Copyright © 1995–2008 Charlie Zender
This is the first edition of the NCO User's Guide,
and is consistent with version 2 of texinfo.tex.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. The license is available online at http://www.gnu.org/copyleft/fdl.html
The original author of this software, Charlie Zender, wants to improve it
with the help of your suggestions, improvements, bug-reports, and patches.
Charlie Zender <surname at uci dot edu> (yes, my surname is zender)
3200 Croul Hall
Department of Earth System Science
University of California, Irvine
Irvine, CA 92697-3100
|
Note to readers of the NCO User's Guide in HTML format:
The NCO User's Guide in PDF format
(also on SourceForge)
contains the complete NCO documentation.
|
The netCDF Operators, or NCO, are a suite of programs known as operators. The operators facilitate manipulation and analysis of data stored in the self-describing netCDF format, available from (http://www.unidata.ucar.edu/packages/netcdf). Each NCO operator (e.g., ncks) takes netCDF input file(s), performs an operation (e.g., averaging, hyperslabbing, or renaming), and outputs a processed netCDF file. Although most users of netCDF data are involved in scientific research, these data formats, and thus NCO, are generic and are equally useful in fields from agriculture to zoology. The NCO User's Guide illustrates NCO use with examples from the field of climate modeling and analysis. The NCO homepage is http://nco.sf.net, and there is a mirror at http://dust.ess.uci.edu/nco.
This documentation is for NCO version 3.9.6. It was last updated 9 June 2008. Corrections, additions, and rewrites of this documentation are very welcome.
Enjoy,
Charlie Zender
NCO is the result of software needs that arose while I worked
on projects funded by NCAR, NASA, and ARM.
Thinking they might prove useful as tools or templates to others,
it is my pleasure to provide them freely to the scientific community.
Many users (most of whom I have never met) have encouraged the
development of NCO.
Thanks espcially to Jan Polcher, Keith Lindsay, Arlindo da Silva,
John Sheldon, and William Weibel for stimulating suggestions and
correspondence.
Your encouragment motivated me to complete the NCO User's Guide.
So if you like NCO, send me a note!
I should mention that NCO is not connected to or
officially endorsed by Unidata, ACD, ASP,
CGD, or Nike.
Charlie Zender
Major feature improvements entitle me to write another Foreword. In the last five years a lot of work has been done to refine NCO. NCO is now an open source project and appears to be much healthier for it. The list of illustrious institutions that do not endorse NCO continues to grow, and now includes UCI.
Charlie Zender
The most remarkable advances in NCO capabilities in the last few years are due to contributions from the Open Source community. Especially noteworthy are the contributions of Henry Butowsky and Rorik Peterson.
Charlie Zender
NCO has been generously supported from 2004–2008 by US National Science Foundation (NSF)grant IIS-0431203. This support allowed me to maintain and extend core NCO code, and others to advance NCO in new directions: Gayathri Venkitachalam helped implement MPI; Harry Mangalam improved regression testing and benchmarking; Daniel Wang developed the server-side capability, SWAMP; and Henry Butowsky, a long-time contributor, developed ncap2. This support also led NCO to debut in professional journals and meetings. The personal and professional contacts made during this evolution have been immensely rewarding.
Charlie Zender
This manual describes NCO, which stands for netCDF Operators. NCO is a suite of programs known as operators. Each operator is a standalone, command line program executed at the shell-level like, e.g., ls or mkdir. The operators take netCDF files (including HDF5 files constructed using the netCDF API) as input, perform an operation (e.g., averaging or hyperslabbing), and produce a netCDF file as output. The operators are primarily designed to aid manipulation and analysis of data. The examples in this documentation are typical applications of the operators for processing climate model output. This stems from their origin, though the operators are as general as netCDF itself.
The complete NCO source distribution is currently distributed
as a compressed tarfile from
http://sf.net/projects/nco
and from
http://dust.ess.uci.edu/nco/nco.tar.gz.
The compressed tarfile must be uncompressed and untarred before building
NCO.
Uncompress the file with ‘gunzip nco.tar.gz’.
Extract the source files from the resulting tarfile with ‘tar -xvf
nco.tar’.
GNU tar lets you perform both operations in one step
with ‘tar -xvzf nco.tar.gz’.
The documentation for NCO is called the NCO User's Guide. The User's Guide is available in Postscript, HTML, DVI, TeXinfo, and Info formats. These formats are included in the source distribution in the files nco.ps, nco.html, nco.dvi, nco.texi, and nco.info*, respectively. All the documentation descends from a single source file, nco.texi 1. Hence the documentation in every format is very similar. However, some of the complex mathematical expressions needed to describe ncwa can only be displayed in DVI, Postscript, and PDF formats.
If you want to quickly see what the latest improvements in NCO are (without downloading the entire source distribution), visit the NCO homepage at http://nco.sf.net. The HTML version of the User's Guide is also available online through the World Wide Web at URL http://nco.sf.net/nco.html. To build and use NCO, you must have netCDF installed. The netCDF homepage is http://www.unidata.ucar.edu/packages/netcdf.
New NCO releases are announced on the netCDF list
and on the nco-announce mailing list
http://lists.sf.net/mailman/listinfo/nco-announce.
NCO has been successfully ported and tested and is known to work on the following 32- and 64-bit platforms: IBM AIX 4.x, 5.x, FreeBSD 4.x, GNU/Linux 2.x, LinuxPPC, LinuxAlpha, LinuxARM, LinuxSparc64, SGI IRIX 5.x and 6.x, MacOS X 10.x, NEC Super-UX 10.x, DEC OSF, Sun SunOS 4.1.x, Solaris 2.x, Cray UNICOS 8.x–10.x, and MS Windows95 and all later versions. If you port the code to a new operating system, please send me a note and any patches you required.
The major prerequisite for installing NCO on a particular platform is the successful, prior installation of the netCDF library (and, as of 2003, the UDUnits library). Unidata has shown a commitment to maintaining netCDF and UDUnits on all popular UNIX platforms, and is moving towards full support for the Microsoft Windows operating system (OS). Given this, the only difficulty in implementing NCO on a particular platform is standardization of various C and Fortran interface and system calls. NCO code is tested for ANSI compliance by compiling with C compilers including those from GNU (‘gcc -std=c99 -pedantic -D_BSD_SOURCE -D_POSIX_SOURCE’ -Wall) 2, Comeau Computing (‘como --c99’), Cray (‘cc’), HP/Compaq/DEC (‘cc’), IBM (‘xlc -c -qlanglvl=extc99’), Intel (‘icc -std=c99’), NEC (‘cc’), PathScale (QLogic) (‘pathcc -std=c99’), PGI (‘pgcc -c9x’), SGI (‘cc -c99’), and Sun (‘cc’). NCO (all commands and the libnco library) and the C++ interface to netCDF (called libnco_c++) comply with the ISO C++ standards as implemented by Comeau Computing (‘como’), Cray (‘CC’), GNU (‘g++ -Wall’), HP/Compaq/DEC (‘cxx’), IBM (‘xlC’), Intel (‘icc’), NEC (‘c++’), PathScale (Qlogic) (‘pathCC’), PGI (‘pgCC’), SGI (‘CC -LANG:std’), and Sun (‘CC -LANG:std’). See nco/bld/Makefile and nco/src/nco_c++/Makefile.old for more details and exact settings.
Until recently (and not even yet), ANSI-compliant has meant
compliance with the 1989 ISO C-standard, usually called C89 (with
minor revisions made in 1994 and 1995).
C89 lacks variable-size arrays, restricted pointers, some useful
printf formats, and many mathematical special functions.
These are valuable features of C99, the 1999 ISO C-standard.
NCO is C99-compliant where possible and C89-compliant where
necessary.
Certain branches in the code are required to satisfy the native
SGI and SunOS C compilers, which are strictly ANSI
C89 compliant, and cannot benefit from C99 features.
However, C99 features are fully supported by modern AIX,
GNU, Intel, NEC, Solaris, and UNICOS
compilers.
NCO requires a C99-compliant compiler as of NCO
version 2.9.8, released in August, 2004.
The most time-intensive portion of NCO execution is spent in
arithmetic operations, e.g., multiplication, averaging, subtraction.
These operations were performed in Fortran by default until August,
1999.
This was a design decision based on the relative speed of Fortran-based
object code vs. C-based object code in late 1994.
C compiler vectorization capabilities have dramatically improved
since 1994.
We have accordingly replaced all Fortran subroutines with C functions.
This greatly simplifies the task of building NCO on nominally
unsupported platforms.
As of August 1999, NCO built entirely in C by default.
This allowed NCO to compile on any machine with an
ANSI C compiler.
In August 2004, the first C99 feature, the restrict type
qualifier, entered NCO in version 2.9.8.
C compilers can obtain better performance with C99 restricted
pointers since they inform the compiler when it may make Fortran-like
assumptions regarding pointer contents alteration.
Subsequently, NCO requires a C99 compiler to build correctly
3.
In June 2005, NCO version 3.0.1 began to take advantage
of C99 mathematical special functions.
These include the standarized gamma function (called tgamma()
for “true gamma”).
NCO automagically takes advantage of some GNU
Compiler Collection (GCC) extensions to ANSI C.
As of July 2000 and NCO version 1.2, NCO no
longer performs arithmetic operations in Fortran.
We decided to sacrifice executable speed for code maintainability.
Since no objective statistics were ever performed to quantify
the difference in speed between the Fortran and C code,
the performance penalty incurred by this decision is unknown.
Supporting Fortran involves maintaining two sets of routines for every
arithmetic operation.
The USE_FORTRAN_ARITHMETIC flag is still retained in the
Makefile.
The file containing the Fortran code, nco_fortran.F, has been
deprecated but a volunteer (Dr. Frankenstein?) could resurrect it.
If you would like to volunteer to maintain nco_fortran.F please
contact me.
NCO has been successfully ported and tested on the Microsoft
Windows (95/98/NT/2000/XP) operating systems.
The switches necessary to accomplish this are included in the standard
distribution of NCO.
Using the freely available Cygwin (formerly gnu-win32) development
environment
4, the compilation process is very similar to
installing NCO on a UNIX system.
Set the PVM_ARCH preprocessor token to WIN32.
Note that defining WIN32 has the side effect of disabling
Internet features of NCO (see below).
NCO should now build like it does on UNIX.
The least portable section of the code is the use of standard
UNIX and Internet protocols (e.g., ftp, rcp,
scp, sftp, getuid, gethostname, and header
files <arpa/nameser.h> and
<resolv.h>).
Fortunately, these UNIX-y calls are only invoked by the single
NCO subroutine which is responsible for retrieving files
stored on remote systems (see Remote storage).
In order to support NCO on the Microsoft Windows platforms,
this single feature was disabled (on Windows OS only).
This was required by Cygwin 18.x—newer versions of Cygwin may
support these protocols (let me know if this is the case).
The NCO operators should behave identically on Windows and
UNIX platforms in all other respects.
Like all executables, the NCO operators can be built using dynamic linking. This reduces the size of the executable and can result in significant performance enhancements on multiuser systems. Unfortunately, if your library search path (usually the LD_LIBRARY_PATH environment variable) is not set correctly, or if the system libraries have been moved, renamed, or deleted since NCO was installed, it is possible NCO operators will fail with a message that they cannot find a dynamically loaded (aka shared object or ‘.so’) library. This will produce a distinctive error message, such as ‘ld.so.1: /usr/local/bin/ncea: fatal: libsunmath.so.1: can't open file: errno=2’. If you received an error message like this, ask your system administrator to diagnose whether the library is truly missing 5, or whether you simply need to alter your library search path. As a final remedy, you may re-compile and install NCO with all operators statically linked.
netCDF version 2 was released in 1993.
NCO (specifically ncks) began soon after this in 1994.
netCDF 3.0 was released in 1996, and we were eager to reap the
performance advantages of the newer netCDF implementation.
One netCDF3 interface call (nc_inq_libvers) was added to
NCO in January, 1998, to aid in maintainance and debugging.
In March, 2001, the final conversion of NCO to netCDF3
was completed (coincidentally on the same day netCDF 3.5 was
released).
NCO versions 2.0 and higher are built with the
-DNO_NETCDF_2 flag to ensure no netCDF2 interface calls
are used.
However, the ability to compile NCO with only netCDF2
calls is worth maintaining because HDF version 4
6
(available from HDF)
supports only the netCDF2 library calls
(see http://hdf.ncsa.uiuc.edu/UG41r3_html/SDS_SD.fm12.html#47784).
Note that there are multiple versions of HDF.
Currently HDF version 4.x supports netCDF2 and thus
NCO version 1.2.x.
If NCO version 1.2.x (or earlier) is built with only
netCDF2 calls then all NCO operators should work with
HDF4 files as well as netCDF files
7.
The preprocessor token NETCDF2_ONLY exists
in NCO version 1.2.x to eliminate all netCDF3
calls.
Only versions of NCO numbered 1.2.x and earlier have this
capability.
The NCO 1.2.x branch will be maintained with bugfixes only
(no new features) until HDF begins to fully support the
netCDF3 interface (which is employed by NCO 2.x).
If, at compilation time, NETCDF2_ONLY is defined, then
NCO version 1.2.x will not use any netCDF3 calls
and, if linked properly, the resulting NCO operators will work
with HDF4 files.
The Makefile supplied with NCO 1.2.x is written
to simplify building in this HDF capability.
When NCO is built with make HDF4=Y, the Makefile
sets all required preprocessor flags and library links to build
with the HDF4 libraries (which are assumed to reside under
/usr/local/hdf4, edit the Makefile to suit your
installation).
HDF version 5 became available in 1999, but did not support netCDF (or, for that matter, Fortran) as of December 1999. By early 2001, HDF5 did support Fortran90. However, support for netCDF4 in HDF5 is incomplete. Much of the HDF5-netCDF interface is complete, however, and it may be separately downloaded from the netCDF4 website. We are eager for HDF5 to complete netCDF support. This is scheduled to occur sometime in 2007, with the releases of HDF version 1.8 and netCDF version 4, which are collaborations between Unidata and NCSA. NCO version 3.0.3 added support for reading/writing netCDF4-formatted HDF5 files in October, 2005. See Selecting Output File Format for more details.
NCO version 3.9.0 added full support for all netCDF4 atomic data types in May, 2007. Support for netCDF4 features will be incremental, i.e., we will add one netCDF4 feature at a time. You must build NCO with netCDF4 to obtain this support.
The main netCDF4 features that NCO currently supports are the new
atomic data types and Lempel-Ziv compression.
The new atomic data types are NC_UBYTE, NC_USHORT,
NC_UINT, NC_INT64, and NC_UINT64.
Eight-byte integer support is especially useful improvement from
netCDF3.
All NCO operators support these types, e.g., ncks
copies and prints them, ncra averages them, and
ncap2 processes algebraic scripts with them.
ncks prints compression information, if any, to screen.
Lempel-Ziv deflation is a lossless compression technique. See Deflation for more details.
netCDF4-enabled NCO handles netCDF3 files without change. In addition, it automagically handles netCDF4 (HDF5) files: If you feed NCO netCDF3 files, it produces netCDF3 output. If you feed NCO netCDF4 files, it produces netCDF4 output. Use the handy-dandy ‘-4’ switch to request netCDF4 output from netCDF3 input, i.e., to convert netCDF3 to netCDF4. See Selecting Output File Format for more details.
Use appropriate caution while netCDF4 is beta software. Problems with netCDF4 and HDF libraries are still being fixed. NCO support for netCDF4 atomic types is relatively untested. Binary NCO distributions (RPMs and debs) still use netCDF3.
For now you must build NCO from source to get netCDF4 support.
Typically, one specifies the root of the netCDF4-beta
installation directory. Do this with the NETCDF4_ROOT variable.
Then use your preferred NCO build mechanism, e.g.,
export NETCDF4_ROOT=/usr/local/netcdf4 # Set netCDF4 location
cd ~/nco;./configure --enable-netcdf4 # Configure mechanism -or-
cd ~/nco/bld;./make NETCDF4=Y allinone # Old Makefile mechanism
Our short term goal is to track the netCDF4-beta releases, keep the new netCDF4 atomic type support working, and iron out any problems. Our long term goal is to utilize more of the extensive new netCDF4 feature set. The next major netCDF4 feature we are likely to utilize is parallel I/O. We will enable this in the MPI netCDF operators.
We generally receive three categories of mail from users: help requests, bug reports, and feature requests. Notes saying the equivalent of "Hey, NCO continues to work great and it saves me more time everyday than it took to write this note" are a distant fourth.
There is a different protocol for each type of request. The preferred etiquette for all communications is via NCO Project Forums. Do not contact project members via personal e-mail unless your request comes with money or you have damaging information about our personal lives. Please use the Forums—they preserve a record of the questions and answers so that others can learn from our exchange. Also, since NCO is government-funded, this record helps us provide program officers with information they need to evaluate our project.
Before posting to the NCO forums described below, you might first register your name and email address with SourceForge.net or else all of your postings will be attributed to "nobody". Once registered you may choose to "monitor" any forum and to receive (or not) email when there are any postings including responses to your questions. We usually reply to the forum message, not to the original poster.
If you want us to include a new feature in NCO, check first to see if that feature is already on the TODO list. If it is, why not implement that feature yourself and send us the patch? If the feature is not yet on the list, then send a note to the NCO Discussion forum.
Read the manual before reporting a bug or posting a help request. Sending questions whose answers are not in the manual is the best way to motivate us to write more documentation. We would also like to accentuate the contrapositive of this statement. If you think you have found a real bug the most helpful thing you can do is simplify the problem to a manageable size and then report it. The first thing to do is to make sure you are running the latest publicly released version of NCO.
Once you have read the manual, if you are still unable to get NCO to perform a documented function, submit a help request. Follow the same procedure as described below for reporting bugs (after all, it might be a bug). That is, describe what you are trying to do, and include the complete commands (run with ‘-D 5’), error messages, and version of NCO (with ‘-r’). Post your help request to the NCO Help forum.
If you think you used the right command when NCO misbehaves, then you might have found a bug. Incorrect numerical answers are the highest priority. We usually fix those within one or two days. Core dumps and sementation violations receive lower priority. They are always fixed, eventually.
How do you simplify a problem that reveal a bug? Cut out extraneous variables, dimensions, and metadata from the offending files and re-run the command until it no longer breaks. Then back up one step and report the problem. Usually the file(s) will be very small, i.e., one variable with one or two small dimensions ought to suffice. Run the operator with ‘-r’ and then run the command with ‘-D 5’ to increase the verbosity of the debugging output. It is very important that your report contain the exact error messages and compile-time environment. Include a copy of your sample input file, or place one on a publically accessible location, of the file(s). Post the full bug report to the NCO Project buglist.
Build failures count as bugs.
Our limited machine access means we cannot fix all build failures.
The information we need to diagnose, and often fix, build failures
are the three files output by GNU build tools,
nco.config.log.${GNU_TRP}.foo,
nco.configure.${GNU_TRP}.foo,
and nco.make.${GNU_TRP}.foo.
The file configure.eg shows how to produce these files.
Here ${GNU_TRP} is the "GNU architecture triplet",
the chip-vendor-OS string returned by config.guess.
Please send us your improvements to the examples supplied in
configure.eg.
The regressions archive at http://dust.ess.uci.edu/nco/rgr
contains the build output from our standard test systems.
You may find you can solve the build problem yourself by examining the
differences between these files and your own.
The main design goal is command line operators which perform useful, scriptable operations on netCDF files. Many scientists work with models and observations which produce too much data to analyze in tabular format. Thus, it is often natural to reduce and massage this raw or primary level data into summary, or second level data, e.g., temporal or spatial averages. These second level data may become the inputs to graphical and statistical packages, and are often more suitable for archival and dissemination to the scientific community. NCO performs a suite of operations useful in manipulating data from the primary to the second level state. Higher level interpretive languages (e.g., IDL, Yorick, Matlab, NCL, Perl, Python), and lower level compiled languages (e.g., C, Fortran) can always perform any task performed by NCO, but often with more overhead. NCO, on the other hand, is limited to a much smaller set of arithmetic and metadata operations than these full blown languages.
Another goal has been to implement enough command line switches so that frequently used sequences of these operators can be executed from a shell script or batch file. Finally, NCO was written to consume the absolute minimum amount of system memory required to perform a given job. The arithmetic operators are extremely efficient; their exact memory usage is detailed in Memory Requirements.
NCO was developed at NCAR to aid analysis and manipulation of datasets produced by General Circulation Models (GCMs). Datasets produced by GCMs share many features with all gridded scientific datasets and so provide a useful paradigm for the explication of the NCO operator set. Examples in this manual use a GCM paradigm because latitude, longitude, time, temperature and other fields related to our natural environment are as easy to visualize for the layman as the expert.
NCO operators are designed to be reasonably fault tolerant, so
that if there is a system failure or the user aborts the operation (e.g.,
with C-c), then no data are lost.
The user-specified output-file is only created upon successful
completion of the operation
8.
This is accomplished by performing all operations in a temporary copy
of output-file.
The name of the temporary output file is constructed by appending
.pid<process ID>.<operator name>.tmp to the
user-specified output-file name.
When the operator completes its task with no fatal errors, the temporary
output file is moved to the user-specified output-file.
Note the construction of a temporary output file uses more disk space
than just overwriting existing files “in place” (because there may be
two copies of the same file on disk until the NCO operation
successfully concludes and the temporary output file overwrites the
existing output-file).
Also, note this feature increases the execution time of the operator
by approximately the time it takes to copy the output-file.
Finally, note this feature allows the output-file to be the same
as the input-file without any danger of “overlap”.
Other safeguards exist to protect the user from inadvertently overwriting data. If the output-file specified for a command is a pre-existing file, then the operator will prompt the user whether to overwrite (erase) the existing output-file, attempt to append to it, or abort the operation. However, in processing large amounts of data, too many interactive questions slows productivity. Therefore NCO also implements two ways to override its own safety features, the ‘-O’ and ‘-A’ switches. Specifying ‘-O’ tells the operator to overwrite any existing output-file without prompting the user interactively. Specifying ‘-A’ tells the operator to attempt to append to any existing output-file without prompting the user interactively. These switches are useful in batch environments because they suppress interactive keyboard input.
Adding variables from one file to another is often desirable. This is referred to as appending, although some prefer the terminology merging 9 or pasting. Appending is often confused with what NCO calls concatenation. In NCO, concatenation refers to splicing a variable along the record dimension. Appending, on the other hand, refers to adding variables from one file to another 10. In this sense, ncks can append variables from one file to another file. This capability is invoked by naming two files on the command line, input-file and output-file. When output-file already exists, the user is prompted whether to overwrite, append/replace, or exit from the command. Selecting overwrite tells the operator to erase the existing output-file and replace it with the results of the operation. Selecting exit causes the operator to exit—the output-file will not be touched in this case. Selecting append/replace causes the operator to attempt to place the results of the operation in the existing output-file, See ncks netCDF Kitchen Sink.
The simplest way to create the union of two files is
ncks -A fl_1.nc fl_2.nc
This puts the contents of fl_1.nc into fl_2.nc. The ‘-A’ is optional. On output, fl_2.nc is the union of the input files, regardless of whether they share dimensions and variables, or are completely disjoint. The append fails if the input files have differently named record dimensions (since netCDF supports only one), or have dimensions of the same name but different sizes.
Users comfortable with NCO semantics may find it easier to perform some simple mathematical operations in NCO rather than higher level languages. ncbo (see ncbo netCDF Binary Operator) does file addition, subtraction, multiplication, division, and broadcasting. ncflint (see ncflint netCDF File Interpolator) does file addition, subtraction, multiplication and interpolation. Sequences of these commands can accomplish simple but powerful operations from the command line.
The most frequently used operators of NCO are probably the averagers and concatenators. Because there are so many permutations of averaging (e.g., across files, within a file, over the record dimension, over other dimensions, with or without weights and masks) and of concatenating (across files, along the record dimension, along other dimensions), there are currently no fewer than five operators which tackle these two purposes: ncra, ncea, ncwa, ncrcat, and ncecat. These operators do share many capabilities 11, but each has its unique specialty. Two of these operators, ncrcat and ncecat, are for concatenating hyperslabs across files. The other two operators, ncra and ncea, are for averaging hyperslabs across files 12. First, let's describe the concatenators, then the averagers.
Joining independent files together along a record dimension is called
concatenation.
ncrcat is designed for concatenating record variables, while
ncecat is designed for concatenating fixed length variables.
Consider five files, 85.nc, 86.nc,
... 89.nc each containing a year's worth of data.
Say you wish to create from them a single file, 8589.nc
containing all the data, i.e., spanning all five years.
If the annual files make use of the same record variable, then
ncrcat will do the job nicely with, e.g.,
ncrcat 8?.nc 8589.nc.
The number of records in the input files is arbitrary and can vary from
file to file.
See ncrcat netCDF Record Concatenator, for a complete description of
ncrcat.
However, suppose the annual files have no record variable, and thus
their data are all fixed length.
For example, the files may not be conceptually sequential, but rather
members of the same group, or ensemble.
Members of an ensemble may have no reason to contain a record dimension.
ncecat will create a new record dimension (named record
by default) with which to glue together the individual files into the
single ensemble file.
If ncecat is used on files which contain an existing record
dimension, that record dimension is converted to a fixed-length
dimension of the same name and a new record dimension (named
record) is created.
Consider five realizations, 85a.nc, 85b.nc,
... 85e.nc of 1985 predictions from the same climate
model.
Then ncecat 85?.nc 85_ens.nc glues the individual realizations
together into the single file, 85_ens.nc.
If an input variable was dimensioned [lat,lon], it will
have dimensions [record,lat,lon] in the output file.
A restriction of ncecat is that the hyperslabs of the
processed variables must be the same from file to file.
Normally this means all the input files are the same size, and contain
data on different realizations of the same variables.
See ncecat netCDF Ensemble Concatenator, for a complete description
of ncecat.
ncpdq makes it possible to concatenate files along any
dimension, not just the record dimension.
First, use ncpdq to convert the dimension to be concatenated
(i.e., extended with data from other files) into the record dimension.
Second, use ncrcat to concatenate these files.
Finally, if desirable, use ncpdq to revert to the original
dimensionality.
As a concrete example, say that files x_01.nc, x_02.nc,
... x_10.nc contain time-evolving datasets from spatially
adjacent regions.
The time and spatial coordinates are time and x, respectively.
Initially the record dimension is time.
Our goal is to create a single file that contains joins all the
spatially adjacent regions into one single time-evolving dataset.
for idx in 01 02 03 04 05 06 07 08 09 10; do # Bourne Shell
ncpdq -a x,time x_${idx}.nc foo_${idx}.nc # Make x record dimension
done
ncrcat foo_??.nc out.nc # Concatenate along x
ncpdq -a time,x out.nc out.nc # Revert to time as record dimension
Note that ncrcat will not concatenate fixed-length variables, whereas ncecat concatenates both fixed-length and record variables along a new record variable. To conserve system memory, use ncrcat where possible.
The differences between the averagers ncra and ncea are analogous to the differences between the concatenators. ncra is designed for averaging record variables from at least one file, while ncea is designed for averaging fixed length variables from multiple files. ncra performs a simple arithmetic average over the record dimension of all the input files, with each record having an equal weight in the average. ncea performs a simple arithmetic average of all the input files, with each file having an equal weight in the average. Note that ncra cannot average fixed-length variables, but ncea can average both fixed-length and record variables. To conserve system memory, use ncra rather than ncea where possible (e.g., if each input-file is one record long). The file output from ncea will have the same dimensions (meaning dimension names as well as sizes) as the input hyperslabs (see ncea netCDF Ensemble Averager, for a complete description of ncea). The file output from ncra will have the same dimensions as the input hyperslabs except for the record dimension, which will have a size of 1 (see ncra netCDF Record Averager, for a complete description of ncra).
ncflint can interpolate data between or two files. Since no other operators have this ability, the description of interpolation is given fully on the ncflint reference page (see ncflint netCDF File Interpolator). Note that this capability also allows ncflint to linearly rescale any data in a netCDF file, e.g., to convert between differing units.
Occasionally one desires to digest (i.e., concatenate or average)
hundreds or thousands of input files.
Unfortunately, data archives (e.g., NASA EOSDIS) may not
name netCDF files in a format understood by the ‘-n loop’
switch (see Specifying Input Files) that automagically generates
arbitrary numbers of input filenames.
The ‘-n loop’ switch has the virtue of being concise,
and of minimizing the command line.
This helps keeps output file small since the command line is stored
as metadata in the history attribute
(see History Attribute).
However, the ‘-n loop’ switch is useless when there is no
simple, arithmetic pattern to the input filenames (e.g.,
h00001.nc, h00002.nc, ... h90210.nc).
Moreover, filename globbing does not work when the input files are too
numerous or their names are too lengthy (when strung together as a
single argument) to be passed by the calling shell to the NCO
operator
13.
When this occurs, the ANSI C-standard argc-argv
method of passing arguments from the calling shell to a C-program (i.e.,
an NCO operator) breaks down.
There are (at least) three alternative methods of specifying the input
filenames to NCO in environment-limited situations.
The recommended method for sending very large numbers (hundreds or
more, typically) of input filenames to the multi-file operators is
to pass the filenames with the UNIX standard input
feature, aka stdin:
# Pipe large numbers of filenames to stdin
/bin/ls | grep ${CASEID}_'......'.nc | ncecat -o foo.nc
This method avoids all constraints on command line size imposed by
the operating system.
A drawback to this method is that the history attribute
(see History Attribute) does not record the name of any input
files since the names were not passed on the command line.
This makes determining the data provenance at a later date difficult.
To remedy this situation, multi-file operators store the number of
input files in the nco_input_file_number global attribute and the
input file list itself in the nco_input_file_list global attribute
(see File List Attributes).
Although this does not preserve the exact command used to generate the
file, it does retains all the information required to reconstruct the
command and determine the data provenance.
A second option is to use the UNIX xargs command. This simple example selects as input to xargs all the filenames in the current directory that match a given pattern. For illustration, consider a user trying to average millions of files which each have a six character filename. If the shell buffer can not hold the results of the corresponding globbing operator, ??????.nc, then the filename globbing technique will fail. Instead we express the filename pattern as an extended regular expression, ......\.nc (see Subsetting Variables). We use grep to filter the directory listing for this pattern and to pipe the results to xargs which, in turn, passes the matching filenames to an NCO multi-file operator, e.g., ncecat.
# Use xargs to transfer filenames on the command line
/bin/ls | grep ${CASEID}_'......'.nc | xargs -x ncecat -o foo.nc
The single quotes protect the only sensitive parts of the extended
regular expression (the grep argument), and allow shell
interpolation (the ${CASEID} variable substitution) to
proceed unhindered on the rest of the command.
xargs uses the UNIX pipe feature to append the
suitably filtered input file list to the end of the ncecat
command options.
The -o foo.nc switch ensures that the input files supplied by
xargs are not confused with the output file name.
xargs does, unfortunately, have its own limit (usually about
20,000 characters) on the size of command lines it can pass.
Give xargs the ‘-x’ switch to ensure it dies if it
reaches this internal limit.
When this occurs, use either the stdin method above, or the
symbolic link presented next.
Even when its internal limits have not been reached, the xargs technique may not be sophisticated enough to handle all situations. A full scripting language like Perl can handle any level of complexity of filtering input filenames, and any number of filenames. The technique of last resort is to write a script that creates symbolic links between the irregular input filenames and a set of regular, arithmetic filenames that the ‘-n loop’ switch understands. For example, the following Perl script a monotonically enumerated symbolic link to up to one million .nc files in a directory. If there are 999,999 netCDF files present, the links are named 000001.nc to 999999.nc:
# Create enumerated symbolic links
/bin/ls | grep \.nc | perl -e \
'$idx=1;while(<STDIN>){chop;symlink $_,sprintf("%06d.nc",$idx++);}'
ncecat -n 999999,6,1 000001.nc foo.nc
# Remove symbolic links when finished
/bin/rm ??????.nc
The ‘-n loop’ option tells the NCO operator to
automatically generate the filnames of the symbolic links.
This circumvents any OS and shell limits on command line size.
The symbolic links are easily removed once NCO is finished.
One drawback to this method is that the history attribute
(see History Attribute) retains the filename list of the symbolic
links, rather than the data files themselves.
This makes it difficult to determine the data provenance at a later date.
Large datasets are those files that are comparable in size to the amount of random access memory (RAM) in your computer. Many users of NCO work with files larger than 100 MB. Files this large not only push the current edge of storage technology, they present special problems for programs which attempt to access the entire file at once, such as ncea and ncecat. If you work with a 300 MB files on a machine with only 32 MB of memory then you will need large amounts of swap space (virtual memory on disk) and NCO will work slowly, or even fail. There is no easy solution for this. The best strategy is to work on a machine with sufficient amounts of memory and swap space. Since about 2004, many users have begun to produce or analyze files exceeding 2 GB in size. These users should familiarize themselves with NCO's Large File Support (LFS) capabilities (see Large File Support). The next section will increase your familiarity with NCO's memory requirements. With this knowledge you may re-design your data reduction approach to divide the problem into pieces solvable in memory-limited situations.
If your local machine has problems working with large files, try running
NCO from a more powerful machine, such as a network server.
Certain machine architectures, e.g., Cray UNICOS, have special
commands which allow one to increase the amount of interactive memory.
On Cray systems, try to increase the available memory with the
ilimit command.
If you get a memory-related core dump
(e.g., ‘Error exit (core dumped)’) on a GNU/Linux system,
try increasing the process-available memory with ulimit.
The speed of the NCO operators also depends on file size.
When processing large files the operators may appear to hang, or do
nothing, for large periods of time.
In order to see what the operator is actually doing, it is useful to
activate a more verbose output mode.
This is accomplished by supplying a number greater than 0 to the
‘-D debug-level’ (or ‘--debug-level’, or
‘--dbg_lvl’) switch.
When the debug-level is nonzero, the operators report their
current status to the terminal through the stderr facility.
Using ‘-D’ does not slow the operators down.
Choose a debug-level between 1 and 3 for most situations,
e.g., ncea -D 2 85.nc 86.nc 8586.nc.
A full description of how to estimate the actual amount of memory the
multi-file NCO operators consume is given in
Memory Requirements.
Many people use NCO on gargantuan files which dwarf the memory available (free RAM plus swap space) even on today's powerful machines. These users want NCO to consume the least memory possible so that their scripts do not have to tediously cut files into smaller pieces that fit into memory. We commend these greedy users for pushing NCO to its limits!
This section describes the memory NCO requires during operation. The required memory is based on the underlying algorithms. The description below is the memory usage per thread. Users with shared memory machines may use the threaded NCO operators (see OpenMP Threading). The peak and sustained memory usage will scale accordingly, i.e., by the number of threads. Memory consumption patterns of all operators are similar, with the exception of ncap2.
The multi-file operators currently comprise the record operators, ncra and ncrcat, and the ensemble operators, ncea and ncecat. The record operators require much less memory than the ensemble operators. This is because the record operators operate on one single record (i.e., time-slice) at a time, wherease the ensemble operators retrieve the entire variable into memory. Let MS be the peak sustained memory demand of an operator, FT be the memory required to store the entire contents of all the variables to be processed in an input file, FR be the memory required to store the entire contents of a single record of each of the variables to be processed in an input file, VR be the memory required to store a single record of the largest record variable to be processed in an input file, VT be the memory required to store the largest variable to be processed in an input file, VI be the memory required to store the largest variable which is not processed, but is copied from the initial file to the output file. All operators require MI = VI during the initial copying of variables from the first input file to the output file. This is the initial (and transient) memory demand. The sustained memory demand is that memory required by the operators during the processing (i.e., averaging, concatenation) phase which lasts until all the input files have been processed. The operators have the following memory requirements: ncrcat requires MS <= VR. ncecat requires MS <= VT. ncra requires MS = 2FR + VR. ncea requires MS = 2FT + VT. ncbo requires MS <= 3VT (both input variables and the output variable). ncflint requires MS <= 3VT (both input variables and the output variable). ncpdq requires MS <= 2VT (one input variable and the output variable). ncwa requires MS <= 8VT (see below). Note that only variables that are processed, e.g., averaged, concatenated, or differenced, contribute to MS. Variables which do not appear in the output file (see Subsetting Variables) are never read and contribute nothing to the memory requirements.
ncwa consumes between two and seven times the memory of a variable in order to process it. Peak consumption occurs when storing simultaneously in memory one input variable, one tally array, one input weight, one conformed/working weight, one weight tally, one input mask, one conformed/working mask, and one output variable. When invoked, the weighting and masking features contribute up to three-sevenths and two-sevenths of these requirements apiece. If weights and masks are not specified (i.e., no ‘-w’ or ‘-a’ options) then ncwa requirements drop to MS <= 3VT (one input variable, one tally array, and the output variable).
The above memory requirements must be multiplied by the number of threads thr_nbr (see OpenMP Threading). If this causes problems then reduce (with ‘-t thr_nbr’) the number of threads.
ncap2 has unique memory requirements due its ability to process arbitrarily long scripts of any complexity. All scripts acceptable to ncap2 are ultimately processed as a sequence of binary or unary operations. ncap2 requires MS <= 2VT under most conditions. An exception to this is when left hand casting (see Left hand casting) is used to stretch the size of derived variables beyond the size of any input variables. Let VC be the memory required to store the largest variable defined by left hand casting. In this case, MS <= 2VC.
ncap2 scripts are complete dynamic and may be of arbitrary length. A script that contains many thousands of operations, may uncover a slow memory leak even though each single operation consumes little additional memory. Memory leaks are usually identifiable by their memory usage signature. Leaks cause peak memory usage to increase monotonically with time regardless of script complexity. Slow leaks are very difficult to find. Sometimes a malloc() (or new[]) failure is the only noticeable clue to their existance. If you have good reasons to believe that a memory allocation failure is ultimately due to an NCO memory leak (rather than inadequate RAM on your system), then we would be very interested in receiving a detailed bug report.
An overview of NCO capabilities as of about 2006 is in Zender, C. S. (2008), “Analysis of Self-describing Gridded Geoscience Data with netCDF Operators (NCO)”, Environ. Modell. Softw., doi:10.1016/j.envsoft.2008.03.004. This paper is also available at http://dust.ess.uci.edu/ppr/ppr_Zen08_ems.pdf.
NCO performance and scaling for arithmetic operations is described in Zender, C. S., and H. J. Mangalam (2007), “Scaling Properties of Common Statistical Operators for Gridded Datasets”, Int. J. High Perform. Comput. Appl., 21(4), 485-498, doi:10.1177/1094342007083802. This paper is also available at http://dust.ess.uci.edu/ppr/ppr_ZeM07_ijhpca.pdf.
It is helpful to be aware of the aspects of NCO design that can limit its performance:
Many features have been implemented in more than one operator and are described here for brevity. The description of each feature is preceded by a box listing the operators for which the feature is implemented. Command line switches for a given feature are consistent across all operators wherever possible. If no “key switches” are listed for a feature, then that particular feature is automatic and cannot be controlled by the user.
|
Availability: All operators |
|
Availability: ncatted, ncks, ncrename Short options: None Long options: ‘--hdr_pad’, ‘--header_pad’ |
This optimization exploits the netCDF library nc__enddef()
function, which behaves differently with different versions of netCDF.
It will improve speed of future metadata expansion with CLASSIC
and 64bit netCDF files, but not necessarily with NETCDF4
files, i.e., those created by the netCDF interface to the HDF5
library (see Selecting Output File Format).
|
Availability: ncbo, ncea, ncecat,
ncflint, ncpdq, ncra, ncrcat,
ncwa Short options: ‘-t’ Long options: ‘--thr_nbr’, ‘--threads’, ‘--omp_num_threads’ |
OMP_NUM_THREADS environment variable, if present, or from the
OS, if not.
NCO may modify thr_nbr according to its own internal
settings before it requests any threads from the system.
Certain operators contain hard-code limits to the number of threads they
request.
We base these limits on our experience and common sense, and to reduce
potentially wasteful system usage by inexperienced users.
For example, ncrcat is extremely I/O-intensive so we restrict
thr_nbr <= 2 for ncrcat.
This is based on the notion that the best performance that can be
expected from an operator which does no arithmetic is to have one thread
reading and one thread writing simultaneously.
In the future (perhaps with netCDF4), we hope to
demonstrate significant threading improvements with operators
like ncrcat by performing multiple simultaneous writes.
Compute-intensive operators (ncwa and ncpdq)
are expected to benefit the most from threading.
The greatest increases in throughput due to threading will occur on
large dataset where each thread performs millions or more floating
point operations.
Otherwise, the system overhead of setting up threads may outweigh
the theoretical speed enhancements due to SMP parallelism.
However, we have not yet demonstrated that the SMP parallelism
scales well beyone four threads for these operators.
Hence we restrict thr_nbr <= 4 for all operators.
We encourage users to play with these limits (edit file
nco_omp.c) and send us their feedback.
Once the initial thr_nbr has been modified for any operator-specific limits, NCO requests the system to allocate a team of thr_nbr threads for the body of the code. The operating system then decides how many threads to allocate based on this request. Users may keep track of this information by running the operator with dbg_lvl > 0.
By default, operators with thread attach one global attribute to any
file they create or modify.
The nco_openmp_thread_number global attribute contains the
number of threads the operator used to process the input files.
This information helps to verify that the answers with threaded and
non-threaded operators are equal to within machine precision.
This information is also useful for benchmarking.
|
Availability: All operators |
Extended options, also called long options, are implemented using the system-supplied getopt.h header file, if possible. This provides the getopt_long function to NCO 14.
The syntax of short options (single letter options) is -key value (dash-key-space-value). Here, key is the single letter option name, e.g., ‘-D 2’.
The syntax of long options (multi-letter options) is --long_name value (dash-dash-key-space-value), e.g., ‘--dbg_lvl 2’ or --long_name=value (dash-dash-key-equal-value), e.g., ‘--dbg_lvl=2’. Thus the following are all valid for the ‘-D’ (short version) or ‘--dbg_lvl’ (long version) command line option.
ncks -D 3 in.nc # Short option
ncks --dbg_lvl=3 in.nc # Long option, preferred form
ncks --dbg_lvl 3 in.nc # Long option, alternate form
The last example is preferred for two reasons. First, ‘--dbg_lvl’ is more specific and less ambiguous than ‘-D’. The long option form makes scripts more self documenting and less error prone. Often long options are named after the source code variable whose value they carry. Second, the equals sign = joins the key (i.e., long_name) to the value in an uninterruptible text block. Experience shows that users are less likely to mis-parse commands when restricted to this form.
GNU implements a superset of the POSIX standard which allows any unambiguous truncation of a valid option to be used.
ncks -D 3 in.nc # Short option
ncks --dbg_lvl=3 in.nc # Long option, full form
ncks --dbg=3 in.nc # Long option, unambiguous truncation
ncks --db=3 in.nc # Long option, unambiguous truncation
ncks --d=3 in.nc # Long option, ambiguous truncation
The first four examples are equivalent and will work as expected. The final example will exit with an error since ncks cannot disambiguate whether ‘--d’ is intended as a truncation of ‘--dbg_lvl’, of ‘--dimension’, or of some other long option.
NCO provides many long options for common switches. For example, the debugging level may be set in all operators with any of the switches ‘-D’, ‘--debug-level’, or ‘--dbg_lvl’. This flexibility allows users to choose their favorite mnemonic. For some, it will be ‘--debug’ (an unambiguous truncation of ‘--debug-level’, and other will prefer ‘--dbg’. Interactive users usually prefer the minimal amount of typing, i.e., ‘-D’. We recommend that scripts which are re-usable employ some form of the long options for future maintainability.
This manual generally uses the short option syntax. This is for historical reasons and to conserve space. The remainder of this manual specifies the full long_name of each option. Users are expected to pick the unambiguous truncation of each option name that most suits their taste.
Availability (-n): ncea, ncecat, ncra, ncrcatAvailability ( -p): All operatorsShort options: ‘-n’, ‘-p’ Long options: ‘--nintap’, ‘--pth’, ‘--path’ |
ncra 85.nc 86.nc 87.nc 88.nc 89.nc 8589.nc
ncra 8[56789].nc 8589.nc
ncra -p input-path 85.nc 86.nc 87.nc 88.nc 89.nc 8589.nc
ncra -n 5,2,1 85.nc 8589.nc
The first method (explicitly specifying all filenames) works by brute
force.
The second method relies on the operating system shell to glob
(expand) the regular expression 8[56789].nc.
The shell passes valid filenames which match the expansion to
ncra.
The third method uses the ‘-p input-path’ argument to specify
the directory where all the input files reside.
NCO prepends input-path (e.g.,
/data/usrname/model) to all input-files (but not to
output-file).
Thus, using ‘-p’, the path to any number of input files need only
be specified once.
Note input-path need not end with ‘/’; the ‘/’ is
automatically generated if necessary.
The last method passes (with ‘-n’) syntax concisely describing the entire set of filenames 15. This option is only available with the multi-file operators: ncra, ncrcat, ncea, and ncecat. By definition, multi-file operators are able to process an arbitrary number of input-files. This option is very useful for abbreviating lists of filenames representable as alphanumeric_prefix+numeric_suffix+.+filetype where alphanumeric_prefix is a string of arbitrary length and composition, numeric_suffix is a fixed width field of digits, and filetype is a standard filetype indicator. For example, in the file ccm3_h0001.nc, we have alphanumeric_prefix = ccm3_h, numeric_suffix = 0001, and filetype = nc.
NCO is able to decode lists of such filenames encoded using the
‘-n’ option.
The simpler (3-argument) ‘-n’ usage takes the form
-n file_number,digit_number,numeric_increment
where file_number is the number of files, digit_number is
the fixed number of numeric digits comprising the numeric_suffix,
and numeric_increment is the constant, integer-valued difference
between the numeric_suffix of any two consecutive files.
The value of alphanumeric_prefix is taken from the input file,
which serves as a template for decoding the filenames.
In the example above, the encoding -n 5,2,1 along with the input
file name 85.nc tells NCO to
construct five (5) filenames identical to the template 85.nc
except that the final two (2) digits are a numeric suffix to be
incremented by one (1) for each successive file.
Currently filetype may be either be empty, nc,
cdf, hdf, or hd5.
If present, these filetype suffixes (and the preceding .)
are ignored by NCO as it uses the ‘-n’ arguments to
locate, evaluate, and compute the numeric_suffix component of
filenames.
Recently the ‘-n’ option has been extended to allow convenient
specification of filenames with “circular” characteristics.
This means it is now possible for NCO to automatically
generate filenames which increment regularly until a specified maximum
value, and then wrap back to begin again at a specified minimum value.
The corresponding ‘-n’ usage becomes more complex, taking one or
two additional arguments for a total of four or five, respectively:
-n
file_number,digit_number,numeric_increment[,numeric_max[,numeric_min]]
where numeric_max, if present, is the maximum integer-value of
numeric_suffix and numeric_min, if present, is the minimum
integer-value of numeric_suffix.
Consider, for example, the problem of specifying non-consecutive input
files where the filename suffixes end with the month index.
In climate modeling it is common to create summertime and wintertime
averages which contain the averages of the months June–July–August,
and December–January–February, respectively:
ncra -n 3,2,1 85_06.nc 85_0608.nc
ncra -n 3,2,1,12 85_12.nc 85_1202.nc
ncra -n 3,2,1,12,1 85_12.nc 85_1202.nc
The first example shows that three arguments to the ‘-n’ option
suffice to specify consecutive months (06, 07, 08) which do not
“wrap” back to a minimum value.
The second example shows how to use the optional fourth and fifth
elements of the ‘-n’ option to specify a wrap value to NCO.
The fourth argument to ‘-n’, if present, specifies the maximum
integer value of numeric_suffix.
In this case the maximum value is 12, and will be formatted as
12 in the filename string.
The fifth argument to ‘-n’, if present, specifies the minimum
integer value of numeric_suffix.
The default minimum filename suffix is 1, which is formatted as
01 in this case.
Thus the second and third examples have the same effect, that is, they
automatically generate, in order, the filenames 85_12.nc,
85_01.nc, and 85_02.nc as input to NCO.
|
Availability: All operators Short options: ‘-o’ Long options: ‘--fl_out’, ‘--output’ |
Specifying fl_out with a switch, rather than as a positional argument, allows fl_out to precede input files in the argument list. This is particularly useful with multi-file operators for three reasons. Multi-file operators may be invoked with hundreds (or more) filenames. Visual or automatic location of fl_out in such a list is difficult when the only syntactic distinction between input and output files is their position. Second, specification of a long list of input files may be difficult (see Large Numbers of Files). Making the input file list the final argument to an operator facilitates using xargs for this purpose. Some alternatives to xargs are very ugly and undesirable. Finally, many users are more comfortable specifying output files with ‘-o fl_out’ near the beginning of an argument list. Compilers and linkers are usually invoked this way.
|
Availability: All operators Short options: ‘-p’, ‘-l’ Long options: ‘--pth’, ‘--path’, ‘--lcl’, ‘--local’ |
To access a file via an anonymous FTP server, supply the remote file's URL. FTP is an intrinsically insecure protocol because it transfers passwords in plain text format. Users should access sites using anonymous FTP when possible. Some FTP servers require a login/password combi