Writing valid and compliant netCDF files

Writing valid and compliant netCDF files#

This page covers the basic netCDF structure and the application of the CF and ACDD conventions to netCDF files. Not all the rules are listed here, this is just a summary of the most common ones. For a complete overview of both conventions refer to their official documentation (see references). This summary is meant to help setting variables and attributes correctly at file creation, for existing files a CF checker tool can be used. We cover some available tools in the next page.

Warning

While CF checker programs are useful for a quick assessment of a file, it is important to remember that:

None of them can check all the requirements and recommendations, they might miss rules for attributes which are used less often or ones that are too complex to code.
A file might satisfy all the rules but have incorrect information, i.e., all variables have units defined, but some of them are incorrect

Naming conventions#

There are no recognised vocabularies for files and variable names, however the ESGF created their own variable definition tables for the CMIP output and this is often used by other projects. The CF conventions only have some basic requirements and recommendations on how to form the names.

Requirements

NetCDF files are required to have the file name extension .nc.
The dimensions of a variable must all have different names.
If the external_variables attribute is used, variables with the same names are not allowed in the file.

Recommendations

Variable, dimension and attribute names should begin with a letter and be composed of letters, digits, and underscores.
No two variable names should be identical when case is ignored.

Warning

Attribute names commencing with underscore (‘_’) are reserved for use by the netCDF library.

Coordinate system#

A coordinate system is described in netCDF files through the combinations of its dimensions, coordinates and projections. A coordinate variable is a one-dimensional variable representing a dimension and sharing its name. An auxiliary coordinate variable is an additional or alternative coordinate for an axis. Finally a scalar coordinate variable is a coordinate with no dimension (i.e., of size one).

The most common dimensions and coordinates are time, latitude, longitude and the vertical coordinate (depth or height). These have a special attribute axis:

Requirements

The only legal values of axis are X, Y, Z, and T (case insensitive).
The only legal values for the positive attribute, associated with the Z axis, are up or down (case insensitive).
The axis attribute must be consistent with the coordinate type deduced from the attributes units and positive.
The axis attribute is not allowed for auxiliary coordinate variables.
A data variable must not have more than one coordinate variable with a particular value of the axis attribute.

Recommendations

If any or all of the dimensions of a variable have the interpretations (as given by their units or axis attribute) of time (T), height or depth (Z), latitude (Y), or longitude (X) then those dimensions should appear in the relative order T, then Z, then Y, then X in the CDL definition corresponding to the file.
Any dimensions of a variable other than space and time dimensions should be added “to the left” of the space and time dimensions as represented in CDL.

Time coordinate#

The time coordinate is well defined by the units and the calendar attributes. The standard time definition doesn’t work for climatological statistics, as a calendar year, month and day of the year are not well defined units of time and they usually change depending on the calendar. For more information on how to describe climatological statistics refer to the CF documentation.

Requirements

The time units of a time coordinate variable must contain a reference time.
The reference time of a time coordinate variable must be a legal time in the specified calendar.

Recommendations

A time coordinate variable should have a calendar attribute.
The use of a reference time in the year 0 to indicate climatological time is deprecated.
Year and month should not be used as units, because of the potential for mistakes and confusion.

Warning

CF standards follows UDUNITS definition of a year to be exactly 365.242198781 days, and a month to be exactly year/12. These are different from a calendar year and a calendar month.

Variable attributes#

The CF Conventions provides a complete list of variable attributes it covers.
Here we are covering only the ones we believe are more critical. It is also worth pointing out that only a few attributes are required to maintain backwards compatibility with COARDS. However, most of the highly recommended ones are expected for the files to be considered CF compliant, whenever they are applicable.

missing and valid data

CF - recommended
_FillValue, missing_value, valid_min, valid_max, and valid_range attributes are used to indicate missing data. Missing data is allowed in data variables and auxiliary coordinate variables. Generic applications should treat the data as missing where any auxiliary coordinate variables have missing values; it’s very important to define this value so that NaNs are properly interpreted by as many tools as possible.
Missing data is not allowed in coordinate variables.

_FillValue
The _FillValue attribute specifies the fill value used to pre-fill disk space allocated to the variable, it is then returned when reading values which were never written. If _FillValue is defined then it should be scalar and of the same type as the variable. If the variable is packed using scale_factor and add_offset attributes, the _FillValue attribute should have the data type of the packed data.
If not defined the default fill value for the type of the variable is used. However, use of the default fill value for data type byte is not recommended.
Note that changing the value of this attibute, won’t change previously ‘filled’ data automatically.
If valid_range is specified _FillValue should be outside of this range.

missing-value
The missing_value is not treated in any special way by the netCDF library, but it may be used by specific applications. The missing_value attribute can be a scalar or vector containing values indicating missing data. These values should all be outside the valid range.
When scale_factor and add_offset are used, the value(s) of the missing_value attribute should be specified in relation to the packed data, so that missing values can be detected before the scale_factor and add_offset are applied.
If both missing_value and _FillValue are used, they should have the same value.

actual_range
The actual_range defines a two-element vector for numeric variables, composed of the exact minimum and the maximum data values and both must be within the valid_range if specified.
If the variable is packed, the elements of the actual_range should be defined based on the unpacked data, including the type.
If the data is all missing or invalid, the actual_range attribute cannot be used.

valid_range, valid_min, valid_max
The valid_range attribute is mutually exclusive with valid_min and valid_max attributes. If none of these are defined, software applications will use _FillValue and the variable type to try to determine a valid range.

Warning

In xarray variable attributes are kept across operations depending on the value of the keep_attrs argument. Its default behaviour is to keep them in unambiguous circumstances. If set to True it will always keep the attributes, if set to False it will discard them.
It is important to be aware of this, as particularly for coordinates, units and cell_methods are easily changed by calculations and often inherited attributes become meaningless or worse, can cause issues if not updated. The same can also happen with other software.

Global attributes#

Global attributes are the ones that apply to the entire file. Global attributes are useful to record provenance: keep track of operations applied to the file, data sources and software used to generate the data, any party involved. They are also used at publication stage when conventions like ACDD build on the CF ones to add publication related information, such as DOI, main contact, license and references.
While global attributes are the most useful when sharing data, for example to increase discoverability, using some key ones from the start of the file creation is important to keep track of the file history. Often this level of information is neglected when saving data to a file, which can make it hard if not impossible to reconstruct the analysis workflow.
Also, as global attributes are preserved during most analysis operations, output files end up containing information from the source file which is usually not relevant to the new data.
institution, source, references, and comment can also be assigned to individual variables, in such cases, the variable version has precedence.

Requirements

The title, history, institution, source, references, and comment attributes are all type string.

References
A useful summarised version of requirements and recommendations for CF conventions.

Writing valid and compliant netCDF files

Contents

Writing valid and compliant netCDF files#

Naming conventions#

Coordinate system#

Time coordinate#

Variable attributes#

Global attributes#