Writing valid and compliant netCDF files

This page covers the basic netCDF structure and the application of the CF and ACDD conventions to netCDF files. Not all the rules are listed here, this is just a summary of the most common ones. For a complete overview of both conventions refer to their official documentation (see references). This summary is meant to help setting variables and attributes correctly at file creation, for existing files a CF checker tool can be used. We cover some available tools in the next page.

Warning

While CF checker programs are useful for a quick assessment of a file, it is important to remember that:

  • None of them can check all the requirements and recommendations, they might miss rules for attributes which are used less often or ones that are too complex to code.

  • A file might satisfy all the rules but have incorrect information, i.e., all variables have units defined, but some of them are incorrect

Naming conventions

There are no recognised vocabularies for files and variable names, however the ESGF created their own variable definition tables for the CMIP output and this is often used by other projects. The CF conventions only have some basic requirements and recommendations on how to form the names.

Requirements

  • NetCDF files are required to have the file name extension .nc.

  • The dimensions of a variable must all have different names.

  • If the external_variables attribute is used, variables with the same names are not allowed in the file.

Recommendations

  • Variable, dimension and attribute names should begin with a letter and be composed of letters, digits, and underscores.

  • No two variable names should be identical when case is ignored.

Warning

Attribute names commencing with underscore (‘_’) are reserved for use by the netCDF library.

Coordinate system

A coordinate system is described in netCDF files through the combinations of its dimensions, coordinates and projections. A coordinate variable is a one-dimensional variable representing a dimension and sharing its name. An auxiliary coordinate variable is an additional or alternative coordinate for an axis. Finally a scalar coordinate variable is a coordinate with no dimension (i.e., of size one).

The most common dimensions and coordinates are time, latitude, longitude and the vertical coordinate (depth or height). These have a special attribute axis:

Requirements

  • The only legal values of axis are X, Y, Z, and T (case insensitive).

  • The only legal values for the positive attribute, associated with the Z axis, are up or down (case insensitive).

  • The axis attribute must be consistent with the coordinate type deduced from the attributes units and positive.

  • The axis attribute is not allowed for auxiliary coordinate variables.

  • A data variable must not have more than one coordinate variable with a particular value of the axis attribute.

Recommendations

  • If any or all of the dimensions of a variable have the interpretations (as given by their units or axis attribute) of time (T), height or depth (Z), latitude (Y), or longitude (X) then those dimensions should appear in the relative order T, then Z, then Y, then X in the CDL definition corresponding to the file.

  • Any dimensions of a variable other than space and time dimensions should be added “to the left” of the space and time dimensions as represented in CDL.

Time coordinate

The time coordinate is well defined by the units and the calendar attributes. The standard time definition doesn’t work for climatological statistics, as a calendar year, month and day of the year are not well defined units of time and they usually change depending on the calendar. For more information on how to describe climatological statistics refer to the CF documentation.

Requirements

  • The time units of a time coordinate variable must contain a reference time.

  • The reference time of a time coordinate variable must be a legal time in the specified calendar.

Recommendations

  • A time coordinate variable should have a calendar attribute.

  • The use of a reference time in the year 0 to indicate climatological time is deprecated.

  • Year and month should not be used as units, because of the potential for mistakes and confusion.

Warning

CF standards follows UDUNITS definition of a year to be exactly 365.242198781 days, and a month to be exactly year/12. These are different from a calendar year and a calendar month.

Variable attributes

The CF Conventions provides a complete list of variable attributes it covers.
Here we are covering only the ones we believe are more critical. It is also worth pointing out that only a few attributes are required to maintain backwards compatibility with COARDS. However, most of the highly recommended ones are expected for the files to be considered CF compliant, whenever they are applicable.

Warning

In xarray variable attributes are kept across operations depending on the value of the keep_attrs argument. Its default behaviour is to keep them in unambiguous circumstances. If set to True it will always keep the attributes, if set to False it will discard them.
It is important to be aware of this, as particularly for coordinates, units and cell_methods are easily changed by calculations and often inherited attributes become meaningless or worse, can cause issues if not updated. The same can also happen with other software.

Global attributes

Global attributes are the ones that apply to the entire file. Global attributes are useful to record provenance: keep track of operations applied to the file, data sources and software used to generate the data, any party involved. They are also used at publication stage when conventions like ACDD build on the CF ones to add publication related information, such as DOI, main contact, license and references.
While global attributes are the most useful when sharing data, for example to increase discoverability, using some key ones from the start of the file creation is important to keep track of the file history. Often this level of information is neglected when saving data to a file, which can make it hard if not impossible to reconstruct the analysis workflow.
Also, as global attributes are preserved during most analysis operations, output files end up containing information from the source file which is usually not relevant to the new data.
institution, source, references, and comment can also be assigned to individual variables, in such cases, the variable version has precedence.

Requirements

The title, history, institution, source, references, and comment attributes are all type string.

References
A useful summarised version of requirements and recommendations for CF conventions.