New, modified, and derived datasets#

Creating new datasets from raw data#

Paola (new comments following meeting Sep23):

We discussed here mentioning tools to generate/modify a netcdf file (ncdump/ncgen, nco to modify attributes, how xarray/matlab “create” netcdf file)) rather than trying to re-create every possible workflow. As well as things a user should check to make sure they’re following the reccomendations listed in create-basics. For example ar ethe attributes still relevant both at global and variable level?

Paola:
however rare, we could cover starting from a template, as for a cdl file (i.e. a ncdump output style file)

data saved from analysis - start saving data with reasonable default format (free, common etc.) and chunking, compression etc if netcdf

the least complicated possible dimensions, keeping into account also data use rather than dumping everything as it is

introduce early descriptive names for variables, and conventions where applicable

include units if applicable

at initial level some global attributes/metadata associate file to keep track of workflow and describe what’s in the file.

Chloe:
Provenance: https://acdguide.github.io/Governance/concepts/provenance.html

Link to ACCESS Archiver

Modifying existing files in-situ#

ncatted, etc

Creating derived datasets from existing/published data#

Paola:
Make sure original attributes/documentation are still relevant

be careful particularly with units, cell_methods and coordinates that might have changed

Chloe:
Provenance: https://acdguide.github.io/Governance/concepts/provenance.html