Publishing climate data with NCI#
NCI provides web services to publish data and metadata:
a dataset catalogue based on GeoNetwork to describe the dataset (i.e. a metadata repository) and associated DOI. This will be the landing page for the DOI and will include a link to access the dataset.
a THREDDS Data Server (TDS). This is a public data repository that provides access to the data. THREDDS offers a variety of protocols, and files can be downloaded from here or accessed via the OPeNDAP protocol.
What data can be published with NCI#
Publishing with NCI is a good option when publishing a big dataset and/or data in netCDF format, with obvious candidates being outputs of model simulation runs on Gadi. THREDDS was developed for netCDF files, and publishing on a THREDDS server means that data is also available by OPeNDAP, which can be used with several analysis software packages. This is useful when publishing a big dataset or the data is stored in big files, since OPeNDAP allows remote access where a user can subset the data and avoid downloading the entire dataset.
Generic procedure to publish with NCI#
Currently NCI is in the process of updating their data procedures, their official documentation does not yet include a detailed description of the process, so the following information is based on our experiences.
Data project#
NCI manages storage and computational resources via projects. To publish data with them a project setup exclusively for this scope is needed. Depending on the researcher’s affiliation and or the dataset scope, they might be able to contribute to an existing “publishing” data project. This is a good option if the dataset is relatively small, as NCI is unlikely to setup a project for a small dataset. An example is the CLEX collection which uses project ks32.
If the dataset is big enough to have its own project, then the researcher should contact the NCI data team, via the helpdesk, and discuss the options with them. If NCI agrees to proceed with the publication, the researcher will have to provide details of the dataset and usually funding for the disk storage.
Creating a GeoNetwork record#
The first step is to create a DMP to collect information on the dataset. Once NCI has agreed to publish the dataset, they will create a page on their confluence site for the specified project, containing a table to be populated with the dataset information. These pages are only visible to interested parties, therefore we provide an example from the publication of a satellite dataset, using a google spreadsheet instead. Once the DMP is ready NCI will use the content to create a geonetwork record and mint a DOI for the new dataset. The GeoNetwork record will provide the landing page for the DOI and will be visible only once the files are available on THREDDS.
Preparing the files#
The actual files have to be organised in a
CF-checker
CLEX CMS team has installed the IOOS CF-checker NCI uses and created a simple python wrapper that creates a similar report to the NCI one. These can be used to check the files before submitting them to NCI. Details are on the CLEX CMS wiki.
Existing data collections#
CLEX#
CLEX has its own data collection in project ks32, anyone associated with CLEX can publish a dataset in this collection.
The CLEX CMS wiki has plenty of information on the process. CLEX currently manages a DMP web tool, which they use to collect information on the dataset. The relevant parts of the DMP are then used to generate a geonetwork xml file that can be directly uploaded by NCI, instead of populating the DMP on a confluence page.
CSIRO#
The Australian contribution to CMIP6 is published in project fs38, and to CMIP5 in project rr3
BoM#
The Australian Water Outlook Service Data Collection is published in project iu04.
The Australian Gridded Climate Data (AGCD) Collection is published in project zv2.
NCI#
Reference Datasets for Climate Model Analysis/Forcing, in project qv56 which includes Obs4MIPs and input4MIPs.
The MERRA2 6-hourly reanalysis data is stored in project rr7