Introduction

This book, ‘Climate Dataset Guidelines’, provides guidelines for creating, managing, sharing and publishing climate data. With a focus on Australian climate research, it has been designed to enable a common approach across the community. Despite the focus on climate data, many of the principles and tools described within are applicable to many Earth science data applications.

These guidelines are applicable to all levels of data sharing, from peer-to-peer sharing with colleagues to full publication in a public repository.

If you intend to share your data with colleagues for simple purposes, such as sanity checking and comparisons, some level of provenance and metadata preparation is likely to be useful, so feel free to pick and choose the parts of this book that are relevant for your use case. However, if your data is to be published or shared widely/publicly, it should follow FAIR principles and meet the minimum set of formatting and metadata recommendations as described in this book.

Creating climate data

Creating datasets that are fit for publication can be a daunting process, with many technical aspects to consider. This section will outline the ways to ensure that your data meets the broad FAIR principles of data sharing, along with specific requirements and recommendations for climate-related data that will aid in the use and reproducibility of your data by the Australian, and international, climate community.

Publishing climate data

Publishing data involves uploading them to an accessible respository, and having a persistent identifier (e.g., a DOI) attached in a similar manner to publishing a research paper. This data can then be used and referenced by other researchers, increasing the scientific impact your analysis-ready data can have. Additionally, many journals now require that data on which a paper is based to be published in this manner. In this section, the various pathways to data publication are presented, with a focus on recommendations for those at Australian research institutions.

Managing climate data

When you publish a dataset or are maintaining one or more dataset replicas, it is important to have a management plan well defined before sharing the data with others. These guidelines cover all different aspects of managing data, focusing on the NCI facility to provide a concrete example. These includes organising and managing a data project, documentation, provenance and enabling data access and discovery. As with all scientific outputs, errors and inconsistencies can be found in climate research data, oftentimes after publication. This section also provides guidance to the updating of published data, along with recommendations for actions that can be taken prior to the initial publication (such as good versioning practices) that can make this process much easier down the line, as published data may need to be updated many times in its lifespan.

Retiring climate data

Data doesn’t last forever, usually becoming outdated or obsolete within 5-10 years; this of course is simply the nature of scientific research. In this section, recommendations are presented on how to go about retiring a dataset, both published and replicated, without breaking citations, removing identifiers, or causing disruption to users, while retaining the value of your research data.