Introduction#

This book, ‘Climate Dataset Guidelines’, provides guidelines for creating, managing, sharing and publishing climate data. With a focus on Australian climate research, it has been designed to enable a common approach across the community. Despite the focus on climate data, many of the principles and tools described within are applicable to many Earth science data applications.

These guidelines are applicable to all levels of data sharing, from peer-to-peer sharing with colleagues to full publication in a public repository.

If you intend to share your data with colleagues for simple purposes, such as sanity checking and comparisons, some level of provenance and metadata preparation is likely to be useful, so feel free to pick and choose the parts of this book that are relevant for your use case. However, if your data is to be published or shared widely/publicly, it should follow FAIR principles and meet the minimum set of formatting and metadata recommendations as described in this book.

Creating climate data#

Creating datasets that are fit for publication can be a daunting process, with many technical aspects to consider. This section will outline the ways to ensure that your data meets the broad FAIR principles of data sharing, along with specific requirements and recommendations for climate-related data that will aid in the use and reproducibility of your data by the Australian, and international, climate community.

Publishing climate data#

Publishing data involves uploading them to an accessible respository, and having a persistent identifier (e.g., a DOI) attached in a similar manner to publishing a research paper. This data can then be used and referenced by other researchers, increasing the scientific impact your analysis-ready data can have. Additionally, many journals now require that data on which a paper is based to be published in this manner. In this section, the various pathways to data publication are presented, with a focus on recommendations for those at Australian research institutions.

Managing climate data#

When you publish a dataset or are maintaining one or more dataset replicas, it is important to have a management plan well defined before sharing the data with others. These guidelines cover all different aspects of managing data, focusing on the NCI facility to provide a concrete example. These includes organising and managing a data project, documentation, provenance and enabling data access and discovery. As with all scientific outputs, errors and inconsistencies can be found in climate research data, oftentimes after publication. This section also provides guidance to the updating of published data, along with recommendations for actions that can be taken prior to the initial publication (such as good versioning practices) that can make this process much easier down the line, as published data may need to be updated many times in its lifespan.

Retiring climate data#

Data doesn’t last forever, usually becoming outdated or obsolete within 5-10 years; this of course is simply the nature of scientific research. In this section, recommendations are presented on how to go about retiring a dataset, both published and replicated, without breaking citations, removing identifiers, or causing disruption to users, while retaining the value of your research data.

Creating climate data products#

Climate data is often used in other research fields, government initiatives and by private stakeholders for a variety of applications. The process of adapting and packaging climate data so that it will be of use to a wider audience, with different backgrounds and/or for different purposes is more complex than simply sharing data with other climate researchers. At the moment we provide only an overview of what this section aims to cover. We welcome input and collaboration from people who have relevant experience or would like to propose use cases to cover.