FAIR(ER) Principles#

Research data that is intended for sharing should follow some general principles of data management and stewardship. The guiding principles often used in data governance are the FAIR principles of data sharing – Findable, Accessible, Interoperable, and Reusable. An extension to these are the FAIRER principles which add Ethical and Revisable to the acronym and is described in the book ‘Good Data’. These principles are very generalised, and it is not always easy to translate their meanings into relevant practices for climate data.

Findable#

The primary way to make a dataset findable is to mint a persistent identifier, such as a DOI, which provides a long-lasting reference to a digital object such as climate data. The data must then be uploaded to a searchable repository, database or data catalogue, so that the data can be found and used. This is the primary aim of the Publishing climate data chapter of this book.

Accessible#

Accessibility of a dataset depends on where the data is stored (e.g., is it accessible), how it is accessed (e.g., via ftp or https), the longevity of this storage, security and authentication protocols (if relevant), and usability restrictions as dictated by the license applied to the data. What happens to a dataset at the end of its life is also a consideration, where the accessibility principle says that metadata should always remain, even if the data no longer exists. These are considerations of the Publishing climate data and Retiring published climate data chapters.

Interoperable#

Interoperability refers to ability of research to easily use a published dataset, especially when using common tools and coding languages. The main ways to ensure the interoperability of your data is to make use of common and open file formats (e.g. netCDF4, grig, zarr) rather than proprietary formats or those that require specialised tools, and to apply discipline-specific controlled vocabularies (e.g. CMIP6 CV) and keywords that are easily recognisable by the community. These aspects are covered in the Creating climate data and Concepts. Data formats are discussed in another ACDG book called ‘Big Data’.

Reusable#

The reusability of the data is dictated primarily by the quality of the metadata and attributes. In other words, the degree to which the metadata effectively and comprehensively describes the data determines its reusability. Details on the provenance (origin & history) of the data are vital, including how it was created, by whom, for what purpose, and using what tools. Additionally, appropriate licensing can affect usability, as a highly restrictive or unclear license can prevent others from using the data in their own research. Finally, ensuring that the metadata and attributes are in line with discipline-specific expectations and standards. This may involve applying an existing set of data standards (e.g. the CORDEX data standards), or simply ensuring that attributes, filenaming and directory structures are likely to make sense to those using the data. Reusability is the main focus of the Creating climate data chapter, while licensing is explained in Concepts.

Ethical#

As scientists, we have an ethical obligation to do no harm through our work. When it comes to data, we should ensure that the collection, distribution, and reuse of data is done in a way that is respectful toward humans and the planet. Privacy is fast becoming an important factor in the collection of data of all kinds, and FAIRER data should aim to avoid violating the privacy of others. The creation, storage and use of data require energy and resources at a cost to environment, and the contributions of scientific activity to climate change and environmental degradation should be a consideration when performing science.

Revisable#

The standard definition of published data generally considers the data to be fixed – a snapshot in time, which if improved upon will result in a new dataset with fresh considerations. However, in practice this is often not the case, where published data can require fixes or corrections (e.g., typos in the metadata), retraction and republication (e.g., due to errors in the data), or extensions (e.g., observational datasets that are regularly being updated). We provide some general guidance around this in the chapter Managing climate data.