Guidelines to manage the retirement of a dataset

Guidelines to manage the retirement of a dataset#

Sometimes it becomes necessary to “retire” a dataset. This page covers some of the cases and considerations that come into play.

For the purposes of scientific reproducibility, in an ideal world we would never need to retire data. However, data centres are not infinite and sometimes datasets contain critical errors, which necessitates removing data from a publication or filesystem. It is important when doing so to follow a procedure to make sure that users are not affected by the changes. Ideally a dataset retirement procedure should be established and advertised before the data is even made available. This procedure will vary depending on each dataset’s characteristics and the community’s needs. The most important factor is whether the user the primary data publisher or managing a replica.

Scope of the guidelines#

This page is intended to provide general guidance on

published data that has become obsolete
published data that contains a serious error
replicated data that has become obsolete
data that was never published but needs to be archived or replaced

The process may be similar in many cases, but it is important to consider how others may be using a dataset before acting, as well as any maintenance obligations associated with having published the data. For simplicity, when referring to a published dataset in the guidelines we are referring to a dataset for which you are the primary publisher.

Index#

General principles of data retirement
Use case: published dataset
Use case: replica dataset
Use case: unpublished (e.g. model output)
References and further information

Guidelines to manage the retirement of a dataset

Contents

Guidelines to manage the retirement of a dataset#

Scope of the guidelines#

Index#