Guidelines to manage the retirement of a dataset

Guidelines to manage the retirement of a dataset

Sometimes it becomes necessary to “retire” a dataset. This page covers some of the cases and considerations that come into play.

For the purposes of scientific reproducibility, in an ideal world we would never need to retire data. However, data centres are not infinite and sometimes datasets contain critical errors, which necessitates removing data from a publication or filesystem. It is important when doing so to follow a procedure to make sure that users are not affected by the changes. Ideally a dataset retirement procedure should be established and advertised before the data is even made available. This procedure will vary depending on each dataset’s characteristics and the community’s needs. The most important factor is whether the user the primary data publisher or managing a replica.

Scope of the guidelines

This page is intended to provide general guidance on

  • published data that has become obsolete

  • published data that contains a serious error

  • replicated data that has become obsolete

  • data that was never published but needs to be archived or replaced

The process may be similar in many cases, but it is important to consider how others may be using a dataset before acting, as well as any maintenance obligations associated with having published the data. For simplicity, when referring to a published dataset in the guidelines we are referring to a dataset for which you are the primary publisher.