Data availability statement#

A Data Availability Statement is a brief document which details where supporting data are available, and how the data can be accessed and reused. It should list specific restrictions, if any. If the data underlying the study cannot be made available or reused, the Data Availability Statement should describe as completely as possible the process needed to reproduce the results of the study.

A Data Availability Statement is usually required for a journal publication; however, the same concept is very useful at the completion of a project, or your PhD thesis, especially if you are leaving your institution. In such cases, it is a good idea to also indicate what data will still be available on the local server, how it can be accessed, who will be managing from now on, and where to find more detailed information on this data as a data management plan.

This statement is different from a published data paper which usually includes a lot more information and refers to a specific dataset. A Data Availability Statement includes only how to access the data, but it refers to all the datasets used in the study.

Examples#

Here are a few examples copied from the American Meteorological Society website, to show what is usually included. If you are doing this at the end of a project and you are leaving behind data on the local server, appoint a new data custodian, and put all relevant information in a data management plan, and then list this too.

  1. Datasets are available in a funder-mandated or public (institutional, general, or subject-specific) repository that assigns persistent identifiers to datasets.

    • Example: “All sea ice concentration data created or used during this study are openly available from the NASA National Snow and Ice Data Center Distributed Active Archive Center at https://doi.org/10.5067/8GQ8LZQVL0VL as cited in Cavalieri et al. (1996).”

  2. Datasets published in the literature.

    • Example: Datasets for this research are included in Anderson et al. (2019).

  3. Datasets derived from public resources and made available with the article.

    • Example 1: Data analyzed in this study were a re-analysis of existing data, which are openly available at locations cited in the reference section. Further documentation about data processing is available at [repository name] at [insert DOI here].

    • Example 2: Datasets analyzed during the current study are available in the [repository name] [identifying doi or persistent URL] [Reference number]. These datasets were derived from the following public domain resources: [list resources and their URLs].

  4. Data sharing not applicable (e.g., for review articles or theory-based articles).

    • Example: “No datasets were generated or analyzed during the current study.”

  5. Datasets that are restricted and not publicly available.

    • Example 1: “Due to its proprietary nature, supporting data cannot be made openly available. Further information about the data and conditions for access are available at the [repository name] at [insert DOI here].”

    • Example 2: “Due to confidentiality agreements, supporting data can only be made available to bona fide researchers subject to a non-disclosure agreement. Details of the data and how to request access are available from [data manager contact info] at [institution where data reside].”

    • Example 3: “Due to privacy and ethical concerns, neither the data nor the source of the data can be made available.”

  6. No valid data repositories exist. In rare cases where no valid data repositories are identified after reviewing all resources, including information specified on the Data Archiving Guidance page, authors must provide a transparent process for making the data available to others.

    • Example 1: “The dataset on which this paper is based is too large to be retained or publicly archived with available resources. Documentation and methods used to support this study are available from [data manager contact info] at [institution].”

    • Example 2: “The authors were unable to find a valid data repository for the data used in this study. These data are available from [data manager contact info] at [host institution].”

    • Example 3: “The numerical model simulations upon which this study is based are too large to archive or to transfer. Instead, we provide all the information needed to replicate the simulations; we used model version [V#.#]. The model code, compilation script, initial and boundary condition files, and the namelist settings are available at [DOI or permanent URL].”