Data Management Plan#

A Data Management Plan (DMP) is a document describing what will be used and generated by a project and how this will be managed and shared. The definition and actual form of a data management plan can vary depending on the use the plan is created for. What you might include also depends on the project field.

A DMP can be required:

  • to apply for a grant

  • by your institution for PhD and/or major projects

  • when you want to publish your data in a repository

and it is also useful:

  • to keep track of data provenance

  • to manage storage and computing resources at project level

  • whenever you share data, even if you do not intend to publish yet

Creating a Data Management Plan#

It is best to create a DMP as early as you can in your project. Some institutions will provide tools and/or guidance on how to create a DMP. CLEX for example provides CLEX_Roadmap; a DMP webtool which you can use to create, store and share DMPs, as well as for guidance on how to structure one.

Here a distinction between three data management levels is made: personal, group and public.

Personal

This applies to any project exploring new ideas and/or procedures. A single researcher mostly conducts the work and hence there are no specific sharing requirements.

At this level a formal data management plan is not necessary, however one can still be useful to start collecting information in case the project becomes bigger and to help plan how the research will be conducted.

At this stage a plan can help design the experiment:

  • which data will be used, if it is locally available and if can be used (see licenses)

  • basic workflow description

  • bulk estimate of data volume and compute resources

  • software and support (including training) necessary to run experiment and/or post-process data

It is also never too early to start to work on project provenance:

  • applying where possible relevant metadata conventions to any data produced, it is easier when done from the first analysis steps rather than having to update the metadata subsequently.

  • adopting version control for your code !!!COMMENT add reference elsewhere in book or see issue 38

  • building references to data/code sources

Group

This applies to any project where data will be shared. Hence this applies to any group activity but also to single researcher projects whose output could be of interest to other researchers (this is often true of model experiments).

When data is shared with others, it is important that it is accompanied by its provenance. This is so others can verify the validity of the data and also use the data in the correct way.

A DMP should also include:

  • list of researchers responsible for the data and all the interested parties

  • all information previously entered in the DMP should be reviewed to make sure it is still applicable

  • directory structure and filenames should be consistent and follow conventions wherever possible; CF standard or other relevant standards should be adopted if not already in use.

  • data reduction plan: does all model data need to be saved after analysis?

  • data retention plan: when can data be deleted?

  • backup strategy

If research is part of a collaboration project:

  • a data agreement on how data will be shared/published should be in place from the start

  • any data policy and/or standard adopted by the project should be followed.

Public

This applies to the final output of the project when data is ready for publication. In addition to the requirements of the group level:

  • a descriptive title and acronym

  • a version, including a versioning strategy for future updates

  • a plain English abstract

  • links to relevant documentation and code

  • license

  • keywords

  • long-term storage strategy

  • any other specific requirements that the chosen repository might have, must be addressed.

As a DMP is documenting and describing your data output as a paper documents and describes your research output, some tips about writing research papers can be applied to the DMP too. We collected a few found online below.

Tips

…Pitfalls include using complicated jargon, including unnecessary details, and writing for your highly specialised colleagues instead of a wider audience. …
Only abbreviations firmly established in the field are eligible, … avoiding those which are not broadly used …
… when looking for keywords, avoid words with a broad meaning and words already included in the title.
… you need to include detailed information so a knowledgeable reader can reproduce the experiment. However, do not repeat the details of established methods; use References and Supporting Materials to indicate the previously published procedures.
… indicate uses and extensions if appropriate. Moreover, you can suggest future experiments and point out those that are underway …
Effective research articles are interesting and useful to a broad audience, including scientists in other fields.
use standard systems … conventions