Use case: unpublished data#
This use case addresses the plethora of data that is associated with unpublished data creation. That is, storage use including, but not limited to:
Model configuration data
Failed model run output
Successful model run output
Data prepared for collaborative sharing but not publication/DOI
Intermediate data products
How each of these scenarios is handled will typically be determined on a project basis, with a view to the importance of reproducibility and considering relative compute or storage costs.
Suggested procedures#
If compute is readily available but storage is limited
Maintain a database or wiki of model runs
Create zip archives of model configurations and move to slow access tape storage if they are required to be kept for reproducibility
If model run failed, remove data immediately
If model run was successful and post-processing has been completed (and if bit-reproducibility across systems is not a concern), then data can be removed, perhaps after an initial quarantine period for data validation
Intermediate data products and collaborative data can be retired at the end of their active projects, following a quaratine period
a. Some intermediate data may not have a logical project end, such as regridded CMIP data - such data might follow a similar approach as the replicated data use case.
If compute is limited but deep storage is readily available
Repeat steps 1-3 as above.
Following post-processing and validation, or at project close, data should be tarred and transferred to a deep storage system