Data Curation

Data Curation

Data curation is concerned with advancing access to trustworthy and reusable data resources. DataLab researchers are actively investigating how to build rich, functional collections of digital data for research communities in the sciences and social sciences and how to improve access to open data for the public. Their work contributes to sustaining the long-term value of open data resources and global progress toward shared cyberinfrastructure.

Current Projects

Research Reproducibility and Data Reuse in Earth System Science

Earth System Science (ESS) is concerned with the physical, chemical, biological and human interactions that determine the future of our planet and the destiny of humankind. ESS research requires an intersection of disciplinary methods and results and relies on heterogeneous, ever growing data collections. The interdisciplinary nature of ESS poses a number of data challenges related to the need for valid integration and reuse of data and research reproducibility. This survey study investigates the perceptions, experiences, and practices of ESS researchers to benchmark the current state of reproducible research and data reuse. The results will inform how to develop open data systems and services for ESS. There are high expectations that open data can support transparency, rigor, and innovation in science and accelerate the pace of new discovery. Our work addresses the particular problems optimizing open data for application and integration across disciplines. The survey results will provide a baseline to inform further work on the data infrastructure needed to sustain data quality and foster the data access and integration necessary for robust interdisciplinary and reproducible ESS.

Open Data Literacy

Open Data Literacy is improving accessibility and use of open data through partnerships with public sector institutions. Action research projects are helping organizations make their data open and usable by the public. New curriculum and outreach are preparing information professionals to lead open data initiatives. (

Site-Based Data Curation

The Site-Based Data Curation project (SBDC) is developing a framework for the curation of research data generated at scientifically significant research sites. The framework is based on geobiology conducted at Yellowstone National Park, as an exemplar site producing data with long-term value. Yellowstone is a tremendously important and rich site for data collection in geobiology, drawing scientists investigating research questions ranging from the origin of life on Earth to the search for life on other planets. Modern research in the earth sciences increasingly depends on the development of systematic accounts of the interactions of physical, chemical and biological phenomena and the integration of diverse measurements and observations. Making data accessible and functional for these purposes will depend on 1) principled curation practices early in the data lifecycle and 2) curating cohesive and usable sets of data for transfer to repositories. The SBDC framework is also an important step forward in evolving the professional work of curation, and the inter-institutional relationships that are essential in the emerging ecology of scientific data curation.