loading page

Data Reuse and Reproducibility in Earth System Science: A Survey of Current Practices, Barriers, and Expectations
  • An Yan,
  • Caihong Huang,
  • Carole Palmer
An Yan
University of Washington Seattle Campus

Corresponding Author:yanan15@uw.edu

Author Profile
Caihong Huang
University of Washington Seattle Campus
Author Profile
Carole Palmer
University of Washington Seattle Campus
Author Profile

Abstract

As Earth System Science (ESS) becomes more data-intensive, collaborative, and interdisciplinary, it is important to understand how best to support and advance data reuse. We conducted an online survey of active ESS researchers from 126 U.S. universities and research centers, representing a wide variety of scientific fields. Of the 207 respondents, 51.7% had more than 20 years of research experience. Results indicated that the current primary purposes for reusing data are to conduct new analysis (87%), followed by comparing results (70.4%), with only 18.5% reusing data to reproduce published studies. As expected, data hosted by federally funded data centers were reused most frequently, with open government data and data provided directly from other researchers also widely used. Reuse of data from other types of repositories lags far behind, due in part to a range of service limitations. At the same time, data sharing by respondents is strong—96.6% actively release their data, primarily as supplements to published papers, with moderate use of open access repositories. Of the 45.9% who had attempted to reproduce research, 73.7% failed at least once, often due to the limited detail provided in published papers. Still, 92.3% believe it is the researcher’s responsibility to ensure their work is reproducible. The majority favored traditional modes of documenting research—word processors, text editors, and code commenting over electronic notebooks or workflow systems. Interestingly, 59.9% continue to use hand-written notebooks. Challenges to data reuse and reproducibility specific to ESS included the complex nature of earth systems, increasingly complicated models, lack of data management resources, and limited emphasis on reproducibility in the field. Open-ended responses raised questions about whether “exact replication” is necessary or possible for ESS. Most researchers agreed that data and code should be considered important research products and that outlets are needed for publishing negative results. Taken together, the results suggest a strong data sharing culture in ESS with high levels of reuse and commitment to open science. The research community would benefit greatly from better documentation and sharing of methods and research processes, as well as targeted improvements in data services and tools.