Lessons Learned from Developing and Operating the Critical Zone Collaborative Network (CZNet) Data Cyberinfrastructure
Abstract
The diverse community of Critical Zone (CZ) scientists study the coupled chemical, biological, physical, and geological processes operating across scales to support life at the Earth's surface. Data produced by CZ scientists are as diverse as their creators, ranging from time series of observations from in situ sensors to laboratory analysis of various types of physical samples, geophysical measurements, microbiological sampling, and many other data types. In 2020, the U.S. National Science Foundation funded a network of CZ Thematic Cluster projects called “CZNet” to work collaboratively in answering scientific questions related to effects of urbanization on CZ processes; CZ function in semi-arid landscapes and the role of dust in sustaining these ecosystems; deep bedrock processes and their relationship to CZ evolution; CZ recovery from disturbances such as fire and flooding; and changes in the coastal CZ related to rising sea level. The CZ Coordinating Hub’s goal is to make data, samples, software, and other research products created by CZ Net projects Findable, Accessible, Interoperable, and Reusable (FAIR), using existing domain-specific repositories. This presentation will focus on several of the challenges the CZNet Hub project has encountered in working toward this goal. We will discuss our experiences in coordinating data collection, archival, discovery, and access for CZNet through development of shared cyberinfrastructure and best practices for data sharing across the network of Thematic Cluster projects. Challenges include the diversity of CZ research, which meant that no single domain data repository was sufficient. To address this, we built a data submission portal for multiple repositories and a catalog to enable discovery of research products wherever they are deposited. Helping investigators choose the “best” repository remains a challenge as this is not only related to the scientific domain of the data but also the perceived level of effort required for sharing data. Data management techniques that enable reproducibility of results and further synthesis of data are also challenges for which solutions involve development of shared best practices such as including code and scripts for data preparation and analysis activities, training, and adoption by the community of scientist