Jeffery S. Horsburgh

and 12 more

The diverse community of Critical Zone (CZ) scientists study the coupled chemical, biological, physical, and geological processes operating across scales to support life at the Earth's surface. Data produced by CZ scientists are as diverse as their creators, ranging from time series of observations from in situ sensors to laboratory analysis of various types of physical samples, geophysical measurements, microbiological sampling, and many other data types. In 2020, the U.S. National Science Foundation funded a network of CZ Thematic Cluster projects called “CZNet” to work collaboratively in answering scientific questions related to effects of urbanization on CZ processes; CZ function in semi-arid landscapes and the role of dust in sustaining these ecosystems; deep bedrock processes and their relationship to CZ evolution; CZ recovery from disturbances such as fire and flooding; and changes in the coastal CZ related to rising sea level. The CZ Coordinating Hub’s goal is to make data, samples, software, and other research products created by CZ Net projects Findable, Accessible, Interoperable, and Reusable (FAIR), using existing domain-specific repositories. This presentation will focus on several of the challenges the CZNet Hub project has encountered in working toward this goal. We will discuss our experiences in coordinating data collection, archival, discovery, and access for CZNet through development of shared cyberinfrastructure and best practices for data sharing across the network of Thematic Cluster projects. Challenges include the diversity of CZ research, which meant that no single domain data repository was sufficient. To address this, we built a data submission portal for multiple repositories and a catalog to enable discovery of research products wherever they are deposited. Helping investigators choose the “best” repository remains a challenge as this is not only related to the scientific domain of the data but also the perceived level of effort required for sharing data. Data management techniques that enable reproducibility of results and further synthesis of data are also challenges for which solutions involve development of shared best practices such as including code and scripts for data preparation and analysis activities, training, and adoption by the community of scientist
Collection, management, and sharing of environmental sensor data require hardware and software to support the day-to-day data management needs of scientists and practitioners who operate networks of environmental sensors and dataloggers. Given the volume of data produced, it is challenging to consistently produce data products of sufficient quality for use in operational or scientific contexts. Specific challenges include data retrieval from field sites, provisioning performant storage, specification of unambiguous metadata, mediation across the different formats, standards (or lack of standards), protocols, and vocabularies used by various sensor and datalogger manufacturers, data quality control and versioning, and integration with reputable data repositories for sharing and publication. Past efforts, including the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) Hydrologic Information System (HIS), provided software tools that met many of these needs but are now more than a decade old, leaving few reliable options for sensor data management. While viable commercial software options exist, the cost of some systems is beyond the reach of many data collection organizations. Other solutions are tied to specific datalogger/sensor manufacturers with no interoperability, making it difficult to collect and manage data where a diversity of sensors is required. Additionally, new ways of collecting data using Internet of Things (IoT) devices have recently emerged along with new standards for collecting, describing, and sharing sensor data, including the Open Geospatial Consortium’s (OGC) SensorThings standard. These new methods and standards help with interoperability but lack sufficient software implementation for easy adoption. In this presentation, we describe the open source HydroServer software, which was developed to enable collection, storage, management, and sharing of data from environmental sensors deployed at in situ monitoring sites. HydroServer provides web-based management of in situ sensor data, integration of OGC’s SensorThings application programming interface and data model for automating data ingestion and data querying, deployment in the commercial cloud, and integration with the HydroShare repository for data sharing/archival.

Jeffery S. Horsburgh

and 10 more

Critical Zone (CZ) scientists study the coupled chemical, biological, physical, and geological processes operating across scales to support life at the Earth's surface. In 2020, the U.S. National Science Foundation funded a network of Thematic Cluster projects called “CZ Net” to work collaboratively in answering scientific questions related to effects of urbanization on CZ processes; CZ function in semi-arid landscapes and the role of dust in sustaining these ecosystems; deep bedrock processes and their relationship to CZ evolution; CZ recovery from disturbances such as fire and flooding; and changes in the coastal CZ related to rising sea level. Data collected by these projects are diverse, ranging from time series from in situ sensors to laboratory analysis of physical samples, geophysical measurements, and others. Thus, coordinating data collection, archival, discovery, and access for the network presents significant challenges. Given the diversity in scientific domains represented, data produced, and collaborations, no single repository fully meets the needs of CZ scientists, posing questions of which repositories to use, how to enable discovery of and access to data across different repositories, and how to develop and promote best practices for sharing research products. This presentation describes cyberinfrastructure (CI) development by the CZ Net Coordinating Hub that leverages existing, domain-specific repositories for managing, curating, disseminating, and preserving data and research products from the CZ Net projects. We have developed CI that links existing data facilities and services, including HydroShare, EarthChem, Zenodo, and other repositories via a CZ Hub that provides tools for data submission, resource registry, metadata cataloging, resource discovery/access, and links to computational resources for analysis and visualization. The CZ Hub’s goal is to make data, samples, software, and other research products created by CZ Net projects Findable, Accessible, Interoperable, and Reusable (FAIR), using existing domain-specific repositories. The repository interoperability we have demonstrated for delivering data services for an interdisciplinary science program may provide a template for future development of integrated, interdisciplinary data services.

Nour A. Attallah

and 1 more

We present a model of indoor residential water use that estimates water demand and conservation potential by end use for a target community by simulating indoor water end use events at a household level. The model uses end use event data from a set of representative residential households to simulate a larger community and advances existing end use models by: 1) accounting for an expanded set of indoor water end uses; 2) considering the variability in flowrates, durations, and volumes for end use events over different days of the week; and 3) providing a generalized approach for simulating indoor water usage and potential conservation at the city level. The model simulates residential water use behavior in individual households by randomly sampling water end use events for different end use types for each day of the week and then aggregating the sampled water end use events per day to estimate the daily water use per household. We used the model to evaluate a set of technological and behavioral conservation actions to quantify the conservation potential in each simulated household as well as aggregated to the city level. We evaluated the performance of the model in predicting the observed average daily water use of households in Logan City, Utah, USA and compared against other common water demand models to demonstrate the model’s reliability. The results of this paper are reproducible using openly available code and data, representing an accessible platform for advancing water demand modeling using detailed water end use data.

Jeffery Horsburgh

and 3 more

Critical Zone (CZ) scientists study the system of coupled chemical, biological, physical, and geological processes operating together across all scales to support life at the Earth’s surface (Brantley et al., 2007). In 2020, the U.S. National Science Foundation funded a new network of Thematic Cluster projects who are working collaboratively to answer scientific questions related to effects of urbanization on CZ processes; CZ function in semi-arid landscapes and the role of dust in sustaining these ecosystems; processes in deep bedrock and their relationship to CZ evolution; recovery of the CZ from disturbances such as fire and flooding; and changes in the coastal CZ related to rising sea level. Given the diversity of data being collected by these projects, supporting data collection, access, and archival for the larger network presents significant challenges. Leveraging existing repositories and cyberinfrastructure provides many benefits, but still poses the questions of which repositories to use and how to enable discovery of and access to data that may be deposited across different repositories. This presentation describes new cyberinfrastructure development that leverages existing, domain-specific data repositories to enable managing, curating, disseminating, and preserving data from the new network of CZ Thematic Cluster projects. A distributed architecture is under development that links existing data facilities and services, including HydroShare, EarthChem, SESAR, and eventually other systems as needed, via a CZ Hub that provides tools for simplified data submission, discovery and access, and links to computational resources for data analysis and visualization in support of CZ synthesis efforts. Our goal is to make data, samples, and software collected by the Thematic Cluster projects Findable, Accessible, Interoperable, and Reusable (FAIR), using existing domain-specific repositories. This collaboration among repositories to deliver integrated data services for an interdisciplinary science program may provide a template for future development of integrated, interdisciplinary data services. Brantley, S.L., M.B. Goldhaber, V. Ragnarsdottir (2007). Crossing disciplines and scales to understand the Critical Zone. Elements 3, 307-314, doi:10.2113/gselements.3.5.307.

Jeffery Horsburgh

and 2 more

Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.

David Tarboton

and 11 more

HydroShare is a domain specific data and model repository operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) to advance hydrologic science by enabling individual researchers to more easily share products resulting from their research. The community platform supports, not just the scientific publication summarizing a study, but also the data, models and workflow scripts used to create the scientific publication and reproduce the results therein. HydroShare accepts data from anybody, and supports Findable, Accessible, Interoperable and Reusable (FAIR) principles. HydroShare is comprised of two sets of functionality: (1) a repository for users to share and publish data and models, collectively referred to as resources, in a variety of formats, and (2) tools (web apps) that can act on content in HydroShare and support web based access to compute capability. Together these serve as a platform for collaboration and computation that integrates data storage, organization, discovery, and analysis through web applications (web apps) and that allows researchers to employ services beyond the desktop to make data storage and manipulation more reliable and scalable, while improving their ability to collaborate and reproduce results. This presentation will describe the capabilities developed for HydroShare to support the full research data management life cycle. Data can be entered into HydroShare as soon as it is collected, and initially shared only with the team directly working on the data. As analysis proceeds, tools, scripts and models that act on the data to produce research results may be stored in HydroShare resources alongside the data. At the time of publication these resources may be permanently published and receive digital object identifiers and cited in research papers. Resources may themselves include citations to the research papers, thereby linking the publications to the supporting data, scripts and models. HydroShare design choices and capabilities for establishing relationships and versioning, based on simplicity, and ease of use, and some of the challenges encountered, will be discussed.