loading page

Going beyond the spreadsheet - developing Best Practices in ‘long-tail’ environmental data curation and publishing
  • +7
  • Corinna Gries,
  • Renée Brown,
  • Mary Gastil-Buhl,
  • Sarah Elmendorf,
  • Hap Garritt,
  • Mary Martin,
  • Greg Maurer,
  • An Nguyen,
  • John Porter,
  • Timothy Whiteaker
Corinna Gries
University of Wisconsin Madison

Corresponding Author:cgries@wisc.edu

Author Profile
Renée Brown
University of New Mexico Main Campus
Author Profile
Mary Gastil-Buhl
Moorea Coral Reef LTER
Author Profile
Sarah Elmendorf
University of Colorado Boulder
Author Profile
Hap Garritt
Woods Hole Marine Biological Lab
Author Profile
Mary Martin
University of New Hampshire Main Campus
Author Profile
Greg Maurer
New Mexico State University
Author Profile
An Nguyen
University of Texas at Austin
Author Profile
John Porter
University of Virginia
Author Profile
Timothy Whiteaker
CRWR
Author Profile

Abstract

The research data repository of the Environmental Data Initiative (EDI) is a signatory of the FAIR Data Principles. Building on over 30 years of data curation research and experience in the NSF-funded US Long-Term Ecological Research program (LTER), it provides mature functionalities, well established workflows, and support for ‘long-tail’ environmental data publication. High quality scientific metadata are enforced through automatic checks against community developed rules and the Ecological Metadata Language (EML) standard. Although the EDI repository is far along the continuum of making its data FAIR, representatives from EDI and the LTER Information Management community have recently been developing best practices for the edge cases in environmental data publishing. Here we discuss and seek feedback on how to best handle the publication of these ‘long-tail’ data when extensive additional data are available along with e.g., genomics data, physical specimens, or flux tower data. While these latter data are better handled in other discipline-specific repositories such as NCBI, iDigBio, and AmeriFlux, they are frequently associated with other data collected at the same time and location, or even from the same samples. This is particularly relevant across the LTER Network, where sites represent integrative research projects. Questions we address (and seek community input from) include: How to archive documents and images when they are data, e.g., field notebooks, or time-lapse photographs of plant phenology? How to deal with data from Unmanned Vehicle (e.g., drones and underwater gliders), acoustic data, or model outputs, which may be several terabytes in size? How should processing scripts or modeling code be associated with data? Overall, these best practices address issues of Findability and Accessibility of data as well as greater transparency of the research process.