Data-Driven Community Building: Measuring and Improving Connectivity in
Domain Repositories
Abstract
Domain repositories can be an integral part of extensive community
support systems that extend from proposal planning and writing, through
project initiation and implementation, data collection, management, and
archive, to publication of results and access to data by other community
members. These long-term relationships are reflected in multiple
contributions (data, software, results, papers, …) by community
members and recognizing these contributions should be an important
community-building best-practice for these repositories. Identifiers for
people and organizations are critical for recognizing community members
and, equally important, for making connections between them and all of
the various objects in the research ecosystem. This Figure demonstrates
connections that can made once identifiers are integrated into the
research ecosystem. Most domain repositories provide DOIs for datasets
in the repository. The metadata for those DOIs can include identifiers
for some authors (ORCIDs) along with names of organizations they are
affiliated with (affiliations). In practice, most authors in these
metadata records do not have ORCIDs but, if they have an ORCID once,
that ORCID can be spread across all of the datasets they have
contributed to, increasing connectivity across the repository.
Affiliations can also be spread across multiple contributions, with some
caveats. If identifiers (i.e. RORs) exist and can be found for the
affiliated organizations, they can be inserted into the metadata, again
increasing connectivity. Many domain repositories maintain lists of
research papers that have used data from the archive. Metadata for these
papers also provide a potential source for identifiers and affiliations.
These can also be harvested and spread across the repository, again
improving connectivity. These ideas and techniques were applied to
UNAVCO, a repository for data related to geodesy with a well-developed
community with over 5000 archived datasets. The connectivity for the
repository is below 10% for dataset contributors and 0% for RORs.
Applying these techniques can increase the connectivity to 56% for
contributors and 49% for RORs.