loading page

The arXiv of the future will not look like the arXiv
  • Alberto Pepe,
  • Matteo Cantiello,
  • Josh Nicholson
Alberto Pepe
Authorea

Corresponding Author:alberto.pepe@gmail.com

Author Profile
Matteo Cantiello
Simons Foundation
Author Profile
Josh Nicholson
Authorea
Author Profile

Abstract

The arXiv is the most popular preprint repository in the world. Since its inception in 1991, the arXiv has allowed researchers to freely share publication-ready articles prior to formal peer review. The growth and the popularity of the arXiv emerged as a result of  new technologies that made document creation and dissemination easy, and cultural practices where collaboration and data sharing were dominant. The arXiv represents a unique place in the history of research communication and the Web itself, however it has arguably changed very little since it's creation.  Here we look at the strengths and weaknesses of arXiv in an effort to identify what possible improvements can be made based on new technologies not previously available. Based on this, we argue that a modern arXiv might in fact not look at all like the arXiv of today.

Introduction

The arXiv, pronounced "archive", is the most popular preprint repository in the world.  Started in 1991 by physicist Paul Ginsparg, the arXiv allows researchers to freely share post publication-ready articles prior to formal peer review and publication. Today, the arXiv publishes over 10,000 articles each month from high-energy physics, computer science, quantitative biology, statistics, quantitative finance, and others (see Fig \ref{104668}). The early success of arXiv stems from the introduction of new technological advances paired to a well-developed culture of collaboration and sharing. Indeed, before the arXiv even existed, physicists were already physically sharing recently finished manuscripts via mail, first, and by email, later.  To understand the success of the arXiv it is important to understand the history of the arXiv. Below we highlight a brief history of technology, services, and cultural norms that predate the arXiv and were integral to its early and continued success.  

The history of the arXiv

Prior to the arXiv, preprinting was performed by institutional repositories, such as the SPIRES-HEP database (Stanford Physics Information REtrieval System- High Energy Physics) at the Stanford Linear Accelerator Center (SLAC) and the Document Server at CERN. Developed in the early 70's, SPIRES created a bibliographic standard and centralized resource that allowed researchers across universities in high energy physics to email the database and request a list of preprints be sent to them.  Since papers themselves could not be emailed at the time, the system relied on traditional mail. This resource was immediately successful with requests numbering in the thousands within the first few years \cite{Elizalde_2017}.  While SPIRES greatly improved the flow of information, it still took weeks for articles (preprints) to be sent and received. A new typesetting system would soon emerge and change this.
TeX, pronounced "tech", was developed by Donald Knuth in the late 70's as a way for researchers to write and typeset articles programmatically. Soon after the introduction of TeX, Leslie Lamport set a standard for TeX formatting, called LaTeX, which made it very easy for all researchers to professionally typeset their documents on their own.  This system made sharing papers easier and cheaper than ever before. Indeed, many, if not most, researchers at the time relied upon secretaries or typists to write their work, which then had to be photocopied in order to be sent via mail to a handful of other researchers. Tex allowed researchers to write their documents in a specified manner (binary) that could be emailed and then downloaded and compiled without the need for physical mail. Soon, physicists were emailing and downloading .tex files at great rates hastening the process of research communication like never before.
Such a system immediately created a new problem for researchers: information overload. Researchers were exchanging emails containing preprints at great rates, and given the size of computer hard drives at the time, email servers were running out of space \cite{Ginsparg_2011}.  To address this problem, an automated email server, called arXiv, was set up in the early 90's. The arXiv would allow researchers to automatically request preprints via email as needed. It would soon become one of the world's first web servers and today still serves as one of the most open and efficient forms of research communication in the world.  
The arXiv was a leader in introducing and utilizing new technology when it was launched, however it has arguably changed very little since its inception, despite a wealth of new technologies now available. Here we look at the strengths and weaknesses of the arXiv in an effort to identify what possible improvements can be made based on new technologies and tools and propose that a modern arXiv might in fact not look at all like the arXiv of today --- a development that will likely occur with or without arXiv.