1 Introduction
RNA sequencing (RNA-Seq) is increasingly common in ecological and evolutionary studies focusing on variation in gene expression (Alvarez et al. 2014, Conesa et al. 2016, Ekblom & Galindo 2011). For example, RNA-Seq is commonly used in studies on physiology, conservation, epigenetics, and to assess organismal response to environmental variables (Todd et al. 2016, Corlett 2017, Rey et al. 2020). RNA-Seq is highly accurate for quantifying expression levels, requires less RNA sample when compared to microarrays, does not necessarily require a reference genome (e.g., Cahais et al., 2012), can uncover sequence variation in transcribed regions, and shows high reproducibility (Wang et al. 2009). However, gene expression data can be strongly influenced by biological and non-biological factors such as experimental and stochastic variation (Auer & Doerge 2010, Qian et al. 2014, Todd et al. 2016). Given the recent surge in RNA-based studies, it is critical to identify and quantify sources of variation in gene expression.
Sampling methods can be an important experimental cause of variation in estimated gene expression (Mutch et al. 2008, Passow et al. 2019). Delay in sample preservation after collection may result in higher RNA degradation and introduce bias in estimated gene expression (e.g., Gayral et al. 2011, Romero et al. 2014). This is a consequence of mRNAs being produced in relatively short bursts in response to internal or external stimuli and having short half-lives (Ross 1995; Staton et al., 2000). Similarly, the use of different anesthetics, methods of tissue preservation, different RNA extraction methods, and timeframe between sample collection and RNA isolation can all impact RNA quality and gene expression (e.g., Debey et al. 2004, Huitink et al 2010, Jeffries et al. 2014, Mutter et al. 2004, Olsvik et al. 2007, Passow et al. 2019).
Stochastic variation in gene expression due to variation in cellular and molecular processes can result in random differences among individuals of the same population for the same genes without necessarily this being a consequence of biological (e.g., maternal effects and potentially heritable variation) or micro-environmental variation. For studies with a low count of biological replicates, this variation may be misinterpreted as biologically relevant (Hansen et al. 2011, Kaern et al. 2005). Detection of stochastic variation in gene expression may be achieved through careful sampling design (e.g., individuals vary at only one treatment) and by increasing the number of sampled individuals (Kim et al. 2015, Liu et al. 2014) to gain statistical power (Ching et al. 2014). However, often RNA-Seq experiments are limited in the number of sampled individuals due to cost, with consequent loss of statistical power and potentially misleading results (Bi & Liu 2016, Li et al. 2013).
Independent of sample size, library construction and RNA sequencing techniques may also produce variability in estimated gene expression. Whole mRNA sequencing methods often result in fragment length bias because longer transcripts are sheared into more fragments so that a higher number of reads will be assigned to them than shorter transcripts, causing an overrepresentation of larger transcripts in sequencing libraries (Ma et al. 2019, Oshlack & Wakefield 2009, Roberts et al. 2011). Cost limitations as well as fragment size bias of whole mRNA sequencing has led to the development of RNA sequencing library construction protocols that allow processing a larger number of samples in a more cost-effective manner (Meyer et al. 2011, Morrissy et al. 2009, Wu et al. 2010). The 3’ RNA Tag-Seq method (also known as Quant-Seq 3’ mRNA-Seq), for example, only primes the 3’ poly-A tail, reducing the sequencing effort and cost, and generates an essentially uniform distribution of fragments with respect to original RNA length (Lohman et al. 2016, Ma et al. 2019).
In fish, RNA-Seq data are commonly used to investigate the effects of environmental variables (e.g. temperature, hypoxia) on gene expression (e.g., Krishnan et al. 2020, Long et al. 2015, Meyer et al. 2011, Smith et al. 2013, Wang et al. 2015). However, little is known about the influence of different sampling techniques on gene expression in fish, especially under field conditions. For example, field conditions may limit the use of optimal sampling protocols or storage methods to reduce variation (e.g., using liquid nitrogen in remote sampling locations or fast processing times for tissue isolation) (Mutter et al. 2004, Pérez‐Portela & Riesgo 2013). Furthermore, field capture may also result in increased variation among individuals, including among biological replicates (Pearce et al. 2016). For example, stress related genes may be overexpressed as a result of long handling time before sampling. The impacts of handling stress on fish physiology are well understood (Sopinka et al. 2016). Although most studies focus on glucocorticoid and blood chemistry responses to capture (Milla et al. 2010, Wiseman et al. 2007, Wood et al 1983, Milligan 1996, Barton 2002, Ruane et al. 2001; see also Romero & Reed, 2005 for influence on handling time on non-fish species), gene expression responses to handling stress indicate that the magnitude, intensity, and duration of changes vary across genes, species, and tissue types (Krasnov et al. 2005, Lopez et al. 2014). While there is some evidence that a sample specimen’s blood cortisol and glucose levels are affected by capture method (e.g., electrofishing), to our knowledge (Barton & Dwyer 1997, Barton & Grosh 1996, Bracewell et al. 2004), it is unknown whether gene expression is affected by capture method or handling time prior to sample collection.
Here, we test whether sampling method (electrofishing vs dip netting), processing time, and RNA-Seq libraries (3’ Tag-Seq method vs. whole mRNA-Seq) influence gene expression data in multiple tissue types from westslope cutthroat trout (Oncorhynchus clarkii lewisi ), a species of conservation concern native to western North America (Behnke 2002; Allendorf and Leary 1988; Shepard et al. 2003). The results of this study will address the sources of gene expression variation under field conditions and provide a foundation for improving future RNA-based study designs for field sampling of wild caught non-model fish and other species.