RNA extraction
Total RNA was extracted from mesocarp using plant RNAOUT kit (TIANDZ,
China), then the quality control of RNA was performed using 2100
Bioanalyzer system and all samples were qualified as a result. RNA
samples were subjected to DNaseⅠdigestion (Takara, Japanese) according
to instructions from the manufacturer to remove the contaminating
genomic DNA. And these were purified using RNeasy MinElute Cleanup Kit
(QIAGEN, Germany), and eluted with RNase-free water. Ribo-Zero™ Magnetic
Kit (Epicentre, America) was used for depleting rRNA according to the
manufacturer’s protocol. Eventually, the rRNA-depleted RNA sample was
used in the subsequent experiments.
RNA-Seq
library construction and
sequencing
Equal amounts of total RNA for each sample was used to construct 24 RNA
libraries. RNA sequencing was performed on an Illumina HiSeq 2500
platform at Chinese National Human Genome Center, Shanghai, and 100 bp
and 150 bp paired-end reads were generated. In more detail, 24 libraries
were generated using NEB Next® UltraTM Directional RNA Library Prep Kit
for Illumina (NEB, American), and the quality, quantity and size
distribution of RNA were determined using three methods: Qubit
fluorometer, 2% agarose gel electrophoresis and Agilent High
Sensitivity DNA Kit. The libraries were then used in cluster generation
by TruSeq PE Cluster Kit (Illumina, American), and submitted for
Illumina HiSeq 2500 sequencing according to the standard operation
protocols. All the raw data have been uploaded to the GEO database of
NCBI (accession number: GSE122868).
Transcript
assembly and novel transcripts annotation
To obtain high quality data, the adapters in raw data were trimmed and
the reads with low quality (Q-value < 20) and shorter than 36
bp were filtered using FASTX-Toolkit (version 0.7,
http://hannonlab.cshl.edu/fastx_toolkit/). The filtered FASTQ files of
pair reads were justified using a custom Python script developed in our
laboratory. Then, Q20, Q30 and GC content of the filtered data were
calculated using FastQC (version 0.11.5,
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to confirm
the quality of filtered data. All the subsequent analyses were based on
the clean data with high quality. Paired-end clean reads were mapped
against Prunus persica L. Batsch reference genome
(prunus_persica/genome_v2.0.a1) from the genome database
(GDR)
(Jung et al., 2008) using Tophat (version
2.1.1) (Trapnell et al., 2009). To
generate transcripts for each sample, the above results were assembled
using Cufflinks suite (version 2.2.1)
(Trapnell et al., 2010;
Roberts et al., 2011;
Trapnell et al., 2012). Novel transcripts
were annotated using BLASTX (Altschul et
al., 1990; Altschul et al., 1997) against
the NCBI nr database and using Blast2GO (version 5)
(Conesa et al., 2005) to perform
InterProScan analyses. Then, the novel transcripts were annotated and
categorized into 3 principal Gene Ontology (GO) categories: biological
process, molecular function and cellular component
(The Gene Ontology Consortium et al.,
2000; The Gene Ontology Consortium,
2017). Besides, the pathway annotations were generated using SBH method
in the KEGG Automatic Annotation Server (KAAS, version 2.1)
(Moriya et al., 2007).
Quantification
of gene expression levels and differential expression analysis
We used Cuffdiff (Trapnell et al., 2013)
to calculate gene expression level as FPKM (fragments per kilobase of
exon per million reads mapped) value, and generated the adjusted
P-values of false discovery rate (FDR) between different conditions
using the Benjamin-Hochberg method
(Benjamini and Hochberg, 1995;
Benjamini and Yekutieli, 2001). FPKM value
of each gene was calculated. Genes with a corrected FDR of 5% and
|Log2ratio| ≥ 1 were assigned as
differentially
expressed genes (DEGs) using R software (version 3.2.5). Wilcoxon signed
rank test was used for the comparison of gene expression levels during 7
days between CT and HT conditions.
Gene
co-expression network analysis
All transcripts in samples were used to perform co-expression analysis
to describe the correlation patterns among genes and identify regulatory
modules. Weighted gene co-expression network analysis (WGCNA) were
conducted using R software package WGCNA based on expression profiles
(Langfelder and Horvath, 2008). Firstly,
the pairwise correlation coefficients between all genes were calculated
and further transformed into adjacency matrix with a soft threshold
power of 12.
To
counter the effects of spurious or missing connections, topological
overlap similarity was measured from the adjacency matrix
(Yip and Horvath, 2007). Then, modules
were detected as branches of the dendrogram using hierarchical
clustering with the DynamicTreeCut algorithm and assigned to different
colors(Langfelder et al., 2008).
Intramodular
connectivity was calculated based on the expression profiles of the
module members to identify the highly connected intramodular hub genes.
Besides, the correlations between gene expression profile and the
phenotype measurements of interest were also calculated to explore the
association of modules to external traits. Finally, Cytoscape (version
3.5.1) was used to visualize the network connections and calculate the
link weights to indicate the hub-genes in selected modules
(Shannon et al., 2003).