Materials and methods

Ethics statement

This study was conducted in sampling procedures were approved by the Ethics Committee of Hainan Medical University (approval number: HMUEC20180059).

Swabs samples of rodents

We collected 588 throat and anal swab samples from 326 rodents from multiple locations in 9 counties or cities in the Hainan Province, the tropical island province in China, from May 2017 to June 2021. Samples (15-20) were grouped into pools according to their sampling locations and species. Morphology was combined with mitochondrial cytochromeb (mt-cyt b ) to identify murine species and analyse the congruence between viruses and their hosts (Table S1). The collected samples were quickly immersed in the maintenance medium in the virus-sampling tube (Yocon Biology, Beijing, China) to ensure the sample quality, and transported to the laboratory within 24 h using the low-temperature cold chain [6]. The samples were divided into three parts evenly upon arrival at the laboratory and were stored at -80 °C storage before subsequent experiments, which is consistent with a previous study [6].

Viral nucleic acid library construction and NGS

The 588 samples were combined into 28 pools based on swab type and sample location. Swab samples were passed through 0.45µm filters (MilliporeSigma, Burlington, MA, USA) to remove eukaryotic and bacterium-sized particles. The filtrate was ultracentrifuged at 100,000 ×g and 4 ℃ for 3 h. The precipitate collected from 28 pool samples was resuspended in Hank’s balanced salt solution and digested with DNase (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA) to decompose and remove unprotected nucleic acids. Viral RNA was extracted using a QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany). Viral cDNA was generated using primer K-8N and Superscript III Reverse Transcriptase (Invitrogen, Thermo Fisher Scientific), as previously described [20]. Sequence-independent polymerase chain reaction (PCR) amplification products were purified and subjected to magnetic bead sorting. Viral nucleic acid libraries constructed by pre-processing, such as sonication, were analysed using an Illumina HiSeq2500 sequencer (Illumina Inc., San Diego, CA, USA) for a single read of 150 bp in length. Sequence data were deposited in the National Center for Biotechnology Information (NCBI) sequence reads archive under the accession number PRJNA892773.

Metagenomic sequencing

Quality-controlled reads of each sample were assembled using Trinity V2.5.1 [21]. DIAMOND was used to compare the contigs against the non-redundant protein database from NCBI. All blastx results of contigs were annotated with taxonomy id by an in-house database, which combined accession files (ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/ ), and other information obtained from the NCBI Entrez server. Non-redundant viral contigs larger than 500 nucleotides (nt) were selected. Contigs related to bacteriophages, plant viruses, and insect viruses were excluded. Read mapping was performed using BWA-MEM [22]. Low-quality mappings were removed using Converm (v0.6.1, https://github.com/wwood/CoverM ) with two preset parameters: min-read-percent-identity 0.90 –min-read-aligned-percent 0.75. The relative viral abundance was also generated from Coverm using the transcripts per million (tpm) method. A heatmap for visualising the mouse-related virome profile was plotted using the pheatmap package in R.

Viral genome sequencing

Molecular clues from metagenomic analyses were used to classify sequence reads into viral families or genera using MEGAN. To identify the partial or complete genome, representative viral open reading frames (ORF)-related reads were selected for read-based PCR and sequencing. Reads with accurate genomic locations were used to design specific nested PCR primers and to identify partial genomes, which is consistent with a previous study [6]. cDNA was generated using random primers and Superscript III Reverse Transcriptase (Invitrogen). The remaining genomic sequences were analysed using genome walking and 5′- and 3′- rapid amplification of cDNA ends (Invitrogen; and Takara Bio, Kyoto Japan). The primers used to amplify the sequence obtained in this study are shown in (Table S2).

Genome annotation

The viral basic nucleotide sequences structure of the genomes and the amino acid encoded by effective ORFs and its location were deduced by comparing with the sequence of other virus families. Prediction of conservative protein families and domains using Pfam (http://www.ebi.ac.uk/services/proteins), BLASTP (https://blast.ncbi.nlm.nih.gov ), and InterProScan 5 (http://www.ebi.ac.uk/services/proteins ). Routine sequence alignment was performed using Clustal Omega (http://www.ebi.ac.uk/Tools/ ). The evolution tree beautification was performed using ITOL. (https://itol.embl.de/ ).

Viral prevalence

Nearly complete or partial genomic sequences of the viruses obtained by NGS sequencing were used as templates to design specific, semi-nested primers for the non-structural gene for PCR and screening for viruses in each filtrate sediment and individual sample (Table S2). PCR was performed using Go Taq Colorless Master Mix (Promega). Nest PCR using two microliters of the first-round PCR product as the template of the second round of PCR has high specificity and sensitivity. The thermal cycling conditions for both PCRs were 94 °C for 5 min, followed by 35 cycles at 94 °C for 30 s, 57 °C for 35 s, 72 °C for 30 s, and a final elongation step at 72 °C for 10 min. The PCR products were analysed using 1.5% agarose gel electrophoresis and ultraviolet imaging.

Phylogenetic and data analyses

We used the ClusterW package to align the nucleotide sequences and deduce the amino acid sequences (https://rdrr.io/bioc/muscle/man/muscle-package.html) and default parameters in MEGAX. Relatively conserved viruses were selected to construct a phylogenetic tree using the maximum likelihood method. According to the operation rules, the best substitution model was evaluated using the model selection package function of MEGAX with 1000 bootstrap replicates. The NCBI basic local alignment search tool was used to perform pairwise amino acid alignment between the reference sequences and the new virus.