3.2. Sequencing output and fragment length
CBH resulted in 2,231,139 to 12,799,906 reads per sample. After the
first filtering and trimming steps, approximately 7 to 34% were
identified as rDNA (general sequencing output given in Appendix 4). For
MTB, a total of 288,094 to 1,278,400 sequences were obtained per sample,
lading 2231 to 4133 OTUs for 16SV4 and 770 to 2169 for 18SV1V2 per
sample. The CBH data were analyzed by two different pipelines, first
direct use of the short fragments (hereafter referred to as CBH-short)
with Kraken 2, using the unaligned reads with a mean length of 200 to
289 bp, which was shorter than the fragments of up to 450 bp obtained
with metabarcoding. Second, EMIRGE was used to reconstruct “full
barcodes” (hereafter named CBH-long) allowed the reconstruction of
fragments of on average 731 near-full length markers per sample,
reaching up to 1200 to 1450 bp for archaea, up to 1600 bp for bacteria
and 1200 to 1900 bp for eukaryotes (Fig. 2). However, for a small number
of taxa, 60 to 95% of sequences were lost.