Materials and Methods

Plant Material & Greenhouse

In January 2020, dormant vegetative cuttings were collected from 100 poplar trees spanning much of the latitudinal distribution of the contact zone between Populus trichocarpa and P .balsamifera (Figure 1, Table S1). Cuttings were then transported to Blacksburg, VA where they were rooted in a greenhouse maintained at day and night temperatures of 24 and 15.5 °C, respectively, with no supplemental lighting. Following vegetative flush of rooted cuttings, leaf tissue was sampled for DNA extraction using the Qiagen plant DNeasy kit (Qiagen Inc., Valencia, CA), with minor modifications that included a phenol-chloroform extraction in place of a QIAshredder column, approximately 100 mg of leaf was used to extract DNA. Samples with low DNA concentration were subject to a secondary extraction using a modified CTAB protocol . Quality and concentration of DNA were assessed using a combination of NanoDrop and Qubit approaches, in addition to gel electrophoresis. DNA from these samples was used to quantify telomere length using both qPCR and WGS based on Illumina short-read sequencing.

Genomic libraries and sequencing

Genomic libraries were prepared using the Illumina KAPA HyperPrep kit at the Duke University Center for Genomic and Computational Biology. Briefly, the libraries were sequenced on an S4 flow cell in 2 x 150 bp format on an IlluminaNovaSeq 6000 with 64 samples per lane. Quality control and preprocessing of raw reads were done using FastQC and Trimmomatic . Sequences were trimmed to remove adapters, short reads (< 30 bp), and reads were discarded with < 30 bp of overlap between forward and reverse reads with a maximum mismatch value of 4. Following trimming, 37.33 ± 6.5 million reads were retained with an average GC length of 35.6% ± 0.5 and an average read length of 141 bp (Table S1). Trimmed reads were used in the downstream analyses.

Telomere length from whole-genome sequence data

Telomere length was estimated from WGS data using three different bioinformatic approaches; Computel v1.3 , K-seek , and TRIP . All three programs estimate telomere length using reads from whole genome sequence data. However, the three programs use different parameters to estimate telomere length allowing for comparison.
Computel (v1.3) creates a telomeric reference using read length, in addition to telomeric repeat length and pattern . Telomeric reads are those reads that align to the telomere reference developed within the program. At least two replicates of the telomere sequence pattern were required to be considered a telomeric read. Once telomeric reads are identified, Computel calculates the relative genome coverage for individuals as a ratio of telomeric coverage to genome coverage. Telomeric coverage is defined as the distribution of coverage per base for the telomeric reference and genome coverage is the product of the total number of reads and read length, divided by total genome size. After estimating relative genome coverage, Computel estimates telomere length by counting telomeric reads and normalizing counts relative genome coverage. Computel then divides telomere length by the number of chromosomes present in the species to obtain mean telomere length per individual. For the purposes of our study, we modified Computel parameters to consider Populus genome (P. trichocarpa v3, NC_037285.1. ) parameters, including chromosome number (n=19), genome size (434,290,000 bp), and telomere sequence pattern (TTTAGGG).
In addition to Computel, we used the bioinformatics pipeline K-seek to identify and quantify reads that contained telomeric repeats. K-seek uses a pattern-matching approach that does not consider genome coverage in telomere estimates. K-seek identifies sequence repeats ranging between 1 to 20 bp . Reads with a minimum repeat length of 50 bp per repeat are considered to avoid counting short repeat sequences scattered across the genome. After identifying and counting sequence repeats, K-seek organizes the repeat sequences in alphabetical order and size (in bp). To quantify telomeric repeat sequences, we used the canonical telomere repeat sequence specific to plants, i.e., AAACCCT , which is the reverse complement of TTTAGGG (Figure S2). To calculate mean telomere length per individual in base pairs, we applied the following equation:
Equation (1) Mean telomere length per individual\(=\ \frac{\text{FTL\ x\ RL}}{2\ x\ Chr}\)
Where, FTL is the frequency of telomeric repeats, RL is the telomeric repeat length (7 base pairs for TTTAGGG), two in the denominator accounts for the two ends of a chromosome and Chr is the number of chromosomes in Populus (n=19).
Finally, we used the telomeric repeats identification pipeline (TRIP) to estimate telomere length across Populus genotypes. Like K-seek, TRIP uses a pattern-matching approach that does not include genome coverage into telomere estimates. However, unlike K-Seek only reads with a minimum of four repeat sequences are considered . TRIP searches for repeat sequences of lengths ranging from 2 to 25 bp with no mismatches permitted. Once repeat sequences are identified, TRIP reorders the repeat sequences alphabetically and summarizes the repeat regions into count tables. The frequency of the canonical telomeric repeat sequence (AAACCCT) was used to estimate mean telomere length of an individual. Telomere length was calculated using the same equation and parameters applied with K-Seek. For subsequent analyses, all individuals were included except one which exhibited low genome coverage (12.31x) from the WGS data (Table S1).
A major difference between Computel and the two other bioinformatic approaches is the inclusion of genome coverage into telomere length estimations. Variability in genome coverage may impact telomere length estimations. Therefore, we normalized telomere length estimates by individual genome coverage using Equation 2. Telomere length values for K-seek and TRIP were re-assessed because the two programs did not previously consider genome coverage.
Equation (2) Corrected telomere length\(=\ \frac{\text{FTL\ x\ RL}}{2\ x\ Chr}/\ Genome\ coverage\)
Genome coverage was calculated as the product of the total number of reads and read length, divided by total genome size for each individual. In the case of Populus , we use a genome size of 434,290,000 bp (P. trichocarpa v3, NC_037285.1, ).

Quantitative PCR (qPCR) telomere length assay.

Quantitative polymerase chain reaction (qPCR) was used to quantify relative telomere length (rTL) using a Mx3000P QPCR System , following the methods of modified for Populus . qPCR quantifies the rTL for each individual as the ratio of the telomere repeat copy number to a single-copy control reference gene, relative to the reference sample . Plant-specific telomere primers were developed to assay telomere length alongside a housekeeping gene (glyceraldehyde-3-phosphate dehydrogenase; GAPDH), which is a single-copy control reference gene that exhibits limited sequence variability within the same individual.
qPCR reactions for the housekeeping GAPDH gene and telomere primers were run in triplicate on separate plates. Each qPCR reaction contained 20 ng of DNA per reaction in addition to 12.5 μl of SYBER Green Master Mix (Quantabio), 0.25 μl forward and reverse primer, 6 μl ultrapure water, and 6 μl of DNA. The qPCR cycle conditions for the GAPDH gene were 10 min at 95°C, followed by 33 cycles of 15 s at 95°C, 30 s at 59 °C and, 30 s at 72°C. qPCR conditions for the telomere region were 10 min at 95°C, followed by 20 cycles of 15 s at 95°C, 30 s at 58°C and 30 s at 72°C. Both GADPH and telomere assays were run on the same day, withPopulus samples randomly assigned to a plate and three technical replicates per individual for each plate.
To calculate relative telomere length (rTL), we used the following formula:\(2^{-\Delta\Delta Ct}\) , where\(\Delta\Delta Ct\ =\ \left(\text{Ct}^{\text{telomere}}\ -\ \text{Ct}^{\text{GAPDH}}\ \right)\ focal\ sample\ -\ \left(\text{Ct}^{\text{telomere}}\ -\text{Ct}^{\text{GAPDH}}\right)\text{\ reference\ sample\ }\).Ct values indicate the number of PCR cycles required for the fluorescent signal to cross the threshold which is defined as the transition from linear phase to exponential phase of amplification for telomere and GAPDH reactions, respectively. Ct values are inversely proportional to the starting amount of the target sequence, samples with longer telomeres required fewer cycles to cross the threshold and thus had lower Ct -values. To standardize rTL across samples and plates, one reference Populus sample was run in triplicate on every plate.
For each plate, a serial dilution of DNA for a single Populussample was prepared to calculate reaction efficiencies. To ensure that the DNA samples fell within the standard curve, we ran a five-point standard curve (40, 20, 10, 5, and 2.5 ng/mL of DNA) across all plates. Each point in the standard curve was run in triplicate. The reaction efficiencies for the telomere plates and GAPDH plates were within the accepted range (i.e., 100 ± 15%), with an average of 93.54 ± 3.82 % for the telomere plates and 88.78 ± 3.88 % for GAPDH plates. Additionally, we included a negative control of 6 μl of water run in triplicate on each plate, which did not produce Ct values.

Statistical analyses