2.2 Sampling and sequencing of plant material
In spring 2019, we collected leaf tissue from 105 individuals ofP. vulgaris , comprising 74 heterostyles (37 pins and 37 thrums) and 31 homostyles. The samples were obtained from nine populations, including: six dimorphic populations (i.e., pins and thrums) from Turkey (TR-D), Slovakia (SK-D), Switzerland (CH-D) and England (EN1-D, EN2-D, and EN3-D); two trimorphic (i.e., pins, thrums, and homostyles) populations from England (EN4-T and EN5-T); and one monomorphic (i.e., only homostyles) population from England (EN6-M) (Table 1). Populations EN4-T, EN5-T and EN6-M corresponded to populations T04, T07, and M01, respectively, from our previous study (Mora-Carrera et al ., 2021). Moreover, the 11 homostyles from EN6-M analyzed here included three of the homostyles carrying CYPT -1 from population M01 of Mora-Carrera et al . (2021; Figure 1B). DNA extractions were performed using the Maxwell extraction method (Promega, USA) at the Functional Genomics Center Zurich (Zurich, Switzerland). Library preparation and paired-end sequencing (150bp) were conducted by RAPiD GENOMICS (Gainesville, Florida, USA) using Illumina Novaseq6000 platform, generating 7,077,866,510 paired-end sequencing reads with an average sequencing depth of 18.9 (± SD = 3.08).
2.3 Mapping and variant calling of WGR data to CYPT from the P. vulgaris genome
To determine the sequences of all five exons and four introns ofCYPT , along with its up- and down-stream intergenic regions, we mapped WGR reads from all 105 individuals to a previously published genome of P. vulgaris (Cocker et al. , 2018). We opted to use the less contiguous genome of P. vulgaris(Cocker et al., 2018) rather than the highly contiguous, chromosome-scale genome of P. veris (Potente et al., 2022) to enhance the likelihood of accurate mapping of WGR reads to intronic as well as up- and downstream intergenic regions ofCYPT . Prior to mapping, we also produced a reference S-locus assembly for P. vulgaris by replacing all S-locus contigs (LH_v2_0002458, LH_v2_0067593, LH_v2_0003915, and LH_v2_0000241) in the genome of P. vulgaris with a published 450 kb sequence of the P. vulgaris S-locus (Li et al . 2016). To determine the position of CYPT in the reference S-locus assembly, we aligned the coding sequence ofCYPT against the reference using thequery function of blastn with default parameters, which is part of the NCBI BLAST+ toolkit v2.6.0 (Camacho et al. , 2009).
Prior to mapping, Illumina adapters were clipped from raw reads with Trimmomatic v0.38 (Bolger et al. , 2014) using default parameters. Mapping was performed using BWA-mem v7.17 (Li & Durbin, 2009) with default parameters. As negative control, pin individuals (0/0) were included in the analysis and, as expected, none of the sequencing reads from the 37 pins mapped to the S-locus. Duplicated reads were marked with the MarkDuplicates tool included in Picard v2.18.4 (http://broadinstitute.github.io/picard/). Variant calling of SNPs and indels for the S-locus was conducted using HaplotypeCaller, implemented in the Genome Analysis Toolkit (GATK) v4.1.2.0 (McKenna et al. , 2010) pipeline. Finally, SNP variants were filtered from the Variant Call Format (VCF) file using the SelectVariants with following filters: quality-by-depth (QD) > 2.0; mapping quality (MQ) > 40.0; strand bias (FS) < 60.0; mapping quality rank-sum test (MQRankSum) > -12.5; a rank-sum test (ReadPositionRankSum) > -8.0; site read depth (DP < ½X) || (DP > 3X). Additionally, sites with fixed heterozygosity (i.e., InbreedingCoeff < -0.99) likely representing incorrect SNP calling (O’Leary et al ., 2018; Pavanet al ., 2020) were filtered out.
2.4 Identification of mutations putatively disrupting CYPT function in P. vulgaris
Disruptive mutations in CYPT coding regions - To identify potential homostyle-specific, loss-of-functionCYPT mutations, including non-synonymous mutations, insertions, and deletions, we compared the sequences of the five CYPT exons in 31 homostyles with the functional CYPT allele of the 37 thrums. For this, we extracted the respective sequences of the fiveCYPT exons from the S-locus VCF file using theintersect function included in BEDtools v2.29.2 (Quinlan & Hall, 2010) and converted them into a single FASTA file using vcf2phylip.py (https://github.com/edgardomortiz/vcf2phylip). The sequence alignment of all exons and the detection of putatively disruptive mutations in CYPT of homostyles and thrums were performed with MEGA X (Kumar et al. , 2018). Finally, we compared the resulting sequence alignment with an alignment of previously detected mutations in CYPT exons reported by Mora-Carrera et al . (2021).
Structural rearrangements in CYPT- To determine whether the shift to homostyly is associated with structural rearrangements involving CYPT exons, we examined the paired-end sequencing reads mapped to the introns and exons of CYPT in both thrums (as a reference) and homostyles using the Interactive Genomic Viewer (IGV) v2.8.6 (Robinsonet al. , 2011). Translocations can be identified by analyzing the mapped paired-end reads, where one read is mapped to one position in the genome (e.g., CYPT in the S-locus), while its mate-pair is mapped to a different position, either in the same or different chromosome. Inversions can be detected by comparing the orientation of the mapped read-pairs to CYPT in the reference genome. If there is a small inversion, both mapped paired-end reads should be oriented in the same direction (→→ or ←←), whereas in the absence of a structural change normal mapped paired-end reads should be oriented towards each other (→←). Finally, deletions are characterized by drops in sequencing read coverage at specific positions in the genome, in this case, withinCYPT .
Disruptive mutations in CYPT promoter region - To determine whether mutations in the promoter region are responsible for CYPT loss of function in homostyles, we conducted an analysis to identify the putativeCYPT transcription-factor binding-site and searched it for homostyle-specific mutations. It is expected that the 3kb region upstream of a gene of interest contains its promoter, including the transcription-factor binding-site (Yu et al. , 2016), thus we first extracted and aligned the 3kb sequence upstream ofCYPT exon 1 from 20 thrums in dimorphic populations TR-D, SK-D, CH-D, and EN1-D of P. vulgaris . We then added the 3kb sequence upstream of CYPT from the P. veris high-quality reference genome (Potente et al., 2022) to the aligned dataset. Secondly, we inspected the aligned upstream sequence of CYPT to look for any single nucleotide polymorphism (SNP) that was fixed in homostyles but absent in thrums. Thirdly, to identify enriched motifs of transcription binding-sites in the aligned dataset, we employed the Simple Enrichment Analysis (SEA) implemented in the Multiple Em for Motif Elicitation (MEME) suite program v5.5.0 (McLeay & Bailey, 2010). We used theArabidopsis thaliana DAP motifs database (O’Malley et al. , 2016) and default parameters. The identified enriched motif sequences were compared between the 37 thrums and 31 homostyles by aligning these regions with MEGA X (Kumar et al. , 2018). To determine the potential function of the identified enriched motifs, we consulted TheArabidopsis Information Resource (TAIR) (http://www.arabidopsis.org).