LEGENDS:
Figure 1: (A) Comparison of SNPs in CHD8 collected across databases like Ensemble, ExAC/GnomAD, EVS and SFARI. Ensemble provided the highest variations followed by GnomAD, whereas EVS had the least count. SFARI database had the highest percentage of truncating variations. (B) Frequency of different SNPs in general population versus ASD population. 53.45% of all variations identified in CHD8 in general population were nsSNP- the most common. However truncating SNPs were the highest recorded variants within ASD population. (C) Comparison of coding SNPs in general vs ASD population. 94.5% of all variants collected in general population were nsSNPs and truncating SNPs formed just 5.4%. Whereas ASD population had 55.17% truncating SNPs. 14 (27%) and 10 (35.7%) of ASD variants were common to both population, whereas all frameshift variations identified in ASD population were unique. (D) Exon wise SNP density. Exon 30 recorded the highest SNP density, exon 6 had the lowest count of only nsSNPs from general population, exon 14 had the highest truncating SNPs. Exon 10 displayed the highest SNP density within the N terminal region, C terminal exons 29 to 37 recorded higher SNPs except exon 34.
Figure 2: (A) The longest protein sequence of CHD8 was identified to be 2,581 aa in length, coded by mRNA transcript NM_001170629/ENST00000399982.2 composed of 37 exons, encoding protein ID NP_001164100/Q9HCK8. The protein CHD8 contains six important domains- Chromo domain (640-790aa) represent in yellow, Helicase ATP-binding (807-1009aa in maroon/pink), SNF2_N (825-1101aa in red/pink), Helicase C-terminal (1137 – 1288aa in light blue) and BRK domain (2310-2419aa in sky blue), DNA-binding site SANT and SLIDE (1437-1683aa in green) and a region between 1789 – 2302aa that binds to CHD7 and interacts with FAM124B (CHD7_BD, Interaction with FAM124B) indicated in navy blue. (B) Heatmap representing exon wise comparison of SNP density. nsSNPs were clustered within C-terminal exons and including exons 2, 3, 10 and 21. Truncating SNPs often localised within the N terminal exons- specifically exons 8,10 and 14. Lowest SNP density was observed in exons 17-20 corresponding to the most conserved region of CHD8. Residues within N terminal exons 1-4 and C terminal exons 31-37 were evolutionarily the most variable. Exons 3 to 5 contained the highest accumulation of PTMs, followed by exons 31, 29 and 21.
Figure 3: Exon and Domain wise distribution of SNPs across general and ASD population represented in shades of blue (nsSNPs) and yellow (truncating SNPs) against the backdrop of evolutionary status of CHD8 residues (light pink area) and PTM sites (grey area) across exons in fig. (A) and domain in fig. (B).
Figure 4: Comparison of CHD8 protein disorder prediction by tools IUPred2A in fig. (A) and MoRFchibi SYSTEM in fig.(B). In both each residue is plotted against its disorder probability score in the Y axis. Within fig. (B), the MoRF predictions were displayed as Toggle MoRF Bands in light blue colour.
Figure 5: (A) DEPICTER predictions of disordered regions across the protein CHD8 and its corresponding protein-binding, RNA-binding, DNA-binding, linkers and multifunctional disordered sites. (B)Mutation cluster predictions by tool Mutant 3D. The core domain regions are highlighted in fluorescent green and nsSNPs are represented as vertical pins along the CHD8 protein 2D structure. Mutations belonging to significant mutation clusters are represented in yellow and red colour code separately. Further details are available in Supplementary Figure S2.
Figure 6: Protein-Protein Interaction network constructed for the enzyme CHD8 (in yellow). Stringent network building rules were applied to obtain 13 direct interactions with protein partners that are represented in green. Molecular functions directly associated to ASD are presented in turquois, regulatory function in orange and others in grey.