LEGENDS:
Figure 1: (A) Comparison of SNPs in CHD8 collected
across databases like Ensemble, ExAC/GnomAD, EVS and SFARI. Ensemble
provided the highest variations followed by GnomAD, whereas EVS had the
least count. SFARI database had the highest percentage of truncating
variations. (B) Frequency of different SNPs in general
population versus ASD population. 53.45% of all variations identified
in CHD8 in general population were nsSNP- the most common.
However truncating SNPs were the highest recorded variants within ASD
population. (C) Comparison of coding SNPs in general vs
ASD population. 94.5% of all variants collected in general population
were nsSNPs and truncating SNPs formed just 5.4%. Whereas ASD
population had 55.17% truncating SNPs. 14 (27%) and 10 (35.7%) of ASD
variants were common to both population, whereas all frameshift
variations identified in ASD population were unique. (D) Exon
wise SNP density. Exon 30 recorded the highest SNP density, exon 6 had
the lowest count of only nsSNPs from general population, exon 14 had the
highest truncating SNPs. Exon 10 displayed the highest SNP density
within the N terminal region, C terminal exons 29 to 37 recorded higher
SNPs except exon 34.
Figure 2: (A) The longest protein sequence of CHD8 was
identified to be 2,581 aa in length, coded by mRNA transcript
NM_001170629/ENST00000399982.2 composed of 37 exons, encoding protein
ID NP_001164100/Q9HCK8. The protein CHD8 contains six important
domains- Chromo domain (640-790aa) represent in yellow, Helicase
ATP-binding (807-1009aa in maroon/pink), SNF2_N (825-1101aa in
red/pink), Helicase C-terminal (1137 – 1288aa in light blue) and BRK
domain (2310-2419aa in sky blue), DNA-binding site SANT and SLIDE
(1437-1683aa in green) and a region between 1789 – 2302aa that binds to
CHD7 and interacts with FAM124B (CHD7_BD, Interaction with FAM124B)
indicated in navy blue. (B) Heatmap representing exon wise
comparison of SNP density. nsSNPs were clustered within C-terminal exons
and including exons 2, 3, 10 and 21. Truncating SNPs often localised
within the N terminal exons- specifically exons 8,10 and 14. Lowest SNP
density was observed in exons 17-20 corresponding to the most conserved
region of CHD8. Residues within N terminal exons 1-4 and C terminal
exons 31-37 were evolutionarily the most variable. Exons 3 to 5
contained the highest accumulation of PTMs, followed by exons 31, 29 and
21.
Figure 3: Exon and Domain wise distribution of SNPs across
general and ASD population represented in shades of blue (nsSNPs) and
yellow (truncating SNPs) against the backdrop of evolutionary status of
CHD8 residues (light pink area) and PTM sites (grey area) across exons
in fig. (A) and domain in fig. (B).
Figure 4: Comparison of CHD8 protein disorder prediction by
tools IUPred2A in fig. (A) and MoRFchibi SYSTEM in fig.(B). In both each residue is plotted against its disorder
probability score in the Y axis. Within fig. (B), the MoRF
predictions were displayed as Toggle MoRF Bands in light blue colour.
Figure 5: (A) DEPICTER predictions of disordered regions across
the protein CHD8 and its corresponding protein-binding, RNA-binding,
DNA-binding, linkers and multifunctional disordered sites. (B)Mutation cluster predictions by tool Mutant 3D. The core domain regions
are highlighted in fluorescent green and nsSNPs are represented as
vertical pins along the CHD8 protein 2D structure. Mutations belonging
to significant mutation clusters are represented in yellow and red
colour code separately. Further details are available in Supplementary
Figure S2.
Figure 6: Protein-Protein Interaction network constructed for
the enzyme CHD8 (in yellow). Stringent network building rules were
applied to obtain 13 direct interactions with protein partners that are
represented in green. Molecular functions directly associated to ASD are
presented in turquois, regulatory function in orange and others in grey.