1. Introduction
Homeobox genes belong to a family of homeodomain-containing TFs (TFs), have been vastly studied for their roles in development, physiology and tissue homeostasis 1. Even though some members of the homeodomain family, comprising of HOXs, Hepatocyte nuclear factors (HNFs) and NANOGs (NKX genes), are well characterized for their role in various cancers, the mechanistic function of Iroquois (IRX) proteins in tumorigenesis and their DNA binding sequence is still not fully explored2-4. New studies on some HOX genes have identified their roles in various cancers, but their functional mechanisms are still to be explored5-7. For instance,HOXA9 has been found to be a tumour suppressor/oncogene in breast cancer and leukemia 1. Another HOX gene, HOXB13, has been well studied in prostate development and tumorigenesis, with inherited mutations having a genetic contribution to prostate cancer1. These classes of proteins usually function as complexes (homo or hetero dimers) to exert their regulatory function, altering their binding preference8. Limited diversity in eukaryotes has been observed in the recognition and binding of homeodomains to DNA 9. This could be due to a specific constraint in the specific amino acids associated with the homeodomain architecture and its preference for specific DNA recognition sequences10,11.
The IRXs are one of the newly added members of homeodomain TF family that have been found to play an important role in developmental processes 12. IRX proteins contain the unique Iro-box motif, a conserved motif of 13 amino acid residues in the carboxyl-terminal region. They also have an atypical homeodomain with three extra amino acids between the first and second alpha helices, which groups them in the 3-amino-acid-loop-extension (TALE) family of TF13. These homeobox TFs play important roles in embryogenesis, cell specification and differentiation and organ development. The human IRX complex is composed of six genes, found in two clusters of three genes, each in chromosome 5 (IRX1, 2 and 4 ) and 16 (IRX3 , 5and 6 ) 14-17.
Recently IRX TFs have been studied in different cancers, suggesting aberrant expression of these proteins in contributing to tumorigenesis.IRX5 has been reported to be regulated by vitamin D3 in prostate cancer involved in regulating cell cycle and apoptosis18. Knockdown ofIRX5 was observed to reduce the cell viability of androgen-sensitive LNCaP cells. IRX2 protein expression has been correlated with breast tumour size, indicating its oncogenic function in breast cancer 19. Genome-Wide Association Studies (GWAS) identified IRX4 as a causative gene in prostate cancer susceptibility20. Additionally, alternate splicing of IRX4 has also been recently studied in prostate cancer, highlighting differential regulation in prostate tumorigenesis and progression21. Epigenetic studies in pancreatic cancer found the IRX4 promotor region to be hypermethylated, influencing increased cell growth22. IRX4 has also been described as a tumour suppressor in prostate cancer via vitamin D interactions23. Other studies have also suggested the potential oncogenic roles of IRX4 in breast cancer and non-small cell lung cancer (NSCLC )24,25. Other differential roles of IRXs have been reported linking it to multiple mechanisms associated with tumour progression26-28.
Although IRX gene clusters are now being identified as novel therapeutic targets in carcinogenesis27, their protein structure, which may help to understand their functions, has not been biophysically characterized using techniques like Nuclear Magnetic Resonance (NMR) and X-radiation crystallography (X-ray). Various studies have used homology modelling and molecular dynamics (MD) simulations to understand the molecular mechanisms of TF binding to DNA. A recent study on HOXB13 used computationally modelled protein structures to predict the effect of single nucleotide polymorphisms (SNPs) on the non-homeobox region 29. Additionally, this approach was also used to model HOXB13 protein and predict the functional role of SNPs in prostate cancer, demonstrating genotype-phenotype effects and paving the way for further clinical studies highlighting its theranostic applications29. Furthermore, a study on transcription regulator SoxR (Sulphur Oxidation) predicted DNA binding residues of these proteins using homology modelled structures30. Structural construction using homology modelling of E2F1 TF revealed dimerization partner domains and the efficiency to bind to DNA31. The structure of a protein is linked to its stability, function and its interaction. Although the Protein Data bank (PDB) has a good number of crystallographic structures, not enough information is available regarding the human proteome. The use of computationally modelled structures to understand the physical and chemical properties of TFs has great benefits. It is well established that missense mutations play an important role in diseases affecting the core tertiary structure of a protein32,33. One of the key benefits of these approaches is analyzing the effect of mutations on the protein structure and its binding capacity. Interpretation of mutants and their association to diseases can be significantly influenced using this technique34. A recent modelling study in the zyxin family of proteins LIM1-3 domains has indicated new insights into protein-protein interactions and potential nucleic acid binding platforms of these proteins, highlighting opportunities for therapeutic development35.
We studied the sequence conservation of the amino acids present in the homeodomain in this work. We have built a homology model of IRX4 homeodomain and used MD- simulations and free energy calculations to provide insight into the mechanistic of protein-DNA binding. We also checked the mutations on the DNA binding domain and its effect on homeodomain stability. A classical modelling approach has been used in this work over Alphafold36 as the prediction of protein-DNA interactions using the Alphafold approach is still in its infancy.