Description
Colorectal Cancer (CRC) is a global health issue. Recent studies into Sporadic CRC have shown certain tumors exhibit Elevated Microsatellite Alterations at Selected Tetranucleotide repeats, EMAST. EMAST is a biomarker of aggressive CRC and is characterized by insertions/deletions of tetranucleotides in repetitive non-coding DNA. The exact molecular mechanisms of EMAST are unknown, but a potential pathway involves the reduced function of MSH3 (a DNA mismatch repair protein) under conditions of oxidative stress/inflammation. Studies into EMAST have been restricted to clinical samples due to the lack of an appropriate animal model. A major difficulty in finding an animal model of EMAST is in finding repeat sequences that are characteristic of EMAST. Thus, our aim was to use bioinformatic tools to establish a potential panel of mouse specific EMAST sequences that could be tested in a mouse model of colon cancer. Published human EMAST sequences were analyzed and a set of requirements was gathered, including repeat length and type, to design a Python program. Flanking sequences taken from the human EMAST loci were also analyzed using the Multiple Em for Motif Elicitation (MEME) tool to identify conserved elements. These motifs were included in our program parameters and sequences found in the mouse genome using our program were then checked for homology to human EMAST loci using ClustalW. Those that showed >89% homology were analyzed further using the ENSEMBL database. ENSEMBL allowed us to prioritize the potential mouse EMAST sequences, in terms of previous evidence of instability. Murine sequences found using our program have been confirmed in mouse tissue using PCR. A panel of 8 mouse sequences were tested in tumor and adjacent normal tissue and instability in 2 of these sequences has been observed. In conclusion, novel motifs that are significant in human EMAST loci were identified and used in designing a unique Python program to discover potential species-specific EMAST sequences. Our program has produced a list of mouse repeat sequences that have the potential to display EMAST. Our results show that 2 out of 8 sequences have identified EMAST in tumor samples from our mouse model.