Description
Zalophus californianus, the California sea lion, is a coastal sea lion that ranges along the west coast of USA, from Alaska in the north to the Baja Mexico, in the south. Sea lions are well-known for their playfulness, intelligence, and are used in the military and wildlife shows. Sea lions have been identified to have a propensity for urinary tract carcinomas and about 17% are are diagnosed with cancers. Sequencing the sea lion genome provides experience in researching a large genome and will identify genes that are responsible for specific traits, such as the evolution of a land mammal to semi-aquatic habitat and also identify genes that are causing cancers in these animals. The Sea lion genome was sequenced using a 454 FLx sequencer, provided through collaboration with Roche 454 Lifesciences, and the resultant data was assembled using Newbler assembler. A total of 1,951,532,210bps of sea lion DNA was assembled and 1.4 million contigs were generated. Interspersed repeats and low complexity DNA were identified and masked using Repeatmasker. The proportion of repeat regions within the sea lion genome was 35.56%, and dominated by the LINE category, similar to other carnivores. The sea lion mitochondrial DNA was identified and assembled to perform phylogenetic analysis. Prediction of gene structure and compositional features of exons, introns and intergenic regions was conducted by a HMM based model called GlimmerHMM. Two different approaches, HMMgene and GlimmerHMM were conducted to identify exon regions to build a sea lion specific training model. The number of exons predicted by GlimmerHMM was 244,377 much higher than 74,700 exons predicted by HMMgene. Out of 1.4 million contigs, 379 contigs have coding sequences with initial, internal and terminal exons and out of 379 contigs, 248 contigs showed similarity matches against 'nr' database. Therefore a pipeline was created to build the species specific model by GlimmerHMM to predict species specific protein coding sequences and this approach can be broadly applied to any newly sequenced eukaryotic genome in an academic settings. Furthermore, sequencing and annotation of the sea lion genome will help in finding the underlying cause for diseases such as cancers.