We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
The Protein Data Bank (PDB; http://wwpdb.org) was established in 1971 as the first open access digital data resource in biology with seven protein structures as its initial holdings. The global PDB archive now contains more than 124,000 experimentally determined atomic level 3D structures of biological macromolecules, all of which are freely accessible via the Internet. Knowledge of the 3D structure of a gene product is beneficial for understanding its function and its role in disease. The main theme of this thesis is to improve our knowledge of this relationship between genes and proteins. First, we manually review the existing knowledge of how changes in 3D protein structure are related to diseases. Second, we analyze novel datasets of genetic information and prioritize single nucleotide variation for further analysis, using protein level annotations. Of particular interest in the PDB archive are protein structures for which 3D structures of both wild type and genetic variants have been determined, thereby revealing atomic-level structural differences. In the first part of this thesis, we present a systematic review of these cases. We observe a wide range of possible structural and functional changes at the protein caused by single point mutations, including changes in enzyme activity, aggregation, structural stability, binding and dissociation, some in the context of large biological assemblies. An increasing number of software methods are available that attempt to predict the consequences of genetic variation data. Our results show that the range of possible consequences is much larger than is often assumed and also a comprehensive understanding of three-dimensional structure, dynamics, and biophysics will be required to develop better tools that can make accurate predictions about the consequences of genetic changes manifested at the atomic level in protein and RNA gene products. In the second part of this thesis, we are developing a bioinformatics pipeline by which we can achieve fast and accurate annotation for large variation databases using information derived from protein sequences and 3D structures.