Statistics Seminars: Stochastic Geometrical Approaches to Protein Structural Bioinformatics

Presented by Kanti Mardia (University of Leeds),

23 May 2003 00:00 in CM221

"With all the excitement generated by gene sequences, it is easy to forget that the primary purpose of most genes is to code for proteins. The proteins are biological macromolecules that are of primary importance to all living organisms. If gene sequencing is like the recording of music, then proteins are like the playback.
Recently, there have been phenomenal growth in protein data bases followed from gene sequences and have raised various new challenges. There is now a wealth of information about the primary structure of proteins since the DNA sequence in a gene determines the amino acid sequence. In principle, this amino acid sequence determines the shape of the protein or how it folds into three- dimensions. One of the most challenging problems is how to predict the final three-dimensional shape from the amino acid sequence information.
Within this framework, a key problem in proteomics is to provide a methodology which, given a query molecule, will find other similar molecules within a large data base. This problem of matching aims to resolve functions of unknown proteins, and to design new enzymes for examples. In effect, the problem reduces to matching two configurations in 3-Dimensions of unequal size where the points are not labelled and the match has to be invariant under some transformation group. We will describe some new stochastic geometrical approaches to this problem. We will illustrate our methodology by matching active sites of two proteins. "

