We use cookies to ensure that we give you the best experience on our website. You can change your cookie settings at any time. Otherwise, we'll assume you're OK to continue.

Department of Mathematical Sciences

Seminar Archives

On this page you can find information about seminars in this and previous academic years, where available on the database.

Statistics Seminars: Estimation of Distribution Algorithms for Protein Structure Prediction via k-means clustering

Presented by Daniel Bonetti, Instituto Federal São Paulo (IFSP)

20 June 2017 15:30 in CM221

Proteins are essential for maintaining life. For example, knowing the structure of a protein, cell regulatory mechanisms of organisms can be modelled, enabling disease treatments or relationships between protein structures and food attributes can be determined. However, we know that discovering the structure of a protein is a difficult and expensive task that can cost five billion dollars and take 10 years just to figure out the cure of a specific disease. Computational methods have been developed to find proteins structures. They require several calculations to predict even a small protein, since it is hard to explore the large search. We developed an Estimation of Distribution Algorithm (EDA) specific for the ab initio Protein Structure Prediction (PSP) problem using full-atom representation. We developed a multivariate probabilistic model to address the correlation among dihedral angles of an EDA for PSP. We used the k-means clustering to find high density variable values in the search space. Then, we used these clusters to generate the offspring of the evolutionary process. For each generation and correlate variables, a new k-means clustering algorithm is performed. So, the k-means must create the clusters in a predefined amount of time. That ensures that the EDA does not spend too much time creating high quality models, since an average model has the enough quality needed. Furthermore, we compared the proposed probabilistic model with k-means against Finite Gaussian Mixtures and Multivariate Kernel Estimation.

Contact for more information