Statistics Seminars: Robust Clustering Using Exponential Power Mixtures
9 November 2009 14:15 in CM221
Clustering is a widely used technology in extracting useful information from gene expression data, where unknown correlation structures in genes are believed to persist even after normalisation. Such correlation structures pose great challenge to the clustering methods based on Gaussian mixture model (GM), k-means (KM), and partitioning around medoids (PAM) as they are not robust against general dependence within data. Here we use the exponential power mixture models to increase the robustness of clustering against general dependence and non-Gaussian components in the data. An expectation-conditional-maximisation algorithm is developed to calculate the maximum likelihood estimators of the unknown parameters in these mixtures. The Bayesian Information Criterion (BIC) is then employed to select the numbers of components in these mixture models. The maximum likelihood estimators are shown to be consistent under sparse dependence. Our numerical results indicate that the proposed procedure outperforms GM, KM, and PAM when there are strong correlations or non-Gaussian components in the data. This is a joint work with Faming Liang in Texas A&M University.
Contact firstname.lastname@example.org for more information