Statistics Seminars: Mixture Model Component Cluster Trees
20 February 2009 14:00 in CM221
One of the most commonly used parametric clustering methods - model-based clustering - assumes that continuous data (possibly after a transformation) comes from a mixture of Gaussian components. The common implicit assumption is that once the best such mixture has been chosen to fit the data, each mixture component is a cluster estimating an underlying (sub-population) group. Clearly there will be issues with such an assumption if the underlying groups do not have Gaussian distributions. While the mixture will still fit the data well, it is likely that if the true underlying groups are non-symmetric, skewed, heavy-tailed, curvilinear or if there are outliers then the number of components in the model will overestimate the number of groups. We look at using hierarchical clustering methods based on a distance defined by the estimated mixture to create a dendrogram with components as leaves - a component cluster tree. This can be used to identify sub-mixtures of combinations of components that will better estimate the underlying groups.
Contact firstname.lastname@example.org for more information