Statistics Seminars: Gaussian and not-so-Gaussian clustering with robustness against outliers and a stab at the number of clusters
12 December 2016 14:00 in CM221
Cluster analysis has many applications, and there are many clustering methods, which tend to give the user quite different clusterings of the same dataset. One reason for the difficulty of the clustering problem is that there is ambiguity between outliers that may come in small groups and small clusters. Another one is that people mean different things when they use the term "cluster" and often cluster analysis is done without a proper problem definition that specifies what kinds of clusters are of interest.
In the first part of my talk I will present OTRIMLE, a robust method for clustering based on a Gaussian mixture model but allowing for some observations that could not reasonably be assigned to any cluster. OTRIMLE was introduced by Coretto and Hennig (2015a, 2015b) as "Robust Improper Maximum Likelihood" (RIMLE; "OTRIMLE" stands for "Optimally Tuned RIMLE"; the method needs as tuning a density level for noise/outliers).
Furthermore I will present a principle to choose a suitable number of clusters by a principle that is inspired Davies's (1995) Data Features: a model is "adequate" for a real dataset if, according to a certain statistic, data generated from the model look like the real data, and out of these models the simplest can be chosen, where simplicity can be traded against a low noise proportion.
Davies, P. L. (1995). Data Features. Statistica Neerlandica, 49, 185-245.
Coretto, P. & Hennig, C. (2015a). Robust improper maximum likelihood:
tuning, computation, and a comparison with other methods for robust
Gaussian clustering. Journal of the American Statistical Association,
Coretto, P. & Hennig, C. (2015b). A consistent and breakdown robust model-based
clustering method. arXiv:1309.6895.
Contact firstname.lastname@example.org for more information