Cookies

We use cookies to ensure that we give you the best experience on our website. You can change your cookie settings at any time. Otherwise, we'll assume you're OK to continue.

Department of Mathematical Sciences

Seminar Archives

On this page you can find information about seminars in this and previous academic years, where available on the database.

Statistics Seminars: Gaussian and not-so-Gaussian clustering with robustness against outliers and a stab at the number of clusters

Presented by Christian Hennig, University College London

12 December 2016 14:00 in CM221

Cluster analysis has many applications, and there are many clustering methods, which tend to give the user quite different clusterings of the same dataset. One reason for the difficulty of the clustering problem is that there is ambiguity between outliers that may come in small groups and small clusters. Another one is that people mean different things when they use the term "cluster" and often cluster analysis is done without a proper problem definition that specifies what kinds of clusters are of interest.

In the first part of my talk I will present OTRIMLE, a robust method for clustering based on a Gaussian mixture model but allowing for some observations that could not reasonably be assigned to any cluster. OTRIMLE was introduced by Coretto and Hennig (2015a, 2015b) as "Robust Improper Maximum Likelihood" (RIMLE; "OTRIMLE" stands for "Optimally Tuned RIMLE"; the method needs as tuning a density level for noise/outliers).

Furthermore I will present a principle to choose a suitable number of clusters by a principle that is inspired Davies's (1995) Data Features: a model is "adequate" for a real dataset if, according to a certain statistic, data generated from the model look like the real data, and out of these models the simplest can be chosen, where simplicity can be traded against a low noise proportion.

References:
Davies, P. L. (1995). Data Features. Statistica Neerlandica, 49, 185-245.

Coretto, P. & Hennig, C. (2015a). Robust improper maximum likelihood:
tuning, computation, and a comparison with other methods for robust
Gaussian clustering. Journal of the American Statistical Association,
published online.

Coretto, P. & Hennig, C. (2015b). A consistent and breakdown robust model-based
clustering method. arXiv:1309.6895.

Contact sunil.chhita@durham.ac.uk for more information