Stats4Grads: Bayesian Inference of Mixture of Hidden Markov Models for Internet Browsing Behaviour
26 January 2011 14:15 in CM221
Clickstream data, defined as the aggregate sequence of page visits executed by a particular user as the user navigates through a website, can provide insight into the behaviour, buying habits and preferences of the website visitors. We model sequences of page requests within a session using a mixtures of hidden Markov models (MixHMM). The model provides a page categorization approach, as well as a method to label users into different clusters based on the web browsing pattern of the visitors. In a Bayesian framework, we use Markov Chain Monte Carlo (MCMC) sampling to simulate hidden Markov model (HMM) parameters from their posterior distribution conditional on observed data. We make the use of Forward-Backward Gibbs sampler technique to have rapid mixing in sampling. The model uses Dirichlet distributions as priors over visiting webpages of a website. The performance of the model is assessed over an artificial navigation pattern. Having applied the model to the real clickstream data from a commercial website, we illustrate that sensible page categorization and user classification are being learned.
See the Stats4Grads page for more details about this series.