# MATH2041 Statistical Concepts II

Anyone who collects information must decide how to draw useful conclusions from it. For example, can an opinion poll, involving maybe 1000 people, be trusted to give an accurate picture of everyone else's opinions? Answering this question requires that we combine our knowledge of probability theory with our understanding of how opinion polls are performed. The kind of reasoning that results is called statistical inference.

Other areas of popular debate where statistical problems arise include understanding the effects of food additives, interpreting the results of clinical trials of medical treatments, the reliability of electrical and other products and the incidence of leukaemia near nuclear power stations. A knowledge of statistics is essential not only to those who specialise in studying such phenomena but also to anyone who wishes to develop informed opinions about them. The module will introduce you to some basic ideas of statistical inference and develop solutions to some standard problems. There are two schools of thought about the fundamental principles of statistics, the Bayesians and the frequentists. The module will cover both viewpoints but the majority of methods presented will be the more widely used frequentist ones.

Practical computing sessions, using the freely available statistical package R, will be held throughout the year. They serve three purposes: to bring the module closer to the real world of applied statistics, to provide additional insight into the lectured material and to introduce you to an excellent piece of statistical software.

## Outline of Course

Aim: To introduce the main ideas and methods of statistics and statistical computing.

- Exploring Data: Summary statistics: mean, median, standard deviation, inter-quartile range, correlation. Ideas of location, scale and association. Displays: dot-plot, histogram, stem-and-leaf plot, boxplot and scatterplot. Exploration for model building.
- Probability Models: experiments. Sources of uncertainty. Estimation. Examples of linear models and least squares estimation. Mean and variance of linear model estimates. Simple implications for inference and design.
- Bayesian Inference: Inference using Bayes theorem. Prior, likelihood and posterior. Conjugate prior distributions for binomial, Poisson and mean of Gaussian samples. Prediction. Credible intervals. Limiting posterior distributions.
- Frequentist Inference: Some distribution theory. Confidence intervals. Large sample intervals for means and differences of means for continuous and binomial populations. Pooled sample variance for common variances. Large sample inference for linear model. t-distribution. Confidence intervals for Gaussian linear models. Significance testing. One-way analysis of variance.
- Likelihood Methods: Maximum likelihood estimation. Large sample behaviour of maximum likelihood estimator. Approximations to Fisher information. Confidence intervals and significance tests. Multinomial models. Independence hypothesis in contingency tables. Likelihood ratio tests.
- Goodness of Fit and Diagnostics: Importance of model validation. Likelihood based goodness of fit tests. Quantile-quantile plots. Use of residuals for linear models.
- Non-Parametric Inference: Order statistics, ranks and sample quantiles. Sign test. Confidence interval for median. Mann-Whitney-Wilcoxon and Kruskal-Wallis tests.

### Prerequisites

For details of prerequisites, corequisites, excluded combinations, teaching methods, and assessment details, please see the Faculty Handbook.

### Reading List

Please see the Library Catalogue for the MATH2041 reading list.

### Examination Information

For information about use of calculators and dictionaries in exams please see the Examination Information page in the Degree Programme Handbook.