Glossary

Mean
The most widely used measure of location.  The sum of all observations divided by the number of observations.  Sample means are symbolised by  (x(bar)), while population means are generally symbolised as m.

Median
The “middle value” if the data are listed in rank order.  If there are two central values (n even) then the median is simply the average of these.  The median is a useful statistic when we are dealing with highly skewed data.

Mode
The most commonly observed value (or set of values) in a data set.  For continuous variates we cite the modal class (or classes).  The mode is a useful characteristic when we wish to quote the most “fashionable” observation.

Range
The difference between the highest and lowest values.  Perhaps the simplest measure of dispersion in data, but by definition, it is strongly influenced by extreme untypical values.

Variance
The most important measure of dispersion.  It is the average squared deviation of values from their mean.  If we are estimating the variance in a population as judged from a sample (by far the most common practice) then the variance (symbolised by s2) is given by:
 

Standard deviation
The square root of the variance.  This important measure of dispersion is essentially an attempt to undo the effect of squaring when the variance is calculated.  The standard deviation of a population as estimated from a sample is symbolised (s) and is given by:

Confidence limits
The upper and lower values between which the true mean will lie with particular probability (e.g. 95% or 99%).  For large samples (n > 30) the 95% and 99% confidence limits are given by:

Poisson distribution
A discrete probability distribution which models the outcome of rare and random events.  If the mean number of rare and random events per sample is m then the probability px of getting x events in a given sample is given by:

Binomial distribution
A discrete probability distribution in which there are two alternatives (e.g. heads/tails, success/failure).  If p is the probability of one outcome (outcome 1) and q is the probability of the alternative (= 1 – p, outcome 2), then the probability of getting x outcome 1’s in n trials (assuming events are independent) is given by:

  
 
Normal distribution
An important continuous distribution, characterised entirely by its mean m and standard deviation s.  Sometimes referred to the Gaussian distribution it is the classical bell-shaped curve with the mean, median and mode all lying on the line of symmetry.  It is widely used in statistics, not least because the central limit theorem dictates that repeated sample means drawn from a normally distributed population will themselves tend to be normally distributed.

Central limit theorem
The means of samples from a normally distributed population are themselves normally distributed, regardless of the sample size n used to calculate the mean.  This is a robust theorem! As sample size increases then the means of samples drawn from a population of any distribution will approach a normal distribution.
 

Standard error
A term used to describe the standard deviation of any estimate.  It is particularly used to refer to the standard deviation of sample means around the population mean. When used in this context, the standard error is estimated by:

Estimated standard error = s / Ö n