Mean
The most widely used measure of location. The sum of all observations
divided by the number of observations. Sample means are symbolised
by
(x(bar)),
while population means are generally symbolised as m.
Median
The “middle value” if the data are listed in rank order.
If there are two central values (n even) then the median is simply the
average of these. The median is a useful statistic when we are dealing
with highly skewed data.
Mode
The most commonly observed value (or set of values) in a data set.
For continuous variates we cite the modal class (or classes). The
mode is a useful characteristic when we wish to quote the most “fashionable”
observation.
Range
The difference between the highest and lowest values. Perhaps
the simplest measure of dispersion in data, but by definition, it is strongly
influenced by extreme untypical values.
Variance
The most important measure of dispersion. It is the average squared
deviation of values from their mean. If we are estimating the variance
in a population as judged from a sample (by far the most common practice)
then the variance (symbolised by s2)
is given by:
Standard deviation
The square root of the variance. This important measure of dispersion
is essentially an attempt to undo the effect of squaring when the variance
is calculated. The standard deviation of a population as estimated
from a sample is symbolised (s) and is given
by:
Confidence limits
The upper and lower values between which the true mean will lie with
particular probability (e.g. 95% or 99%). For large samples (n
> 30) the 95% and 99% confidence limits are given by:
Poisson distribution
A discrete probability distribution which models the outcome of rare
and random events. If the mean number of rare and random events per
sample is m then
the probability px of getting x
events in a given sample is given by:

Binomial distribution
A discrete probability distribution in which there are two alternatives
(e.g. heads/tails, success/failure). If p
is the probability of one outcome (outcome 1) and q
is the probability of the alternative (= 1 – p, outcome
2), then the probability of getting x
outcome 1’s in n trials (assuming events are
independent) is given by:
Central limit theorem
The means of samples from a normally distributed population are themselves
normally distributed, regardless of the sample size n
used to calculate the mean. This is a robust theorem! As sample size
increases then the means of samples drawn from a population of any distribution
will approach a normal distribution.
Standard error
A term used to describe the standard deviation of any estimate.
It is particularly used to refer to the standard deviation of sample means
around the population mean. When used in this context, the standard error
is estimated by: