Research into Understanding Scientific Evidence

Richard Gott, Sandra Duggan, Ros Roberts and Ahmed Hussain

 

The Concepts of Evidence listed below are part of a working document which is being continually refined.  The tables in this document may be slightly out of date.  Please contact us for latest versions.

The latest downloadable version of the complete list can be obtained here.

 

GCSE in the UK

A version produced in collaboration with teachers (funded by AQA) which describes the sub-set of the complete list appropriate to GCSE science in the UK can be found here.

 

Concepts of evidence and their role in open-ended practical investigations and scientific literacy; background to published papers.

A report detailing a recent research project and links to the instruments used can be found found here.

 

Research publications can be found here.

Our research is based on the belief that there is a body of knowledge which underlies an understanding of scientific evidence. Certain ideas which underpin the collection, analysis and interpretation of data have to be understood before we can handle scientific evidence effectively. We have called these ideas concepts of evidence. Some pupils/students will pick up these ideas in the course of studying the more traditional areas of science, but many will not. These students will not understand how to evaluate scientific evidence unless the underlying concepts of evidence are specifically taught. If these ideas are to be taught, then they need to be carefully defined.

We are in the process of developing a comprehensive, but as yet tentative, definition of concepts of evidence ranging from the ideas associated with a single measurement to those which are associated with evaluating evidence as a whole. What follows is the latest version which has been, and continues to be, informed by research and writing in primary and secondary science education, in science-based industry and in the public understanding of science. Our definition is by no means complete and we welcome comments or suggestions from readers of this site.

The reader should note that we are not suggesting that students need to understand all of these concepts. Although we believe that some of these ideas are fundamental and appropriate at any age, others may be necessary only for a student engaged in a particular branch of science.

We are aware that some concepts, such as sensitivity, can have several meanings in different areas of science. We aim to point this out where applicable.

A framework for data and evidence

In any discussion of the place of data and evidence in science or engineering, we must avoid the trap of failing to define terms and, as a consequence, rendering the argument unintelligible. We shall therefore begin by defining what we mean by data and evidence.

We take datum to mean the measurement of a parameter e.g. the volume of gas or the type of rubber. This does not necessarily mean a single measurement: it may be the result of averaging several repeated measurements and these could be quantitative or qualitative.

Data we take to be no more or less than the plural of datum, to state the obvious.

Evidence, on the other hand, we take as data which have been subjected to some form of validation so that it is possible, for instance, to assign a 'weight' to the data when coming to an overall judgement. This process of weighting will need to look wider than the data itself. It will need to consider, for example, the quality of the experiment and the conditions under which it was undertaken, together with its reproducibility by other workers in other circumstances and perhaps the practicality of implementing the outcomes of the evidence.

 

We begin our definition in the centre of the figure above with the ideas that underpin the making of a single measurement and work outwards. This seems a logical way to proceed but, please note, that we are not suggesting that this equates with the order of understanding necessary for carrying out an experiment or the order in which these ideas are best taught.

Making a single measurement

To make a single measurement, the choice of an instrument must be suited to the value to be measured. Making an appropriate choice is informed by an understanding of the basic principles underlying measuring instruments.

1  Underlying relationships

All instruments rely on an underlying relationship which converts the variable being measured into another that is easily read. For instance, the following (volume, temperature and force) are measured by instruments which convert each variable into length:

  • a measuring cylinder converts volume to a length of the column of liquid
  • a thermometer converts temperature to a change in volume and then to a change in length of the mercury thread
  • a force meter converts a force into the changing length of a spring

Other instruments convert the variable to an angle on a curved scale, such as a car speedometer. Electronic instruments convert the variable to a voltage.

Some instruments are not so obviously 'instruments' and may not be recognised as such. One example is the use of lichen as an indicator of pollution and another is pH paper where chemical change is used as the basis of the 'instrument' and the measurement is a colour. Other instruments rely on more complex and less direct relationships.

 

Topic

Understanding that:

Example

1.                 Linear relationships

...most instruments rely on an underlying and preferably linear relationship between two variables.

A thermometer relies on the relationship between the volume of a liquid and temperature.

2.                 Non-linear relationships

...some 'instruments', of necessity, rely on non-linear relationships.

Moving iron ammeter, pH.

3.                 Complex relationships

...the relationship may not be straightforward and may be confounded by other factors.

The prevalence, or size, of a species of lichen is an indicator of the level of pollution but other environmental factors such as aspect, substrate, or air movement can also affect the distribution of lichen.

4.                 Multiple relationships

...sometimes several relationships are linked together so that the measurement of a variable is indirect.

Medical diagnosis often relies on indirect, multiple relationships. Braking distance is an indirect measure of frictional force.

 

2 Calibration and error

All instruments must be calibrated so that the underlying relationship is accurately mapped onto the scale. If the relationship is non-linear, the scale has to be calibrated more often to map that non-linearity. All instruments, no matter how well-made, are subject to error. Each instrument has finite limits on, for example, its resolution and sensitivity.

 

Topic

Understanding that:

Example

5.                 End points

...the instrument must be calibrated at the end points of the scale.

A thermometer must be calibrated at zero and 100.

6.                 Intervening points

...the instrument must be calibrated at points in between to check the linearity of the underlying relationship.

A thermometer must be calibrated at a number of intervening points to check, for instance, for non-linearity due to non-uniform bore of the capillary.

7.                 Zero Errors

...there can be a systematic shift in scale and that instruments should be checked regularly.

If the zero has been wrongly calibrated, if the instrument itself was not zeroed before use or if there is fatigue in the mechanical components, a systematic error can occur.

8.                 Overload, limiting sensitivity / limit of detection

...there is a maximum (full scale deflection) and a minimum quantity which can be measured reliably with a given instrument and technique.

The lower and upper ends of the scale of a measuring instrument place limits on the lowest and highest values that can be measured. It is all too easy to read an electronic meter (in particular) without realising it is on its end stop.

9.                 Sensitivity*

...the sensitivity of an instrument is a measure of the amount of error inherent in the instrument itself.

An electronic voltmeter will give a reading which fluctuates slightly.

10.             Resolution and error

...the resolution is the smallest division which can be read easily. The resolution can be expressed as a percentage.

If the instrument can measure to 1 division and the reading is 10 divisions, the error can be expressed as 10±1 or as a percentage error of 10%.

11.             Specificity**

...an instrument must measure only what it purports to measure.

This is of particular significance in biology where indirect measurements are used as 'instruments' e.g. bicarbonate indicator used as an indirect measure of respiratory activity in woodlice could be affected by other acids such as that produced by the woodlice during excretion.

12.             Use

...there is a prescribed procedure for using an instrument which, if not followed, will lead to systematic and / or random errors.

Taking a thermometer out of the liquid to read it will lead to systematically low readings. More specifically, there is a prescribed depth of immersion for some thermometers which takes account of the expansion of the glass and the mercury (or alcohol) which is not in the liquid.

13.             Human error

...even when an instrument is chosen and used appropriately, human error can occur.

Scales on measuring instruments can easily be misread.

*Sensitivity and **specificity have a different meaning in medicine in the measurement of disease where sensitivity is the true positive rate, that is, the proportion of patients with the disease who are correctly 'measured' or identified by the test. Specificity is the proportion of patients without the disease who are correctly measured or identified by the test. These two measures describe the 'measurement efficiency'.

 

3 Reliability and validity of a single measurement

Triangulation, by using more than one of the same instrument or by using another type of instrument, can increase reliability.

 

Topic

Understanding that:

Example

14.             Reliability

...instruments can be subject to inherent inaccuracy so that using different instruments can increase reliability.

Measurement of blood alcohol level can be assessed with a breathalyser and cross checked with a blood test. Temperature can be measured with a mercury, alcohol and digital thermometer to ensure reliability.

15.             Reliability

...human error in the use of an instrument can be overcome by independent, random checks.

Spot checks of measurement techniques by co-workers are sometimes built into routine procedures.

16.             Validity

...measures that rely on complex or multiple relationships must ensure that they are measuring what they purport to measure.

A complex technique for measuring a vitamin may be measuring more than one form of the same vitamin.

 

 Measuring a datum

Moving from the measuring instrument itself, we now turn to the actual measurement of a datum. The measurement of a single datum may be required or it may be as one of several data to be measured. A significant element of science in industry is indeed about the sophisticated and careful measurement of a single parameter.

1 The choice of an instrument for measuring a datum

Of prime importance is choosing the instrument to give the accuracy and precision required; a proactive choice rather than a reactive discovery that it wasn't the right instrument for the job!

 

Topic

Understanding that:

Example

17.             Trueness or accuracy*

...trueness is a measure of the extent to which repeated readings of the same quantity give a mean that is the same as the 'true' mean.

If the mean of a series of readings of the height of an individual pupil is 173 cm and her 'true' height, as measured by a clinic's instrument is 173 cm, the measuring instrument is 'true'.

18.             Repeatability

...repeated readings of the same quantity with the same instrument never give exactly the same answer.

Weighing yourself on a set of bathroom scales in different places on the bathroom floor, or standing on a slightly different position on the scales, will result in slightly differing readings. It is never possible to repeat the reading in exactly the same way.

19.             Precision

...precision (or imprecision) is a measure of the spread of the repeated measurements around the mean.

A precise measurement is one in which the readings cluster closely together. In the above example, a precise set of readings might be 175, 175.5, 175, 176, 175.5. A precise measurement need not be accurate or true, and vice versa.

20.             Outliers in relationships

...outliers, aberrant or anomalous values in data sets should be examined to discover possible causes.

Outliers may be due to gross error and for example, in medical laboratory practice, may have serious implications if not explored. If the source of the error, for example, can be explained by poor measurement procedures, then the outlying measurement can be discarded.

* Accuracy is a term which is often used rather loosely to indicate the combined effects of precision and trueness. But, in some science-based industries the distinction we have defined here is used widely so that, for example, the precision and accuracy of a given measurement are quoted routinely.

 

2 Sampling a datum

We shall use the term sampling to mean any sub-set of a 'population'. The 'population' might be the population of a species of animal or plant or even the 'population' of possible sites where gold might be found. We shall also take the population to mean the infinite number of repeated readings that could be taken of any particular measurement.

 

Topic

Understanding that:

Example

21.             Sampling

...one or more measurements comprise a sample of all the possible measurements that could be made.

The measurement of a single blade of grass is a sample of all the blades of grass in a field.

A single measurement of the bounce height of a ball is a sample of the infinite number of such bounces that could be measured.

22.             Size of sample

...the greater the number of readings taken, the more likely they are to be representative of the population.

As more readings of, for example, the height of students in a college are taken, the more closely the sample is likely to represent the whole college population.

The more times a single ball is bounced, the more the sample is likely to represent all possible bounces of that ball.

23.             Reducing bias in sample / representative sampling

...readings must be taken using an appropriate sampling strategy, such as random sampling, stratified or systematic sampling so that the sample is as representative as possible.

In the above example of the height of college students, tables of random numbers can be used to select students.

24.             An anomolous datum

 ...an unexpected datum could be indicative of inherent variation in the data or the consequence of a recognised uncontrolled variable.

In the above example, a very small height may have been recorded from a child visiting the college and should not be part of the population being sampled.  A very low rebound height from a squash ball may occur as a result of differences in the material of the ball and is therefore part of the sample.

 

3 Statistical treatment of measurements of a datum

Understanding that the statistical treatment of a datum is concerned with the probability that a measurement is within certain limits of the true reading. The following are some of the basic statistics associated with a single datum:

 

Topic

Understanding that:

25.             Range

...the range is a simple description of the distribution defines the maximum and minimum values measured.

26.             Mode

...the mode is the value which occurs most often.

27.             Median

...the median is the value below and above which there are half the measurements.

28.             Mean

...the mean (average) is the sum of all the measurements divided by the number of measurements.

29.             Frequency distributions

...a series of readings of the same datum can be represented as a frequency distribution by grouping repeated measurements which fall within a given range and plotting the frequencies of the grouped measurements.

30.             Standard deviation

...the standard deviation is a way of describing the spread of normally distributed data. The standard deviation depends on the measuring instrument and technique - the more precise these are, the smaller the standard deviation of the sample or of repeated measurements.

31.             Standard deviation of the mean (standard error)

...the standard deviation of the mean describes the frequency distribution of the means from a series of readings repeated many times. The standard deviation of the mean depends on the measuring instrument and technique AND on the number of repeats.

32.             Coefficient of variation

...the coefficient of variation is the standard deviation expressed as a percentage of the mean (CV = SD*100/mean).

33.             Confidence limits

...confidence limits indicate the degree of confidence that can be placed on the datum. For example, 95% confidence limits mean that 95% of the measurements in a normal distribution lie within 2 standard deviations of the mean.

 

4 Reliability and validity of a datum

Any datum must be subject to careful scrutiny to ascertain the extent to which it:

  • is valid: that is, the value of the appropriate variable has been measured
  • is reliable: for example, has the parameter been sampled so that the datum represents the population?

Only then can the datum be weighed as evidence. Evaluating a datum includes evaluating the reliability and validity of the ideas associated with the making of single measurements.

 

Topic

Understanding that:

Example

34.             Reliability

...a datum can only be weighed as evidence once the uncertainty associated with the instrument and the measurement procedures have been ascertained.

The reliability of a measurement of blood alcohol level should be assessed in terms of the uncertainty associated with the breathalyser and in terms of how the measurement was taken.

35.             Validity

...that a measurement must be of, or allow a calculation of, the appropriate datum.

The girth of a tree is not a valid indicator of the tree's age.

 

Data in investigations - looking for relationships

An investigation is an attempt to determine the relationship, or lack of one, between the independent and dependent variables or between two or more sets of data. Investigations take many forms but all have the same underlying structure.

1. The design of practical investigations

What do we need to understand to be able to appraise the design of an investigation in terms of validity and reliability?

1.1 Variable structure

Identifying and understanding the basic structure of an investigation in terms of variables and their types helps to evaluate the validity of data.

 

Topic

Understanding that:

Example

36.             The independent variable

...the independent variable is the variable for which values are changed or selected by the investigator.

The type of ball in an investigation to compare the bounciness of different types of balls; the depth in a pond at which light intensity is to be measured.

37.             The dependent variable

...the dependent variable is the variable the value of which is measured for each and every change in the independent variable.

In the same investigations as above: the height to which each type of ball bounces; the light intensity at each of the chosen depths in the pond.

38.             Correlated variables

… in some circumstances we are looking for a correlation only, rather than any implied causation

Foot size can be predicted from hand size (both ‘caused’ by other factors)

39.             Categoric variables

...a categoric variable has values which are described by labels. Categoric variables are also known as nominal data.

The variable 'type of metal' has values 'iron', 'copper' etc.

40.             Ordered variables

...an ordered variable has values which are also descriptions, labels or categories but these categories can be ordered or ranked. Measurement of ordered variables results in ordinal data.

The variable of size e.g.' very small', 'small', 'medium' or 'large' is an ordered variable. Although the labels can be assigned numbers (e.g. very small=1, small=2 etc.) size remains an ordered variable.

41.             Continuous variables

...a continuous variable is one which can have any numerical value and its measurement results in interval data.

Weight, length, force.

42.             Discrete variables

...a discrete variable is a special case in which the values of the variable are restricted to integer multiples.

The number of discrete layers of roof insulation.

43.             Multivariate designs

...a multi-variate investigation is one in which there is more than one independent variable.

...The effect of the width and the length of a model bridge on its strength. The effect of temperature and humidity on the distribution of gazelles in a particular habitat.

 

1.2 Validity, 'fair tests' and controls

‘Fair tests’ and controls aim to isolate the effect of the independent variable on the dependent variable. Laboratory-based investigations, at one end of the spectrum, involve the investigator changing the independent variable and keeping all the control variables constant. This is often termed 'the fair test', but is no more than one of a range of valid structures. At the other end of the spectrum are 'field studies' where many naturally changing variables are measured and correlations sought. For example, an ecologist might measure many variables in a habitat over a period of time. Having collected the data, correlations might be sought between variables such as day length and emergence of a butterfly, using statistical treatments to ensure validity. The possible effect of other variables can be reduced by only considering data where the values of other variables are the same or similar. In between these extremes, are many types of valid design which involve different degrees of manipulation and control. Fundamentally, all these investigations have a similar structure; what differs are the strategies to ensure validity.

 

Topic

Understanding that:

Example

44.             Fair test

...a fair test is one in which only the independent variable has been allowed to affect the dependent variable.

A laboratory experiment about the effect of temperature on dissolving time, where only the temperature is changed. Everything else is kept exactly the same.

45.             Control variables in the laboratory

...other variables can affect the results of an investigation unless their effects are controlled by keeping them constant.

In the above experiment, the mass of the chemical, the volume of liquid, the stirring technique and the room temperature are some of the variables that should be controlled.

46.             Control variables in field studies

...some variables cannot be kept constant and all that can be done is to make sure that they change in the same way.

In a field study on the effect of different fertilisers on germination, the weather conditions are not held constant but each experimental plot is subjected to the same weather conditions.

47.             Control variables in surveys

…the potential effect on validity of uncontrolled variables can be reduced by selecting data from conditions that are similar with respect to other variables.

In a field study to determine whether light intensity affects the colour of dog’s mercury leaves, other variables are recorded, such as soil nutrients, pH and water content. Correlations are then sought by selecting plants growing where the value of these variables is similar.

48.             Control group experiments

...control groups are used to ensure that any effects observed are due to the independent variable(s) and not some other unidentified variable.

In a drug trial, patients with the same illness are divided into an experimental group who are given the drug and a control group who are given a placebo or no drug.

 

1.3 Choosing values

The values of the variables need to be chosen carefully. This is possible in the majority of investigations prior to the data being collected. In field studies, where data are collected from variables that change naturally, some of these concepts can only be applied retrospectively.

 

Topic

Understanding that:

Example

49.             The sample

...issues of sample size and representativeness apply in the same way as in sampling a datum (see Measuring a datum, 2).

The choice of sample size and the sampling strategy will affect the validity of the findings.

50.             Relative scale

...the choice of sensible values for quantities is necessary if measurements of the dependent variable are to be meaningful.

In differentiating the dissolving times of different chemicals, a large quantity of chemical in a small quantity of water causing saturation will invalidate the results.

51.             Range

...the range over which the values of the independent variable is chosen is important in ensuring that any pattern is detected.

An investigation into the effect of temperature on the volume of yeast dough using a range of 20 - 25°C would show little change in volume.

52.             Interval

...the choice of interval between values determines whether or not the pattern in the data can be identified.

An investigation into the effect of temperature on enzyme activity would not show the complete pattern if 20°C intervals were chosen.

53.             Number

...a sufficient number of readings is necessary to determine the pattern.

The number is determined partly by the range and interval issues above but, in some cases, for the complete pattern to be seen, more readings may be necessary in one part of the range than another. This applies particularly if the pattern changes, for example, in a mass and spring extension experiment at the top of the range.

 

1.4 Accuracy and precision

The design of the investigation must provide data with sufficiently appropriate accuracy and precision to answer the question. This consideration should be built into the design of the investigation. Different investigations will require different levels of accuracy and precision depending on their purpose.

 

Topic

Understanding that:

Example

54.             Determining differences

...there is a level of precision which is sufficient to provide data which will allow discrimination between two or more means.

The degree of precision required to discriminate between the bounciness of a squash ball and a ping pong ball is far less than that required to discriminate between two ping pong balls.

55.             Determining patterns

...there is a level of precision which is required for the trend in a pattern to be determined.

Large error bars on the points on a line graph may not allow discrimination between an upward curve or a straight line.

 

1.5 Tables

Tables can be used to design an experiment in advance of the data collection and, as such, contribute towards its validity. In this way, tables can be much more than just a way of presenting data, after the data have been collected.

 

Topic

Understanding that:

Example

56.             Tables

...tables can be used as organisers for the design of an experiment by preparing the table in advance of the whole experiment. A table has a conventional format.

An experiment on the effect of temperature on the dissolving time of sodium chloride:

 

 

1.6 Reliability and validity of the design

In evaluating the design of an investigation, there are two overarching questions:

  • Will the measurements result in sufficiently reliable data to answer the question?
  • Will the design result in sufficiently valid data to answer the question?

Evaluating the design of an investigation includes evaluating the reliability and validity of the ideas associated with the making of single measurements and with each and every datum.

 

Topic

Understanding that:

Example

57.             Reliability of the design

...the reliability of the design includes a consideration of all the ideas associated with the measurement of each and every datum.

...the reliability of the design includes a consideration of all the ideas associated with the measurement of the data.

Factors associated with the choice of the measuring instruments to be used must be considered e.g. the error associated with each measuring instrument.

The sampling of each datum and the accuracy and precision of the measurements should also be considered.

This includes the sample size, the sampling technique, relative scale, the range and interval of the measurements, the number of readings, and the appropriate accuracy and precision of the measurements.

58.             Validity of the design

...the validity of the design includes a consideration of the reliability (as above) and the validity of each and every datum.

...the validity of the design includes a consideration of the reliability (as above) and the validity of the data.

This includes the choice of measuring instrument in relation to whether the instrument is actually measuring what it is supposed to measure.

This includes considering the ideas associated with the variable structure and the concepts associated with the fair test.

For example, measuring the distance travelled by a car at different angles of a ramp will not answer a question about speed as a function of angle.

 

2. Data presentation, patterns and relationships in practical investigations

Having established that the design of an investigation is reliable and valid, what do we need to understand to explore the relationship between one variable and another? Another way of thinking about this is to think of the pattern between two variables or 2 sets of data. What do we need to understand to know that the pattern is valid and reliable? The way that data are presented allows patterns to be seen.

2.1 Data presentation

There is a close link between graphical representations and the type of variable they represent.

 

Topic

Understanding that:

Example

59.             Tables

...a table is a means of reporting and displaying data. But a table alone presents limited information about the design of an investigation e.g. control variables or measurement techniques.

Simple patterns such as directly proportional or inversely proportional relationships can be shown effectively in a table.

60.             Bar charts

...bar charts can be used to display data in which the independent variable is categoric and the dependent variables is continuous.

The number of pupils who can and cannot roll their tongues would be best presented on a bar chart.

61.             Line graphs

...line graphs can be used to display data in which both the independent variable and the dependent variable are continuous. They allow interpolation and extrapolation.

The length of a spring and the mass applied would be best displayed in a line graph.

62.             Scatter graphs (or scatter plots)

...can also be used to display data in which both the independent variable and the dependent variable are continuous. Scatter graphs are often used where there is much fluctuation in the data because they can allow an association to be detected. Widely scattered points can show a weak correlation, points clustered around, for example, a line can indicate a relationship.

The dry mass of the aerial parts of a plant and the dry mass of the roots.

63.             Histograms

...histograms can be used to display data in which a continuous independent variable has been grouped into ranges and in which the dependent variable is continuous.

On a sea shore, the distance from the sea could be grouped into ranges and the number of limpets in each range plotted in a histogram.

64.             Box and whisker plots

...the box, in box and whisker plots, represents 50% of the data limited by the 25th and 75th percentile. The central line is the median. The limits of the 'whiskers' may show either the extremes of the range or the 2.5% and 97.5% values.

Box and whisker plots are often used to compare large data sets.

65.             Multi-variate data

...3D bar charts and line graphs (surfaces) are suitable for some forms of multivariate data.

 

66.             Other forms of display

...data can be transformed, for example, to logarithmic scales so that they meet the criteria for normality which allows the use of parametric statistics.

Logarithmic transformation is commonly used in clinical and laboratory medicine.

 

2.2 Statistical treatment of measurements of data

There are a large number of statistical techniques for analysing data which address three main questions:

  • Do two groups of data differ from each other?
  • Do data change when repeated on a second occasion?
  • Is there an association between two sets of data?

Statistics consider the variability of the data and present a result based on probability. Each statistical technique has associated criteria depending on, for example, the type of data, its distribution, the sample size etc. Some common methods of statistical analysis of data are shown below.

 

Topic

Understanding that:

67.             Differences between means

...a t-test can be used to estimate the probability that two means from normally distributed populations, derived from an investigation involving a categoric independent variable, are different. If measures are repeated with the same or matched pairs, then a paired t-test can be used.

68.             Analysis of variance

...analysis of variance is a technique which can be used to estimate the effects of a number of variables in a multi-variate problem involving categoric independent variables.

69.             Linear and non-linear regression

...regression can be used to derive the 'line of best fit' for data resulting from an investigation involving a continuous independent variable.

70.             Non-parametric measures

...when the measurements are not normally distributed, non-parametric tests, such as the Mann-Whitney U-test, can be used to estimate the probability of any differences.

71.             Categoric data

...when the data results from an investigation in which both independent and dependent variables are categoric, the analysis of the data must use, for instance, a chi-squared test.

 

2.3 Patterns and relationships in data

Patterns represent the behaviour of variables so that they cannot be treated in isolation from the physical system that they represent. Patterns can be seen in tables or graphs or can be reported by using the results of appropriate statistical analysis. The interpretation of patterns and relationships must respect the limitations of the data: for instance, there is a danger of over-generalisation or of implying causality when there may be a different, less direct type of association.

 

Topic

Understanding that:

Example

72.             Types of relationships between variables

...relationships

·                can be linear and directly proportional

·                can be linear (y=mx+c) but not directlyproportional

·                can follow predictable curves (y=x2)

·                can be modelled mathematically to give approximations to parts of the curve

·                can be purely empirical and not be represented by any simple mathematical relationship

 

Hooke's Law.

The length of a spring and load.

 

Height and time for a falling object.

 

The terminal velocity of a parachute and its surface area.

73.             Interpretation of patterns

...there are different types of association such as causal, consequential, indirect or chance associations.

…differences or change may or may not be significant.

In any large multivariate set of data, there will be associations, some of which will be chance associations. Even if x and y are highly correlated, x does not necessarily cause y: y may cause x or z may cause x and y.

Changes in students' understanding before and after an intervention may not be significant and / or may be due to other factors.

 

3. Reliability and validity of the data in the whole investigation

In evaluating the whole investigation, all the foregoing ideas about evidence need to be considered in relation to the two overarching questions:

  • Are the data reliable?
  • Are the data valid?

In addressing these two questions, ideas associated with the making of single measurements and with each and every datum in an investigation should be considered. The evaluation should also include a consideration of the design of an investigation, ideas associated with measurement, with the presentation of the data and with the interpretation of patterns and relationships.

Data to evidence - comparisons with other data

So far we have considered the data in a single investigation. In reality, the results of an investigation will usually be compared with other data.

 

Topic

Understanding that:

74.             A series of experiments

...a series of experiments can add to the reliability and validity of evidence even if, individually, their precision does not allow much weight to be placed on the results of any one experiment alone.

75.             Secondary Data

...data collected by others is a valuable source of additional evidence, provided its value as evidence can be judged. Meta-analyses.

76.             Triangulation

...triangulation with other methods can strengthen the validity of the evidence.

 

Societal issues

Finally, in reality, if we are faced with evidence and we want to arrive at a judgement, then other factors will also come into the equation, some of which are listed below. If the evidence is non-existent or evaluated as unreliable and / or invalid, then societal issues will be the sole means of arriving at a judgment.

 

Topic

Understanding that:

77.             Credibility

...consistency with accepted ideas (usually), common sense and personal experience may be necessary for the validity of the evidence to be accepted. Face validity.

78.             Practicality

...the implications from the evidence are practical and cost effective. For example, the side effects of a drug may outweigh its benefits for all but seriously ill patients.

79.             Bias

...evidence from a particular source must be inspected for inherent bias of the experimenters. Possible bias may be due to funding sources or intellectual rigidity (e.g. cancer and smoking funded by the tobacco industry, retention of ideas about an oscillating universe or a flat earth).

80.             Power structures

...evidence can be accorded undue weight, or dismissed too lightly, simply by virtue of its political significance and the operation of influential bodies.

81.             Acceptability

...evidence can be denied or dismissed for what may appear to be illogical reasons such as public and political fear of its consequences (e.g. BSE, traffic pollution). Prejudice and preconceptions can also interfere with the acceptance of the evidence and its consequences.

82.             Status

...the academic or professional status, experience and authority of the experimenters may influence the weight which is placed on the evidence.

 

To find out more about our research into understanding scientific evidence...

Here are some of our most recent publications:

Roberts, R., and Gott, R. (2010)

Questioning the evidence for a claim in a socio-scientific issue: an aspect of scientific literacy.

Research in Science & Technological Education, 28: 3, 203 — 226

Roberts, R., Gott, R. and Glaesser, R. (2010)

Students’ approaches to open-ended science investigation: the importance of substantive and procedural understanding.

Research Papers in Education. 25(4), 377-407

Roberts, R. (2009)

How Science Works (HSW).

Education in Science. June 2009, no 233, 30-31

Roberts, R. (2009)

Can teaching about evidence encourage a creative approach in open-ended investigations?

School Science Review, 90(332) pp31-38 ISSN: 0036-6811

Glaesser, J., Gott, R., Roberts, R. & Cooper, B. (2009)

Underlying success in open-ended investigations in science: using qualitative comparative analysis to identify necessary and sufficient conditions.

Research in Science and Technological Education, 27,1,5-30.

 

Glaesser, J., Gott, R., Roberts, R. & Cooper, B. (2009)

The roles of substantive and procedural understanding in open-ended science investigations: Using fuzzy set Qualitative Comparative Analysis to compare two different tasks

Research in Science Education. 39, 4 (2009), 595-624.

 

Roberts, R. and Gott, R. (2008)

Practical work and the importance of scientific evidence in science curricula.

Education in Science, Nov 2008, 8-9.

Gott, R. and Roberts, R. (2008)

Concepts of evidence and their role in open-ended practical investigations and scientific literacy; background to published papers.

Durham, Durham University

 

Gott R. and Duggan, S. (2007)

A framework for practical work in science and scientific literacy through agrumentation

Res. in Sc. and Tech. Educ. 25 (3)

 

Roberts, R and Gott R. (2007)

Questioning the Evidence: research to assess an aspect of scientific literacy.

Proceedings of European Science Education Research Association (ESERA) conference, Malmo, Sweden, August 2007

 

Roberts, R and Gott R. (2007)

Evidence, investigations and scientific literacy: what are the curriculum implications?

Proceedings of National Association for Research in Science Teaching (NARST) conference, New Orleans, April 2007

 

Gott R. and Duggan, S. (2006)

Investigations, scientific literacy and evidence

Hatfield

 

Roberts, R and Gott R. (2006)

The role of evidence in the new KS4 National Curriculum and the AQA specifications

School Science Review 87 (321)

 

Roberts, R and Gott R. (2006)

Assessment of performance in practical science and pupil attributes.

Assessment in Education 13 (1)

 

Roberts, R and Gott R. (2004)

A written test for procedural understanding: a way forward for assessment in UK science education

Res. in Sc. and Tech. Educ. 22 (1)

 

Roberts, R. (2004)

Using Different Types of Practical within a Problem-Solving Model of Science.

School Science Review 85 (312)

Roberts, R. and Gott, R (2004)

 

Assessment of Sc1: alternatives to coursework?

School Science Review 85 (313)

Gott, R and Duggan, S. (2003)

Understanding and Using Scientific evidence.

 

Sage, London

Gott, R and Duggan S.  (2003)

Building success in Sc 1.  Workbook and interactive CD ROM 

 

Folens, Bedfordshire.

Roberts, R and Gott R (Feb 2003)

Written tests for procedural understanding in science: why? And would they work?

 

Education in Science, Feb 2003, 16-18.

Roberts, R and Gott R (2003)

Assessment of biology investigations. 

Jnl. of Biol. Ed. 37, 3, 114-121

Gott R. and Duggan S. (2002)

Performance assessment of practical science in the UK National Curriculum

Cambridge Journal of Education., 32, 2, 183 – 201

Roberts, R and Gott, R.(2002)

Investigations: collecting and using evidence. 

In Teaching Scientific Enquiry, ASE/John Murray (Sang D Ed).

Duggan S. and Gott R.  (2002)

What sort of science do we really need?

Int. J. Sci. Ed. 24, 7, 661-679

Roberts R. 2001

Procedural understanding in biology: “the thinking behind the doing”

Journal of Biological Education 35 (3) 113-117

Tytler R., Duggan S. and Gott R. 2001

Public participation in an environmental dispute: implications for science education

Public Understanding of Science 10 343-364

Tytler R., Duggan S. and Gott R.  2001

Dimensions of evidence, the public understanding of science and science education

Int. J. Sci. Ed., 23, 8, 815-832

Duggan S. and Gott R. 2000

Intermediate GNVQ science: a missed opportunity?

Research in Science and Technological Education 18 (2) 201-214

Duggan, S. and Gott, R (2000) 

Understanding evidence in science: the way to a more relevant curriculum.   

In Issues in science teaching.  Sears J. and Sorenson P, Routledge, London, pp60-70.

Roberts R. and Gott R. 2000

Procedural understanding in biology: how is it characterised in texts?

School Science Review 82 (298) 83-91

Gott, R, Duggan, S and Roberts, S. (1999)

The science investigation workshop.

Education in Science 183, 26-27

Gott R., Foulds K. and Johnson P. 1997

Science Investigations Book 1

Collins Educational

Gott R., Foulds K. and Jones M. 1998

Science Investigations Book 2

Collins Educational

Gott R., Foulds K. and Roberts R. 1999

Science Investigations Book 3

Collins Educational

Gott R. and Duggan S. 1998

Understanding scientific evidence - why it matters and how it can be taught.  In: ASE Secondary Science Teachers’ Handbook Ed. M. Ratcliffe

Stanley Thornes (Publishers) Ltd

Gott R., Duggan S. and Johnson P. 1999

What do practising applied scientists do and what are the implications for science education?

Research in Science and Technological Education 17 (1) 97-107)

Roberts R. and Gott R. 1999

Procedural understanding: its place in the biology curriculum

School Science Review 81 (294) 19-25

 

Last updated:  12/10/2010

To comment on the content of these web pages or for further information,

please contact:Rosalyn.Roberts@durham.ac.uk