Bob Kentridge 1995
Comparative Psychology: Lecture 4.
Predictability or surprise?
At the end of last lecture we'd arrived at the conclusion that the
predictability of CS-US as a pair of events rather than their temporal
contiguity or even their frequency of co-occurrence was a crucial
factor in determining the strength of conditioning that would occur
between them. We take this approach, dissecting the details of the
processes underlying classical conditioning, much further. We will
try and tease out the role of the predictability or surprisingness of
stimuli in classical conditioning. In order to do so we need to look
at two other classical conditioning phenomena first.
Overshadowing.
In trying to explain the classical conditioning process we will be
looking at evidence from experiments where two (or even more)
CSs are presented with a single US. By varying aspects of the CSs
and comparing the effectiveness of their conditioning to the single
US we hope to discover what properties of the CS (and, eventually
the US), as opposed to the CS-US pair, determine the effectiveness
of conditioning.
Probably the simplest experiment comparing two CSs is a
demonstration of the phenomenon of overshadowing. In this
experiment we take two CSs, CS1 and CS2, which, during training,
are always presented together. After training we measure the
strength of conditioning to the stimuli CS1 and CS2 presented
individually. We typically find that the strength of conditioning to
each CS depends on their relative intensity. If CS1 is a dim light
and CS2 a bright light then, after conditioning to the CS1-CS2
combination, the CR to the bright light is very strong while the
dim light alone produces little or no reaction. We refer to the
general perceived strength of stimuli as their salience. Although
it might be related to the physically measurable intensity of
stimuli, salience is refers to the intensity of the subjective
experience of stimuli, not of the objective intensity of the stimuli
themselves. Some more examples make this clear. The two CSs
need not even be in the same modality, a loud tone might
overshadow a dim light for example. We cannot equate the
physical intensities of sound and light, yet it is reasonable to
characterise their relative subjective salience. In this situation
then it doesn't make sense to say that the strength of
conditioning between a CS and a US depends on the intensity of
the CS, if we are to derive general principles of classical
conditioning we must say that the strength of CS-US learning
depends on the salience of the CS. Salience, as subjective
experience, varies between individuals, and, more importantly,
between species. We, who have colour vision, might find a red
light more salient that a green light of equal intensity, while a rat,
which only has monochromatic vision will find them equally
salient. Salience is depends on some combination of the physical
characteristics of stimuli and of the sensory systems of the
perceiver.
Blocking.
In addition to variations in the subjective characteristics we can
also investigate variations in the history of these experiences.
Suppose once again that we use two stimuli CS1 and CS2 which
will be paired with a single US. Rather than presenting the two
CSs together throughout the training of an animal we only use CS1
in the first half of training and then use the CS1-CS2 combination
together, just as we would in an overshadowing experiment for
the second half of training. We could, for example, give an animal
50 trials in which they experience a CS1-US combination followed
by 50 in which they experienced as CS1-CS2-US combination. The
result, in general, is that when subsequently tested individually
the animal will show string conditioning to CS1 and little or no
conditioning to CS2. The effect where the prior pairing of one
stimulus with a US stops the US being associated with other
subsequently presented stimuli is called blocking.
Procedure for a simple blocking experiment:
Group Name 1st 50 trials 2nd 50 trials Test
Blocking CS1-US CS1-CS2 and US CS2 alone
Control nothing CS1-CS2 and US CS2 alone
To control for the possible confounding effects of overshadowing we
could also run experiments in which the roles of CS1 and CS2 were
reversed so that CS2 was experienced paired by itself with the US for
50 trials before the 50 trials of training with the CS1-CS2 compound
stimulus. In this experiment we would normally expect strong
conditioning to CS2 and little conditioning to CS1. These results are
quite robust, if we ran another group to examine overshadowing with 50
compound CS1-CS2 trials and no earlier experience of one or other CS
paired alone with the US we would normally find that the effects in
the other groups of experiencing one or other stimulus along with the
US outweighs any overshadowing effect we might find. In fact, to
produce and experiment in which there is no ambiguity in the results
we really need a control group which experiences the same stimuli as
the experimental 'blocking' group, but in which CS1 is not predictive
of the US in the first phase.
Blocking and predictability experiment:
Group Name Phase 1 Phase 2
Correlated Correlated CS1 and US CS1-CS2 and US
Uncorrelated Uncorrelated CS1 and US CS1-CS2 and US
Overshadowing US alone CS1-CS2 and US
No US CS1 alone CS1-CS2 and US
We then go on to test the strength of association between CS2 and
the US using a suppression ratio procedure. Rescorla carried out
this experiment like in 1971 (although his descriptions of the
groups and their names differ a little from mine), he found the
results which we expected - the strength of conditioning to CS2
acquired during phase 2 was much weaker in the group which
had received prior correlated parings of CS1 and the US than in
the groups which had received no prior pairings (the
overshadowing group) or which had received random
presentations of CS1 and the US. Finally, the group which had
received CS1 alone with no US in phase 1 showed even stronger
conditioning to CS2 than the overshadowing or uncorrelated
controls. We will return to this last group later.
Rescorla and Wagner's 1972 model of classical conditioning.
How can we explain these results? Both Rescorla and Leon Kamin,
who originally discovered and named blocking, settled on the
explanation that associations are only learned when a surprising
event accompanies a CS. According to this theory in a normal
simple conditioning experiment the US is surprising the first few
times it is experienced so it is associated with salient stimuli
which immediately precede it. In a blocking experiment once the
association between the CS (CS1) presented in the first phase of
the procedure and the US has been made the US is no longer
surprising (since it is predicted by CS1). In the second phase,
where both CS1 and CS2 are experienced, as the US is no longer
surprising it does not induce any further learning and so no
association is made between the US and CS2. This explanation was
presented by Rescorla and Wagner in 1972 as a formal model of
conditioning which expresses the capacity a CS has to become
associated with a US at any given time. This associative strength
of the US to the CS is referred to by the letter V and the change
in this strength which occurs on each trial of conditioning is called
dV. Our informal explanation boils down to the notion that the
more a CS is associated with a US the less additional association
the US can induce. We can express our informal explanation of the
role of US surprise and of CS (and US) salience in the process of
conditioning as follows:
dV = ab(L - V)
where a is the salience of the US, b is the salience of the CS and L
is the amount of processing (attention?) given to a completely
unpredicted US. Let us go through the implications of this equation
in detail. When the US is first encountered the CS has no association
to it so V is zero. On the first trial the CS gains a strength of abL
in its association with the US which is proportional to the saliences
of the CS and the US and to the initial amount of processing given to
the US. As we start trial two the associative strength is V is abL so
the change in strength that occurs with the second pairing of the CS
and US is ab(L - abL). It is smaller than the amount learned on the
first trial and this reduction in amount that is learned reflects the
fact that the CS now has some association with the US, so the US is
less surprising. As more trials ensue the equation predicts a
gradually decreasing rate of learning which reaches an asymptote at L.
Unfortunately this isn't exactly what is seen when the development
CS-US associations is measured over time. Instead we see a slower
start to learning, followed by a lot of learnig which tails off quite
quickly. This appears to rather undermine the trouble of formalising
our understanding of classical conditioning into an equation - after
all, what is the point if the equation can't predict thing? Rescorla
has argued that the equation is consisten with observed behaviour if
one assumes that very small changes in associative strength are
undetectable and that there is a limit to the amount of effect that
very large changes can have on behaviour. It is, not, however, this
kind of prediction which rescues the Rescorla-Wagner equation as a
truly worthwhile bit of formalism - we will see something I find
really impressive later.
The equation can also be applied to a number of CSs each of which
contributes to an overall associative strength V of the US in the
right hand side of the equation. It is reasonably clear that the
presence of the CS salience term b in the equation lets it account
for overshadowing. The meaning of the equation is clearest if we
think of the specific dVs on the left hand side as referring to the
increments in association between specific CSs while V on the
right hand side is referring to the predictability of the US and so is
the sum of all the different CS-US associations. If we denote the
conditioning strength accrued to CS1 by dV1 and that to CS2 by
dV2 then our equations are
dV1 = ab1(L - V)
dV2 = ab2(L - V)
and both dV1 and dV2 accrue to V on each trial. The amount of
association directed to each CS is simply proportional to their
salience.
The equation also models blocking well. During the initial phase
of a blocking experiment the associative strength of the US is
increased so later, when a second CS is presented the amount of
associative strength it can gain has been reduced.
The power of formal models.
Modelling blocking and overshadowing is all very well, but the
equation was specifically set up to model these phenomena so we
shouldn't be too impressed. What is impressive is the prediction of
some results which are so quite counter-intuitive we could not predict
them without a formal model - an equation. Here is an example in
which the model predicts the effects of pairing two previously learned
CSs on learning about a third new stimulus.
If on separate occasions (not as compound stimuli) two CSs of equal
salience have both been completely associated with a US then V=L for
both stimuli and dV on subsequent trials is zero for both. We now
present a third CS in conjunction with the original pair so we are now
presenting three CSs together whereas we've only presented two of them
singly in the past. The overall associative strength of the US is now
2L, a contribution of L from both of the original CSs. The equation
predicts that there will be a negative change in associative strength
on this trial proportional to the salience of the CSs:
dV = ab(L - 2L)
dV = -abL
This is probably not what we would predict intuitively yet it reflects
what happens - the third stimulus becomes a conditioned inhibitor of
the US - it provokes a CR of the opposite quality to that produced by
the other two CSs. We will discuss this sort of conditioned
inhibition later when we deal with what is actually being learned
during classical conditoning.
Predictability or surprise?
The Rescorla-Wagner model is, however, not perfect. If we return to
Rescorla's experiment we begin to see why. The explanation for the
final 'super-conditioning' part of Rescorla's experiment is rather
tricky. During phase 1 of the experiment the 'No US' are undergoing
the simplest type of learning there is - habituation. They experience
CS1 a lot of times and nothing special happens. The process of
learning that nothing in particular is associated with a CS is called
habituation. Rescoral argues that the 'No US' group learn in the
first phase of the experiment that CS1 is a predictor of 'no US' and
hence that, when it is followed by a US in phase 2 this US is even
more surprising than it would have been normally, hence it provokes
especially strong learning. His own model, however, predicts that
there should be no change in the associative strength associated with
the stimulus when there is no US. First, is is not very logical to
assign an amount of processing devoted to a non-event if that
non-event is unpredicted. Second, Rescorla's model revolves around
the surprisingness of specific USs - and 'no US' must be a different
US from 'US' so prior exposure to a godd predictor of 'no US' should
not effect the amount of processing devoted to a different US 'US'.
For these, and other reasons a series of more sophisticated models
have subsequently been developed in which the rate of learning is not
driven by the 'surprisingness' of the US (as in the L-V term of the
Rescorla-Wagner model) but by terms which represent the predictive
power of individual CSs independently (for example Mackintosh's 1975
model). I won't go into the formalism of these models, but will
quickly outline how they might deal with Rescorla's
'superconditioning' result. In this sort of model a CS which had been
experienced many times unpaired with a significant US would be
evaluated as having less than average predictive power. If, however,
the CS had been paired with a different US during phase 1 of
Rescorla's experiment then it should be evaluated as having predictive
power and hence still be associable with a different US during phase
2, reducing the 'superconditioning' to the other CS previously found.
Tony Dickinson has reported just such an effect in 1976.
That is probably enough for now on the processes that might underlie
classical conditioning. There is much more we could explore, but I
hope that today's examples have shown you that there is a lot more to
it than the simple co-occurence of a CS and US - in the models I've
discussed today the history of an animal's experience determines how
it processes information about CSs and USs available to it and how it
then associates them together. As these model become more
sophisticated it becomes clearer that discussions of surprisingness
and predictability can be quite naturally interpreted referring to the
way in which an animal's experiences modify its expectations of events
and the amount of attention it devoted to different stimuli in its
environment. This is a is very cognitive approach to animal learning
compared to the rather arid behaviourist explanations often proffered.
It is also a long way from Pavlov's early interpretation of
conditioning as a physiological rather than a psychological process.
Sources.
This lecture drew about equally on chapter 4 of Schwartz and chapters 2
and 4 of Tony Dickinson's 'Contemporary Animal Learning Theory'. I've
changed the names of most of the experimental groups and variables in
the equations in order to be consistent and to avoid the use of greek
symbols!