Bob Kentridge 1995

Comparative Psychology: Lecture 4.

Predictability or surprise?

At the end of last lecture we'd arrived at the conclusion that the predictability of CS-US as a pair of events rather than their temporal contiguity or even their frequency of co-occurrence was a crucial factor in determining the strength of conditioning that would occur between them. We take this approach, dissecting the details of the processes underlying classical conditioning, much further. We will try and tease out the role of the predictability or surprisingness of stimuli in classical conditioning. In order to do so we need to look at two other classical conditioning phenomena first.

Overshadowing.

In trying to explain the classical conditioning process we will be looking at evidence from experiments where two (or even more) CSs are presented with a single US. By varying aspects of the CSs and comparing the effectiveness of their conditioning to the single US we hope to discover what properties of the CS (and, eventually the US), as opposed to the CS-US pair, determine the effectiveness of conditioning. Probably the simplest experiment comparing two CSs is a demonstration of the phenomenon of overshadowing. In this experiment we take two CSs, CS1 and CS2, which, during training, are always presented together. After training we measure the strength of conditioning to the stimuli CS1 and CS2 presented individually. We typically find that the strength of conditioning to each CS depends on their relative intensity. If CS1 is a dim light and CS2 a bright light then, after conditioning to the CS1-CS2 combination, the CR to the bright light is very strong while the dim light alone produces little or no reaction. We refer to the general perceived strength of stimuli as their salience. Although it might be related to the physically measurable intensity of stimuli, salience is refers to the intensity of the subjective experience of stimuli, not of the objective intensity of the stimuli themselves. Some more examples make this clear. The two CSs need not even be in the same modality, a loud tone might overshadow a dim light for example. We cannot equate the physical intensities of sound and light, yet it is reasonable to characterise their relative subjective salience. In this situation then it doesn't make sense to say that the strength of conditioning between a CS and a US depends on the intensity of the CS, if we are to derive general principles of classical conditioning we must say that the strength of CS-US learning depends on the salience of the CS. Salience, as subjective experience, varies between individuals, and, more importantly, between species. We, who have colour vision, might find a red light more salient that a green light of equal intensity, while a rat, which only has monochromatic vision will find them equally salient. Salience is depends on some combination of the physical characteristics of stimuli and of the sensory systems of the perceiver.

Blocking.

In addition to variations in the subjective characteristics we can also investigate variations in the history of these experiences. Suppose once again that we use two stimuli CS1 and CS2 which will be paired with a single US. Rather than presenting the two CSs together throughout the training of an animal we only use CS1 in the first half of training and then use the CS1-CS2 combination together, just as we would in an overshadowing experiment for the second half of training. We could, for example, give an animal 50 trials in which they experience a CS1-US combination followed by 50 in which they experienced as CS1-CS2-US combination. The result, in general, is that when subsequently tested individually the animal will show string conditioning to CS1 and little or no conditioning to CS2. The effect where the prior pairing of one stimulus with a US stops the US being associated with other subsequently presented stimuli is called blocking.

Procedure for a simple blocking experiment:

Group Name	1st 50 trials	2nd 50 trials	Test

Blocking	CS1-US		CS1-CS2 and US	CS2 alone
Control		nothing		CS1-CS2 and US	CS2 alone
To control for the possible confounding effects of overshadowing we could also run experiments in which the roles of CS1 and CS2 were reversed so that CS2 was experienced paired by itself with the US for 50 trials before the 50 trials of training with the CS1-CS2 compound stimulus. In this experiment we would normally expect strong conditioning to CS2 and little conditioning to CS1. These results are quite robust, if we ran another group to examine overshadowing with 50 compound CS1-CS2 trials and no earlier experience of one or other CS paired alone with the US we would normally find that the effects in the other groups of experiencing one or other stimulus along with the US outweighs any overshadowing effect we might find. In fact, to produce and experiment in which there is no ambiguity in the results we really need a control group which experiences the same stimuli as the experimental 'blocking' group, but in which CS1 is not predictive of the US in the first phase.

Blocking and predictability experiment:

Group Name	Phase 1				Phase 2

Correlated	Correlated CS1 and US		CS1-CS2 and US
Uncorrelated	Uncorrelated CS1 and US		CS1-CS2 and US
Overshadowing	US alone			CS1-CS2 and US
No US		CS1 alone			CS1-CS2 and US
We then go on to test the strength of association between CS2 and the US using a suppression ratio procedure. Rescorla carried out this experiment like in 1971 (although his descriptions of the groups and their names differ a little from mine), he found the results which we expected - the strength of conditioning to CS2 acquired during phase 2 was much weaker in the group which had received prior correlated parings of CS1 and the US than in the groups which had received no prior pairings (the overshadowing group) or which had received random presentations of CS1 and the US. Finally, the group which had received CS1 alone with no US in phase 1 showed even stronger conditioning to CS2 than the overshadowing or uncorrelated controls. We will return to this last group later.
You'd see an image of Rescorla's 1971 results here
if you were using a graphical web browser like Mosaic or Netscape.

Rescorla and Wagner's 1972 model of classical conditioning.

How can we explain these results? Both Rescorla and Leon Kamin, who originally discovered and named blocking, settled on the explanation that associations are only learned when a surprising event accompanies a CS. According to this theory in a normal simple conditioning experiment the US is surprising the first few times it is experienced so it is associated with salient stimuli which immediately precede it. In a blocking experiment once the association between the CS (CS1) presented in the first phase of the procedure and the US has been made the US is no longer surprising (since it is predicted by CS1). In the second phase, where both CS1 and CS2 are experienced, as the US is no longer surprising it does not induce any further learning and so no association is made between the US and CS2. This explanation was presented by Rescorla and Wagner in 1972 as a formal model of conditioning which expresses the capacity a CS has to become associated with a US at any given time. This associative strength of the US to the CS is referred to by the letter V and the change in this strength which occurs on each trial of conditioning is called dV. Our informal explanation boils down to the notion that the more a CS is associated with a US the less additional association the US can induce. We can express our informal explanation of the role of US surprise and of CS (and US) salience in the process of conditioning as follows:

dV = ab(L - V)

where a is the salience of the US, b is the salience of the CS and L is the amount of processing (attention?) given to a completely unpredicted US. Let us go through the implications of this equation in detail. When the US is first encountered the CS has no association to it so V is zero. On the first trial the CS gains a strength of abL in its association with the US which is proportional to the saliences of the CS and the US and to the initial amount of processing given to the US. As we start trial two the associative strength is V is abL so the change in strength that occurs with the second pairing of the CS and US is ab(L - abL). It is smaller than the amount learned on the first trial and this reduction in amount that is learned reflects the fact that the CS now has some association with the US, so the US is less surprising. As more trials ensue the equation predicts a gradually decreasing rate of learning which reaches an asymptote at L. Unfortunately this isn't exactly what is seen when the development CS-US associations is measured over time. Instead we see a slower start to learning, followed by a lot of learnig which tails off quite quickly. This appears to rather undermine the trouble of formalising our understanding of classical conditioning into an equation - after all, what is the point if the equation can't predict thing? Rescorla has argued that the equation is consisten with observed behaviour if one assumes that very small changes in associative strength are undetectable and that there is a limit to the amount of effect that very large changes can have on behaviour. It is, not, however, this kind of prediction which rescues the Rescorla-Wagner equation as a truly worthwhile bit of formalism - we will see something I find really impressive later.
You'd see an image of ideal and real CS-US aquisition curves here
if you were using a graphical web browser like Mosaic or Netscape.
The equation can also be applied to a number of CSs each of which contributes to an overall associative strength V of the US in the right hand side of the equation. It is reasonably clear that the presence of the CS salience term b in the equation lets it account for overshadowing. The meaning of the equation is clearest if we think of the specific dVs on the left hand side as referring to the increments in association between specific CSs while V on the right hand side is referring to the predictability of the US and so is the sum of all the different CS-US associations. If we denote the conditioning strength accrued to CS1 by dV1 and that to CS2 by dV2 then our equations are

dV1 = ab1(L - V)
dV2 = ab2(L - V)

and both dV1 and dV2 accrue to V on each trial. The amount of association directed to each CS is simply proportional to their salience. The equation also models blocking well. During the initial phase of a blocking experiment the associative strength of the US is increased so later, when a second CS is presented the amount of associative strength it can gain has been reduced.

The power of formal models.

Modelling blocking and overshadowing is all very well, but the equation was specifically set up to model these phenomena so we shouldn't be too impressed. What is impressive is the prediction of some results which are so quite counter-intuitive we could not predict them without a formal model - an equation. Here is an example in which the model predicts the effects of pairing two previously learned CSs on learning about a third new stimulus. If on separate occasions (not as compound stimuli) two CSs of equal salience have both been completely associated with a US then V=L for both stimuli and dV on subsequent trials is zero for both. We now present a third CS in conjunction with the original pair so we are now presenting three CSs together whereas we've only presented two of them singly in the past. The overall associative strength of the US is now 2L, a contribution of L from both of the original CSs. The equation predicts that there will be a negative change in associative strength on this trial proportional to the salience of the CSs:

dV = ab(L - 2L)
dV = -abL

This is probably not what we would predict intuitively yet it reflects what happens - the third stimulus becomes a conditioned inhibitor of the US - it provokes a CR of the opposite quality to that produced by the other two CSs. We will discuss this sort of conditioned inhibition later when we deal with what is actually being learned during classical conditoning.

Predictability or surprise?

The Rescorla-Wagner model is, however, not perfect. If we return to Rescorla's experiment we begin to see why. The explanation for the final 'super-conditioning' part of Rescorla's experiment is rather tricky. During phase 1 of the experiment the 'No US' are undergoing the simplest type of learning there is - habituation. They experience CS1 a lot of times and nothing special happens. The process of learning that nothing in particular is associated with a CS is called habituation. Rescoral argues that the 'No US' group learn in the first phase of the experiment that CS1 is a predictor of 'no US' and hence that, when it is followed by a US in phase 2 this US is even more surprising than it would have been normally, hence it provokes especially strong learning. His own model, however, predicts that there should be no change in the associative strength associated with the stimulus when there is no US. First, is is not very logical to assign an amount of processing devoted to a non-event if that non-event is unpredicted. Second, Rescorla's model revolves around the surprisingness of specific USs - and 'no US' must be a different US from 'US' so prior exposure to a godd predictor of 'no US' should not effect the amount of processing devoted to a different US 'US'. For these, and other reasons a series of more sophisticated models have subsequently been developed in which the rate of learning is not driven by the 'surprisingness' of the US (as in the L-V term of the Rescorla-Wagner model) but by terms which represent the predictive power of individual CSs independently (for example Mackintosh's 1975 model). I won't go into the formalism of these models, but will quickly outline how they might deal with Rescorla's 'superconditioning' result. In this sort of model a CS which had been experienced many times unpaired with a significant US would be evaluated as having less than average predictive power. If, however, the CS had been paired with a different US during phase 1 of Rescorla's experiment then it should be evaluated as having predictive power and hence still be associable with a different US during phase 2, reducing the 'superconditioning' to the other CS previously found. Tony Dickinson has reported just such an effect in 1976. That is probably enough for now on the processes that might underlie classical conditioning. There is much more we could explore, but I hope that today's examples have shown you that there is a lot more to it than the simple co-occurence of a CS and US - in the models I've discussed today the history of an animal's experience determines how it processes information about CSs and USs available to it and how it then associates them together. As these model become more sophisticated it becomes clearer that discussions of surprisingness and predictability can be quite naturally interpreted referring to the way in which an animal's experiences modify its expectations of events and the amount of attention it devoted to different stimuli in its environment. This is a is very cognitive approach to animal learning compared to the rather arid behaviourist explanations often proffered. It is also a long way from Pavlov's early interpretation of conditioning as a physiological rather than a psychological process.

Sources.

This lecture drew about equally on chapter 4 of Schwartz and chapters 2 and 4 of Tony Dickinson's 'Contemporary Animal Learning Theory'. I've changed the names of most of the experimental groups and variables in the equations in order to be consistent and to avoid the use of greek symbols!