Additional Guidance: Marking for Common Awards
The guidance below was circulated to all TEIs in June 2016. The guidance is not a change of policy, but rather additional information to help TEIs interpret and apply University policy in the context of the Common Awards. The guidance will be discussed at the Common Awards staff conference in July 2016. It will be revised in light of feedback received from TEIs. In the meantime, it is hoped that the draft guidance will be of use to staff in TEIs.
1. Marking is a matter of qualitative academic judgment, guided by formal criteria.
Qualitative academic judgments about student work cannot be reduced to formulae, or made a matter of ‘ticking boxes’. We do, nevertheless, provide detailed guidance to help makers translate their qualitative judgments into numerical marks, and to express those judgments in consistent language.
2. Our detailed assessment criteria are intended to be a helpful guide, not a straitjacket.
The University provides, in its core regulations, ‘Generic Assessment Criteria’ for degree-level work (§92). These provided a benchmark for the creation of more detailed criteria specific to the Common Awards. Those more detailed criteria were produced by the Common Awards Finished Product Group and have been revised by the Continuing Implementation Group. In other words, they were generated by representatives of the TEIs working together with Ministry Division. So they are not a regulation imposed by the University, but rather an attempt by the wider Common Awards community to provide helpful guidance that will support consistent marking practice across the TEIs. Feedback from TEIs on how helpful they are proving in practice will help us to refine them further.
3. Using the detailed assessment criteria is more art than science.
These detailed marking criteria are not designed to be used mechanically. Looking at the table of criteria in relation to a given piece of work, a marker may find that some rows are more applicable than others, that the implied classification of that piece of work is different in different rows, and that several different descriptions in some rows could plausibly be applied. No formula for combining all these factors into a single mark can substitute for good academic judgment, in the light of the learning outcomes for the module, the nature of the task being assessed, the kind of guidance that was given to students, the materials available to them, and so on. Nevertheless, the tables can assist a marker in calibrating her judgments against those of other markers, and can help her find language to express her judgments clearly to the student.
4. Our marking criteria provide a way of translating qualitative judgments into numerical marks.
The Common Awards marking criteria are qualitative, not quantitative. The vast majority of our marking reflects that. We do not decide that one essay is exactly 2.3 times as good as another essay, nor that a student has made 24% fewer errors than another in a given essay.
We use a numerical scale that is widely used in further and higher education, butthe numbers themselves are purely conventional. We choose, for instance, to assign the boundary between upper second quality work and first class quality work the number 70. We could have assigned it the number 3, the number 270, or the number 3.8x1067. The number 70 has no direct meaning: it does not mean that 70% of the learning outcomes were met; it does not mean that first class work is at least 70/40 or one and three quarter times as good as a bare pass.
5. The translation into numerical marks is intended to model our intuitive judgments about how qualitative judgments combine.
We have, however, picked these otherwise arbitrary numbers so that students who gain fairly straightforward profiles of marks over the course of their studies will normally end up with the overall classification that we intuitively deem they should. Here is a student with one high 2.2 mark, two low 2.1 marks, and one high 2.1 mark; turn those into numbers, take the average as required by our degree classification rules, and we’re looking at a 2.1 overall, which ‘seems about right’. ‘Seeming about right’ in the kinds of cases where we find it fairly easy to agree in our judgments is the only real test of whether the otherwise arbitrary mathematical rules that we have put in place for determining classifications are appropriate ones.
6. The system is also designed to extend to cases that are more difficult, where our intuition gives out.
In the case of a student whose marks are all over the place, for instance, we may well not have any agreed sense of what ‘seems about right’. So we trust the numbers and rules that have worked in more straightforward cases.
Using the Full Range of Marks
7. We do not ‘mark student work out of 100’, and the regular call to ‘use the full range of marks’ does not mean ‘marks should go all the way up to 100’.
There is no useful sense in which our marks are ‘percentages’. A piece of work that gets a mark of 60 has not got 60 things out of 100 right, or achieved six tenths of perfect clarity. We could decide that the highest possible mark was 76 or 80 or 92, and the mere fact that there are numbers between that highest mark and the number 100 would be, in itself, a completely uninteresting fact – and the common call to ‘use the full range of marks’ is meaningless if this is all that it rests on.
8. There are, however, good reasons for ensuring that we don’t confine our first-class marks to the low 70s.
It is appropriate to ask whether the range of marks that we do use for good first class work models our intuitive judgments well. Our rules for award classifications happen to give a prominent role to numerical averages, and this does mean that the width of the range of first class marks matters. If, for instance, all our first class marks are clustered into the 70-75 band while our second class marks remain spread over the 20 marks between 50 and 70, it becomes very easy for a second class mark to pull a student’s average down below 70, and very hard for a first class mark to pull it up. In such a case, a student might well get a whole range of the very highest first class marks we are prepared to give, and yet be pulled down by a few marks that are low-ish 2.1. Does that ‘seem about right’? If not – and successive Boards of Examiners in multiple universities have tended to say that it does not – then we need to use a wider range of first class marks in order to make our quantitative model of our qualitative judgments work better.
9. Our marking scheme includes a band from 86–100 to recognise extraordinary work.
In line with many other universities, our assessment criteria contain guidance on marks all the way up to the 86–100 range. That has nothing to do with the false idea that we are ‘marking out of 100’, and therefore need to make sense of the numbers all the way up to the ‘top’. It is instead a way of allowing and encouraging us to recognise truly extraordinary work on the occasions when we meet it, and to give it a mark that will make a serious difference to the student’s overall grade. Marks in this range will be rare. The detailed Level 6 criteria for this band for ‘Essays and Other Written Assignment’, for instance, indicates that all such work will typically demonstrate ‘complete mastery’ of the question set, ‘extremely powerful, original argument’, ‘outstanding analysis’ and more.
And marks in this range will get rarer the higher you go. Work marked at 86 will be work where we judge that, on balance, yes, these amazing things can actually be said about this essay, though only just. Work much higher up the band will be work that clamours to be acknowledged in these terms, and could still be described in this way even if it were significantly worse. As a result, marks in the 90s tend to be very rare indeed, and it would be no surprise to attend several exam boards and not see any examples.
10. None of our marking is quantitative, so quantitative scores normally need to undergo conversion to become marks.
Some assignments, like multiple-choice tests, might produce quantitative scores. Those scores are not the marks for that assignment. In principle, they will always need to go through some process of conversion – where conversion involves makingqualitative judgments about what is meant by different levels of achievement in a quantitatively scored test, and then translating those qualitative judgments into numerical marks in the normal way.
For instance, imagine that there is a multiple-choice test with 100 questions, which straightforwardly yields a score between 0 and 100. There is absolutely no reason, in principle, why the mark that the student should get for that assignment should be identical to this numerical score, because (for instance) there is no reason to think that an ability to get 40 questions out of a 100 right on this test must match the qualitative criteria we have for gaining a pass mark, and no reason to think that an ability to get 70 right matches the qualitative criteria we have for first class work. It might be, rather, that on this simple multiple-choice questionnaire, we judge that a student needs to get a score of 80 out of 100 to pass (and so to deserve a mark of 40), but that if he or she gets the maximum score of 100 out of 100, he or she is achieving really excellent first class quality (and so deserves a mark of 85). Some formula will therefore be needed to convert the numerical score into an appropriate mark, in the light of these judgments.
This might well mean that marks from such a test are effectively capped. In the example just given, the test cannot yield marks above 85. That is not an awkward mathematical problem: it is a recognition that some forms of assignment simply don’t allow students to demonstrate the extraordinary levels of penetrating insight that we recognise with the very highest marks.
It is sometimes possible to design a test that yields quantitative scores in such a way that (i) getting a score of 40 does equate qualitatively to a pass; (ii) getting a score of 70 or more really does smell qualitatively of ‘first class’ work in a way that lower scores do not; (iii) it is very difficult indeed to get a score of more than 85; (iv) it is well-nigh impossible (or perhaps actually impossible) to get a score of more than 90. These scores might be actual percentages (i.e., the number might represent the proportion of items on the test that the student has got right), or (more plausibly) they might be yielded by some more complex numerical scoring system. But if they match the qualitative criteria in the way suggested, the mark awarded could be the same numerical value as the score.
If quantitative tests are used, it is good to design them in the way just described, so that conversion can be avoided – because score conversion can cause confusion and upset, if the reasons for it are not well understood by all involved. Where such test design is not possible, however, numerical scores do need to be converted.