A much simpler way of awarding exam results in exceptional circumstances

Students at Newham Collegiate Sixth Form queue to receive their A-Level results in London, August 2020. Photo by Tolga Akmen / AFP via Getty Images.

In summer 2020, Ofqual, the body responsible for regulating the award of general qualifications in England, put in place arrangements to enable grades for GCSEs, A levels and BTECs to be awarded in the absence of the normal end-of-course exams being able to take place because of the COVID-19 pandemic. The outcomes of this process, which relied heavily on the infamous algorithm or statistical model, led to much confusion and distress as pupils received grades sometimes many levels below those they had anticipated. Consequently, the Department for Education was forced to revert to the ‘centre-assessed grades’ which had been submitted by schools and colleges as the base data to which the algorithm was applied, meaning that these predictions took a weight which they were never to bear, raising further questions about comparability and reliability.

As the pandemic has continued and the safe options for carrying out assessment in summer 2021 have narrowed, the question of how to award fair grades has again risen to the top of the agenda. The challenge is to design a process which is transparent and robust, addresses the challenges and criticisms that emerged in 2020, but is also practical to implement. In the longer term, as we reflect on how the pandemic has made us change how we do many of the things we previously took for granted, we may also be able to use the insights of designing a new approach to provide explanations of pervasive misunderstandings about how the qualification system works, in particular the concept of ‘grade inflation’.

Lessons from the past

Three major difficulties seem to have been encountered during the summer 2020 session:

1) Confusion and misunderstanding about the role teachers’ predictions and ‘centre-assessed grades’ would play in the final determination of the results for individual pupils.

2) Overly optimistic assumptions about the ability of a statistical model (the so-called ‘algorithm’) to provide credible results for individual pupils.

3) Organisational mind-sets about how to proceed being largely premised on resolving potential anomalies rather late in the day.

Teachers as examiners

Ofqual’s own research provides a good account of the strengths and limitations of teachers’ contributions to the business of marking and grade setting. Reviewing the evidence and commenting on how ‘accurate’ teachers’ estimates can be, it distinguishes two dimensions:

- absolute accuracy: the ability of a teacher to estimate the actual grades that individual students will achieve; and

- relative accuracy: the ability of a teacher to estimate the rank order of students by their grades, ie their levels of achievement relative to each other

Teachers turn out to be rather good at making judgements which correspond with examiners’ rank orders of pupils’ performance — the correlations between their judgements and those of the examiners are impressively high.

By comparison, they are less good at predicting the actual grades pupils will be awarded — the correlations are a good deal lower. This is perhaps less surprising as the process by which examiners determine the cut-offs between grade boundaries for different grades only emerges rather late in the day; indeed, prior to final standardisation meetings, most examiners themselves may be a bit uncertain how exactly marks will be turned into grades.

There is an additional ambiguity about the ways in which predicted grades are developed. Predicted grades serve multiple purposes. They are used, amongst other things, to provide information on which higher education providers and future employers may make judgements about students’ potential as well as to motivate pupils by ‘stretching’ them. In both cases an element of optimism is built into the process. In the circumstances it is not particularly surprising that teachers’ predictions, generated under these conditions, do not correlate as strongly with the grades examiners eventually award.

On the other hand, there is growing evidence that teachers have become increasingly sophisticated in judging their students’ performance, not least because they have had an array of information about students’ prior attainment available to them. To the best of my knowledge, however, no research study has asked teachers to predict what grades their students will actually be given as opposed to those they feel they deserve. In an ideal world the two may be identical.

In short, in exceptional circumstances such as those faced in summer 2020, the most valid form of assessment is likely to be one which gives teachers a major role in determining the eventual outcomes but subjects their judgements to various challenges and constraints.

Benchmarking the standard

There has rightly been concern about ‘grade inflation’ diminishing the credibility of exam qualifications. The larger the number of top grades, the more sceptical the wider audience may become about their value. The challenge is to develop benchmarks that are fair to a particular cohort of pupils but do not stretch the bounds of credibility. As Ofqual has identified, the school’s own track record in recent years is probably the most obvious reference point.

There are problems, however, with this approach. Understandably, many schools will want to claim that they are on rising improvement trajectories and that their results are constantly getting better. Ofqual investigated schools’ trajectories and concluded that in most cases results tended to be stable from one year to the next; only a very small minority (less than one per cent) seemed to be on modestly rising trajectories when compared with previous years’ results.

These conclusions are supported by earlier independent research. Within fairly narrow bounds, schools’ performances tend to bounce around a bit from year to year. In the majority of cases it is almost impossible to extrapolate next year’s results from knowledge of this year’s within any degree of accuracy. They may go up a bit, they may go down a bit. Some schools, however, do seem to manage to improve their performances year on year, taking into account the changing nature of their intakes. To talk of a trend, however, requires a minimum of three years of upward movement. Only a very small minority of schools manage to achieve this. Most importantly, perhaps, it is rare for a school to sustain improvement into a fourth or subsequent year.

In short, these patterns suggest that it is not unreasonable to use the last three years of a school’s performance as a reference point. This is in fact what the Ofqual model did in 2020.

But this averaging out may not count for much in the eyes of a pupil who believes that they have been disadvantaged by a ‘bad year’ amongst previous cohorts in their school. To deal with this concern, in exceptional circumstances it is important to be seen to be generous in determining the appropriate benchmarks, which can be done by allowing schools themselves to choose their benchmark.

Schools will doubtless choose their ‘best year’ in the last three to benchmark their performance. In many cases, choices about which is ‘the best’ will not be self-evident but they can be reassured that the worst that will happen to them is that they plateau or mark time. To smooth out fluctuations in subjects with small numbers of students, it is probably necessary to combine results of any two out of the three years. The fact that that each school will have made its own choice will considerably increase the likelihood of the benchmark being perceived as legitimate by students and parents.

Determining final results

Choosing the benchmark against which to reference this performance is the first step towards determining final grades. While the process outlined here repeats more or less closely some steps which also took place in 2020 (though not necessarily in the same order), as a whole has the merit of being transparent to pupils and parents, by being simpler, which also reduces the burden on teachers and exam boards.

Importantly, the basis for decision making is clear at each stage, with decisions largely taken at a local, school level, rather than centrally, which will help to raise levels of trust in the outcome.

1) Establishing initial benchmarks: Exam boards would ask schools to nominate their benchmarks and to submit them for verification. There is likely to be a small increase in results because everyone will be choosing their ‘best’ year. However, given what we know about year to year variations, this is likely to be comparatively small.

2) The national picture: The exam boards or Ofqual would construct the overall national picture from the schools’ submissions. Early on in the process, then, Ministers could be informed about likely outcomes and could determine whether they are likely to be satisfied by the total picture or whether they wish to impose any constraints in the interests of year-on-year fairness and comparability across cohorts. There might, for example, be some evidence from earlier performance measures that the cohort in question had markedly lower prior attainments than in previous years; however, minor differences are probably best left to stand without adjustment, in the interests of fairness in exceptional circumstances.

3) Profile for each individual school: Based on the schools’ benchmarks, the boards confirm to schools the expected distribution of grades, factoring in the size of this year’s cohort. Schools will already have a reasonable idea of what to expect from knowing what they submitted for validation.

4) Centre-moderated grading: Each exam centre (typically a single school or college) is asked to set up an ‘Assessment Committee’ to oversee the setting of grades. Individual subject departments would be asked to make recommendations about the grades to be awarded to individual pupils based on the expected distribution of grades. These judgements would then be debated and signed off by the Assessment Committee. This is where the teachers’ knowledge of individual learners, and their performance relative to each other, would be significantly important.

5) Decisions at the grade boundaries: Assessment Committees will be faced with some dilemmas. Just as in normal times where examiners have to make difficult decisions about grade boundaries, so too teachers may find that they have pupils on the borderline of particular grade categories and will need to draw on further evidence. In the interests of facilitating fairness and legitimacy, Assessment Committees could be allocated a small number of ‘wild cards’ (the exact number to be dependent on the number of entrants) which can be played to increase the number of grades available at a particular level in a subject above that proposed by their exam board. These ‘wild cards’ are likely to be mostly used at the borderlines of the higher grades.

6) Confirmation of grades: Assessment Committees would be asked to provide a reasoned account of the processes they had used, the evidence available to them and any dilemmas they had experienced and how they had used the ‘wild cards’, alongside submitting proposed grades to the exam boards for signing off. The boards would then issue their final awards.

7) Appeal processes: Pupils who were dissatisfied with their grades would have two (limited) avenues to pursue their concerns. First, they could request the Assessment Committee consider previously overlooked evidence. Second, they could be given the option of sitting more customary exams the start of the next term. Given the transparency and evidence-based nature of the process followed up to this point, the number of dissatisfied pupils should be no more than the small numbers who appeal in normal years.

The overall result should be a cohort of pupils with grades that accurately represent their potential and can be used to support their progression, whether to the next level of study or into employment, but which also align with the historical patterns of achievement, within known patterns of year to year variation, thus countering the challenges which any perception of ‘grade inflation’ creates.

John Gray is Emeritus Professor of Education at Cambridge University and former Vice-Principal of Homerton College. A Fellow of the British Academy and member of its Education Section, he chaired the Standards Committee at Cambridge Assessment for more than a decade and was a Special Adviser to the House of Commons Select Committee’s investigation into School Accountability in 2009–10. This post has been produced as part of the British Academy’s Shape the Future initiative, which explores how to create a positive post-pandemic future for people, the economy and the environment.

We are the UK’s national academy for the humanities and social sciences. We mobilise these disciplines to understand the world and shape a brighter future.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store