Reforming the Performance Tables - Sig+ for School Data

There was a time when I believed that a 101 page RAISE report was an essential and accurate summary of a school’s performance; that floor standards were vital; that children really did make a point of progress each term; that there really was such a thing as a 3b+; that the performance tables told us something useful; and that a comparison of the results of five disadvantaged children in a cohort against those of 475,000 non-disadvantaged children nationally was meaningful.

OK, I never believed the last one – I’m not that daft – but I believed the others. I believed these things because I was told they were real, and important, and insightful.

But I now realise that 101 pages of school performance data was probably 99 pages too long; that floor standards were arbitrary thresholds that caused schools and those monitoring them to focus on the wrong things; that there is no such thing as a point of progress and that 3b+ was invented solely to support that flimsy premise. I reached the conclusion that much of the data generated in schools was – due to its overwhelming focus on accountability – prone to distortion and quite possibly made up; and that national key performance measures – especially in primary schools where teacher assessment is prevalent – are seriously exposed to perverse incentives.

Here are a few real quotes that illustrate the point:

“We were told not to have too many pupils exceed the early learning goals.” (advice from LA advisor to headteacher)
“We don’t do level 3 at KS1 anymore.” (headteacher at an LA briefing)
“I was told we had a quota system: I could have no more than five level 3s and I had to have as many Level 1s to balance it out.” (KS1 teacher on the system in their previous school)

I could go on.

A game was being played in some schools, especially at KS1 where results formed the baseline of the progress measure: get it as low as possible without alerting suspicion. Goldilocks assessment: not too hot, not too cold, just right.

And yet these measures are so critical to the public (and Ofsted’s) perception of schools.

Which brings me back to the DfE performance tables. Here, everything a school does is boiled down to a number, a bright colour and simple descriptor to aid interpretation: well below average (red), below average (orange), average (yellow), above average (light green), well above average (dark green). These measures can influence the timing and outcome of an inspection; they can also influence school choice and house prices. The stakes are high.

But how many people understand this data? Who really knows what -1.7 means? How many people understand the complexities and loopholes of progress measures: the effect of outliers, nominal scores, teacher assessment, context, mobility, and that resource base of 20 pupil with EHCPs whose results are included in the data? Who understands how these performance indicators are calculated and how, sometimes, removing just one child from results or having one score an extra mark or two can make all the difference? Yes, the data may reveal an interesting deviation from national average, but this does not tell us anything about standards in that school.

Sadly, that is exactly what is implied and what is inferred; and too many people – parents included – base their judgements on this alluring yet oversimplified information. Are we really looking at poor progress or are we looking at a school with high results at KS1 or a school with a high proportion of pupils with SEND that did not sit KS2 tests? And what about those schools that are well above average? Yes, progress may be extraordinary but that may be due to context – EAL pupils with low start points often get high progress scores – or it may be a school that decided to have a quota system for its KS1 results. We have no idea what we are looking at.

If we are to have performance tables then it is time for an overhaul. Perhaps it is time to question the need for progress measures, especially in primary schools where those perverse incentives influence outcomes at KS1, and results in writing at KS2. Maybe we can only really publish results in the tested subjects: in reading, maths, and grammar, punctuation and spelling. But if we are to do that, those results need to be placed into context, which requires more meaningful narrative.

Those 101 page RAISE reports were packed full of numbers and yet readers were left to their own devices when it came to interpretation. It was then trimmed down to a relatively svelte 63 pages, but as far as guidance went, this was limited to a ‘G’ for Governors at the top of pretty much every page. In 2016, the Ofsted inspection dashboard came out – a bewildering kaleidoscope of colours and graphs, but at least it listed strengths and weaknesses. This was replaced by the IDSR in 2017, which weighed in at 22 pages long and took the confusing decision to merge strengths and weaknesses under one banner of ‘areas to investigate’. This remained in 2018 but now the report was trimmed down to 11 pages: more narrative, less data. And then we come to the 2019 version with its five pages of narrative on results (now termed ‘areas of interest’) and context, and just half a page of data (presented as quintiles, which I’m not hugely keen on), and pretty much nothing on groups. For most schools, the IDSR is greyed out because there is nothing of any significance.

And perhaps the DfE’s performance tables can take a lead from this approach: more narrative, less data. Start with context – tell parents something meaningful about the school, about its community, its SEND provision and its inclusivity. Establish that first and then move onto results. In primary schools this could be limited to proportions of pupils meeting expected standards in tests and presented using statements such as:

“56% of pupils met the expected standard in key stage 2 tests. This is below the national average but in line with results of similar schools nationally. 5 pupils with SEND did not take the tests. 2 pupils were discounted from results because they had recently arrived from overseas. 4 pupils missed expected standards by between 1 and 3 marks. There are 30 pupils in this year group. 1 pupil therefore accounts for 3.3% of the results.”

Data suppression rules, which restricts publication of data in the performance tables to groups of six or more pupils, may prevent presenting data in exactly that way, but it’s a start. In fact, those data suppression rules are interesting – data in the IDSR is ‘greyed out’ if the cohort or group comprises fewer than 11 pupils. Perhaps the performance tables should adopt that rule, too.

We need to stop presenting complex and flawed data in an oversimplified way that is so open to misinterpretation. Performance tables in their current guise do not provide parents with useful information; they give a distorted perspective, potentially causing parents to shun one school and clamour to get their children into another without having any idea of the story that underlies the seemingly simple numbers and pretty colours. School performance is complicated and messy and we need to stop pretending otherwise just for the sake of convenience. It is hugely risky to reduce a school to a single number, colour or descriptor on which people will inevitably make judgements.

Following a key speech by Amanda Spielman HMCI at the Bryanston Education Summit in 2018, Ofsted’s main report on school performance – the IDSR – has undergone a radical transformation from data heavy to prose-based. Whilst not perfect and still reliant on data of questionable validity, it is certainly easier to understand and less prone to misinterpretation.

If performance tables are going to be with us for the foreseeable future then they should follow that lead and aim to tell a better story about our schools.

More context, more narrative, less data.

For new posts, head over to Insight Inform