When Is A Trend Not A Trend?

Most of us think we can spot a trend in school data when we see one but increasingly I’m not so sure. The problem is not that trends don’t exist; some schools will genuinely be improving and others declining. The problem is not even a failure to recognise that some trends might be completely outwith the control of the school. The problem is that what looks like a trend, might be no such thing.

Actually, I think the deeper problem is that most people tend to accept that schools have blips in their data for reasons that are almost completely random but that looking at data over several years gets past this problem. Reaction to the Cramlington Learning Village Ofsted report (Outstanding to Special Measures) is a good example.

Some of the arguments against this kind of interpretation have been about the validity of statistical methods – @Jack_Marwood has blogged about this extensively and there has been a little frisson of excitement about the way that the clustering effect has been used to overule the apparent statistical significance in the recent RCT on reception baseline testing, with speculation about how this might be applied to RAISEonline etc.

But I want to take a different tack. Have a look at this graph.

schooldatagraph

Over ten years, the red school data falls pretty steadily from the national average of 56% 5A*-CEM – what do you think Ofsted would make of this? And the green? Well it’s not as dramatic as the improvement at Huntington School but as long as they survived the first four years I think most headteachers would give their right arm for this data.

The thing is, about once a year I give myself a little VBA project. I am totally crap at writing code but I enjoy the challenge (and the immediate, if often entirely unhelpful feedback!) This year, the red and green graph is the outcome. I have taken a spreadsheet with 26 identical ‘schools’ with a 56% A*-CEM score and applied an algorithm that randomly allocates anything up to a 4% variation each year, for ten years. The spreadsheet then graphs the best and worst performing ‘school’. If you want to have a go for yourself then here is the Excel spreadsheet

wholespreadsheet

You’ll need to enable macros when you open it, then click the RESET button, and then click the ADD 1 YEAR… button. As a bit of fun I’ve added some of the things that are sometimes cited as key reasons for a school’s success (or not), but I promise you the data is completely random.

Best wishes

Advertisements

14 thoughts on “When Is A Trend Not A Trend?

  1. I’d add a few things to consider:

    Whilst I realise that your Excel model is purely to give pause for thought, it’s worth mentioning that it the data which it generates isn’t random, as it assumes a dependent relationship between the GSCE 5A*-CEM measures between years. The cohorts are independent of each other, and therefore it would be better to generate annual percentages from a model of the distribution of annual 5A*-CEM percentages for each school, rather than a variation on the previous year.

    The cohort independence is the point I was alluding to to in the Tweet you referenced, which is lost on far too many people. As you say, ‘The problem is that what looks like a trend, might be no such thing’ because, as I argue, the random distribution of any school’s actual year-on-year numbers (such as Cramlington Learning Village) is primarily due to the random variation in the pupil effect in each cohort, not any school or teacher effect.

    Oh, and graphs of percentages should include zero, otherwise they are misleading. I explained this here: http://icingonthecakeblog.weebly.com/blog/why-it-is-important-to-present-percentages-properly

    Yours in the pursuit of better understanding of data,

    Jack Marwood

  2. Aha! – I’ve finally grasped what you mean about the cohort independence. I’ve got it now (I think). The performance of each cohort is dependent on both external and, to a lesser extent, internal factors, so tends to be similar from year to year, because these tend not to change. BUT each cohort is a random sample from the pool of children across multiple years within the local area. I was having trouble with the cohorts being ‘independent’ (of each other) when they must be ‘dependent’ (on other factors).

    So, whilst you are right that my spreadsheet was only meant to illustrate that something that lots of people would see as a trend could be generated by randomly adding an annual change of up to 4% to the stats across fewer than 30 schools, I could do the same sort of thing but with a better model.

    I sometimes (often) get defeated by my very limited knowledge of VBA but I’m thinking I could populate an array with maybe 2000 pupils’ results distributed realistically (to represent ten years’ worth of results) and then the algorithm could pull 200 at random to fill the table.

    I’m not sure how to generate 1000 realistically distributed results, though. Any thoughts? Other than by feel? Perhaps it doesn’t matter too much – if the annual variation looks reasonable then it will perform the same service as my current spreadsheet and produce trends by chance.

    Another tack would be to do what I think you have suggested, which is to take a model of the distribution of annual 5A*-CEM percentages as my starting point but if someone believed that the differences within this group were to do with school or teacher effectiveness rather than random variation then any attempt to use them to support the random variation argument would be doomed as a circular argument.

    Fascinating!

  3. Oh, and I read that post about graph scales (I think that’s how I discovered your blog in the first place) but the whole thing is about seeing signal in the noise so I thought all cheats were fair game.

  4. I’m not actually sure what the point is here. Yes, random changes will (in a small number of cases) result in what look like trends. But what is at issue is whether the changes are random. If they are not then this point becomes irrelevant. If they are, then it takes more than this to show it. There are plenty of statistical tests that can be used to show that either results, or changes in results, are independent of either previous results or previous changes in results.

      • I’m saying we can test for various types of randomness. I suggest doing this for all schools, as I doubt we can do it for a specific school. Pointless looking at random simulations and saying we could mistake it for a pattern. We know we are bad at recognising randomness.

  5. Thanks, TB/OA. I see your point 😉
    I think we can argue from first principles on this. If each school cohort is independent of any other, and if the main variation in GCSE results is at individual pupil level, then the year on year changes in 5A*-CEM – at school level at least – must be largely random. I’d argue this, others may not.

    Looking at the 2011-13 5A*-CEM results at http://www.education.gov.uk/cgi-bin/schools/performance/school.pl?urn=137457, my assumption seems to be reasonable (66%, 61%, 57%). A different set of assumptions (and misunderstanding of randomness) might lead someone to assume that this meant that ‘standards were slipping’, whereas it’s entirely reasonable (by my reasoning) that the students in each cohort were simply different to each other. With this limited data, one could *suggest* that cohorts might have a mean % of 61.3% with a standard deviation of 4.5 (it probably doesn’t).

    At system level, the nature of grades means that year on year changes are dependent on the way grades are awarded, and therefore are (by definition) less likely to be randomly distributed. 2011-13 5A*-CEM show this (59%, 59.4%, 59.2%) in that these numbers are effectively the same.

    2014 was clearly quite different (53.4%) both nationally and at CLV (53%, almost 2 SD from the hypothetical mean of 61.3% as calculated above), for reasons which would need to be explained (and which I’d hope anyone who reads this would find trivial).

    • Hi Andrew. The only point was that “we know we are bad at recognising randomness” isn’t my experience of people in education – or life more generally – and I think some of the tweets about the CLV data support my point, although it was just a coincidence that I finally got my VBA to work the week that Ofsted reported on that school.
      I think a hefty proportion of people would look at the graph above and conclude that there was something terrific happening in the one school and something very badly wrong in the other but I’ve generated those graphs by just assuming that, each year, there are things outside a school’s control which can have a random effect on 5A*-CEM that is normally distributed with a standard deviation of 2.5%. It’s meant to be illustrative, nothing more.
      You’re quite right that the issue, in any particular case, is whether or not the changes are random. I would suggest that a school, or department, needs to look long and hard at what they are doing if there appears to be a trend in their data, but that everyone needs to bear in mind the possibility that a downwards trend might not represent a decline in effectiveness, and equally an upwards trend might just be luck too.
      I wasn’t making any statement about whether the apparent trend at CLV was random or not, if that’s what you were thinking. How could I possibly know that from 350 miles away?

  6. I compliment you on your spreadsheet. I looked at the macros but couldn’t see how the random values were generated. Anyway I believe the only way some non mathematicians in education will stop trying to measure things is if they are tricked into accepting some data, and then later told ‘actually this was made up’. Until then (and perhaps some personal embarrassment) they will continue to believe everything they are told. Your spreadsheet idea may prove a useful tool. I do not have the statistical expertise to comment further so I will leave it to the others.

  7. PS my present personal markbook is on a spreadsheet. It goes overboard on stats (for the sake of it and because management seem to like data, so data is what they can have if they really want it all). My weekly scores are inputted as percentages, but these are then converted and normalised. The normalised marks are traffic lighted using the auto green to red spectrum feature. The mean normalised score for each student is also displayed. Although I set it up in a sarcastic mood, it does actually help me certainly to decide who is at the top and bottom of the class. It also helps me determine whether the work I marked was pitched right and ‘differentiated’ ie is the mean over 50% (in order to encourage)/is the s.d. larger enough to genuinely spread them out beyond lucky/unlucky answers.

  8. Pingback: When Is a Trend, Not a Trend (2nd attempt) | docendo discimus

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s