Buffy took seven complete series before the First Evil was finally defeated when Spike’s amulet channeled the power of the Sun into the Hellmouth and Sunnydale High School collapsed into a hole that makes the VW swallowing Buckinghamshire effort seem pretty tame. It’s looking as though it might take more like a mere seven months for the research on reliability of lesson observations, unleashed by Rob Coe, to do the same for the graded lesson observations that have stalked the corridors of our own schools, devouring innocent teachers, for many a year.
I have never believed that teacher effectiveness could be judged on three graded lesson observations per year; I cannot see how Ofsted inspectors can believe that the teaching charade they view during an inspection gives them much useful information about the quality of teaching and learning in a school; and I think that basing PRP decisions on individual lesson observations comes close to breaking employment law. I will be happy to see these worst excesses of the system swept away, and if that’s the end of graded observations entirely, well maybe it’s a price worth paying. But if we want to measure teacher effectiveness (I’ll leave the argument about whether we do or not for another time) how are we going to do it now?
My first suggestion is that, if the MET project that Coe has been referring to is good enough research to justify binning graded lesson observations then, given that MET stands for Measures of Effective Teaching, it should be good enough research to suggest how we might validly and reliably measure just that. The culminating findings make the following points:
- It is definitely possible to measure teacher effectiveness. Teachers were assessed and then pupils were assigned randomly and the earlier assessment was used to predict student outcomes. Those teachers who had been identified as more effective did have better student outcomes on average.
- There are some subtleties to this, however. My interpretation is that for any one individual teacher it’s possible that student achievement gains in a particular year would not match their assessed level of effectiveness so that means no guarantees that a teacher identified as particularly effective will not have a year with poor outcomes, but the original measurement of effectiveness is solid.
- “Estimates of teachers’ effectiveness are more stable from year to year when they combine classroom observations, student surveys, and measures of student achievement gains than when they are based solely on the latter.” I presume this is because of the noise in the student achievement gains.
So this leaves me thinking that, if we want to assess teacher effectiveness, we can do so, using a combination a VA, student surveys, and lesson observations. It’s tempting to think that having several years of data would average out the noise and make that the stand-out indicator but it’s crucial to realise that the whole point of the randomisation in this research was because without it there was no way to decide whether differences in student outcomes were due to teachers or due to other factors – in other words, other factors do matter. This research definitely does not suggest that we can just ignore which classes a teacher has worked with and rely on VA.
As soon as I start to think about transferring all this to a typical English school, with it’s busy teachers and SLT, small and sometimes imperfect data-sets, and varied classes, I find myself in strong agreement with Tom Sherrignton’s blog post “How do I know how good my teachers are?” I don’t think there will ever be a perfect measure but we can have a pretty good stab at it. And lesson observations are part of this.
My second question is about what that research actually says about the reliability of graded lesson observations. Coe’s figures have been widely circulated. I’m not going to dispute them but I am going to query whether it’s possible to generalise those findings to our current system and comment on what the MET project says about making observations (which they are suggesting are important in assessing teacher effectiveness) more reliable. That’s for another post, coming soon.