What does lesson observation research actually say?

Buffy took seven complete series before the First Evil was finally defeated when Spike’s amulet channeled the power of the Sun into the Hellmouth and Sunnydale High School collapsed into a hole that makes the VW swallowing Buckinghamshire effort seem pretty tame. It’s looking as though it might take more like a mere seven months for the research on reliability of lesson observations, unleashed by Rob Coe, to do the same for the graded lesson observations that have stalked the corridors of our own schools, devouring innocent teachers, for many a year.

I have never believed that teacher effectiveness could be judged on three graded lesson observations per year; I cannot see how Ofsted inspectors can believe that the teaching charade they view during an inspection gives them much useful information about the quality of teaching and learning in a school; and I think that basing PRP decisions on individual lesson observations comes close to breaking employment law. I will be happy to see these worst excesses of the system swept away, and if that’s the end of graded observations entirely, well maybe it’s a price worth paying. But if we want to measure teacher effectiveness (I’ll leave the argument about whether we do or not for another time) how are we going to do it now?

My first suggestion is that, if the MET project that Coe has been referring to is good enough research to justify binning graded lesson observations then, given that MET stands for Measures of Effective Teaching, it should be good enough research to suggest how we might validly and reliably measure just that. The culminating findings make the following points:

  • It is definitely possible to measure teacher effectiveness. Teachers were assessed and then pupils were assigned randomly and the earlier assessment was used to predict student outcomes. Those teachers who had been identified as more effective did have better student outcomes on average.
  • There are some subtleties to this, however. My interpretation is that for any one individual teacher it’s possible that student achievement gains in a particular year would not match their assessed level of effectiveness so that means no guarantees that a teacher identified as particularly effective will not have a year with poor outcomes, but the original measurement of effectiveness is solid.
  • “Estimates of teachers’ effectiveness are more stable from year to year when they combine classroom observations, student surveys, and measures of student achievement gains than when they are based solely on the latter.” I presume this is because of the noise in the student achievement gains.

So this leaves me thinking that, if we want to assess teacher effectiveness, we can do so, using a combination a VA, student surveys, and lesson observations. It’s tempting to think that having several years of data would average out the noise and make that the stand-out indicator but it’s crucial to realise that the whole point of the randomisation in this research was because without it there was no way to decide whether differences in student outcomes were due to teachers or due to other factors – in other words, other factors do matter. This research definitely does not suggest that we can just ignore which classes a teacher has worked with and rely on VA.

As soon as I start to think about transferring all this to a typical English school, with it’s busy teachers and SLT, small and sometimes imperfect data-sets, and varied classes, I find myself in strong agreement with Tom Sherrignton’s blog post “How do I know how good my teachers are?” I don’t think there will ever be a perfect measure but we can have a pretty good stab at it. And lesson observations are part of this.

My second question is about what that research actually says about the reliability of graded lesson observations. Coe’s figures have been widely circulated. I’m not going to dispute them but I am going to query whether it’s possible to generalise those findings to our current system and comment on what the MET project says about making observations (which they are suggesting are important in assessing teacher effectiveness) more reliable. That’s for another post, coming soon.


Educational Research: Too much, too little, too often?

As the concept of research-based practice grows in stature within the teaching community in the UK, and to some extent amongst policy-makers too (when it suits them), the relationship between academics and teachers has regularly come under the spotlight. ResearchEd 2013 is possibly the most prominent recent example with ResearchEd Midlands on the horizon.

I thought it might just be useful to take a moment to consider if any individual working on this part-time, whether a teacher reading late into the night, a teacher educator preparing sessions for their trainees, or SLT looking for the magic bullet, can get any kind of effective handle on the whole picture.

I’m a science teacher. Here is Keith Taber’s list of some journals specialising in science education. Keith Taber works at Cambridge University; he is a cheese of the large variety in science education so if his name is not familiar, perhaps I can rest my case in terms of teachers having an overview of academic research.

I think the length of this list is a symptom of a fundamental problem in educational research. There are a lot of journals because there are a lot of small research projects and they all need publishing because that’s the way you demonstrate your credentials in academia. Having a very high profile journal cherry-picking the best of the rest would help – Nature, The Lancet, or the BMJ being prime examples in science and medicine – but I think the problem is more fundamental than just a communication problem. Young people can be slippery little beggars at the best of times, and teachers are worse; trying to get the kind of validity and reliability that you get from lab rats is futile. Therefore good research often needs very large sample sizes. Large sample sizes in education research are massively expensive and time-consuming (not quite LHC but £100,000s). We desperately need to replace a large number of small research projects with a small number of large ones. There are subtleties to this, with small projects helping to select the big ones, but that’s the shift that just might start to yield results that just might make evidence-based practice a convincing reality. There is a shift in this direction, although the emphasis on RCTs good, everything else bad, shows a limited understanding of the issues in educational research, but it’s still just a few flagship projects; the majority of academics are working on tiny little projects which, however good, will produce insignificant results to be buried behind a paywall in an obscure journal.

Can we please fix this?