A Potentially Different Approach

The approach to teaching energy has been a hot topic in physics for a while now. The long-established approach has been to list different types of energy and then to think of physical processes as involving transformation of energy from one form to another. Although it’s not inherent to this way of thinking, it’s quite common for physical processes to be explained in terms of these energy changes too.

For some time now, the IoP and others have been strongly promoting an alternative approach which sets clear start and end points and considers different types of stores of energy, with conservation of energy dictating that as one store empties, another fills. Linked to this alternative approach is a clear emphasis on energy as a calculation tool only and not as a way of explaining processes.

Now that the new KS3 NC and the new GCSE Subject Content have both been written in a way that favours this new approach to energy, it is clearly necessary for me to ensure I am not only passing this approach to teaching energy on to my trainee teachers, but also exploring the ramifications for other bits of science teaching. This is something I’m struggling with quite a bit and I ran into a significant issue earlier this week in tweaking my session on basic circuits.

My approach to basic circuits is to start with potential difference and sort that out before dealing with current. The thinking here is that p.d. is the most conceptually difficult part of circuits work and the usual problem is that p.d. and current are not well enough separated mentally, so misconceptions like p.d. splitting when components are in parallel are common. By teaching p.d. first, that can be secured before grappling with current; doing it the other way round means that p.d. is being taught whilst students are in the process of grappling with current and so inevitably the two get conflated. Regardless of the merits in this order of teaching, a clear concept of p.d. is necessary, and I’ve always approached this by invoking energy.

If p.d. is a measure of the change in energy between two points (change in energy per unit charge, obviously, but I would tend to avoid that technical detail at first) then it becomes fairly easy to be convinced that the change in energy across the power supply must equal the change in energy across the other components in the circuit. Equally, it makes the idea of p.d. splitting across components in series but not across components in parallel fairly clear too. This is really all based on the idea of conservation of energy but I find kids get that pretty instinctively. Anyway, I’ve been teaching it this way for a long time and with a pretty high level of success as far as I can tell (albeit generally with fairly high-achieving students).

But you can see the problem – talking about energy like this is not going to sit comfortably with the new approach to energy at KS3/4 because in that approach, the circuit system would be described as a chemical store of energy in the cell being emptied, and a thermal store of energy (in and around the other components) being filled. There is an electrical pathway transferring energy from one store to the other but talking about amounts of (electrical) energy changing around the circuit doesn’t fit. I did try it this way with my trainee teachers this week; I related p.d. to the rate of emptying of the chemical store, and the rate of transfer to the thermal store. It kind of worked but did feel a bit clunky.

So possibly it’s just a case of everyone getting up to speed with the new approach, and then it won’t feel clunky. Another possibility is to teach the energy topic using the new approach and then not get too hung up on it within other topics (like how at A-Level we happily ignore quantum models when teaching interference of e-m waves) but that seems like an unhelpful compromise as far as childrens’ understanding is concerned. Is there an alternative?

A quick Tweet for suggestions, and a brief flurry of ideas, and the one that caught my eye was this:

Could this be an alternative approach to p.d. that allows teaching it first in a way that establishes a solid foundation for subsequent work? Maybe.

Here are the diagrams I use for establishing what a potential difference is.

pd diagrams


At the moment I talk about the movement of electrons creating a difference’ between ends of the cell, or part of the circuit. I then name this ‘difference‘ a “potential difference. An alternative approach is to emphasise that the charge has to be evenly distributed along a wire and name this as all points on a wire being at the same potential. It then becomes fairly clear that the difference across components is a potential difference. If you focus on the underlined words then there has been a shift from talking about differences to talking about potentials. Maybe this slippery technical term means the alternative approach is going to be a harder sell but on the other hand it does offer a way to avoid energy altogether. Does this advantage outweigh that problem?

I would be really interested in anyone’s thoughts.


A Statistical Battleground

Given that I’ve just launched into a deeper reading of John Hattie’s book, Visible Learning, I’m taking a keen interest in the current statistical battle rumbling away on Twitter. I’ve come across two bloggers, both as far as I can tell with statistical backgrounds, who are making the strongly-worded point that the Effect Size is a statistical technique being applied incorrectly in education and other social science research.

According to @Jack_Marwood “The Effect Size should be used to check whether an experiment will have enough data from which to draw valid conclusions before the experiment takes place. In most educational research, the Effect Size is used to compare different methods of teaching or outcomes of a change in educational process. There is no justification for this.”

The view of @OllieOrange2 is that “Mathematicians don’t use it”.

Whether this is damning or not is debateable. From the arguments, as far as I can follow them, it seems as though using the Effect Size to work out the relative difference between the means of two data sets might not achieve the level of perfection mathematicians strive for, but that with some provisos, it is roughly what education research wants to know. It would be really helpful if a whole bunch of independent statisticians weighed in with an informed opinion (that looked at the big picture rather than obsessing over the mathematical niceties) but since that’s not very likely, I think there is a fair weight of academic thought in favour of the Effect Size as an imperfect but useful measure. I would also claim to have followed the statistical arguments as far as they go (although I wouldn’t presume to be able to spot errors in the presentation of these) and my view is that the case for the provisos is pretty clear but the case for the damnation of the Effect Size has not been sustained. So then this becomes a list of those provisos.

First there is the difference between the means in a pre-, post-test design, and the difference between the means in an intervention and control group design. OllieOrange2 highlights this issue. I can see that if the time scale is quite long then this matters because the effect size of a programme evaluated over a year that had zero influence on achievement could be either about 0.40 (typical for one year of schooling) or 0.00 depending on which methodology is used. OllieOrange2 is inferring that (a) this will be a big discrepancy for lots of studies (b) Hattie hasn’t noticed the difference. However, I can’t see how a pre-, post-test design over a year could show anything unless it was compared to a control or there was some kind of regression analysis to pull out the effect of the variable being studied from the background, so I’m not sure it’s a problem. For short-scale studies the two methodologies would converge. It would be nice to know whether or not Hattie had thought about this though.

In reading Visible Learning I think that a bigger issue may be the difference between research evaluating interventions, and research comparing pre-existing situations. Hattie is very clear that “almost everything works” and uses the mean of all the Effect Sizes from all the meta-analyses to state that 0.40 is generally the bar that should be set before influences are judged. This is a Hawthorne Effect at work where students, and perhaps more significantly, teachers respond to the novelty of the intervention by pulling a few stops out. It makes a lot of sense for an intervention but some influences are different. As a blatant example, the researchers correlating birth weight with achievement cannot possibly have influenced any embryos, and if the achievement data was taken from existing records (as it presumably was) then they cannot have influenced the achievement scores either. So for birth weight, the full 0.54 applies – there isn’t some kind of 0.40 Hawthorne Effect – and this applies to a number of other influences too. The two types of influences are not separated, and the difference doesn’t seem to be mentioned in Visible Learning.

The age of the students being studied makes a big difference. the graph in this post shows this very clearly, but I quite like the height example I’ve just thought up – the Effect Size of 6″ heels on height is bigger for someone 4’6″ compared to someone 5’8″ (reference to height comes from Hattie but the high heels are all mine). Hattie clearly synthesises meta-analyses without adjusting for this. Possibly there is enough random variety of student ages in the original studies to compensate a bit but it’s a clear limitation of Hattie’s work.

Homogeneity of the students being studied is also significant. This is because the Effect Size is relative to the SD so if the students are closer in achievement then the SD is smaller so the Effect Size becomes bigger. This again is a clear limitation, particularly where the original studies by their nature focused on restricted groups.

Dylan Wiliam has made the point that trying to alter something that responds well to teaching will tend to produce a larger effect size. Having said that, if working hard at something doesn’t have much effect because it doesn’t respond well to teaching, that’s quite a good reason for leaving it alone and spending the effort on something more effective. Given the calibre of the author I may be missing something but the lengthy comment quoted in this post (and the reply in the comments) are available, and this .ppt from the Presentations page on his website. I’m sure I have also read a full paper on this but I can’t just find it at the moment.

Before finishing, I think this paper (that I can’t find) on the limitations of the Effect Size is proabably the best criticism I’ve read. It seems more balanced than the recent blog posts and less focused on mathematical issues which may not matter too much at the level at which education research operates. In contrast Rob Coe’s CEM and EEF briefings describe the advantages of using the Effect Size.

As I’ve been writing this short post, to clarify my thinking, and maybe take stock of a project that might be a waste of time, I’ve stumbled across several other relevant pieces on the subject. Neil Brown’s review of Visible Learning is good, and he also reported briefly on Robert Coe’s ResearchEd2013 debate with OllieOrange2. This post by Leafstrewn is an early criticism and references the “Norwegian debate”, which is reported in this (by me) unpronounceable post. EvidenceIntoPractice is always a mine of useful information on research and issues with meta-analyses is no exception.

And subsequent to this post, there has been a very important and potentially significant debate on using Value-Added Models to measure educational effectiveness. I think a lot of the research on which Visible Learning is based will be using different ways of assessing outcomes, partly because VAM are quite a new approach. However I think it’s worth flagging up here as it’s clearly related, particularly to current work like the EEF projects. The most comprehensive review against using VAM as an evidence-base for policy I’ve seen is a Washington Post article, with a perhaps more politically-aware statement form the American Statistical Association.

Self-reported grades (Effect Size = a whopping 1.44)

This post is part of a series looking at the influences on attainment described in Hattie (2009) Visible Learning: a synthesis of more than 800 meta-analyses relating to achievement. Abingdon: Routledge. The interpretation of Hattie’s work is problematical because the meaning of the different influences on achievement isn’t always clear. Further context here.

Following my post on Piagetian programs (effect size = 1.28) comes the top-ranked influence of Self-reported grades (effect size = 1.44). Until now I’ve been assuming that if you take one group of students who say they are working at A-grade standard, and another who say they are working at C grade standard, then you find that, sure enough, the ones self-reporting A grades are achieving more highly. Hattie implies that if self-reported grades are very accurate then there is less need for testing but my thinking on this is that there is good evidence that low-stakes testing is an effective method for improving recall, and replacing high-stakes testing with self-reported grades isn’t going to happen any time soon.

What I have been wondering is how much of this self-reporting of grades is predictive; if the two groups of students are actually working at the same current level but one group declare themselves A grade students and the other group declare themselves C grade, maybe that becomes a self-fulfilling prophecy. In a moment of self-delusion I’ve even used this as a piece of evidence supporting the importance of setting challenging learning objectives – hopefully my other evidence excuses this slip.

So, back to Hattie’s evidence then. I’m afraid the only way to report this is to go through the individual meta-analyses: Kuncel, Credé and Thomas (2005) were looking at the validity of self-reported Grade Point Averages. It’s not toally clear to me quite how GPAs work in the USA but I think this would be kind of the same as asking graduates in the UK what their final percentage mark was for their degree. The point of this meta-analyses is to try to establish the validity of researchers asking for GPA rather than getting it from a transcript of some sort so I don’t think this has any relevance to teachers – it’s just about whether people remember accurately and whether or not they lie.

Falchikov and Goldfinch (2000) were looking at the validity of peer marking compared to teacher marking, at undergraduate level: they found a high level of correlation. This study also reports the findings from Falchikov and Boud (1989), which are similar. Mabe and West (1982) found a low correlation between self-evaluation, and other measures of performance. The range of studies they lookd at was really broad including academic, clerical, athletic performance. It’s a psychology study so of course most subjects were, again, university undergraduates. Finally Ross (1998) found pretty variable levels of self-assessment in those learning a second-language. There is a vague theme running through these studies that novices are worse at self-assessment than more experienced learners in a paticular area.

I think the only useful thing that comes out of this for teachers is that, with capable students, it may be possible to do quite a bit of peer-marking and self-assessment, to ease the workload of teacher marking, if what you are after is marks for your markbook (none of this evidence says anything about any other aspect of feedback). Perhaps the very limited relevance of this influence is why it isn’t mentioned anywhere in Visible Learning for Teachers but it does seem odd that it gets Rank 1 and then is completely ignored.

The rest of the list of influences brought by the student doesn’t seem terribly interesting. Either these are things that teachers have no control over – like pre-term birth weight – or they would be much more interesting if looked at in terms of the effect on trying to change something. For example, Concentration/persistence/engagement (effect size = 0.48) appears important but all the recent focus on this, stemming from Duckworth’s Grit, and Dweck’s Mindset work, only matters to teachers if there is some good evidence that we can shift children along these scales. I’ll have a little look at this one in case there is something interesting lurking in there but otherwise it might be time to move on to school effects, starting with Acceleration (effect size = 0.88) and the behaviour management effects, in particular what the difference is between Classroom management (effect size = 0.52), Classroom cohesion (effect size = 0.53), Classroom behavioural (effect size = 0.88), and Decreasing disruptive behaviour (effect size = 0.34), and what the research says about Peer influences (effect size = 0.53).

Piagetian programs: effect size = 1.28

Hattie states that the one meta-analysis for this influence found a very high correlation between Piagetian stage and achievement (more for maths 0.73 than reading  0.40). Quite what is meant by this isn’t clear. I’m guessing that some sort of test was done to determine the Piagetian stage and the correlation is between this and achievement. Piaget’s original theory suggests that the stages are age-related but later work has criticised this part of the theory – he did base his theories a lot on the development of just his own children – so presumably the research behind this meta-analysis was based on the idea that children made the breakthrough to a new stage at different ages, and that those who reached stages earlier, might achieve more highly. If I remember correctly, the CASE and CAME programmes (and Let’s Think! for primary) were designed to accelerate progress through the Piagetian stages – from the concrete to the formal-operational stage in the CASE and CAME programmes) and there is some evidence that all these programmes have a significant effect including a long-lasting influence on achievement not only in science but spilling over into English, and several years later at that. Maybe these would count as Piagetian programmes.

So that’s my starting point but what does the Jordan and Brownlee (1981) meta-analysis actually deal with? Well, at the moment all I can find is the abstract:

The relationship between Piagetian and school achievement tests was examined through a meta-analysis of correlational data between tests in these domains. Highlighted is the extent to which performance on Piagetian tasks was related to achievement in these areas. The average age for the subjects used in the analysis was 88 months, the average IQ was 107. Mathematics and reading tests were administered. Averaged correlations indicated that Piagetian tests account for approximately 29% of variance in mathematics achievement and 16% of variance in reading achievement. Piagetian tests were more highly correlated with achievement than with intelligence tests. One implication might be the use of Piagetian tests as a diagnostic aid for children experiencing difficulties in mathematics or reading.

I have made a few enquiries and will update this post if I get hold of the full text but it seems quite close to my assumption that it’s about a correlation between tests of Piagetian stages and achievement. I don’t think that’s of any direct use since it doesn’t tell us anything about how we accelerate progression through the stages. On the other hand, if we know that there is a good correlation between Piagetian stage and achievement, and if it transpires that it is possible to change the former, and that this does have a casual effect on the latter, then we would perhaps be cooking on gas.

Where does CASE, CAME, and Let’s Think! come into this? Well, these Cognitive Acceleration (CA) programmes cannot be relevant to this influence, as classified by Hattie, because the first paper on CASE was published in 1990 and the meta-analysis used by Hattie for this influence labelled Piagetian programs dates from 1981. However, as well as the evidence for the effectiveness of these CA programmes from those involved in developing them, they were included in a meta-analysis on thinking skills Higgins et al (2005), which Hattie has made use of. Where do you think this is found? Not under Piagetian programs; not under Metacognitive strategies; no, I don’t think you’ll guess – under Creativity programs (Effect Size = 0.65). I would instinctively have though Creativity programs was something in the Ken Robinson mould. Instead Hattie is picking up a collection of specific curriculum programmes based around clearly stated things to be taught, and particular ways to do the teaching, that emphasise the explicit development of thinking strategies. And buried in here are some very high effect sizes.

I actually taught CASE (without proper training, I’m afraid) for a year, whilst doing a maternity cover about ten years ago. I thought it was pretty good at the time but if the effect sizes hold up (the EEF have a Let’s Think Secondary Science effectiveness trial underway that will report in 2016) then we should probably be thinking about making this a pretty integral part of science and maths teaching. If anyone is looking for access to the programmes then it’s organised by Let’s Think.

Probably the final point on all this is that I’ve started this post with a title that includes Piaget, whose theory on cognitive psychology is a primary source of justification for the whole constructivist teaching movement. And I’ve ended up talking about a programme directly drawing on his theory that appears to have an effect size at least comparable to Direct Instruction. Should the new-traditionalists be worried? No more than is justified. CASE has at least as much in common with Direct Instruction as it does with Problem-based Learning, and although it includes significant amounts of peer discussion it is definitely teacher-led. I continue to argue my case that teachers should be in charge of learning, but that we shouldn’t throw the quality learning baby out with the constructivist bath-water.

Next, Self-reported grades (Effect Size = a whopping 1.44)

Looking More Closely at Visible Learning

A somewhat careless comment on Andrew Smith’s blog (which he responded to with a clear demonstration that he knew more about Hattie’s work than I do) has led me back to the original Visible Learning: a synthesis of over 800 meta-analyses relating to achievement. There are a whole bunch of issues with Hattie’s methodology, which are probably fairly well-known by now e.g. David Weston’s ResearchEd 2013 talk; Learning Spy’s post which is related to Ollie Orange’s . I’ve tried to summarise these for my own clarity. If you read the introduction to Visible Learning, or churn your way through Visible Learning for Teachers, it’s pretty clear that Hattie is conscious of at least some of the limitations of his work (maybe not some of the statistical issues, though). In some ways Andrew is bucking the trend in education at the moment – a few years ago Hattie was definitely the most prominent researcher in the field of education but his star has undoubtedly waned. For a while there, he really was The Messiah, but that wasn’t his fault, more a consequence of being responsible for some important evidence at just the moment that the deep-water swell of evidence-based practice felt bottom and started to build. At first surfers flocked to, and eulogised, Hattie’s miraculous surf break but when it turned out to not be as smooth, glassy and regular as they hoped, and other surf spots were discovered, it almost inevitably fell from favour somewhat.

As Hattie himself points out, any attempt to just look at the headline effect sizes and conclude “this works, that doesn’t” is not only misinterpreting his work, but missing the point. His approach is to take the huge mass of evidence and use it to draw out themes that really do tell us something about how to teach more effectively, but always to appreciate that this must be in the context of our own teaching, our own students, and our own settings.

However, I think there is another barrier to making effective use of Hattie’s work. I think I’ve been aware of it for a while but the recent brief exchange with Andrew Smith has highlighted it for me. Interpretation of Hattie’s work is problematical because the meaning of the different influences on achievement isn’t clear. I first encountered Hattie’s work through the Head of History at the college where I worked. He was a fantastic teacher and had been significantly influenced by Geoff Petty’s book Evidence Based Teaching which in turn was heavily influenced by Visible Learning. I think Petty made a pretty decent stab at interpreting Hattie’s work but I also think he was influenced by some of his own ideas about effective teaching (Teaching Today pre-dates Visible Learning and I think shows that he didn’t take on board all the evidence from Visible Learning when he read it) and there are points where he freely admits to basically taking an educated guess at what some of Hattie’s influences actually refer to.

So having gone on to read quite a lot online about Hattie’s work, and continuing to encounter the same issue, I keenly started out on Visible Learning for Teachers, and was enormously disappointed with it. Expecting non-technical clarification and additional detail about the meta-analyses, instead it is an attempt to leave all the detail behind and draw some conclusions about the implications for teachers. A worthy aim, but a good couple of hundred pages longer than necessary; it reminded me of Jane Eyre!

It wasn’t long after reading this that the methodological issues with Visible Learning started to be spoken of more prominently and although I have continued to use the list of effect sizes as a kind of quick reference to support some ideas about effective teaching, I’ve more-or-less left it at that. So the video posted on Tom Sherrington’s blog over the summer blew me away somewhat – here was the clear, coherent message that was missing from Visible Learning for Teachers. Subsequently, and spurred on by my recent error, I’ve gone back to the original Visible Learning. I really see no reason why Hattie thought that teachers needed this interpreting; it’s not very technical and the introductory and concluding chapters draw the threads together at least as well as anything in the Teachers’ version. That fundamental issue still remains though, that for at least some of the influences, the meaning is hazy. On the other hand, the references are clear, and working at a university I am lucky enough to have unobstructed access to many of them.

It’s therefore time to do some reading, and sort out the nature of the influences that remain unclear to me. My plan is to take each influence in order from Visible Learning and do just enough to feel confident of the meaning. I’m hoping for most influences this will just involve reading the relevant page or two from Visible Learning (a lot are very clear) but for some I expect to need to go back to the most prominent original meta-analysis to see what it was actually about. I’ll let you know how I get on but Hattie starts with the section on Contributions from the Student: Background. Prior achievement (Effect Size = 0.67) is clear enough but Piagetian programs (Effect Size = 1.28) is not (I had assumed this was things like CASE and CAME – which have been shown to be very effective – so that shows how much I need to do this reading). I can’t make much sense of Hattie’s paragraph on this so, here we go. I’ll let you know how I get on.

A Little Meeting with Ofsted

Having been to the unmistakably impressive UCAS building in Cheltenham a few weeks ago as a member of the UCAS Teacher Training Advisory Group, I walked right past Ofsted’s London office. As an organisation, it holds such a prominent place in the English education system that you couldn’t possibly miss it, and for no very good reason I think I expected the offices to be impossible to miss too. Rectifying my error I engaged with the very pleasant G4S reception people and in short order Sean Harford came down to meet me and ushered squeezed me into a very bijou meeting room with Angela Milner. I think the ensuing discussion was helpful in clarifying for me some of the issues around ITE inspections and how these are going to work under the new, new framework, which has started but only on a small scale this year (I think there have been ten, part 1 inspections so far last term, so there will have been ten completed inspections by Christmas).

Previous reports of meetings with senior Ofsted people have been very positive and Sean and Angela didn’t let the side down. I think it was very much a discussion about where Ofsted are at with ITE inspection, and the thinking behind that position, rather than anything earth-shattering that might make a big difference in the future. We covered most of the things I have been thinking about, although there are a couple of things I might expand on a bit now the dust has settled (actually the Ofsted office wasn’t dusty but you really couldn’t have swung a cat in that meeting room).

If there was a theme to the meeting it was that Angela and Sean were very focused on the two closely tied issues of the Ofsted ITE remit, and the quality of NQTs in our schools. I got a sense of awareness – not so sure about sympathy – for the difficult decisions we have to make when viability of ITE provision, and quality of trainee teachers, are not necessarily served by the same choices but I think I came out of the meeting more aware than when I went in that essentially the Ofsted line is that they are commissioned to report on quality of ITE in terms of the quality of the NQTs produced and they do not see taking into account the difficulties that providers experience in achieving that, as part of their job. This came through most clearly in discussing validity of Ofsted grading. I had suggested that a provider might be performing minor miracles with weak trainee teachers but still come up short in comparison to another provider with the reputation to attract stronger applicants; Sean’s view was that children are only affected by the quality of the NQT, not the progress they’ve made to get there. I think he has a good point, however harsh that might be. The same theme came through with recruitment decisions – if a provider is accepting marginal applicants, that’s their call but Ofsted aren’t interested in how far their training takes them, only in how good they are at the end of it. If the alternative is to close down, or exacerbate the teacher recruitment shortage, that’s not an issue within Ofsted’s remit.

If you read my earlier blog on ITE inspections you may remember that I suggested the elephant in the ITE room was the School Direct route. I got a bit more of a sense of sympathy here – you can’t be involved in ITE without being very conscious of the enormous upheaval shifting so many places to SD has caused. Again, though, the message was that it’s the outcome that matters, not the training route. So ITE inspections will be looking at a mixture of PL and SD trainee teachers in part 1 and PL and SD NQTs in part 2, and whilst these will be looked at as separate groups (as for different subjects and phases) the quality of training is judged on the performance and no account will be taken of the route or the advantages of SDs greater experience in the one school, or the disadvantages of poorer opportunities for wide experience, or the difficulties of maintaining standards. So I guess that’s a level playing field, at least. If it’s harder to maintain high standards of training across SD provision then that’s tough on old framework Grade 2 providers that had to get heavily involved in SD to maintain numbers; if SD confers an advantage because more observed trainees and NQTs will be well-established in their schools then that’s tough on the Grade 1 providers that had protected allocations and didn’t see the writing on the DfE wall. So maybe it’ll all come out in the wash but it’s tricky for providers who have a tremendously difficult balance to strike between holding Alliances to account for weaknesses in their SD provision and not pissing them off so they go looking for a softer option, taking the money with them.

Perhaps more importantly, from the perspective of those not directly affected by ITE inspections, rolling PL and SD together in this way will make it difficult to judge whether SD is, in general, providing a better, worse, or just different training route. That’s a massive question and Ofsted are the only people likely to be able to make a reasonably impartial judgement. The DfE and NCTL have too much to lose, having promoted it so fiercely; and it’s caused too much damage to the established HEI providers for their view not to be easily dismissed as partisan. Ofsted still have work to do to persuade everyone that they are completely apolitical e.g. see this Times Higher Ed article but I would like to see them report on the overall quality of SD and their perception of the strengths and weaknesses of the model at some point in the future – maybe it will be late 2015 before they’ve inspected enough SD to have any useful evidence. Meanwhile, they could undo some of the damage from the March 2013 statement by Michael Wilshaw that seemed calculated to lend support to DfE and NCTL policy, by publishing an update on the relative performance of HEIs, against SCITTs and other employment-based routes – both Sean and Angela seemed to think that grade breakdowns were pretty comparable.

Of course, all this talk of measuring outcomes and judging the provider on the quality of the product still depends on being able to measure with accuracy, validity and reliability. I don’t think anyone meeting with Ofsted has come away with the sense that Ofsted believe their judgements are infallible and Sean was quite open about the possibility that not every inspection report was perfect. The discussion on how Ofsted might take this forward was very brief and, if I happen to find myself in this kind of situation again, it’s the area I would want to ask about more. I remain astonished at how quickly Ofsted seemed to roll-over when Rob Coe suggested individual lesson grades were unreliable (maybe that was an open door waiting to be pushed and really it was just the discrepancy between policy and practice that remained but Michael Wilshaw did respond with “Which ivory towered academic, for example, recently suggested that lesson observation was a waste of time – Goodness me!” so I don’t think it was a fait accompli). If Ofsted had engaged with research more they would either have already found themselves in agreement with Rob, or would have had the ammunition to hold their ground. I’m not suggesting individual lesson observation grades would be a good thing, and Sean didn’t miss the opportunity to state clearly that ITE inspections do not grade individual lessons, just that the response to Rob’s message suggests more uncertainty within Ofsted than they might be comfortable admitting.

Perhaps more of a thought out loud than anything stronger, but whilst there is obviously a moderation process as part of training inspectors, Sean did express an interest in what would be termed ‘blind second-marking’ in a university context. Interestingly he said something similar when he met Andrew Smith. It’s not an area I can claim any expertise in but I am pretty sure that there are various ways in which these measurement issues could be investigated. This data, showing that KS2 level across a cohort significantly influences secondary school Ofsted grade, is an example but there are much more sophisticated regression analysis techniques that might be relevant (although maybe Ofsted should be starting with Section 5 inspections rather than ITE if they are going to commission this kind of research).

A minor point about Part 2 of the inspection was clarified. The reference in the framework to NQTs/former trainees is purely because FE trainees don’t become NQTs so Ofsted definitely won’t be looking at any trainee beyond their first term as an NQT (or former trainee if in FE) during ITE inspections.

Both Angela and Sean were very clear that the Part 2 Inspection of NQTs was about how well-prepared they were, not some kind of bald snapshot of their teaching in one observed lesson. They were as quick to raise the sample size issue as I was and their model was quite a noticeable reflection of education research methodology where only large sample sizes allow conclusions to be drawn across contexts, but small samples often provide richer information because the data goes deeper and can be, in fact has to be, considered in context. I found this reassuring because it makes the precise timing of this part of the inspection less critical. I think the difficulty for providers will be that a lot will be riding on the way in which NQTs report their experience – the inspectors will need to be pretty astute to spot the NQT who has been given loads of personalised support and an extensive toolkit to take into their NQT year but hasn’t engaged with, and drawn on, it very effectively, as against the NQT given the same support and toolkit who can rattle off a list when asked and explain how they are using it.

I was pleased that Sean and Angela were talking much more about the quality of information and preparation for NQTs, and the quality of information passed on to schools, rather than the support provided to NQTs, particularly since an inspection team might be looking at NQTs outside the provider’s partnership. Some schools engage really well with the local HEI that has trained their NQT but many don’t, and aren’t keen to release NQTs for this purpose either, and it’s not something providers can always influence. Also, during Section 5 inspections in schools the inspection team will normally sit down with the NQTs and look at the support the school have provided, so that should help significantly in persuading schools that continued engagement with training providers might be worthwhile.

The final thing that came through loud and clear was that the focus on behaviour was going to become significantly stronger. I guess this is unsurprising in the week that Ofsted have published a report on low-level disruption in schools, and during a period when they are trying to move away from giving an impression that behaviour in typical schools is pretty good. I’ve put forward my views on behaviour training in ITE before and hope that everyone involved in training teachers can use this Ofsted priority to collaborate on finding best (or better) practice. I’m in no doubt that if inspection teams find NQTs struggling with behaviour, they will be asking hard questions about whether their training exposed them to a wide enough variety of kids and gave them the tools for the battle. Again, a lot is required of inspectors to correctly distinguish the NQT having a ding dong battle with a difficult Y10 class, but holding up and gradually turning the tide, from the NQT who wasn’t so well prepared but doesn’t have such a tricky class, or who never needed any help with behaviour. With that massive proviso I am prepared to concede that providers are not entirely at the mercy of the quality of the school their NQTs are working in.

My remaining bone of contention is the emphasis on Grade 3 trainee teachers being unacceptable. Angela was clear that the process of grading was holistic and involved working up from the bottom, to establish first whether all the Grade 4 criteria were met and then whether there was evidence to award the next grade up and so on, as described in the Handbook. To me that still seems as though one Grade 3 NQT might be a sticking point and Angela and Sean didn’t entirely convince me that it wouldn’t be. In effect I got the sense that, in making the grading judgement, the door might still be unlocked, if not ajar, if the provider could demonstrate a convincing narrative of significant personalised support, extended placement or additional experience, and clear advice and follow-up for both the NQT and employing school. If I’ve interpreted that correctly then I at least am clear that what we are doing at Southampton will be recognised by Ofsted but, as with the developing Section 5 inspection process, I do worry that the ability of the provider to narrate convincingly might be more significant than whether or not they’ve actually done the business. I also think that while the pressure for providers to convince themselves that a Grade 3 trainee is actually a Grade 2 is not quite so remorseless, it’s still definitely there. I don’t understand why that first sentence in the first criterion is included; surely stating that “Trainees demonstrate excellent practice in some of the standards for teaching and all related to their personal and professional conduct” would have covered it without preceding it with “All primary and secondary trainees awarded QTS exceed the minimum level of practice expected of teachers” [their emphasis]. The former statement is about the quality of trainees; the latter is about the numbers assigned to them.

So that’s my summary of what came out of the meeting. From my point of view it has clarified some of the intention behind the revised framework and demonstrated that Ofsted have at least understood the difficulties that remain with inspecting ITE providers in a way that genuinely recognises the best provision. It still feels like a big, and very pointy, stick to me, well a big and pointy axe, really. I guess we’ll have to see how well it’s wielded – the right intentions aren’t enough (and don’t even think about trying to wield anything in that meeting room). I wish any colleagues, inspected last term and waiting for Part 2, the best of luck (bet you didn’t get much summer holiday this year!). For what it’s worth, here’s what I would still like to see from Ofsted:

  • Get rid of the statement on outcomes that makes everyone paranoid about Grade 3 trainees rather than paranoid about whether they are doing the best for all trainees
  • Openly commission some research into validity and reliability of inspection judgements (fair enough if this starts with Section 5)
  • Continue to work with providers to find the best way to time inspections to avoid ‘funny’ weeks – Angela seemed keen on this
  • Be even more clear that sample sizes are small and judgements shouldn’t be skewed by particular incidents, or individual trainee or NQT performances
  • Be even more clear that it is the preparation of trainees, the package that NQTs take with them, and the extent to which they can draw on that package, that is being judged, and not just the quality of their teaching regardless of school support and context
  • Produce an update on the report about types of providers early in the life of the last framework – the data for all inspections under that framework must now be available
  • Consider producing an analysis of the strengths and weaknesses seen in SD compared to PL routes as soon as there have been enough inspections to do so

Many thanks, Sean and Angela, for making time to see me, and continuing the very welcome engagement of Ofsted with the people being inspected.