NAO Report: Training New Teachers

Some time ago, quite soon after I moved into an ITE job at the University of Southampton, I posted on my thoughts on the relative merits of university-led and School Direct training routes. Looking back now, I would summarise the post as essentially suggesting that there were some advantages to SD that universities ought to have sorted out ages ago, but there are problems with SD too, and all sorts of quality-control issues. A surprising number of people seemed to think I got that about right.

Since then things have progressed and, for example, I’m now doing a bit of tutoring for Teach First so understand that programme much better than before. I’ve worked with trainee teachers from a SCITT as well. And, in general, have just seen more of the system, its triumphs, and its disasters. At the same time, we’ve been going through a series of desperate measures by the NCTL to boost recruitment (bursaries at SLT salary levels, abandonment of allocations, QTS on the side for under-graduates) and denials of the blindingly, bloody obvious from the DfE about current recruitment and retention levels. So things have progressed but I remain worried about the future of ITE  in this country.

The National Audit Office report was much-needed and I strongly suggest anyone with an opinion on ITE at the system level should read it. However, this post is prompted as much by what’s not in the report, as by what is.

Everyone in ITE – university tutors, school-based training co-ordinators, TF leadership development officers, and particularly the many individual mentors who are generally adding the demands of mentoring on top of their own teaching workloads with very little compensation – is working like trojans to deliver the best possible training for new entrants to the profession but it is all made so much more difficult by the lack of any stability in the system. If anyone has the impression that the DfE have a carefully thought-through plan, are proceeding intelligently, and properly evaluating as they go then I haven’t met them yet. Sure, there is a policy direction, but that’s not the same thing at all. The NAO report states “The Department… does not yet have sufficient information about long-term costs and the extent to which each route, and increasing schools’ role in the process, has improved teaching standards”. I think that’s very generous!

On the other hand, powerful people associated with the DfE are hardly unique in spending some time getting the feel of things, thinking they have the answer to making a significant improvement, and then ploughing ahead with lots of determination and not much sensitivity to feedback. Read The Blunders of Our Governments for further insight!

In a better world, what would be the questions it would help to know the answers to in ITE? Here is my current list:

  1. Which training routes, or aspects of training, tend to produce the best teachers?
  2. Which training routes, or aspects of training, tend to produce teachers who stay in teaching?
  3. For both the above questions, what is the answer in absolute terms, and what is the answer when looking at value-added?
  4. What elements of the various selection processes correlate with successful outcomes?
  5. How much does each training route actually cost the taxpayer?

There are plenty of people that will happily pontificate on these, and probably provide an answer, but I’m yet to be convinced that anyone can back their assertions up with convincing evidence.

I believe Education Datalab are about to report on some aspects of Q2. That’ll be a great start! And from what I know about this project it has the potential to provide a permanent and rich source of information to relate training to retention and other aspects of early careers in teaching.

There have been a couple of commendable attempts to evaluate the impact of Teach First on children’s outcomes too, but as far as I know, that’s about it for teacher quality. Trying to measure the effectiveness of teachers is a significant problem but actually, if you are talking about trying to identify trends across large groups of trainees then it is certainly possible. Ofsted make some kind of attempt to measure absolute outcomes but it’s based on a small number of single observations, some pretty arbitrary judgements, is almost certainly unreliable, and in the end all ITE is effectively graded on a two point scale so that’s not an awful lot of use.

Given how long universities have had to work on selection criteria, and the research expertise around in some of them, it’s a bit embarrassing that selection procedures haven’t been more thoroughly investigated. To be fair, though, medical schools are only just beginning to get their act together on this too, and the outcome metrics for doctors are probably rather simpler to sort out than for teachers.

Maybe we do know the answer to Q5. The NAO report contains the graph below but these are not simple calculations because trainee teachers’ cost impacts and benefits in schools are complex. The thing that puzzles me is that we pay schools for placements and, although they provide some training, £3000 per trainee seems like a very high net cost. Also, although I suspect TF is expensive, they must save about half a salary in most schools even with days out, lighter timetables, mentor remission etc. I would like to see details of the source analysis for this graph.

NAO costs

There is the beginnings of a project, in which I have a hand, to try to develop a value-added model of evaluation that can be applied to ITE. This is important because absolute measures are likely to assess the quality of successful applicants, and that’s definitely not the same as the quality of training. It isn’t going to be easy and, at first, it’s likely to be a bit ropey because the measurement of both initial potential and NQT teacher effectiveness are problematic. However, if we can get some momentum going, and perhaps tie it in with some of the work happening elsewhere like the Education Datalab project, then we just might be able to start to fill the gap the DfE don’t seem to be addressing. Let’s hope so. It would be a startling revelation if we could actually point at robust data and say “Look! this is working better than that – now let’s figure out why.” If we want an evidence-informed profession, finding out what really works in training teachers might be quite a good move. We certainly haven’t got any spare trainee teachers to break!

 

 

A Little Meeting with Ofsted

Having been to the unmistakably impressive UCAS building in Cheltenham a few weeks ago as a member of the UCAS Teacher Training Advisory Group, I walked right past Ofsted’s London office. As an organisation, it holds such a prominent place in the English education system that you couldn’t possibly miss it, and for no very good reason I think I expected the offices to be impossible to miss too. Rectifying my error I engaged with the very pleasant G4S reception people and in short order Sean Harford came down to meet me and ushered squeezed me into a very bijou meeting room with Angela Milner. I think the ensuing discussion was helpful in clarifying for me some of the issues around ITE inspections and how these are going to work under the new, new framework, which has started but only on a small scale this year (I think there have been ten, part 1 inspections so far last term, so there will have been ten completed inspections by Christmas).

Previous reports of meetings with senior Ofsted people have been very positive and Sean and Angela didn’t let the side down. I think it was very much a discussion about where Ofsted are at with ITE inspection, and the thinking behind that position, rather than anything earth-shattering that might make a big difference in the future. We covered most of the things I have been thinking about, although there are a couple of things I might expand on a bit now the dust has settled (actually the Ofsted office wasn’t dusty but you really couldn’t have swung a cat in that meeting room).

If there was a theme to the meeting it was that Angela and Sean were very focused on the two closely tied issues of the Ofsted ITE remit, and the quality of NQTs in our schools. I got a sense of awareness – not so sure about sympathy – for the difficult decisions we have to make when viability of ITE provision, and quality of trainee teachers, are not necessarily served by the same choices but I think I came out of the meeting more aware than when I went in that essentially the Ofsted line is that they are commissioned to report on quality of ITE in terms of the quality of the NQTs produced and they do not see taking into account the difficulties that providers experience in achieving that, as part of their job. This came through most clearly in discussing validity of Ofsted grading. I had suggested that a provider might be performing minor miracles with weak trainee teachers but still come up short in comparison to another provider with the reputation to attract stronger applicants; Sean’s view was that children are only affected by the quality of the NQT, not the progress they’ve made to get there. I think he has a good point, however harsh that might be. The same theme came through with recruitment decisions – if a provider is accepting marginal applicants, that’s their call but Ofsted aren’t interested in how far their training takes them, only in how good they are at the end of it. If the alternative is to close down, or exacerbate the teacher recruitment shortage, that’s not an issue within Ofsted’s remit.

If you read my earlier blog on ITE inspections you may remember that I suggested the elephant in the ITE room was the School Direct route. I got a bit more of a sense of sympathy here – you can’t be involved in ITE without being very conscious of the enormous upheaval shifting so many places to SD has caused. Again, though, the message was that it’s the outcome that matters, not the training route. So ITE inspections will be looking at a mixture of PL and SD trainee teachers in part 1 and PL and SD NQTs in part 2, and whilst these will be looked at as separate groups (as for different subjects and phases) the quality of training is judged on the performance and no account will be taken of the route or the advantages of SDs greater experience in the one school, or the disadvantages of poorer opportunities for wide experience, or the difficulties of maintaining standards. So I guess that’s a level playing field, at least. If it’s harder to maintain high standards of training across SD provision then that’s tough on old framework Grade 2 providers that had to get heavily involved in SD to maintain numbers; if SD confers an advantage because more observed trainees and NQTs will be well-established in their schools then that’s tough on the Grade 1 providers that had protected allocations and didn’t see the writing on the DfE wall. So maybe it’ll all come out in the wash but it’s tricky for providers who have a tremendously difficult balance to strike between holding Alliances to account for weaknesses in their SD provision and not pissing them off so they go looking for a softer option, taking the money with them.

Perhaps more importantly, from the perspective of those not directly affected by ITE inspections, rolling PL and SD together in this way will make it difficult to judge whether SD is, in general, providing a better, worse, or just different training route. That’s a massive question and Ofsted are the only people likely to be able to make a reasonably impartial judgement. The DfE and NCTL have too much to lose, having promoted it so fiercely; and it’s caused too much damage to the established HEI providers for their view not to be easily dismissed as partisan. Ofsted still have work to do to persuade everyone that they are completely apolitical e.g. see this Times Higher Ed article but I would like to see them report on the overall quality of SD and their perception of the strengths and weaknesses of the model at some point in the future – maybe it will be late 2015 before they’ve inspected enough SD to have any useful evidence. Meanwhile, they could undo some of the damage from the March 2013 statement by Michael Wilshaw that seemed calculated to lend support to DfE and NCTL policy, by publishing an update on the relative performance of HEIs, against SCITTs and other employment-based routes – both Sean and Angela seemed to think that grade breakdowns were pretty comparable.

Of course, all this talk of measuring outcomes and judging the provider on the quality of the product still depends on being able to measure with accuracy, validity and reliability. I don’t think anyone meeting with Ofsted has come away with the sense that Ofsted believe their judgements are infallible and Sean was quite open about the possibility that not every inspection report was perfect. The discussion on how Ofsted might take this forward was very brief and, if I happen to find myself in this kind of situation again, it’s the area I would want to ask about more. I remain astonished at how quickly Ofsted seemed to roll-over when Rob Coe suggested individual lesson grades were unreliable (maybe that was an open door waiting to be pushed and really it was just the discrepancy between policy and practice that remained but Michael Wilshaw did respond with “Which ivory towered academic, for example, recently suggested that lesson observation was a waste of time – Goodness me!” so I don’t think it was a fait accompli). If Ofsted had engaged with research more they would either have already found themselves in agreement with Rob, or would have had the ammunition to hold their ground. I’m not suggesting individual lesson observation grades would be a good thing, and Sean didn’t miss the opportunity to state clearly that ITE inspections do not grade individual lessons, just that the response to Rob’s message suggests more uncertainty within Ofsted than they might be comfortable admitting.

Perhaps more of a thought out loud than anything stronger, but whilst there is obviously a moderation process as part of training inspectors, Sean did express an interest in what would be termed ‘blind second-marking’ in a university context. Interestingly he said something similar when he met Andrew Smith. It’s not an area I can claim any expertise in but I am pretty sure that there are various ways in which these measurement issues could be investigated. This data, showing that KS2 level across a cohort significantly influences secondary school Ofsted grade, is an example but there are much more sophisticated regression analysis techniques that might be relevant (although maybe Ofsted should be starting with Section 5 inspections rather than ITE if they are going to commission this kind of research).

A minor point about Part 2 of the inspection was clarified. The reference in the framework to NQTs/former trainees is purely because FE trainees don’t become NQTs so Ofsted definitely won’t be looking at any trainee beyond their first term as an NQT (or former trainee if in FE) during ITE inspections.

Both Angela and Sean were very clear that the Part 2 Inspection of NQTs was about how well-prepared they were, not some kind of bald snapshot of their teaching in one observed lesson. They were as quick to raise the sample size issue as I was and their model was quite a noticeable reflection of education research methodology where only large sample sizes allow conclusions to be drawn across contexts, but small samples often provide richer information because the data goes deeper and can be, in fact has to be, considered in context. I found this reassuring because it makes the precise timing of this part of the inspection less critical. I think the difficulty for providers will be that a lot will be riding on the way in which NQTs report their experience – the inspectors will need to be pretty astute to spot the NQT who has been given loads of personalised support and an extensive toolkit to take into their NQT year but hasn’t engaged with, and drawn on, it very effectively, as against the NQT given the same support and toolkit who can rattle off a list when asked and explain how they are using it.

I was pleased that Sean and Angela were talking much more about the quality of information and preparation for NQTs, and the quality of information passed on to schools, rather than the support provided to NQTs, particularly since an inspection team might be looking at NQTs outside the provider’s partnership. Some schools engage really well with the local HEI that has trained their NQT but many don’t, and aren’t keen to release NQTs for this purpose either, and it’s not something providers can always influence. Also, during Section 5 inspections in schools the inspection team will normally sit down with the NQTs and look at the support the school have provided, so that should help significantly in persuading schools that continued engagement with training providers might be worthwhile.

The final thing that came through loud and clear was that the focus on behaviour was going to become significantly stronger. I guess this is unsurprising in the week that Ofsted have published a report on low-level disruption in schools, and during a period when they are trying to move away from giving an impression that behaviour in typical schools is pretty good. I’ve put forward my views on behaviour training in ITE before and hope that everyone involved in training teachers can use this Ofsted priority to collaborate on finding best (or better) practice. I’m in no doubt that if inspection teams find NQTs struggling with behaviour, they will be asking hard questions about whether their training exposed them to a wide enough variety of kids and gave them the tools for the battle. Again, a lot is required of inspectors to correctly distinguish the NQT having a ding dong battle with a difficult Y10 class, but holding up and gradually turning the tide, from the NQT who wasn’t so well prepared but doesn’t have such a tricky class, or who never needed any help with behaviour. With that massive proviso I am prepared to concede that providers are not entirely at the mercy of the quality of the school their NQTs are working in.

My remaining bone of contention is the emphasis on Grade 3 trainee teachers being unacceptable. Angela was clear that the process of grading was holistic and involved working up from the bottom, to establish first whether all the Grade 4 criteria were met and then whether there was evidence to award the next grade up and so on, as described in the Handbook. To me that still seems as though one Grade 3 NQT might be a sticking point and Angela and Sean didn’t entirely convince me that it wouldn’t be. In effect I got the sense that, in making the grading judgement, the door might still be unlocked, if not ajar, if the provider could demonstrate a convincing narrative of significant personalised support, extended placement or additional experience, and clear advice and follow-up for both the NQT and employing school. If I’ve interpreted that correctly then I at least am clear that what we are doing at Southampton will be recognised by Ofsted but, as with the developing Section 5 inspection process, I do worry that the ability of the provider to narrate convincingly might be more significant than whether or not they’ve actually done the business. I also think that while the pressure for providers to convince themselves that a Grade 3 trainee is actually a Grade 2 is not quite so remorseless, it’s still definitely there. I don’t understand why that first sentence in the first criterion is included; surely stating that “Trainees demonstrate excellent practice in some of the standards for teaching and all related to their personal and professional conduct” would have covered it without preceding it with “All primary and secondary trainees awarded QTS exceed the minimum level of practice expected of teachers” [their emphasis]. The former statement is about the quality of trainees; the latter is about the numbers assigned to them.

So that’s my summary of what came out of the meeting. From my point of view it has clarified some of the intention behind the revised framework and demonstrated that Ofsted have at least understood the difficulties that remain with inspecting ITE providers in a way that genuinely recognises the best provision. It still feels like a big, and very pointy, stick to me, well a big and pointy axe, really. I guess we’ll have to see how well it’s wielded – the right intentions aren’t enough (and don’t even think about trying to wield anything in that meeting room). I wish any colleagues, inspected last term and waiting for Part 2, the best of luck (bet you didn’t get much summer holiday this year!). For what it’s worth, here’s what I would still like to see from Ofsted:

  • Get rid of the statement on outcomes that makes everyone paranoid about Grade 3 trainees rather than paranoid about whether they are doing the best for all trainees
  • Openly commission some research into validity and reliability of inspection judgements (fair enough if this starts with Section 5)
  • Continue to work with providers to find the best way to time inspections to avoid ‘funny’ weeks – Angela seemed keen on this
  • Be even more clear that sample sizes are small and judgements shouldn’t be skewed by particular incidents, or individual trainee or NQT performances
  • Be even more clear that it is the preparation of trainees, the package that NQTs take with them, and the extent to which they can draw on that package, that is being judged, and not just the quality of their teaching regardless of school support and context
  • Produce an update on the report about types of providers early in the life of the last framework – the data for all inspections under that framework must now be available
  • Consider producing an analysis of the strengths and weaknesses seen in SD compared to PL routes as soon as there have been enough inspections to do so

Many thanks, Sean and Angela, for making time to see me, and continuing the very welcome engagement of Ofsted with the people being inspected.

Is Ofsted helping to improve ITE? Part II

My first post on Ofsted inspections of ITE set out where I am coming from, and considered the purpose of these inspections. It concluded:

“So if Ofsted were to step back from reporting on good practice, and if the difference between Grade 1 and 2 (over 80% of providers) has a rather arbitrary effect on available provision, that leaves Ofsted as an effective enforcer of absolute minimum standards and a possible pressure, and possible guide, to improving the quality of training. The former role requires reliable differentiation between Grade 1/2 and Grade 3/4; the latter two require valid measurement of training quality, and the ‘guide’ bit requires accurate identification of strengths and weaknesses. In this second post, I’ll try to dig into the issues of reliability, validity, and accuracy that my original comment alluded to.”

For those of you who are not aware of how an ITE inspection works, the call comes first thing Thursday, with the inspection starting on Monday. For the two years the previous Framework operated, this could be at any point in the academic year. The inspectors look at statutory requirements; data (on outcomes and tracking of progress, and NQT surveys about their training); observe training sessions (if there are any); observe trainees teaching to get an idea of their progress, to look at the quality of mentoring, and for evidence of good training showing in their teaching; and observe NQTs (and maybe RQTs) teaching to judge the quality of the final product. That’s my summary, for more information check the Handbook.

When we were inspected, secondary trainees were observed right at the begining of their second placement i.e. day 3, so the ones affected only found out with a weekend’s notice that not only were they going to be teaching a class on Monday, in a school they didn’t know, but it was going to be with an Ofsted inspector observing. I thought that was an unacceptably awful thing to do to trainees. The inspection team handled it sensitively but I just felt grossly unprofessional about the whole thing. It’s less important, but clearly there is also a major issue with reliability here, too. How can inspectors reasonably be expected to judge trainee progress if one lot are observed in their first placement, others on day 3 of their second placement, and at another provider they are observed after many weeks of teaching?

This is one of the main drivers behind my original comment about the way in which Ofsted inspects ITE. However, under the new framework this has been sorted out. Hurrah! The changes are probably best summarised in the revisions to the framework but under the newest framework, there will be a summer inspection which will include observation of training sessions (if there are any) and trainees teaching; and an autumn inspection which will focus on observation of NQTs (and maybe RQTs) teaching. There is still an issue with HEI courses finishing placements at Whitsun, and SDs going to the end of the summer term, and some weeks having training sessions, and some weeks none, so I do think Ofsted really need to get calendar info before setting dates if they want to improve reliability by comparing like with like, but it is a solid step away from ‘dreadful’ and actually I think quite a bold and imaginative idea.

The second thing that really upsets me about Ofsted is the pressure on ITE providers over grading trainees. Under the Grading Descriptors on p.33 the Handbook states that for Grade 1 or Grade 2 ITE providers “all trainees awarded QTS exceed the minimum level of practice expected of teachers as defined in the Teachers’ Standards”. That word ‘exceed’ is critical; in other words, if any trainee gets a Grade 3 then an ITE provider Requires Improvement. I think this is probably a remaining ripple from the big splash casued by the changing of Grade 3 from ‘Satisfactory’ to ‘Requires Improvement’. At Grade 3 a trainee meets the Teachers’ Standards and therefore will be awarded QTS but where once this was Satisfactory, it no longer is. Providers certainly ought to be trying to provide extended placements with extra support to reach Grade 2 before gaining QTS but it also ought to be acceptable for providers to work hard to support Grade 3 NQTs in schools. At the moment, this is a very dangerous strategy because a Grade 3 might not go into teaching (but will still be in the data, and qualified). The incentive to find some spurious evidence and chance upgrading them before awarding QTS is obvious. We have taken the right approach at my university; I will be fuming if that comes back to bite us.

Of course, the alternative is to find some spurious evidence and fail them. If we are really saying that we don’t want these trainees in the profession then, fine, but the Teachers’ Standards and/or award of QTS needs changing to reflect the standard required. Don’t just tell providers that Grade 3 meets the Standards for QTS but it isn’t acceptable to let anyone at this standard be awarded QTS. And, of course, completion rates are significant data in an inspection. Just like exclusion rates for schools, high completion rates might demonstrate excellent recruitment and training, but they could also reflect over-grading and low standards. Good recruitment decisions obviously help with completion rates but where is the evidence that there is a reliable way to discriminate all the potentially good teachers? Where are the science and maths teachers we need going to come from if we only take dead certs?

Anyway, those are the two points that led to my labelling ITE inspections ‘dreadful’, so it’s one down and one to go for Ofsted on fixing these. I will now try to get some perspective on the issues with reliability, validity, and accuracy, promised at the start of this post.

So here are some of the reliability problems with ITE inspections:

  • Even under the new Framework, inspectors are likely to see different things at different providers depending on when in the summer term they visit. This is not easily resolved but I would like to see Ofsted acknowledging the challenge, at least.
  • The amount of training observed is likely to be tiny (if any). I think the danger of a poor session from one trainer tarring the whole course with the same brush is too high.
  • NQT observations are attempting to evaluate the quality of the finished product. There is no mention of individual lesson observation grades in the Handbook but our inspection team saw only seven secondary NQTs which leaves an awful lot riding on those individual performances. Hopefully the two-part inspection will increase this number but there is nothing in the Handbook to reassure me that Ofsted are clear about how many are required to ensure reliability isn’t affected by random variation.
  • The same reliability issue affects any comparisons drawn between NQT quality when observed, and grading of trainees at the end of training. The Handbook doesn’t appear to require this but it was a clear feature of our inspection (so maybe the framework has changed).
  • Any observation of NQTs is bound to be influenced by the quality of induction and training provided by the employing school, and their ability to pick NQTs that suit their school. Under the previous framework all schools involved would be in the ITE Partnership, so maybe that’s fair game; under the new Framework I’m not so sure that will be the case.
  • Observation of RQTs is hard to justify (although interviewing them about their training may well be appropriate), because so much will have happened in schools since training. Maybe this won’t be a feature of inspections but the Handbook is a bit ambiguous on this. The phrase being “NQTs/former trainees”.

And here are some of the validity problems:

  • There is no evidence-based way to determine the standard of trainees at the start of their training; so any measure of the quality of outcomes will reflect not only the quality of training but also the quality of applicants. It’s not currently possible to measure ‘value-added’ but there is a sense that this is nonetheless what Ofsted think they are doing. Maybe the argument is that recruitment and training quality together are being evaluated but this is pretty advantageous for the providers with the best reputations who get more applicants. Is reputation really a variable that Ofsted want to include in their inspection outcomes?
  • Completion rates might demonstrate excellent training and support, but they could also reflect over-grading and low standards, as described above. ITE providers must, in the end, be gatekeepers to the profession – children are owed that.
  • The Grade 3 penalty means, as described above, that if the best ITE provider in the country correctly grades a trainee 3 and hasn’t sorted it before inspection then that one piece of data will count more than everything else combined.
  • The new framework places a big emphasis on behaviour. Inspectors won’t be seeing the training, only the performance of trainees and NQTs. What they see will depend an awful lot on context. The NQT having a ding-dong battle (that they will eventually win) with a truculent Y10 class could easily represent outstanding training, whilst the clockwork smoothness of another class might be due to smashing kids, or a trainee for whom good behaviour comes as easily as breathing.
  • The NQT Survey data depends a lot on responses and there is no mechanism for validating the data; our inspection was possibly triggered by a drop in the previously high ratings from this survey but that data was flatly contradicted by our exit point survey data so what happened remains an unsolved mystery.

Finally, on the subject of accuracy, inspectors are in for three days maximum; during this time they may be able to make a fair stab at judging the quality of the provider but I really don’t think that they can achieve a level of understanding that would allow an accurate description of not only what, but why, the provider was doing well in certain areas, or not so well. I think inspectors will tend to see strengths and weaknesses in the presence or absence of the things they value in ITE – confirmation bias at work – and I don’t think that is good enough evidence on which to build world-class intial teacher education.

I’m not actually saying that I think Ofsted ITE judgements are necessarily unreliable or invalid, I’m just saying that there are all these issues that are fairly obvious and I have no sense that Ofsted are engaged in worrying about these things. Maybe it is possible for an inspection team to accurately grade providers on a 1-4 scale, but I think it’s ambitious, and if these judgements aren’t right then Ofsted could be failing to correctly identify providers offering poor quality training, and they could be creating pressure to improve, and offering guidance, that doesn’t actually lead in the direction of genuine improvements – the problem we’ve been seeing in schools until recently.

There have been some very sensible suggestions that school inspections should move to a three-tier grading system and I think this would make sense for ITE. I’m not sure that trying to distinguish Outstanding from Good is terribly helpful whereas getting really effective at distinguishing Requires Improvement from ‘Good or Better’ is terribly important so we don’t have badly trained NQTs entering the system. And this brings me to the massive elephant in the ITE inspection room.

elephant

I’m very aware that the effectiveness of the established system of training teachers has been a moot point but it has at least been pretty stable. Now, ITE is going through a massive upheaval. SCITTs are sometimes, effectively, single schools, and SD alliances can be very small too, or dominated by one school. I’m certain some brilliant things will be happening but also sure there will be some disasters. A lot of this new training is, on paper, quality assured by HEIs or well-established SCITTs but SD has put schools in an exceptionally strong position to plough their own furrows. The chaotic nature of all this is entirely the doing of the DfE but it is Ofsted that are ultimately responsible for enforcing standards. SD should have been introduced more gradually but, given that the seeds were all cast at once, it needs a bit of germination time and there may be a few sickly seedlings that will produce excellent crops so it seems a bit harsh for Ofsted to get the hoe out straight away. For this reason, the complete avoidance of SD in our recent inspection is possibly justified, but Ofsted need to quickly be exceptionally clear about how they are going to engage with SD. In particular, I don’t think it is acceptable to lump SD and provider-led training together. Yes, a provider that allows poor quality SD to run on their watch needs to be pulled up on this, but unless somehow this drills down to the decisions made at school-level, providers will be held responsible for decisions made at the periphery of their control (even when their own training is excellent) whilst the school leaders who should have done better (or stayed out of it if they weren’t sure they were going to get it right) remain largely unscathed. If Ofsted tame the elephant, we might all come out of the SD revolution in some semblance of order and then be able to get on with the question of how to make our NQTs even better-prepared for civilisation’s most essential profession. If Ofsted don’t get this right, children will suffer.

 

Is Ofsted helping to improve ITE?

A short time ago, I wrote a post about the Carter Review, and my thoughts on the future for Initial Teacher Education. With one casual tweet, the education blogmeister, Tom Bennett, catapulted that post into the limelight (well, maybe into the wings) and several people were kind enough to tweet a smattering of applause, which has provided me with useful encouragement. Thank you.

Sean Harford is Ofsted’s Director, Initial Teacher Education and Regional Director, East of England. He responded to what was possibly not the most thoroughly considered part of my post, by extending an invitation to discuss the Ofsted ITE inspection process. This follows some fairly high profile meetings between senior Ofsteders like Sean, and Mike Cladingbowl, and people like Andrew Smith, Tom Bennett, Tom Sherrington, David Didau, Ross McGill, Shena Lewington et al.

What I actually said was “Do something about the dreadful way in which Ofsted inspects ITE (won’t go into details here but it really sucks)”. That is not terribly nuanced so I think the first thing I need to do is to clarify my own thinking about this. And since it is possible that the university I work for will become known, I should start by stating unequivocally that our most recent inspection, which was under what at the time we were calling the new framework but is now the old framework (i.e. the one that ran from September 2012 to June 2014), was highly professional, very well-led, produced a report which reflected strengths and weaknesses in our courses, and the grade was probably about right. I am making this statement partly to attempt to show that my thoughts on inspection of ITE are not just the rumblings of someone who feels his chips have been pissed on, and partly because the inspectors’ names are obviously on the report and I don’t want anything I say to reflect badly on them.

So, moving on from the preamble, it seems to me that the starting point for thinking about either the ITE Inspection Framework or the wider role of Ofsted in teacher training, is to decide what the purpose of inspection is. At the moment, it’s primary function is to grade ITE providers and report on the strengths and weaknesses of their provision. What purpose does that serve?

Ofsted grading affects allocation of places to training providers; this is set out clearly for next year but is not new. This is pretty crucial; in schools the difference between grades might have some implications for SLT careers but only an Ofsted disaster usually leads to redundancies. In HEIs the difference between Grade 1 and 2 might well be the difference between financially viable or not, and therefore everyone’s jobs. The impact of the Grade 3 for the University of Leeds will be worth monitoring. This could all be seen as a drive to higher standards – sorting the wheat from the chaff – but this assumes both that the grading is reliable* (at least to within about 1/4 of a grade) and that Ofsted grading has a direct effect on the future of ITE provision (it doesn’t – ITE is much more precarious in a Russell Group or 1994 Group university than in an ex-teacher training college or SCITT because it’s not the main focus of the institution).

Secondly, Ofsted grading might affect trainee choices. I can’t produce any evidence to support this claim but I think that the most astute trainees probably do look at both Ofsted grade (and HEI reputation if relevant) but it is difficult to see how anyone not familiar with the system would correctly compare reports for HEIs, SCITTS, and SD lead schools. The less astute trainees are often thoroughly confused by the variety of training routes and have done shockingly little research before making their decisions so Ofsted reports don’t have any impact on their choices, and even for the first group, I think a lot of decisions are based on geography in the end.

Thirdly, within any given institution, there is likely to be pressure to aspire to an Outstanding grade (even if this pressure is not the same for every provider). This will drive standards up if, and only if, inspection outcomes make a valid measurement of the quality of training. In the end, the reliability* of the grade doesn’t matter for this but it does matter if Ofsted divert attention away from the quality of training towards other things that might influence the inspectors.

Finally, an Ofsted grade of Inadequate would lead to the removal of accreditation by the NCTL so Ofsted inspections have a role in setting a minimum standard. I don’t think there has been a Grade 4 since 2010 but a Grade 3 will lead to a further inspection within 12 months and might lead quite quickly to improvement or annihilation.

Actually, not finally, but it’s instructive that all my first thoughts were focused on the grading. An Ofsted report, of course, also identifies what the inspection team think are the strengths and weaknesses of the provision. If these are accurately identified then the report would be a useful guide to making genuine improvements; if these are not accurately identified then they become a ticklist of things to fix before the next inspection and may have no positive impact on the quality of training. And accurate or not, if the tutors don’t buy in to the conclusions then it will definitely be an exercise in papering over cracks, whether these are structural or cosmetic.

Ofsted also has a secondary role in identifying and reporting particularly good practice but I think these reports tend to be too superficial to do more than point out a direction – with the emphasis at the moment strongly focused on effective partnership. I guess there are some suggestions here for ways of managing partnerships that seem to be working but there isn’t the detail needed to understand why some partnerships work better than others. I think the danger with this secondary function is that ITE providers will start looking for “what Ofsted want” which has been the scourge of many schools and colleges, and we don’t necessarily want every provider running an EAL session in Hungarian, as Durham do, so maybe Ofsted should restrict itself to commenting on themes emerging from its inspections e.g. that the quality of partnerships is often an important difference between the more and less effective providers, with the best providers identified. These providers might then be in the best position to explain to the rest of us exactly what they have done, perhaps with UCET or the TSC etc. helping to co-ordinate this; I think that might be a more effective way of disseminating good practice and it matches the model that hopefully schools and teachers are moving towards, of taking professional responsibility for their own development.

So if Ofsted were to step back from reporting on good practice, and if the difference between Grade 1 and 2 (over 80% of providers) has a rather arbitrary effect on available provision, that leaves Ofsted as an effective enforcer of absolute minimum standards and a possible pressure, and possible guide, to improving the quality of training. The former role requires reliable* differentiation between Grade 1/2 and Grade 3/4; the latter two require valid measurement of training quality, and the ‘guide’ bit requires accurate identification of strengths and weaknesses. In my second post on this, I’ll try to dig into the issues of reliability, validity, and accuracy that my original comment alluded to.

 

*Yes, science teachers, I know this should be “reproducible” but this is social science, not GCSE Physics, so I’m going old skool.

 

#EducationFest No.1: Play up, play up, and play the game

This is the first in a series of posts on the Festival of Education at Wellington College.

I’m a little surprised that Michael Wilshaw chose the glorious surroundings of Wellington College to launch an attack on state sector mediocrity in sport. Given that anyone wanting to attend his speech had to park on the athletics field and walk past the 1st XI cricket pitch with its pavilion the size of a small comprehensive, the possibility that different levels of facilities might contribute to the divide won’t have been far from anyone’s thoughts. I think he made a decent point on Radio 4 about schools working with whatever local facilities they have but I don’t suppose Antony Seldon has ever needed to know where the local park is. However, I think the facilities are a red-herring; that’s not why £33000/year translates into sporting success. The real difference is the 7 day week at Wellington College, and the balance of teaching, residential, and extra-curricular responsibilities that go with a boarding school job. This is a lot more significant to sporting opportunities for pupils than whether or not a school has a boathouse on the Thames. I can’t see the DfE stumping up to give teachers a chunk off their teaching load in exchange for a longer working day and weekend commitments. It’s all very well implying that teachers don’t do these things because they lack ambition and have been subverted into lazy and/or anti-competitive mind-sets by the progressive movement but that ignores the reality. My first teaching job was in an HMC boarding school, and for sure I was working 80+ hours a week with the children, putting in a couple of evenings and a fair chunk of weekends, doing sport. But I also got paid £3K over a state-sector starting salary, had a free flat, got three meals a day, and had someone pick up my laundry and return it cleaned and ironed. My timetable was about 70% compared to the 90% a state sector teacher would expect and I got 19 weeks holiday a year.

This is a shame because his speech was actually a rallying call to make comprehensives everything he would like them to be, and whilst I don’t think that having a decent rugby team matters a jot, most of what he said about academic standards, parental responsibility, behaviour, and leadership, are not a bad combination to be aiming for. He didn’t really say so much about sport – it certainly didn’t dominate his speech – but given that Ofsted published their report on competitive school sport the same day, with this as the focus of their press release, that’s what everyone will be talking about. I can’t help thinking that Wilshaw was a better headteacher than HMCI and that what he achieved at Mossbourne had far more potential to influence the quality of comprehensive education in this country than all the speeches he makes now about how we all need to pull our socks up. I wonder if he has considered going back into school leadership and leading by example rather than exhortation. I will always listen with an open mind to what he says, because of what he achieved, but the more he suggests it’s just a case of making a bigger effort, the less convinced I will be.

Grade 2 or bust! Perverse incentives

Two things that the DfE have got right – there may be others – are two of the changes to school accountability systems. The Wolf report quite rightly identified the perverse incentives in accountability measures that led to schools pushing pupils into BTECs and other vocational qualifications. And there is no doubt that the 5 A*-C measures and floor targets focused too great a proportion of a schools attention on the C/D borderline. Now, I’m not suggesting all is rosy in this particular garden – some pupils benefit from doing high-quality vocational qualifications; I’m not convinced that the EBacc is anything other than Gove’s tendency to assume that what worked for him is best for everyone else as well; the changes were handled clumsily at times; and it’s not clear whether long-term planning is anything more than an oxymoron under the current regime, but perverse the previous incentives were, and Progress 8 and the new floor standards just have to be an improvement.

So, this post is about a similar perverse incentive in ITT and the impact it is having on the quality of the NQT you may be working with this, or next, year. In ITT trainees are graded 1-4 on a very similar basis to experienced teachers. A trainee that has not met the Teachers Standards is graded 4 and would not be awarded QTS. A trainee that has just met the Teachers Standards would be 3, and those consistently teaching Good or Outstanding lessons (possibly a bit rough round the edges but more-or-less on the same basis as experienced teachers) would be graded 2 or 1. There is a bit more to it than that, but the quality of teaching is (quite rightly) key. So far, so good. But how is an ITT provider judged? Well, as an ITT provider, to get a Grade 1 or Grade 2 the inspection handbook states that “all trainees awarded QTS exceed the minimum level of practice expected of teachers as defined in the Teachers’ Standards”. That word ‘exceed’ is critical; in other words, if any trainee gets a Grade 3 then an ITT provider Requires Improvement. Is it just me or is this nuts? Of course, the better the training, the greater the likelihood of trainees being 2 or 1, but if they’re a 3 then they’re a 3 and an incentive like this just means providers have to find some way of them being a ‘2’. We get judged on completion rates as well, so while the genuine 4s get weeded out, we can’t afford to take the same approach to a 3. In any case, a trainee at Grade 3 has met the Teachers Standards. If they’re not ready to take their own classes then the Teachers Standards need tightening up. Threatening ITT providers with a big stick is just papering over the cracks. Wilshaw has been taking ITT providers to task over the quality of some NQTs recently but his own organisation is pushing us to overgrade trainees. If these trainees are ready for their NQT year, with the expectation that they improve as they go, then let us say that honestly so everyone knows where they stand; if they’re not ready, then let’s be clear about that too and have a mechanism for dealing with the problem. At the moment, this perverse incentive just sweeps the whole thing under the carpet and that cannot be good for the children in our schools.

Graded Lesson Observations: Defibrillation or a Stake through the Heart?

An observer enters your classroom. Is this person your HoD, the assistant head with responsibility for T&L, an Ofsted inspector, or a demon who has occupied a corpse and is coming to suck your blood? A fair number of commentators have recently suggested the latter and have been sharpening words, and presumably a variety of sticks, with a view to dispatching said vampires to the demon dimensions. Like Rupert Giles, Robert Coe from Durham University CEM (possibly a pseudonym for the Watchers Council) has been quietly dispensing the wisdom of the ancients academics, guiding the Slayers in their quest. But is the graded lesson observation really the personification of evil, or does it have a soul worth saving?

Wilshaw’s Westminster Education Forum speech on 7th November 2013 included the line: “Which ivory towered academic, for example, recently suggested that lesson observation was a waste of time – Goodness me!” Does Wilshaw need to pay more attention to the ivory towered ones? Is his organisation trying to perform a task as fundamentally uncertain as measuring the combined momentum and position of a sub-atomic particle; is it engaged in a legitimate assessment technique but doing it in a slightly crap way; or is the Ofsted Christmas party actually a masquerade ball of orgiastic hedonism where innocent teachers are dragged to be ripped assunder in a feeding frenzy of unimaginable gore?

In ITT, observations are a big part of how we assess the progress of trainees. It doesn’t feel as though the judgements we make are unreliable; over the course of a number of observations, we would feel confident that an accurate picture of a trainee’s teaching was being drawn. Are we deluding ourselves when we reflect on this practice; are we even capable of reflection…

If you pick up Robert Coe’s blog entry on this you’ll see that he is linking to two pieces of research. The first is the massive (and massively well-funded – thanks Bill & Melinda) MET project. Now, I make no claims to either the academic clout of Robert Coe, or to expertise in this area, but reading the MET policy and practice brief  I can see where Coe’s figures are coming from, but not his conclusion that observations are unreliable to the point of worthlessness as a measure of teacher performance. The MET project seems to me to be making suggestions about how to improve the reliability of observations not concluding that they are good only for a staking. Of course, like Wilshaw, anyone involved in a project called “Measuring Teacher Effectiveness” may be somewhat biased towards the idea that it is actually possible to measure such a thing, and continued research funding may even depend on that outcome, but the MET project is looking at a range of ways to measure teacher effectiveness and I can’t see why, if they were looking at data that suggested observations were a waste of time, they wouldn’t say so and recommend a system based on other measurement methods.

Strong, Gargani & Hacifazlioğlu (2011) is the other piece of research. It’s behind a paywall but for good papers there’s often an academic somewhere that has breached their institutions copyright rules and posted it somewhere helpful. In interpreting the results, it’s important to appreciate that of the three experiments, two involved judging teachers on the basis of two minute clips of whole-class teaching (chosen to avoid any behavioural management incidents!). However, the third experiment did involve observations of videos of whole lessons, but using a complex observational protocol – the CLASS tool – that seems to weight student engagement and various other, dare I say it, constructivist ideals quite strongly. Coe is right to state that the ability of observers to pick good teachers in these experiments was in the same league as Buffy’s ability to pick good boyfriends but he leaves out at a crucial point which I think I’d better quote.

This analysis showed that a small subset of items produced scores that accurately identified teachers as either above or below average. All of these items were from the instructional domain. They included clearly expressing the lesson objective, integrating students’ prior knowledge, using opportunities to go beyond the current lesson, using more than one delivery mechanism or modality, using multiple examples, giving feedback about process, and asking how and why questions.

The final point made in the paper is that “This… has motivated us to undertake development of an observational measure that can predict teacher effectiveness.”

So I’m not sure that Coe has it right on this evidence. Yes, we all (ITT, Ofsted, and school leaders) need to recognise that sloppy observation procedure and training will lead to meaningless judgements. Yes, using graded observations for staff development may be a bit like burning witches to improve their chances at the last judgement. Yes, value-added data may be a better, or even the best, method for judging the effectiveness of a teacher and/or their teaching. But, in ITT where value-added data does not exist, I think my colleagues and I really ought to be bringing some of the academic clout of our Faculty to bear on using research like this to develop a model for lesson observation that delivers reliable outcomes. I’ll let you know how we get on, and give you a shout if we need any stake holders.