This Decade-Long Experiment In Teacher Evaluation Is An Unsurprising Failure

Hatched by economists, bureaucrats and politicians, the theory, back in the dawn Race To The Top era, was this: We would adopt standards nationally (Common Core) and then test those nationally accepted standards with tests that, if not national themselves, were at least nationally comparable. We could take high stakes testing from its existing use to evaluate buildings, and drill down to the classroom level. The test results would reveal which teachers were good and which teachers were bad. Schools would pay the good ones more and fire the bad ones; schools would improve, and student achievement would climb, lifting students out of poverty and strengthening the economy of the nation.

EdWeek captures a certain attitude about the approach with its headline calling the programs “efforts to toughen teacher evaluation,” because part of the guiding theory was that teachers had had it too easy for too long.

All of that lay behind the programs that were implemented. A new study underlines what most people in the education field already knew—the programs failed.

A working paper just issued by five researchers concludes that the “massive effort to institute new high-stakes teacher evaluation systems,” had essentially no effect on “student achievement.”

This is not a surprise. The explanation of why it failed so hard comes in several parts.

As a sort of preface, we should that the standards undergirding all of this, the infamous Common Core, were deeply flawed every step of the way from conception through implementation (many fine autopsies have been written, but Tom Loveless’s Between the State and Schoolhouse is one of the better ones).

On that flawed foundation, the teacher evaluation system was built of toothpicks and mayonnaise.

The term “student achievement” was thrown around a lot, but all it ever actually meant was “test scores.” Therefore, in the classrooms where these policies lurched to life, “improve student achievement” really meant “raise test scores.” Linking that to teacher evaluation sent a clear message to teachers: we don’t care what else you do, because your job is now defined as “raise test scores on this one test.”

MORE FOR YOU

That was demoralizing on several levels. First, teachers could see that the tests weren’t very good (see, for example, the infamous talking pineapple questions, or the poet who couldn’t answer test questions about her own work). Second, a raft of research told us that test scores were hugely correlated to factors far beyond a teacher’s control. The effect is that a “good” teacher is one who’s been put in a classroom with high-scoring students, and a “bad” one is in a classroom with low-scoring students. Third, the test only covered math and reading, and yet was used to generate ratings for all teachers in the building; teachers found themselves being evaluated for the scores of students they didn’t have in class on subjects they didn’t teach. And for teachers who did teach those subjects, the pressure was on.

Schools had already been restructuring themselves around the Big Standardized Test, with resource allocation tilted away from anything not on the test. Tying the tests to teacher evaluation personalized this, ramping up what even Michael Petrilli of the right-tilted, accountability-favoring Thomas B. Fordham Institute notes was “anxiety and bad morale.”

Proponents of the system were disappointed that it did not result in more firings, but in many buildings, administrators were being asked to choose between the “evidence” of the tests scores and the evidence of their own first hand experience with the teacher. Virtually nobody working in a school thought that the test-based teacher evaluation system was fair or accurate, but anyone who dared to challenge it was accused of being against any kind of accountability for teachers or school.

Meanwhile, as is often the case, public education was about a decade behind private industry. The test-linked teacher evaluation system was a form of stack ranking, where employees are rated, stacked in order of rating, and then the bottom chunk are fired. Microsoft jettisoned that system in 2013, saying it blocked teamwork and innovation (don’t take chances that might hurt your ranking, and don’t help someone because that might just move them ahead of you). By the late 2010s, education was one of the few places left where people were still claiming you could fire your way to excellence.

If the test-based teacher evaluation system had simply failed, it would be one more story of educational amateurs insisting that you can measure a cloud by taping a ruler to a sponge while standing in a river, but this approach has done considerable damage to public education. The warping of education by high stakes testing has long been noted, and that has been damaging enough. But as everyone is setting off alarms about the long-stewing exodus of teachers from the classroom, we must note that test-based evaluation is a large factor.

High stakes testing evaluations have eroded teacher autonomy and hamstrung their ability to use their own professional skills and judgment. It has placed teachers in a crazy world where they face the threat of punitive actions over things they cannot control. And it has reduced the attractiveness, the basic appeal of the profession. “My dream is to get into teaching so that I can help students get a better score on a single bad standardized test of math and reading,” said no teacher ever.

We don’t have an effective, empirical tool for identifying bad teachers, and even if we did, schools could fire them and replace them with…who? We’ve been backing away from teacher evaluations tied to high stakes test scores for a couple of years, but we cannot back away far enough, fast enough. Here’s hoping that reports like this one keep us moving in the right direction.

The Tycoon Herald