Education Policy Analysis Archives recently published an
article by Audrey Amrein-Beardsley and Clarin Collins that effectively exposes the Houston Independent School District use of a value-added teacher evaluation system as a disaster. The Educational Value-Added Assessment System (EVAAS) is alleged by its creators, the software giant SAS, to be the “the most robust and reliable” system of teacher evaluation ever invented. Amrein-Beardsley and Collins demonstrate to the contrary that EVAAS is a psychometric bad joke and a nightmare to teachers.
EVAAS produces “value-added” measures for the same teachers that jump around willy-nilly from large and negative to large and positive from year-to-year when neither the general nature of the students nor the nature of the teaching differs across time. In defense of the EVAAS one could note that this is common to all such systems of attributing students’ test scores to teachers’ actions so that EVAAS might still lay claim to being “most robust and reliable”—since they are all unreliable and who knows what “robust” means?
Unlike many school districts which have the good sense to use these value-added systems for symbolic purposes only (“Look at us; we are getting tough about quality.”), Houston actually fired four teachers (three African-American, one Latina) based on their EVAAS scores. Houston fired Teacher A partly on the basis of EVAAS scores that looked like this:
EVAAS Scores for Teacher A by Year & Subject |
Subject | 2006-2007 | 2007-2008 |
Math | 2.0 | +0.7 |
Science | +2.4 | 3.5 |
The above scores are just a representative sample of the wildly unreliable scores that Teacher A accumulated over four years in several subjects.
As if this pattern did not alone exonerate Teacher A, her supervisor’s rating of her performance based on classroom observations were highly negatively correlated with the EVAAS scores. Amrein-Beardsley & Collins report that 1) teachers insisted that their teaching methods changed little from year-to-year while their EVAAS scores jumped around wildly, and that 2) principals reported having been pressured to adjust their supervisory ratings of teachers so that they were in agreement with the EVAAS scores. After all, did the administration want to admit that they had spent a half-million dollars of an elaborate mistake?
The whole Houston story as reported by Amrein-Beardsley & Collins is gruesome in the extreme, and I recommend that you read it in its entirety. For me, the story sparked recollections of disasters from thirty years ago that I chronicled in a chapter in a book edited by Jason Millman and Linda Darling-Hammond. (Glass, Gene V. (1990). Using student test scores to evaluate teachers. Pp. 229-240 in Jason Millman & Linda Darling-Hammond (Eds.), The new handbook of teacher evaluation: Assessing elementary and secondary school teachers. Newbury Park, CA: SAGE Publications.) You can read the entire chapter here.
In the mid-1980s, I was able to find six school districts in the entire country that claimed to have based teacher compensation on the test-score performance of their teachers. Each of the six showed the same pattern of behaviors that I summarized thus:
Using student achievement data to evaluate teachers...
- ...is nearly always undertaken at the level of a school (either all or none of the teachers in a school are rewarded equally) rather than at the level of individual teachers since a) no authoritative tests exist in most areas of the secondary school curriculum, nor for most special roles played by elementary teachers; and b) teachers reject the notion that they should compete with their colleagues for raises, privileges and perquisites;
- ...is always combined with other criteria (such as absenteeism or extra work) which prove to be the real discriminators between who is rewarded and who is not;
- ...is too susceptible to intentional distortion and manipulation to engender any confidence in the data; moreover teachers and others believe that no type of test nor any manner of statistical analysis can equate the difficulty of the teacher's task in the wide variety of circumstances in which they work;
- ...elevates tests themselves to the level of curriculum goals, obscuring the distinction between learning and performing on tests;
- ...is often a symbolic administrative act undertaken to reassure the lay public that student learning is valued and assiduously sought after.
Most of what I saw in the mid-1980s is true today and is true of the present-day EVAAS system in Houston. Regrettably, point #5 is not so true. Not content to use these systems as mere symbolic window dressing, Houston has actually fired teachers based on their students’ test scores. Is HISD the bellwether of a dawning scientific age? Is it the district with the courage of its convictions? Should the nation look to Houston for leadership in insuring that teacher evaluation must be hard-headed and results-based?
Well, coincidentally, Houston was one of the six school districts that I investigated for my chapter in the Millman & Darling-Hammond handbook. And here is what became of Houston’s early-day effort to reward teachers for their students’ test-score gains.
Rod Paige, who eventually became Secretary of Education for Bush 43, became superintendent of HISD in 1994. But Houston had had a system of teacher incentive pay based on student test scores before Paige ever arrived. Since Paige was also an officer of the HISD from 1989 to 1994 and had co-authored the districts “Declaration of Beliefs and Visions,” his influence may have been responsible for the teacher incentive pay system that antedated his superintendency.
Teachers in Houston elementary schools were being given monetary bonuses—not base salary increases—when the entire school reproduced the previous year’s average test plus an additional increment, say, last year’s growth + 2 months grade-equivalents. The bonuses amounted to a few hundred dollars for each teacher. After a couple years in which the bonuses effectively replaced cost-of-living increases in the salary schedule, the test score gains were bumping up against the ceiling of the test. One or two schools missed their bonuses and tensions were rising.
In the meantime, the flow of money did not go unnoticed by the building administrators. Now principals are “instructional leaders,” or so the story goes. How they find time (or the expertise) to lead teaching while insuring the safety of students and staff, enforce discipline, direct traffic, and field complaints from angry parents is a mystery to me; but perhaps I don’t know what an instructional leader really is. So the principals banded together and approached the HISD administration and asked for their reward for making the test score gains. Only this time, the rewards were $10,000, $12,000, and sometimes $15,000—we’re talking 1985 dollars here too. Within a year or two, a couple of building principals were discovered having taken the test answer sheets into their offices and engaged in some erasing and marking. The entire system blew up and status quo ante was reinstituted.
Are things different now? Has some genius or some software company come up with a new system that is truly “robust and reliable”? And has a system been found that teachers and administrators acknowledge is legitimate and fair so that they will not be tempted to take whatever steps might be necessary not to become victims? And when will we see the value-added system that can be applied to politicians and school board members or even to researchers who invent value-added measuring systems? Or, as such persons regularly argue, is the value of their work so much more complex than that of a teacher of young children that it could not possibly be captured by a clumsy quantitative index?
Gene V Glass
University of Colorado Boulder
Arizona State University