Saturday, January 28, 2012

The Problem with Value-Added Measures

Matthew Di Carlo in his blog on January 26th wrote this in a discussion of Florida's use of value-added assessment of teachers and schools:
I would argue that growth-based measures are the only ones that, if they’re designed and interpreted correctly, actually measure the performance of the school or district in any meaningful way....
Those italics are in the original, and they are a bit of a cop-out. In my opinion there is no way to "design and interpret correctly" the various growth measures that have been proposed for the measurement of the contribution of a teacher or even a group of teachers to a group of children's learning. In the first place, any system of high stakes, punitive measurement of teachers for purposes of monetary rewards or other benefits produces not just teaching to the test—a problem so pervasive that even the President of the U.S. can talk about in a State of the Union address—but also produces cheating...and before those feelings of moral outrage begin to take you over, please summon the honesty to admit that you would too if placed in the same circumstance.

But there is another problem with the value-added measures that is much too infrequently talked about.

Way back in the 1990s while moderating an online discussion of the Tennessee Value-Added Assessment System (due to a business prof at U Tenn named William Sanders, I believe), Sanders himself, and later his assistant and occasional co-author, Sandra Horn made a brief appearance in the discussion and quickly retreated. That discussion among a dozen or more scholars runs to several thousand words and is available to anyone here:

I happened to be giving a talk in Denver in the mid-1990s at a conference in Denver where Sanders was also speaking. After my talk I was approached by a young woman who identified herself as Horn and asked if I had time for a brief conversation. "Yes, certainly." Horn started off by saying that Sanders and she felt that if I just understood a few things about TVAAS that the objections I had expressed in the online discussion would surely be cleared up. "Try me."

For 15 minutes I listened to descriptions of TVAAS that were entirely irrelevant to my objections. Finally I interrupted:

GVG: Let me pose a hypothetical to you. Suppose that there are two classes of children and that Class A and Class B are taught by two teachers who teach in exactly the same way. In fact, every word, action, and thought they produce is identical. And suppose further that these two groups of children begin the school year with identical knowledge acquired in the past. Now here is the critical assumption. Suppose that the pupils in Class A have an average IQ of 75, and the pupils in Class B have an average IQ of 125. Do you believe that your measure of teacher value-added will produce the same numeric value for these two teachers?

SH: Yes.

Rather than deliver an impromptu lecture on the difference between aptitude (mental ability, a portion of which is undeniably inherited) and school achievement, I excused myself.

And such is the Achilles heel in all of the so-called value-added assessment systems. They act as though the statistical equating on achievement tests (as fallible as it is) of groups of students has held all influences constant (ceteris paribus), and hence the gain score is valid and fair as a measure of the contribution to learning of a teacher or a school. It is not, and never will be.

Gene V Glass
University of Colorado Boulder
Arizona State University

1 comment:

  1. The following comments are offered by Jane Jackson, who is Co-Director of the Modeling Instruction Program in the Dept. of Physics at Arizona State University. For 19 years, the Modeling Instruction Program has been helping teachers attain knowledge and skills needed to benefit their students. Modeling Instruction is recognized as an Exemplary K-12 science program by the U.S. Department of Education.

    Here are relevant excerpts from listserv posts in March 2012 by three experienced high school physics teachers in three different states.
    (First teacher): True, many of the proposed evaluation systems are overly narrow and rely too much on a single test, but we need to stop rejecting the idea that all evaluation systems are bad.
    I for one would welcome a system that truly measures the progress I can make in the key skills of my students: their ability to think, to problem solve and to communicate their understanding of concepts of Physics they have been exposed to in my classroom. If we want the respect afforded to other professionals, we need to accept the responsibility for the quality of the job we do.
    (Second teacher):
    I would welcome such a system as well. We are dealing with designing evaluations right now, which must be based partially on student results. None of your key points will play any part in the new evaluations. If we had an instrument which measured your designated key skills, I would totally be on board, but we do not, nor do I see any evidence tools with such focus are being considered or developed. I don't even think there is agreement on what you have identified as key skills, so we don't even have a common definition of effectiveness in the subject area.
    In our evaluation design, it is proposed that my impact on students will be evaluated on how the children do on the state's standardized Math and Language Arts exams, which most of my students have already passed by the time they get to my class. Other measures of my effectiveness include posting sample student work, data, and the vision statement in my room. As a serious professional, of course I have a visceral reaction--I don't see what I do for kids in the evaluation system at all.
    (Third teacher): I think there are plenty of teachers who welcome true evaluation and the promised pay and respect that are supposed to come with it. But I do oppose the plan for 50% of my evaluation of teaching 11th and 12th grade physics students coming from 9th and 10th grade reading scores. Seriously, I'm a good teacher if the students I will teach next year are reading at grade level before I get them?