*growth model*, a diverse group of statistical techniques to isolate a teacher's impact on his or her students' testing progress while controlling for other measurable factors, such as student and school characteristics, that are outside that teacher's control. Opponents, including many teachers, argue that value-added models are unreliable and invalid and have absolutely no business at all in teacher evaluations, especially high-stakes evaluations that guide employment and compensation decisions. Supporters, in stark contrast, assert that teacher evaluations are only meaningful if these measures are a heavily weighted component.

## Reliability and Validity Apply to All Measures

*unreliable*. Depending on how much data are available and where you set the bar, a teacher could be classified as a "top" or "bottom" teacher because of a random statistical error (Schochet & Chiang, 2010). That same teacher could receive a significantly different rating the next year (Goldhaber & Hansen, 2008; McCaffrey, Sass, Lockwood, & Mihaly, 2009). It makes little sense, critics argue, to base hiring, firing, and compensation decisions on such imprecise estimates. There are also strong objections to value-added measures in terms of

*validity*—that is, the degree to which they actually measure teacher performance. (For an accessible discussion of validity and reliability in value-added measures, see Harris, 2011.)

## Four Research-Based Recommendations

## Avoid mandating universally high weights for value-added measures.

## Pay attention to all components of the evaluation.

*actual*importance will depend in no small part on the other components chosen and how they are scored. Consider an extreme hypothetical example: If an evaluation is composed of value-added data and observations, with each counting for 50 percent, and a time-strapped principal gives all teachers the same observation score, then value-added measures will determine 100 percent of the variation in teachers' final scores.

## Don't ignore error—address it.

*systematic*. For example, there may be differences between students in different classes that are not measurable, and these differences may cause some teachers to receive lower (or higher) scores for reasons they cannot control (Rothstein, 2009).

*random*error—statistical noise due largely to small samples. Even a perfect value-added model would generate estimates with random error.

*confidence interval*.

## Continually monitor results and evaluate the evaluations.

## If We Do This, Let's Do It Right

*whether*to include value-added measures in new evaluation systems. Supporters of value-added scoring say it should dominate evaluations, whereas opponents say it has no legitimate role at all. It is as much of a mistake to use value-added estimates carelessly as it is to refuse to consider them at all.

#### EL Online

For another perspective on the use of value-added data, see the online-only article "Value-Added: The Emperor with No Clothes" by Stephen J. Caldas<!-- at www.ascd.org/el1112caldas-->.