In recent years, assessment data have begun to play a pivotal role in education policy and practice. No Child Left Behind (NCLB) requires states to implement standardized assessment-based systems to evaluate their schools. The NCLB approach rests on the assumption that assessment data can provide credible information to gauge how effectively schools and teachers are serving their students.
Educators, however, recognize that because students come to school with different backgrounds, one-time assessment scores are not a fair way to compare teachers with one another when they work under vastly different circumstances. We therefore need new methods for evaluating the effectiveness of teachers and schools—methods that differ from the typical NCLB approach.
The Purpose of Value-Added Assessment
Value-added assessment, a statistical process for looking at test score data, is one technique that researchers have been developing to identify effective and ineffective teachers and schools. In contrast to the traditional methods of measuring school effectiveness (including the adequate yearly progress system set up under NCLB), value-added models do not look only at current levels of student achievement. Instead, such models measure each student's improvement from one year to the next by following that student over time to obtain a gain score. The idea behind value-added modeling is to level the playing field by using statistical procedures that allow direct comparisons between schools and teachers—even when those schools are working with quite different populations of students.
The end result of value-added assessment is an estimate of teacher quality, referred to as a teacher effect in the value-added literature (Ballou, Sanders, & Wright, 2004). This measure describes how well the teacher performed in improving the achievement of the students in his or her class and how this performance compares with that of other teachers.
Value-added models have surfaced as an important topic among education policymakers, researchers, and practitioners. U.S. Secretary of Education Margaret Spellings has organized a federal working group to investigate how such models might be incorporated into NCLB. The Government Accountability Office is investigating the integration of these models into state test-based accountability systems. There is also great interest in value-added assessment at the state level, with at least three states—Ohio, Pennsylvania, and Tennessee—using value-added assessment statewide.
The Emerging Research Base
As value-added modeling assumes a larger role in education, its research base is also flourishing. The following three topics in this field are of special interest to educators.
The Complex Statistical Machinery
Ever since the inception of value-added models, educators have expressed concern that such models are too statistically complex and difficult to understand (Darlington, 1997). However, in 2004, a team of researchers at RAND brought a great deal of clarity to the value-added discussion (McCaffrey, Lockwood, Koretz, Louis, & Hamilton, 2004). Their research documented an array of statistical approaches that can be used to analyze assessment data and discussed the benefits and limitations of each model.
Some researchers have compared the results obtained from complex statistical models with those obtained from much simpler models. Tekwe and colleagues (2004) claimed that “there is little or no benefit to using the more complex model” (p. 31). However, their study relied on a narrow data structure, which may have seriously limited its conclusions. Most value-added approaches remain highly technical, and there is little conclusive evidence that simpler designs are just as efficient as more complex designs.
Although the RAND report helped clarify the statistical methods used in value-added models, and value-added software programs are becoming more widely available (Doran & Lockwood, in press), implementing such a model remains complex. For this reason, schools and school districts that are interested in value-added modeling need to collaborate with professional organizations experienced with the challenges of this method.
Test Scores and Vertical Scales
In many areas of scientific research, measuring growth is straightforward. To measure changes in temperature, we need only consult a thermometer. Measuring change in student achievement, however, is not as simple.
For value-added modeling to work, tests must be vertically scaled (Ballou et al., 2004; Doran & Cohen, 2005). Essentially, vertical scaling is a statistical process that connects different tests and places them on the same “ruler,” making it possible to measure growth over time. For example, one cannot measure a child's height in inches one year and in meters the next year without adjusting the scale.
To connect different tests and measure student growth, designers of value-added models commonly assume that the curriculum in higher grades is nothing more than a harder version of that in the previous grade; in other words, 8th grade math is the same as 7th grade math, just more difficult. Therefore, one can measure a student's increase in math knowledge by measuring his or her academic growth over time.
A large body of research, however, suggests that year-to-year curricular variation is significant (Schmidt, Houang, & McKnight, 2005). Other researchers have demonstrated that the process used to create the vertical scales is a statistical challenge in itself and can actually introduce more error in longitudinal analyses (Doran & Cohen, 2005; Michaelides & Haertel, 2004).
These findings suggest that value-added modeling may need to evolve into newer forms. The research emerging in this area is too new, however, to allow solid conclusions.
Identifying Teacher Effects
Possibly the most important question about value-added assessment is whether the estimate obtained from a value-added model can actually be called a teacher effect. Can any statistical model really sift through all the other factors that may have influenced the student's score (for example, socio-economic status or early learning environment) and isolate the learning that we can specifically attribute to the teacher's methods? As it currently stands, no empirical research validates the claim that value-added models accurately identify the most effective teachers. The many anecdotal claims have not yet been verified through experimental research.
Educators Take Note
The research base on value-added methods is growing, and researchers are developing new approaches in an effort to make this technique more credible and useful to schools. Value-added modeling is an important new area of research—one that is playing a rapidly growing role in shaping assessment and accountability programs.
Ballou, D., Sanders, W., & Wright, P. (2004). Controlling for student background in value-added assessment of teachers. Journal of Educational and Behavioral Statistics, 29(1), 37–65.
Darlington, R. B. (1997). The Tennessee value-added assessment system: A challenge to familiar assessment methods. In J. Millman (Ed.), Grading teachers, grading schools. Thousand Oaks, CA: Sage.
Doran, H. C., & Cohen, J. (2005). The confounding effect of linking bias on gains estimated from value-added models. In R. Lissitz (Ed.), Value-added models in education: Theory and applications. Maple Grove, MN: JAM Press.
Doran, H. C., & Lockwood, J. R. (in press). Fitting value-added models in R. Journal of Educational and Behavioral Statistics.
McCaffrey, D. F., Lockwood, J., Koretz, D., Louis, T. A., & Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29(1), 67–101.
Michaelides, M. P., & Haertel, E. H. (2004, May). Sampling of common items: An unrecognized source of error in test equating (Technical Report). Los Angeles: Center for the Study of Evaluation & National Center for Research on Evaluation, Standards, and Student Testing.
Schmidt, W. H., Houang, R. T., & McKnight, C. C. (2005). Value-added research: Right idea but wrong solution? In R. Lissitz (Ed.), Value-added models in education: Theory and applications. Maple Grove, MN: JAM Press.
Tekwe, C. D., Carter, R. L., Ma, C.-X., Algina, J., Lucas, M., Roth, J., et al. (2004). An empirical comparison of statistical models for value-added assessment of school performance. Journal of Educational and Behavioral Statistics, 29(1), 11–36.
Harold C. Doran is a Senior Research Scientist at the American Institutes for Research (AIR). Steve Fleischman, series editor of this column, is a Principal Research Scientist at AIR; firstname.lastname@example.org.
Click on keywords to see similar products: