Since Race to the Top legislation, teacher evaluation systems across the United States have emphasized measures of student learning—precisely because Race to the Top requires the inclusion of such measures in a teacher's evaluation. To accommodate these requirements, state departments of education commonly use state test scores to calculate measures of student learning, which we refer to as growth scores or value-added measures.
Although such scores have an intuitive appeal, they've been criticized for a number of reasons. For one thing, using different tests to compute value-added measures for the same group of students produces very different results. For another, state tests are not necessarily aligned to a school's curriculum (Marzano & Toth, 2013). But perhaps the biggest problem is that the majority of teachers can't use these scores. In fact, only about 31 percent of U.S. teachers teach grade levels and subject areas that are associated with state-level tests (Prince et al., 2009).
So what are teachers, schools, and districts to do? The answer is to develop other ways to demonstrate that students taught by a given teacher have learned.
Recommendation 1: Use common assessments to create common growth measures.
Consider a team of 2nd grade teachers who are focusing on a unit on plant and animal survival. The goal for that unit is for students to understand what different types of plants and animals need to survive.
As teacher teams create both a pre-test and post-test, they design items for both tests at three levels of difficulty: basic, proficient, and advanced. Items at the proficient level deal directly with the unit's goal; they ask students to describe and exemplify the survival needs for specific types of plants and animals. Items at the basic level focus on more basic skills, such as recalling specific details about survival—for instance, that both plants and animals need food, air, and water to survive. Items at the advanced level require students to make inferences or applications that go beyond what was directly taught. For example, an item might require students to compare and contrast different ways in which plants and animals breathe and find nourishment.
Both the pre- and post-tests should contain these three types of items in the same proportions. Teachers can use the results of these tests to determine the needs of the entire class and of individual students.
Recommendation 2: Use common student surveys to create common growth measures.
Teachers can also use common student surveys to measure growth. A large-scale study published by the Bill and Melinda Gates Foundation (2011, 2012) concluded that student surveys should be a significant part of an array of assessments used to judge teacher quality. Here are some useful items to include on such surveys:
- I've learned a great deal in this class.
- I've accomplished more than I thought I would in this class.
- My teacher pushes us to work hard and think deeply.
- In this class, the teacher expects nothing but our best.
The first two items address how much students have learned. Research has shown that students can accurately report their own levels of learning (Hattie, 2009). The second two items deal with how hard students work in class, which connects to how much a teacher has motivated students to learn and, most likely, to how much they end up learning.
Teachers can use these surveys as both pre-tests and post-tests, computing student growth scores the same way they compute growth scores with common assessments, or they can use them as post-tests only.
Comparing Scores Across Teachers
Race to the Top requires that measures used in teacher evaluation be comparable across teachers. So how can teachers manage this?
Using Common Assessments
To help eliminate measurement error resulting from teacher bias, multiple teachers should score each student's assessment. Teachers should average the scores to create a composite score for each student for that particular assessment. Doing this for both the pre- and post-test gives each student a relatively reliable growth score, which one obtains by subtracting the pre-test from the post-test score. This would make the average growth of students across the pre- and post-test comparable among teachers.
But what can a small district do when only a few teachers teach a specific unit at the same grade level? Given the ease of Internet access and use, several small districts can come together online to develop common pre- and post-test assessments.
To develop average growth scores using surveys, teachers can use items that address student perceptions of how much they've learned and how hard they've worked. These scores can be used to compare teachers who administered the same survey items at the same grade and content level. Large districts will have large comparison groups; small districts can band together to create comparison groups of sufficient size.
Useful and Fair
Most teachers agree that measures of student learning should be included in teacher evaluation models, but they want these measures to be useful and fair. Teachers can use common content pre-tests and post-tests as well as common items on student surveys to produce scores of student learning that more accurately reflect their effectiveness in the classroom.
Bill and Melinda Gates Foundation. (2011). Learning about teaching: Initial findings from the Measures of Effective Teaching project. Seattle, WA: Author.
Bill and Melinda Gates Foundation. (2012). Gathering feedback for teaching. Seattle, WA: Author.
Hattie, J. (2009). Visible learning. New York: Routledge.
Marzano, R. J., & Toth, M. (2013). Teacher evaluation that makes a difference: A new model for teacher growth and student achievement. Alexandria, VA: ASCD
Prince, C. D., Schuermann, P. J., Guthrie, J. W., Witham, P. J., Milanowski, A. T., & Thorn, C. A. (2009). The other 69 percent. Fairly rewarding the performance of teachers of nontested subjects and grades. Washington, DC: Center for Education and Compensation Reform.