Skip to content
ascd logo

Log in to Witsby: ASCD’s Next-Generation Professional Learning and Credentialing Platform
May 1, 2016
Vol. 73
No. 8

Research Says / Evaluating and Improving: Not the Same Thing

author avatar

Research Says / Evaluating and Improving: Not the Same Thing- thumbnail
In 2013, a team of experts funded by the Bill and Melinda Gates Foundation wrapped up a three-year, $45 million project to identify Measures of Effective Teaching (MET)—an effort to determine whether it's possible to put numbers on something as complex as teaching. After reviewing 20,000 classroom videos, crunching data from thousands of student surveys, and parsing value-added achievement results for 3,000 teachers in seven districts, they concluded it was, in fact, possible to accurately evaluate effective teaching—largely by triangulating data from student learning gains, student surveys, and to a lesser extent, classroom observations (which, they cautioned, should be conducted on multiple occasions by multiple experts or by expertly trained observers).
Like any major report, the study had its critics. Some questioned what they saw as circular logic in using value-added measures (themselves fraught with accuracy concerns) to validate student surveys and classroom observations (Baker, 2013). Others questioned whether the tremendous effort to observe teachers in classrooms, given its weak overall correlation to achievement, was worth it (Greene, 2013). Amid the debate, though, few paused to ask whether better evaluations actually yielded improvement in teacher performance.

Rethinking Performance Appraisals

Not long after the report's release, Bill Gates's company, Microsoft, announced it was moving away from rating and ranking employees and toward giving employees real-time feedback and coaching aimed at fostering professional growth (Warren, 2013). Since then, other companies, including GE and Google, have also abandoned get-tough performance management approaches in favor of allowing employees to set stretch goals and receive frequent feedback and coaching from supervisors and peers (Duggan, 2015).
This rethinking of performance management by corporations seems to have come from connecting the dots found in, of all places, education research. For example:
  • Carol Dweck's (2006) studies of middle schoolers showed the power of developing a "growth" mindset (seeing intelligence as malleable) versus a "fixed" mindset (seeing intelligence as static). Performance ratings can reinforce a fixed mindset if those being evaluated internalize low marks as evidence of their permanent lack of ability.
  • Dylan Wiliam (2011) found that timely, targeted feedback is key to improving performance, yet it falls on deaf ears when coupled with numerical ratings. A study of 6th graders found that adding a numeric score to written comments wiped out the benefits of the comments, presumably because "students who got the high scores didn't need to read the comments and students who got low scores didn't want to" (p. 109).
  • Extrinsic rewards can discourage the very behaviors they intend to encourage. Researchers observed that when children were rewarded for drawing pictures, they later spent less of their free time drawing (Deci, Ryan, & Koestner, 1999), which suggests that rewarding inherently enjoyable activities (be it drawing pictures or teaching well) can turn them into chores—and leave us feeling cheated if rewards don't materialize.

Do Better Evaluations Drive Better Results?

The education world has spent the past few years ramping up annual performance reviews. Between 2009 and 2012, the number of states requiring annual teacher performance evaluations jumped from 14 to 43 (National Council on Teacher Quality, 2012). Yet given what we know about the effect of ratings on productive mindsets, receptivity to feedback, and motivation, it seems plausible that even the most precise teacher evaluation systems, if framed incorrectly, might backfire, delivering few positive effects on performance.
One of the only studies to date to examine this hypothesis looked at performance trajectories of midcareer teachers in Cincinnati who participated in an evaluation system that employed multiple, structured observations done by experienced peers from other schools (Taylor & Tyler, 2012). At the time of the study, these teachers were evaluated only once every four years, so researchers were able to observe a spike in teacher performance—equivalent to a 4.5 percentile point gain in mathematics achievement—during the year of their evaluation. Teachers with previously low levels of student achievement showed larger gains in performance. Researchers attributed these gains to teachers receiving, on four separate occasions, detailed formative feedback on their performance and having opportunities to reflect on—and converse with peers about—their practice.
It's worth noting that the evaluation system in question was fairly lenient; more than 90 percent of teachers were rated proficient or distinguished. Moreover, the overall ratings were fairly low-stakes and only loosely tied to promotion and retention decisions. Nonetheless, on individual rubric items, raters were strict. Researchers speculated that in the end, it may have been the "microlevel feedback" that drove improvements, especially as performance gains persisted long after the teachers were evaluated, despite loose coupling with personnel decisions. Presumably, teachers had internalized the feedback, which they received before they were given a summative rating. Without overstating findings from one study, it seems that when it comes to driving improvement, the most important thing evaluations may provide is high-quality feedback.
Herein lies the rub, however. Greater precision with teacher evaluations often begets greater complexity—witness the MET report's calls for multiple measures, multiple observations, multiple raters, and extensive training on evaluation frameworks. Yet as Mike Schmoker (2012) notes, there's no solid evidence that bloated systems make schools perform better, especially if they become so time-consuming that they fail to deliver clear feedback or support professional growth conversations. Similarly, a review of research by the SHRM Foundation (Pulakos, Mueller-Hanson, O'Leary, & Meyrowitz, 2012) found that attempting to review employee performance on a large number of competencies does little to change behavior.
Microsoft HR chief Lisa Brummel might have put it best when she declared in her memo to employees that they would have "no more ratings," so they could "focus on what matters," including "more timely feedback and meaningful discussions to help employees learn in the moment, grow and drive great results" (Warren, 2013). Microsoft appears to have arrived at the critical insight that the real path to improvement may lie in simplicity, so everyone can worry less about measuring and more about motivating success.

Baker, B. D. (2013). Gates still doesn't get it! Trapped in a world of circular reasoning and flawed frameworks [blog post]. Retrieved from School Finance 101 at https://schoolfinance101.wordpress.com/2013/01/09/gates-still-doesnt-get-it-trapped-in-a-world-of-circular-reasoning-flawed-frameworks/

Bill & Melinda Gates Foundation (2013). Ensuring fair and reliable measures of effective teaching: Culminating findings from the MET Project's three-year study. (Policy and Practice Brief). Seattle, WA: Author.

Deci, E. L., Ryan, R. M., & Koestner, R. (1999). A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychological Bulletin, 125(6), 627–668.

Duggan, K. (2015, December 15). Six companies that are redefining performance management. Fast Company. Retrieved from www.fastcompany.com/3054547/the-future-of-work/six-companies-that-are-redefining-performance-management

Dweck, C. (2006) Mindset: The new psychology of success. New York: Ballantine.

Greene, J. P. (2013). Understanding the Gates Foundation's Measuring Effective Teachers project [blog post]. Retrieved from Jay P. Greene's Blog at http://jaypgreene.com/2013/01/09/understanding-the-gates-foundations-measuring-effective-teachers-project

National Council on Teacher Quality. (2012). State of the states 2012: Teacher effectiveness policies. Washington, DC: Author.

Pulakos, E. D., Mueller-Hanson, R., O'Leary, R. S., & Meyrowitz, M. M. (2012). Building a high-performance culture: A fresh look at performance management. Alexandria, VA: SHRM.

Schmoker, M. (2012, August 29). The next education fad: Complex teacher evaluations that don't work. Education Week, 32(2), pp. 20, 24.

Taylor, E. S., & Tyler, J. H. (2012, Fall). Can teacher evaluation improve teaching? Education Next, 12(4).

Warren, T. (2013, November 12). Microsoft axes its controversial employee ranking system [blog post]. Retrieved from The Verge at www.theverge.com/2013/11/12/5094864/microsoft-kills-stack-ranking-internal-structure

Wiliam, D. (2011). Embedded formative assessment. Bloomington, IN: Solution Tree.

Bryan Goodwin is the president and CEO of McREL International, a Denver-based nonprofit education research and development organization. Goodwin, a former teacher and journalist, has been at McREL for more than 20 years, serving previously as chief operating officer and director of communications and marketing. Goodwin writes a monthly research column for Educational Leadership and presents research findings and insights to audiences across the United States and in Canada, the Middle East, and Australia.

Learn More

ASCD is a community dedicated to educators' professional growth and well-being.

Let us help you put your vision into action.
From our issue
Product cover image 116035.jpg
The Working Lives of Educators
Go To Publication