HomepageISTEEdSurge
Skip to content
ascd logo

Log in to Witsby: ASCD’s Next-Generation Professional Learning and Credentialing Platform
Join ASCD
November 1, 2009
Vol. 51
No. 11

Making Teacher Evaluations Meaningful

premium resources logo

Premium Resource

Teacher evaluations have been called the missing link in the teacher quality debate, but poor teacher evaluations have been blamed for producing a "widget" or "Lake Wobegon" effect where administrators are indifferent to instructional effectiveness and teachers are rewarded for just-above-average performance.
Interest in the usefulness of teacher evaluations arguably was reignited in June of this year with the publication of The Widget Effect, a study by the New Teacher Project that looked at teacher assessment practices in 12 districts in four U.S. states. Across the districts analyzed, the study found that less than 1 percent of teachers were rated unsatisfactory, a figure that shows few distinctions were being made regarding teacher effectiveness and quality.
Similar reports show The Widget Effect's findings can be generalized to other districts as well. A recent New Yorker article ("The Rubber Room," August 31, 2009) notes only 1.8 percent of tenured teachers in New York City Public Schools are rated unsatisfactory, revealing more about the quality of evaluations than teacher performance. In a system of 89,000 teachers, inadequate assessments mean poor performance goes unaddressed, new teachers don't get targeted support, excellence goes unrecognized, and professional development is not systematically aligned with areas of need.

A Tool of Little Use or Consequence

The Widget Effect was shortly followed by the release of So Long, Lake Wobegon?a report by University of Connecticut professor and Project on the Next Generation of Teachers researcher Morgaen Donaldson. In the report, Donaldson argues, "Teacher evaluations often suffer from the 'Lake Wobegon effect': Most if not all teachers receive satisfactory evaluation ratings. It is possible that all teachers are above average in some schools, but there is generally more variation in teacher effectiveness within schools than between them. Thus, any school—low-performing or high-performing, wealthy suburban or under-resourced urban—is likely to employ more under-performing teachers than its evaluation ratings suggest."
At a June briefing about the report, Donaldson explained that the Wobegon effect is the result of several factors, including inadequate evaluation instruments. "Simple checklists and rating systems often focus on trivialities like how neat a bulletin board is and are rarely aligned with district improvement efforts and a professional development focus," she noted. Donaldson said other contributing factors include principals' limited time and resources to tackle a wide range of staff evaluations; evaluators' lack of training, oversight, and incentives for accurately evaluating teacher effectiveness; cultural norms of noninterference with teaching; and the belief that replacement teachers would not be better or readily available.
Donaldson also noted that evaluations often are used as an opportunity for cheerleading and motivating, rather than for providing critical feedback. She cites findings from The Widget Effect and many other studies that evaluations do not improve teachers' knowledge and skills. Bottom line: evaluations have few consequences, either positive or negative.
These concerns are compounded by generally laissezfaire policy attitudes about educator evaluation. According to 2008 data from Education Week's Education Counts database, 8 states do not require teachers' performance to be formally evaluated at all, only 12 states require teacher evaluations on an annual basis, 26 states require evaluators to receive formal training, and 12 states link teacher evaluations to student performance.

Evaluations That Drive Effectiveness

Donaldson argues that increased competition and accountability in public education, as well as a growing infrastructure of standards for effective teaching, better data on teaching and learning, and a diverse teaching corps that's more open to experimentation with compensation, make this a perfect time to improve teacher evaluation.
In particular, competition in the form of the U.S. Department of Education's $4.35 billion dollar Race to the Top grant (part of the American Recovery and Reinvestment Act) may create an incentive for improving teacher evaluations. To be eligible for Race to the Top funds, states like California, Wisconsin, and New York must remove legal barriers to linking student achievement data to teachers and principals. And grant applications will be scored based on state plans to differentiate teacher and principal effectiveness, report effectiveness of teacher and principal preparation programs, and provide targeted support to teachers and principals.
Cincinnati's Teacher Evaluation System, highlighted by Donaldson as a model program, might win Ohio extra points on its Race to the Top application. This program designates administrators and teacher leaders as evaluators, is revised according to internal and external feedback, and addresses many of the "Wobegon" effects outlined above.
According to Donaldson, the bones for a good evaluation system include an extended program development phase; valid, reliable instruments; multiple measures; robust professional development for evaluators and teachers; accountability, incentives, and support for evaluators; and reinforcement for professional development initiatives in the district.

Doing Evaluations Differently in D.C.

Harvard Business School Social Entrepreneurship Fellow Ben Weber spent this past summer researching and developing an evaluation program at E.L. Haynes Public Charter School, a Washington, D.C., elementary school. The school recently grew from 18 to 80 teachers and needed to develop an evaluation system that was both thorough and easy to use. The staff's first priorities, says Weber, were "aligning and maximizing the instructional value of feedback" and making sure feedback was heard and delivered in a more formalized way. Part of the impetus of Weber's work came out of teacher requests for more classroom observation and support.
"That told us," says Weber, "we needed to get all of the eyes in the classroom—the principal, English language learner (ELL) specialists, curriculum coordinators, master teachers, and instructional coaches—aligned in the same direction, so we could say, 'This is what we're looking for in our classrooms right now, and this is what teachers are looking for in terms of feedback.'"
Like most schools, Haynes uses a teacher competency document to guide what good teaching looks like, and this in turn frames performance standards and evaluative feedback. Haynes built its framework by borrowing practices from books like The Skillful Teacher, the charter management organization Achievement First, and other sources. Weber also interviewed about 20 D.C.-area schools (including the Fairfax County, Va., and Montgomery County, Md., public school districts), 10 nonprofits (e.g., the SEED Foundation), businesses such as McKinsey & Company, and organizations like the World Bank to find evaluation best practices.
Haynes conducts evaluations for everyone from the head of school to principals to front-office staff. Under the new system, teachers will work with a supervisor; a professional development mentor who does not serve in an evaluative capacity; and coaches serving as resources around specific instructional areas like literacy, curriculum, ELL instruction, and inclusion. The school has a human resources platform for storing evaluation data, but in addition, each classroom will have its own wiki where ongoing conversations about teacher goals and benchmarks will take place. The information on the wiki is proprietary to the teachers in the classroom and will be viewable by teachers' coaches and supervisors. The evaluation itself can be seen just by the supervisor and the individual teacher.
So far, Haynes's evaluation program is not tied to compensation. "At this point, we're just piloting the program, and we want a lot of feedback from teachers on both the content and the process," says Weber. Also for now, student test scores will not be included in evaluations. Weber adds the caveat that Haynes is so deeply data driven, it wouldn't be hard to roll in test scores. "But I'm really hesitant to do that in the first couple years that we test the system," he says. "One of the big things that makes evaluation systems fail is insufficient buy-in from the people being evaluated."
Weber wants teachers to trust the process, and adding scores off the bat might distract from that, or at worst, lead some to game the system, he notes. He is hopeful that teachers will feel supported by more structured, standardized evaluation processes. "We want to set expectations in the right place and regularize the process so that everyone understands it," he says.
Year-round, teachers will be observed five times as teams and provided written feedback; meet one-on-one with their professional development mentors to discuss progress; work with their supervisors to evaluate professional growth; get clear, written midyear indicators that show whether they are on- or off-track; have three opportunities to self-assess against competency guidelines; develop a professional growth plan with meaningful goals and targets; and use their wiki to get feedback and access resources. Haynes also has developed three staff surveys, based on factors like school climate, processes, and professional relationships, that ask what Weber sees as the essential question: "Is the feedback you're getting helping you become a better teacher?"

Getting to "Good" and Better

Charlotte Danielson, whose framework for teaching sets the norm for instructional quality in schools worldwide, laments that the current culture of teacher evaluation is one of protection and passivity, not professional inquiry. Whereas teacher evaluation could be a powerful point of reflection, support, and growth, statistics show it's often used solely as a rubber stamp or for punitive purposes.
To reverse the widget effect, the report's authors recommend districts:
  • Adopt a fair, comprehensive evaluation system that differentiates between teachers based on their effectiveness and provides professional development supports aligned to instructional goals.
  • Provide training for administrators and evaluators and hold them accountable for accurately assessing teacher performance.
  • Use performance assessments to guide decisions about professional development, assignments, compensation, retention, and dismissal.
  • Adopt lower-stakes dismissal policies for ineffective teachers and a system of due process that is fair and efficient.
"Evaluating teachers is really a lot like evaluating artists or programmers or engineers—it's a highly skilled, highly technical profession," Weber explains. "A lot of the conversation about evaluating teachers focuses on weeding out bad teachers, and I think that misses the point. The real work of this is in helping good teachers get better."

Figure

ASCD is a community dedicated to educators' professional growth and well-being.

Let us help you put your vision into action.
Discover ASCD's Professional Learning Services