The typical teacher evaluation in public education consists of a single, fleeting classroom visit by a principal or other building administrator untrained in evaluation who wields a checklist of classroom conditions and teacher behaviors that often don't focus directly on the quality of instruction. According to Mary Kennedy, author of Inside Teaching: How Classroom Life Undermines Reform (2005), checklists include such items as, “Is presentably dressed,” “Starts on time,” “Room is safe,” and “The lesson occupies students.” In most instances, the evaluator just checks off either “satisfactory” or “unsatisfactory.”
Not surprisingly, it's easy for teachers to earn high marks under these drive-by rating systems. A recent study of the Chicago school system by the New Teacher Project (2007) found that 87 percent of the city's 600 schools, including 69 schools that the city declared to be failing, did not issue a single “unsatisfactory” teacher rating between 2003 and 2006. In addition to rarely giving unsatisfactory ratings, principals rarely use the evaluations to help teachers improve instruction and student achievement. They frequently don't even bother to discuss the results of their evaluations with the teachers involved.
Models That Improve Teaching
A number of promising evaluation models point to a way out of the morass. Together they demonstrate that it's possible to evaluate teachers in much more productive ways than most public schools currently do. These models share several key characteristics.
The Teacher Advancement Program (TAP) is a good example of a model with explicit standards. Launched by the Milken Family Foundation in 1999 and now operated by the California-based National Institute for Excellence in Teaching, TAP has made intensive instructional evaluations the centerpiece of a comprehensive program to strengthen teaching, whose other components include coaching, career ladders, and performance-based compensation. Now in 180 schools in five states and the District of Columbia, the program encompasses 5,000 teachers and 60,000 students.
TAP uses a set of standards for evaluating teachers that is based on the work of consultant Charlotte Danielson.1
Enhancing Professional Practice: A Framework for Teaching (1996), Danielson breaks teaching down into four major categories (planning and preparation, classroom environment, instruction, and professional responsibilities); 22 themes (ranging from demonstrating knowledge of the subjects taught to designing ways to motivate students to learn); and 77 skills (such as when and how to use different groupings of students and the most effective ways to give students feedback). Danielson also created a set of rubrics for evaluators that detail what teachers need to do (or not do) to earn “unsatisfactory,” “basic,” “proficient,” and “distinguished” ratings in every skill category.
TAP's modified version of Danielson's teaching standards has three main categories—designing and planning instruction, the learning environment, and instruction—and 19 subgroups that target such areas as the frequency and quality of classroom questions and whether teachers are teaching students such higher-level thinking skills as drawing conclusions.
Whereas traditional evaluations tend to be one-dimensional, relying exclusively on a single observation of a teacher in a classroom, more comprehensive models capture a richer picture of a teacher's performance.
For example, since its inception in 1987, the National Board for Professional Teaching Standards has conferred advanced certification in 16 subjects on some 63,000 teachers across the United States by using a two-part evaluation. The first part is a portfolio that includes lesson plans, instructional materials, student work, two 20-minute videos of the candidate working with students in classrooms, teachers' written reflections on the two taped lessons, and evidence of work with parents and peers. The second part of the evaluation is a series of 30-minute online essays that gauge teachers' expertise in the subjects they teach. Candidates for the board's high school science certification, for example, have to explain acid/base theory and relate it to the phenomenon of acid rain.
Another way to counter the limited, subjective nature of many conventional evaluations is to have teachers evaluated on multiple occasions by multiple evaluators.
Teamwork in TAP. In schools using TAP, teachers are evaluated at least three times each year against TAP's teaching standards by teams of master and mentor teachers trained to use the organization's rubrics. Schools combine the scores from the different evaluations and evaluators into an annual performance rating.
TAP evaluators must demonstrate an ability to rate videotaped teacher lessons at TAP's three performance levels (unsatisfactory, proficient, and exemplary) before they can do live teacher evaluations. TAP requires schools using the program to enter every evaluation into a TAP-run online performance appraisal management system that produces charts and graphs of evaluation results, which are used to compare a school's evaluation scores with TAP evaluation trends across the United States. In addition, TAP ships videotaped lessons to evaluators every year; to continue as TAP evaluators, they must accurately score the lessons using TAP's performance levels.
Teamwork in BEST. The Connecticut Department of Education established a program called Beginning Educator Support and Training (BEST) in 1989 to strengthen its teaching force. The program supplied new teachers with mentors and training and then required the teachers in their second year to submit a portfolio chronicling a unit of instruction (at least five hours of teaching). Three state-trained teacher-evaluators who teach the same subject as the candidate score each portfolio. Failing portfolios are rescored by a fourth evaluator. As in the TAP program, scorers must complete nearly a week's worth of training and demonstrate an ability to score portfolios accurately before participating in the program. Until New Mexico and Wisconsin recently introduced portfolio-based evaluations of new teachers, BEST was the only statewide teacher-evaluation system in the United States.
Using evaluators with backgrounds in candidates' subjects and grade levels, as TAP and BEST do, strengthens the quality of evaluations. “Good instruction doesn't look the same in chemistry as in elementary reading,” said Mike Gass, executive director of secondary education in Eagle County, Colorado, where the district's 15 schools use TAP.
Subject-area and grade-level specialists, scoring rubrics, evaluator training, and recertification requirements increase the “inter-rater reliability” of evaluations. They produce ratings that are more consistent from evaluator to evaluator and that teachers are more likely to trust.
Teamwork in peer reviews. Some school systems, usually in collaboration with local teachers unions, have evaluation systems that rely heavily on experienced teachers to conduct peer reviews—in part as a way to expand the pool of evaluators.
The first and best-known such program is that in Toledo, Ohio, a heavily unionized city of 28,500 students and 2,300 teachers on the western edge of Lake Erie. In 1981, the president of the Toledo Federation of Teachers struck a deal with Toledo school officials that made a districtwide team of veteran teachers responsible for evaluating every new Toledo teacher as well as underperforming veterans.
Under the Toledo Peer Assistance and Review program, about a dozen “consulting teachers” on leave from their classrooms for three years mentor and evaluate Toledo's first-year teachers through frequent, informal classroom observations and as many as six (usually unannounced) evaluations each semester. Where possible, consulting teachers and the teachers with whom they work teach the same subjects but not in the same schools. The evaluations focus on teachers' subject knowledge, professionalism, classroom-management skills, and teaching skills.
At the end of each semester, the consulting teachers make recommendations on the dozen or so teachers under their supervision to a review board comprising five union officials and four school system administrators. The panel then votes on each teacher's status, with a ruling requiring a six-vote majority. To make it to their second year, Toledo teachers must survive two rounds of review-board voting.
Principals play adjunct roles in the evaluations, supplying consulting teachers with information on the attendance and professional comportment of the teachers under review but leaving the classroom evaluations to the consulting teachers.
A Lot of Places to Grow
Unlike traditional teacher evaluations, the systems in Toledo and Connecticut are part of programs to improve teacher performance, not merely weed out bad apples. They are drive-in rather than drive-by evaluations. At a time when research is increasingly pointing to working conditions as being more important than higher pay in keeping good teachers in the classroom, the teachers in the comprehensive evaluation programs say that the combination of extensive evaluations and coaching they receive makes their working conditions more professional, and thus more attractive.
Using evaluations to strengthen teaching is part of the fabric of the school at DC Preparatory Academy, which serves 275 middle school students in northeastern Washington, D.C. The school opened in 2003 and brought on TAP in 2005. In the TAP model, a key role of the evaluations is identifying weaknesses that mentors will work on with teachers during the six weeks between evaluations. In the words of a 5th grade teacher who had spent four years teaching in nearby Fairfax County, Virginia, and who had received high marks on her evaluations there, “Holy Moley! I've learned under TAP that I've got a lot of places to grow.” Among other things, she needed to do a better job of demonstrating to her students the sorts of questions she wanted them to ask as critical readers.
To further strengthen the relationship between evaluation and instruction, TAP requires schools to have weekly, hour-long cluster meetings in which TAP-trained master teachers work with teams of teachers of a particular subject or grade level. For example, 5th and 6th grade math teachers might work on strategies for teaching word problems, and middle school language arts teachers might collaborate on evaluating student writing. Not surprisingly, some teachers in TAP schools chafe at the time and energy that the program demands of them. Some resent the increased accountability and say that the program engenders an unhealthy competition among teachers.
Some studies have suggested that teachers' performance plateaus after several years in the classroom. But few teachers in public education get the sort of sophisticated coaching teachers receive under TAP; if more did, perhaps their performance would continue to improve.
The Trouble with Test Scores
One school of education reformers, including many of today's performance-pay advocates, would evaluate teachers primarily on the basis of their students' achievement. It's a reasonable notion: Teaching is ultimately about helping students learn. But currently the only way to measure student achievement on a large scale is by using standardized test scores. And standardized test scores aren't the simple solution they seem to be.
For one thing, only about 50 percent of public school teachers teach subjects at grade levels in which students are tested, eliminating the prospect of a system that applies fairly to all teachers. Second, most standardized tests in use today measure only a narrow band of low-level skills, such as recalling or restating facts, rather than such high-level skills as the ability to analyze information. As a result, the tests tend to leave the best teachers—those with wider teaching repertoires who are able to move students beyond the basics—at a disadvantage, while putting pressure on the entire school system to focus on low-level skills.
There is also the daunting challenge of separating out individual teachers' effect on their students' reading and math scores from the myriad of other influences on student achievement. And there's the difficulty of drawing the right conclusions about teacher performance from very small numbers of student test scores, an especially tough challenge in elementary schools, where teachers typically work with a single classroom of students every day.
As a result, test scores should play a supporting rather than a leading role in teacher evaluations. School systems should use schoolwide scores in their evaluation calculations, rather than individual teachers' scores, which would also encourage school staffs to collaborate rather than compete.
The Cost in Time and Money
Not surprisingly, comprehensive classroom-evaluation systems are more labor-intensive and thus more expensive than principal drive-bys or evaluations based on test scores. As a result, they are tougher to implement for administrators trying to bring about change on the scale required in large urban school systems.
The only way TAP, the National Board for Professional Teaching Standards, Toledo, and other programs are able to provide multiple evaluations by multiple evaluators is by using such strategies as peer review and remote scoring of portfolios. Toledo union officials say that the city is spending about $500,000 of its $344-million budget on the peer review program this year. That's $5,000 per teacher for the 100 teachers in the program. But that figure includes the year's worth of mentoring that teachers under peer review receive. Said one Toledo teacher, “It's an investment, not a cost.”
TAP costs anywhere from $250 to $700 per student, or up to 6 percent of per-pupil expenditures (National Institute for Excellence in Teaching, n.d.). That works out to $6,250–$14,900 per teacher at 25 students per classroom. However, an average of 40 percent of that money covers performance bonuses that are built into the TAP program. (Compensation for teachers in TAP schools is based partly on their evaluation scores.) As with Toledo's program, intensive, yearlong mentoring for teachers is built into the program's budget. TAP's model also includes pre- and post-evaluation conferences. TAP is less expensive where collective-bargaining contracts allow teachers more time to attend TAP meetings without having to pay them extra.
Connecticut education officials say the state spends about $3.7 million each year on its BEST program, or just over $2,000 per teacher. About 40 percent of that money ($800 per teacher) is spent evaluating teachers' portfolios, including training and paying stipends ($100 a day) to the 500 veteran teachers who score the portfolios. The remaining 60 percent is spent on training and supporting the 1,800 or so new teachers who go through the BEST program each year as well as on central administration.
The National Board for Professional Teaching Standards is the largest and most costly of the comprehensive evaluation systems, charging applicants a $2,500 fee for national certification.
At $1,000 per teacher, it would cost $3 billion each year to evaluate the 3 million teachers in the United States using a Connecticut- or National Board–like portfolio or TAP's multiple evaluations/multiple evaluators model. To put this figure in perspective, public education's price tag has surpassed $500 billion annually, including some $14 billion2
(about $240 per student) for teachers to take professional development courses and workshops that teachers themselves say don't always improve their teaching (Killeen, Monk, & Plecki, 2002).
Some experts question the high costs of TAP and National Board certification. Some studies have found that the National Board certifies top teachers, whereas other studies have found that Board-certified teachers don't increase student achievement more than other teachers. And a new study from the National Center on Performance Incentives at Vanderbilt University—although not studying the important question of whether teachers who receive high scores on TAP evaluations tend to produce greater gains in their students' test scores—found that a small sample of secondary schools using TAP produced no higher levels of student achievement than schools that hadn't implemented the TAP program.
But the investment in comprehensive teacher-evaluation systems is worth making because comprehensive systems like TAP that focus on improving teachers' performance signal to teachers that they are professionals doing important work. In so doing, the systems help make public school teaching more attractive to the sort of talent that the profession has struggled to recruit and retain.
It's hard to overstate the importance to school reform of creating a more professional working environment in teaching. In a national survey of public schools, the National Comprehensive Center for Teacher Quality and Public Agenda (2007) found that if given a choice between two otherwise identical schools, 76 percent of secondary teachers and 81 percent of elementary teachers early in their careers would rather be at a school in which administrators strongly supported teachers than at a school that paid significantly higher salaries. Yet many local, state, and national union leaders have not pressed for more rigorous evaluation systems for fear that such systems may result in the dismissal of additional teachers for poor performance and may strengthen the case for performance-based pay at the expense of the single-salary schedule.
It's hard to believe that an industry that spends $400 billion of its annual expenditures on something so central to its success—teachers—pays so little attention to the return on its investment. How can public education hope to improve teacher quality without a reliable way to measure teacher quality?
No Child Left Behind (NCLB) has helped. One principal noted that by creating consequences for schools and school systems with students who fall below state standards, the law “pushes principals” to take evaluations more seriously. According to the human resources administrator in the Toledo school system, both principals and teacher teams are referring more veteran teachers to peer review in the wake of NCLB “because they don't want to work with people who are pulling the whole school down.”
New York City's school system, the largest in the United States, recently layered on top of NCLB a system of sanctions (up to and including removing principals from their jobs) and financial rewards for both schools and their principals; this system gives teachers and principals alike strong incentives to care about the quality of the teaching in their classrooms. Giving schools greater authority over teacher hiring and firing would provide them with additional incentives to evaluate teachers carefully.
Ultimately, the single-salary schedule may be the most stubborn barrier to better teacher evaluations. As Kate Walsh, president of the National Council on Teacher Quality, said,
If there are no consequences for rating a teacher at the top, the middle, or the bottom, if everyone is getting paid the same, then why would a principal spend a lot of time doing a careful evaluation? I wouldn't bother.
Many teacher unions, of course, argue that the failure of principals to take evaluations seriously requires
a single-salary schedule.
There's no simple solution to this catch-22. But TAP, for one, has addressed it. The program's comprehensive classroom evaluations legitimize performance pay in teachers' minds, and its performance-pay component gives teachers and administrators alike a compelling reason to take evaluations seriously. Pay and evaluations become mutually reinforcing, rather than mutually exclusive.
Danielson, C. (1996). Enhancing professional practice: A framework for teaching. Alexandria, VA: ASCD.
Kennedy, M. (2005). Inside teaching: How classroom life undermines reform. Cambridge: Harvard University Press.
Killeen, K. M., Monk, D. H., & Plecki, M. S. (2002). School district spending on professional development: Insights from national data. Journal of Education Finance, 29(1), 25–50.
National Comprehensive Center for Teacher Quality & Public Agenda. (2007). Lessons learned: New teachers talk about their jobs, challenges, and long-range plans. New York: Public Agenda. Available:
National Institute for Excellence in Teaching. (n.d.). Frequently asked questions about the Teacher Advancement Program. Available:
New Teacher Project. (2007). Hiring, assignment, and transfer in Chicago Public Schools. New York: Author.
Charlotte Danielson joined the National Institute for Excellence in Teaching as a consultant in 2007.
Numbers updated to 2006 by Education Sector using Department of Labor inflation adjustment calculators. See
Editors' Note: This article draws on and in some instances excerpts sections of the newly released report, Rush to Judgment: Teacher Evaluation in Public Education, by Thomas Toch and Robert Rothman (Education Sector, 2008).
Thomas Toch is Codirector of Education Sector, a Washington, DC–based think tank; firstname.lastname@example.org.
Click on keywords to see similar products: