Weighing the Pros and Cons of TAP

A Tennessee teacher offers her perspective on the state's new evaluation system.

Premium Resource

Headlines offer a plethora of opinions and arguments both for and against proposed and newly implemented teacher evaluation systems all over the United States. Hardly anyone balks at the idea of evaluating teachers, but many question whether these new evaluation systems really improve teaching and learning. My experience with the new Tennessee system shows that the new systems can be helpful, but they're not without flaws.

A New System

In 2011–12, Tennessee adopted a brand-new education evaluation system that uses a model from the Teacher Advancement Program (TAP). Fifty percent of a teacher's overall effectiveness score is based on observations—four for professional teachers and six for apprentice teachers (teachers in their first three years of teaching). Two of these observations (three for apprentices) are preplanned. One consists of a pre-observation conference with the evaluator, the observation, and a post-observation conference. The other requires the teacher to turn in a written lesson plan before the observation, which is again followed by a post-observation conference. The other observations are unplanned and are followed by a post-observation conference.

The evaluations cover four domains: Instruction, Planning, Environment, and Professionalism, each having its own separate rubric. Initially, these rubrics were covered in four separate evaluations; however, the state now allows evaluators to combine two rubrics in one observation.

The other 50 percent of a teacher's effectiveness score is based on test data. Thirty-five percent is based on value-added data calculated from student growth on the Tennessee Comprehensive Assessment Program (TCAP) standardized tests. However, because testing does not begin until 3rd grade, value-added measures are only available for teachers in grades 4–12. As a result, teachers of K–3 students, as well as all special-area teachers, are evaluated according to the average gains for the 4th grade in their school. This system has two glaring problems: One is that it may not be valid to use 4th grade growth scores for a kindergarten teacher or a physical education teacher. The other is that this system places an enormous responsibility on 4th grade teachers.

Each school district determines how to allocate the remaining 15 percent of the test-data measure, but schools must use research-based data approved by the state. Our district, a K–8 district, chose to use 8th grade ACT data for grades 4–8 and all special areas and to use 4th grade social studies TCAP scores, which have historically been high in our district, for grades K–3.

As a 4th grade teacher, if I ignore the pressure of having 50 percent of all K–3 teachers' evaluations resting on my team's test scores and consider only whether these growth scores improve my teaching, I am torn. I want my students to show a year's growth and to be proficient or advanced in all areas, but I am a rational person and a realist. No amount of Superman capabilities will make students in special education who are on a pre-primer reading level read at a 4th grade level by spring. And I can't control the events that happened the night before or the morning of testing that may cause a student to be "off" that day.

Even when growth scores have come in at a negative value, I have always been able to firmly and faithfully say that each and every one of my students has made significant gains. When I do see a lack of growth, I carefully and diligently reflect on my practices over the year, study the test data, and work to improve the following year.

For instance, in math last year, even though students collectively made a year's growth, I studied the subcategories, noting that students were noticeably weaker in data and probability than in other areas. As a result, I changed my practices this year.

In this respect, testing does indeed improve my teaching. Because I have access to the data, I am able to improve instruction. On the other hand, K–3 teachers do not have data. They are simply handed the score.

If K–3 teachers are to use data to improve their instruction, the state will need to rethink this portion of the evaluation or guide K–3 teachers to use the data vertically to improve 4th grade scores. If, for example, number and operations is found to be a weakness, vertical teaming could enable teachers from kindergarten through 4th grade to make a collective effort to improve student performance in this area.

A Teacher's Hope

During my student teaching, evaluative feedback from my lead teacher and my university supervisor was a great help. I reflected, they listened and commented, and then they shared positive elements they observed and constructive criticism or suggestions for improvement.

Fast forward to my first three years of teaching. For the most part, I was responsible for evaluating my own performance, with the exception of three yearly observations by the administration. I missed the more regular feedback available in my student-teaching experience, and I often wished there were more opportunities for informal observation and discussion.

When Tennessee rolled out the new evaluation system, I was optimistic. I believe administrators and other evaluators need to visit classrooms more often. I appreciate constructive feedback. Observers can see things I miss and provide insight that I may not have thought of. When I first saw the multiple pages of rubrics that would be used to score observations, I admit I was overwhelmed. But after viewing each component, I was relieved to note that the majority of the components evaluators would be looking for were already part of most of my everyday lessons.

During our training, there was a big emphasis on how this tool would make us better teachers. The trainers also told us that few teachers will earn a 5 or a 1. We were told that "3 is a rock-solid teacher." I would never give my own 4th graders a gigantic rubric and then tell them most of them would just score average and that's fine. Yet under the TAP system,

scoring a 5 in any area or overall connotes not just satisfactory performance or even superior performance, but truly exceptional performance. Because 5s are very difficult for even the best teachers to achieve, the evaluation system provides very nearly every TAP teacher with "stretch goals" that encourage him or her to continue to improve.

The idea is not that no one will get 5s but that a 3 is truly proficient and a 5 indicates exemplary performance. This system would require a change in mind-set, but it made sense.

I completed the whole cycle of a yearlong evaluation in 2011–12. Each evaluation was fair, and the ratings I was required to give myself before meeting with my evaluator were within one point of my evaluator's scores.

Learning from the Rubric

I can honestly say the system has made me a better teacher in a variety of ways. I am much more conscientious about stating my objectives not only at the start of the lesson but also during and at the end of the lesson.

I am also aware of how aspects of my teaching do or, in some cases, do not fulfill criteria on the rubric. I continually ask myself questions about the structure of lessons: Am I differentiating to the fullest extent here? Can I bring more types of problem solving into this lesson? This ongoing reflection enables me to tweak and modify my lessons as I teach, thus improving the lesson as it unfolds. The rubric guides my thinking.

But the rubric has serious flaws. For instance, for a teacher to get a 3 (at expectations) in one category of the rubric, the rubric requires "evidence that most students demonstrate mastery of the objective." If the objective is simple and concrete, such mastery is certainly attainable. However, many concepts are abstract and complex, and teachers must approach such concepts in multiple ways and in multiple lessons for most students to reach mastery.

It would be excellent if the majority of students mastered adding and subtracting fractions with unlike denominators in 45 minutes to an hour. But in my high-poverty, culturally and linguistically diverse, special education inclusion class, it is unlikely and unrealistic. Besides, it's not clear what most even means. Is it 75 percent? 85 percent? 51 percent? (That is a majority, after all.)

When I taught my students how to measure volume using a graduated cylinder, it took several days of investigating, modeling, discussing, overcoming misconceptions, and dunking objects into graduated cylinders before most students could accurately measure the rise of water to calculate the volume of an object. So when one of my lessons was observed, I attained only a 2 in mastery.

Another problem is the number of indicators in the instructional rubric. There are 12 indicators to measure in one lesson, as opposed to four in the planning and environment rubrics. The indicators in the rubric are all good, and all lessons should have many of them, even a majority. Yet some elements may not fit within a particular lesson. An introductory lesson may include less problem solving but more accessing of prior knowledge. On the other hand, a lesson at the close of a unit may have a great deal of problem solving and critical thinking but less accessing of prior knowledge. Yet the rubric requires a teacher to include both and to have at least three different types of problem solving—just to score average.

To be perfectly honest, a score of a 2 on one indicator is not going to ruin a whole evaluation. If, as in the case of my 2 in mastery, a teacher has a good lesson and exhibits good practices throughout, one or two low areas will not affect an overall score. That one low score lets me know what the weak areas of the lesson were, but it doesn't mean my lessons always score a 2 in that area. If I begin to see a pattern of low scores in an area over multiple observations, I need to consider how to improve. Isn't that the whole point?

The Teacher's Burden

Many educators have spoken about the additional time that the new system requires. The chief complaint is that writing lessons for the evaluation (required for the planning evaluation and recommended for one other) takes too much time. Writing in-depth lessons does take time, and most teachers do not write such detailed plans for every lesson. I equated my own evaluation lesson with the dreaded sub plans, in which one must write out every detail so someone else can follow it. It took me a couple of hours. It was tedious, but not overwhelming, especially for only one or two lessons a year.

Although I personally abhor the thought of designing lessons just for an evaluation, I fear that many teachers will fall into creating the old "dog and pony show" for observations. I have heard of teachers creating lessons that are designed to fit the rubric and pulling them out when an evaluator walks in. Is this a real picture of standard teaching practice? Unplanned evaluations of typical lessons are far more honest and reflective of everyday practices—and they require no additional time from teachers.

Let me be clear on the big picture of time. Evaluators come in and observe for three out of four evaluation lessons, and no additional time is required of teachers during the observation itself. Two evaluations have a pre-observation conference, which requires an additional 30 minutes each. Plus, there are post-observation conferences of approximately 30 minutes each for three or four of the evaluations. The total time so far is about three hours.

The longest portion left is creating the lesson for the planning evaluation. This lesson took me a couple of hours. (I would estimate that an average lesson plan requires anywhere from 15 to 90 minutes to complete.) Add another two hours for the recommended plan for an announced evaluation.

The grand total? Seven hours. Seven hours out of one year. Is this really too much?

The Purpose of Evaluation

These evaluations are meant to improve instruction. As with any evaluation system, those responsible for creating the system must continue to revise it to strive for integrity, validity, and reliability. (In fact, Tennessee has already made several changes for 2012–13.) Yet although the system has flaws, it has indeed improved my teaching, and for that, I am grateful. With additional refinements, it could improve my teaching even more.

End Notes

•

1 Jerald, C. D., & Van Hook, K. (2011). More than measurement: The TAP system's lessons learned for designing better teacher evaluation systems. Chicago: Joyce Foundation.

•

2 National Institute for Excellence in Teaching. (2011). Instruction rubric. Santa Monica, CA: Author.

ASCD is a community dedicated to educators' professional growth and well-being.

Let us help you put your vision into action.

Discover ASCD's Professional Learning Services