Phone Monday through Friday 8:00 a.m.-6:00 p.m.
1-800-933-ASCD (2723)
Address 1703 North Beauregard St. Alexandria, VA 22311-1714
Complete Customer Service Details
by Robert J. Marzano
Table of Contents
In addition to assigning topic scores to classroom assessments, one of the most important activities a teacher must perform is to assign final topic scores to students at the end of a grading period. As we saw in Chapter 4, each topic addressed in each assessment is measured using a four-point scale. By the end of a grading period, each student then has received multiple scores in each topic. To illustrate, consider the topic scores for the student Bill, in Figure 5.1.
Assessment Key:
Students/Assessments
Topics
Precipitation
Ocean Currents
Measurement of Temperature
Reading Tables
Estimation
Classifying
Clear Communication
Effort
Behavior
Attendance
Bill
A
1.5
1.0
2.0
3.0
4.0
B
2.5
C
D
E
F
3.5
G
H
I
J
K
L
M
N
O
Final Topic Score
2.25
1.75
1.8
3.1
Note: Final topic scores are not necessarily averages of column scores.
Bill has received eight scores on the topic of precipitation, seven scores on the topic of ocean currents, and so on. At the end of the grading period, the teacher must devise a way to summarize Bill's performance on each topic during the grading period.
Probably the most common way teachers summarize student achievement is to compute an average score. For example, consider Bill's eight scores for the topic of precipitation. The average for these scores is 1.94, which indicates an average performance slightly below 2.00. The critical issue isn't whether the average score accurately represents the student's learning about this topic. My answer is that, in general, the average score does not accurately reflect a student's knowledge and skill at the end of a grading period. As you can see in Figure 5.1, Bill's final score for the topic of precipitation is not the average—rather, it is 2.25. To fully understand the reasoning behind this score, consider two theoretical issues: (1) the problem of error in measurement, and (2) the nature of learning.
One of the most well-established principles of educational assessment is the presence of error in all forms of measurement. For a detailed discussion of error as it relates to classroom assessment, see Marzano (2000). The nature of measurement error is commonly expressed in the following formula:
observed score = true score + error score
One of the most interesting aspects of the error component of a score is that it can work in favor of a student on one assessment and against the student on the very next assessment. For example, the teacher might assign a student a rubric score of 3.0 on a quiz for a particular topic when the student actually deserves a 3.5 (i.e., his or her true score). On the next assessment the teacher might assign a rubric score of 4.0, but the student actually deserves a 3.5. One time the teacher's error in judgment works to give the student a deflated score relative to his or her true score; in the next case, a misjudgment gives the student an inflated score. In fact, measurement experts assume that an error score will cancel itself out over time—the "negative" error in the deflated score cancels out the "positive" error in the inflated score.
This random nature of measurement error leads some measurement experts to argue that the average score is the score most representative of students' learning because it automatically cancels out low and high scores. To illustrate, consider Figure 5.2, which depicts a student's seven rubric scores for a specific topic obtained over a grading period. The dark horizontal line through the middle of the graph represents the student's true score of 1.5, meaning that the student's true understanding or skill relative to the topic is a rubric score of 1.5. From a measurement perspective, if there were no error associated with each assessment—that is, the teacher scored each of the seven with complete accuracy—then each would have received a rubric score of 1.5. However, as Figure 5.2 indicates, this was not the case. Assessments 1 and 2 were underestimates of the true score of 1.5; assessments 5 and 7 were overestimates of the true score of 1.5. In such a situation, the average score is usually a fairly good estimate of the true score. In this case, the average score is a very close score of 1.64.
Not available for electronic dissemination.
Copyright © by McREL Institute. Used by permission.
There is one major (and fatal) flaw in the logic underlying this example. It assumes that the true score for a student is the same from the beginning to the end of the grading period. In other words, the student's understanding or skill relative to a topic must be the same for the first assessment as it is for the last if the average score at the end of a grading period is to be a good estimate of the true score. Learning theory, however, tells us that this is not a reasonable assumption: A student's understanding or skill should increase over time.
Over the last few decades, research has taught us a great deal about the nature of learning (for detailed discussions, see Anderson, 1995). One of the most generalizable findings is that learning follows a trend like that depicted in Figure 5.3. The horizontal axis represents the number of practices an individual has had learning a new skill or reviewing a new concept. The vertical axis represents the student's learning with a 100-point scale. A score of 0 means the student has no understanding or skill; a score of 100 means that the student has total understanding or skill. Notice that the line depicting learning is an upward curve. Over time, there is an increase in skill or understanding. An interesting aspect of the learning curve depicted in Figure 5.3 is that the amount of learning from practice session to practice session is large at first, but then tapers off—the learning between the later trials is far less than during the earlier ones. This trend is also observable in Figure 5.4. Notice that the increase in learning after the first practice session is almost 23 (22.918) percent. The increase in learning from the second to the third practice session is 11.741 percent. However, the increase in learning from the 20th to the 21st practice session is less than 1 percent (.802 percent). The increment in learning is less and less after each practice session.
Copyright © McREL Institute. Used by permission.
Practice Sessions
Increase in Learning
1
22.918%
2
11.741%
3
7.695%
4
5.593%
5
4.349%
6
3.354%
7
2.960%
8
2.535%
9
2.205%
10
1.945%
11
1.740%
12
1.562%
13
1.426%
14
1.305%
15
1.198%
16
1.108%
17
1.034%
18
.963%
19
.897%
20
.849%
21
.802%
22
.761%
23
.721%
24
.618%
In psychology, this trend in learning (introduced by researchers Newell and Rosenbloom, 1981) is referred to as "the power law of learning" (because the mathematical function describing the trend can be described by a power function: raising the amount of practice to a power. Appendix D, pp. 133–134, describes the formula for the power law). The power law appears to be ubiquitous, applying to a great variety of learning of learning situations. As Anderson (1995) explains, "Since its identification by Newell and Rosenbloom, the power law has attracted a great deal of attention in psychology, and researchers have tried to understand why learning should take the same form in all experiments" (p. 196).
The power law of learning suggests a great deal about the most representative score for a given student's achievement over a grading period. First and foremost, it tells us that a student's true score changes (i.e., increases) throughout a grading period according to the power law curve (see Figure 5.5).
Figure 5.5 assumes that the student's true scores follow the power law of learning. As the student learns, the true score increases. As depicted, the student's true score in the first assessment was .71, and the observed score was 1.00. The true score on the second assessment was 1.24 and the observed score was 1.00. The student's true score on the last assessment was 2.21 and the observed score was 3.00. It is fairly obvious that a final topic score based on the power law is probably a much more accurate estimate of a student's true score than is the average if a student learns during a grading period.
One way to observe the superiority of the power law line as an estimate of students' true scores throughout a grading period over the average score is to calculate the difference between these true score estimates and the observed scores. Figure 5.6 dramatically illustrates that the observed scores are much closer to the power law line than they are to the average score. If the average score were, in fact, the true score, then the seven rubric scores as assigned by the teacher would show a great deal of error.
Assessment
Total Distance
Observed Score
n/a
Average Score
1.64
Estimated Power Law Score
0.71
1.24
1.55
1.78
1.94
2.08
2.21
Difference Between Observed Score and Average Score
0.64
.014
0.36
1.36
3.42
Difference Between Observed Score and Estimated Power Law Score
0.29
0.24
0.05
0.28
0.06
0.58
0.79
2.29
In short, research on learning indicates that a final score based on the power law is most probably a better estimate of a student's true score at the end of a grading period than is the average score. In fact, this is necessarily the case if the student learns, because his or her true score will increase from assessment to assessment. Thus, the average score will be an underestimate of the student's true score at the end of a grading period.
Using the final power law score as the estimate of a student's achievement during a grading period as opposed to the average score is certainly a change from what we do now—but one that measurement experts have recommended for quite some time. For example, measurement expert Frank Davis (1964) notes:
In some ways, [measuring individual change over time] is the most important topic in educational measurement. The primary object of teaching is to produce learning (that is, change), and the amount and kind of learning that occur can be ascertained only by comparing an individual's or group's status before the learning period with what it is after the learning period. (p. 48)
In this section we will consider a technique for estimating a student's final topic score using the power law. (For a technical discussion of the power law, see Marzano [2000].) The new generation of inexpensive computer programs, however, will compute the final power law score precisely and automatically. Later in this chapter, we will briefly review one such program. I strongly recommend that teachers use computer programs because they totally eliminate any errors in estimating the final power law score. If a teacher has no access to such programs or no desire to use them, however, the final power law score can be efficiently estimated with remarkable accuracy.
The process is actually quite simple. The teacher simply examines the topic scores for a given student over a grading period and attempts to detect a trend in their progression. To illustrate, consider the scores for the three topics in Figure 5.7. These represent 10 scores on three topics over a grading period (e.g., nine weeks). As an exercise, cover up the bottom row of the figure (which contains a calculation of the final power law score) and try estimating it by answering the question, "Given the progression of scores over the grading period, which score is most representative of the student's learning at the end of the grading period?" Then, uncover the calculated score and see how far off your estimate was. I have found that by following a simple convention, teachers can estimate final power law scores that are remarkably accurate: using quarter-point intervals.
Topic #1
Topic #2
Topic #3
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
Final Power Score
3.00
2.71
Recall from the discussion in Chapter 4 that judgments using rubrics are considerably more accurate when teachers assign scores in one-half point increments (i.e., 1, 1.5, 2, 2.5, 3, 3.5, 4) as opposed to simply assigning whole number scores (i.e., 1, 2, 3, or 4). Similarly, teachers' estimates of final power law scores are more accurate when they use quarter-point intervals when assigning final topic scores, such as:
1.25
1.50
2.00
2.50
2.75
3.25
3.50
3.75
4.00
Estimated Final Power Law Scores
Percentage of Teachers Choosing Each Score
0
2%
28%
44%
38%
27%
53%
31%
41%
1%
Computed Power Law Scores for Each Topic:
When presented with this system, many teachers have asked me why I do not recommend weighting scores on certain assessments more than others. After all, shouldn't the mid-term examination count more than a quiz? In this system, weighting makes very little sense. In fact, when estimating the final power law score, there is no mathematically legitimate way to weight one score more than another. This is not to say, however, that all assessments are the same. Indeed, they will differ in at least two important ways.
First, teachers commonly weight one assessment more than others because it addresses more topics. For example, the mid-term exam might cover four topics, and an individual quiz only one. In the system described above, the expanded coverage within one assessment will be reflected in the number of topic scores assigned in that assessment. That is, if the mid-term exam covers four topics but a quiz covers only one topic, then the mid-term exam will be assigned four topic scores and the quiz only one. Given that a single assessment can be assigned multiple topic scores, the system presented here provides the same flexibility as weighting.
The second reason teachers commonly weight one assessment more than others is that it contains more items for a given topic, which supposedly decreases the amount of error associated with a given topic score. Certainly, a test with many items for a given topic is more precise than a test with few items. It is also true that a number of short assessments given over time will provide a better indication of a student's learning than one or two large assessments given in the middle and at the end of the grading period, because the assessments keep pace with the trend of students' learning (see the earlier discussion of the power law of learning). Consequently, I recommend that teachers try to give multiple assessments of equal precision spaced over a grading period, as opposed to constructing one or two large, all-encompassing tests that attempt to cover everything.
Up to this point, the discussion about final topic scores has focused on topics directly related to academic achievement. However, in Chapter 4, we observed that a teacher might also want to keep track of nonachievement factors such as effort, behavior, and attendance. To illustrate, reconsider the scores for Bill over a given grading period (see Figure 5.1, p. 71).
Notice that the pattern of Bill's scores on effort and behavior is quite different from the pattern on the academic topics. Whereas the academic topics exhibit a gradual upward trend, indicating learning, the nonachievement factors show a more uneven pattern. There is more variation in statistical terms because performance in the nonachievement factors is more a matter of student choice than student knowledge or skill. That is, exhibiting proper behavior on a given day is probably more a function of whether the student chooses to follow school or classroom rules than whether he or she possesses the necessary knowledge or skill to follow them. In cases where student choice is the primary determinant for behavior, it is probably best to use the average score—as opposed to the final power law score—as the final score for a grading period. If you examine the final topic score for effort and behavior in Figure 5.1, you will notice that they are average scores.
That being said, a significant number of teachers to whom I have presented this idea have argued that effort, behavior, and attendance are learned skills just as much as is reading a table or understanding precipitation. My response has been that if this is their belief, and the pattern of students' scores supports this belief, then they should use the final power law score, not the average score.
In Chapter 4, I begrudgingly noted that the point or percentage method could be adopted to keep track of students' performance on topics, although I recommended against it, in favor of the rubric method. Here we consider how to compute final topic scores when percentages have been used to keep track of topics.
When points are used, it is sometimes inappropriate to use the power law to compute final topic scores. An assessment may represent only a small fraction of the knowledge covered on a given topic. For example, the first assessment on the topic of precipitation might address only 50 percent of the knowledge to be covered in that topic over the grading period. If a student receives a score of 100 percent on that assessment, it doesn't mean that he or she has exhibited an understanding of 100 percent of the knowledge of that topic—only the 50 percent of that topic addressed in that particular assessment. The next assessment might address 75 percent of the knowledge important to the topic. A score of 80 percent in the second assessment would indicate that the student has exhibited competence in 80 percent of the 75 percent covered by the assessment—the student's learning has increased. However, if you were to plot the power law line, it would be going down: from 100 percent to 80 percent.
Because of the difficulties associated with the amount of topic a percentage score addresses, I recommend that teachers use the average score as the final topic score when percentages, as opposed to rubric scores, are used to keep track of student performance.
Once final topic scores have been estimated or computed (using the final power law score or the average score), an overall grade can be computed. In Chapter 7, we will consider alternatives to combining topic scores into a single grade. In fact, a strong case will be made that there is no truly meaningful way to combine scores on various topics into an overall grade. In this section, however, we will assume that a teacher has no option but to report an overall letter grade and must, therefore, combine topic scores in some fashion. The most straightforward approach is to use some weighting scheme like that depicted in Figure 5.9. Here the teacher has given a weight of 2 to topics 1, 3, and 5, thus giving these topics twice the quantitative influence on the final grade as the other topics. The teacher assigns these weights before the grading period begins and communicates them to students at the beginning of the period. Upon assigning final topic scores for each student, the teacher then applies the weights to each topic. Figure 5.10 illustrates this computation for a student named Mark. Note that the quality points for Mark have been calculated by multiplying his score on each topic by the weight assigned to the topic. An average topic score is then calculated by using the following formula:
Total Quality Points
-------------
Total of Weights
Topic
Weight
1. Precipitation
2. Ocean Currents
3. Temperature
4. Reading Tables
5. Estimation
6. Classifying
7. Communicating Clearly
8. Effort
9. Behavior
In Mark's case, his total quality points are 33.04. The total weights applied to the nine topics are 12 (the sum of column 3 in Figure 5.10.) To determine Mark's average rubric score on the weighted topics, the teacher divides his total quality points (33.04) by the total weight (12), for an average of 2.75.
Student Name: Mark
Quality Points
6.50
5.50
6.00
2.58
2.72
3.74
Totals
33.04
The next step is to convert each student's average rubric score into an overall grade. The teacher might decide on the following conversion system:
3.26–4.00 = A
2.76–3.25 = B
2.01–2.75 = C
1.50–2.00 = D
1.49 or below = F
The cutoff points for the various grades may appear arbitrary and, in fact, they are. This is one of the greatest weaknesses of using overall letter grades. Guskey (1996b) explains that the arbitrary nature of cutoff points is a built-in flaw of overall grades.
. . . the cutoff between grade categories is always arbitrary and difficult to justify. If the scores for a grade of B range from 80–89 for example, a student with a score of 89 receives the same grade as the student with a score of 80 even though there is a 9-point difference in their scores. But the student with a score of 79—a 1-point difference—receives a grade of C because the cutoff for a B grade is 80. (p. 17)
Up to this point, the discussion has assumed that a teacher's grade book is a notebook of some kind in which teachers make entries using pencil or pen. Appendix B even includes a page that teachers can duplicate to create their own grade books. However, in this age of technology, it seems silly to expect teachers to spend their valuable time doing what can be done more efficiently and accurately by a computer. Specifically, a number of computerized grade book programs can easily be adapted to accommodate the rubric-based approach described in this book. One such program was designed by New Measure of Cedar Rapids, Iowa (http://www.rubrics.com). The software has many of the same basic features as do many other computerized grade books: course titles can be recorded, student names can be entered and quickly retrieved for each course, names are automatically alphabetized, and so on. Additionally, some of the program's features are particularly suited to the rubric approach to record keeping.
First, the program asks the teacher to identify the topics that will be addressed in a given course. As many as 12 topics can be assigned to a single course. Because teachers have total control of these topics, they can include nonachievement factors if they wish. The teacher who designed our sample unit on weather would respond to the prompts of the program by entering five subject-specific topics:
one thinking and reasoning skill:and one communication skill:The teacher would also enter two nonachievement factors:
Copyright © 1999–2000 New Measure. All Rights Reserved. Www.newgradebook.com. Used by permission.
The program is now set up to keep track of any assessment the teacher scores and automatically computes final topic scores and grades. For example, assume that the teacher has finished scoring the quiz depicted in the last chapter that addressed the topics of precipitation and reading tables (see Figure 4.13, p. 55). Each student in the class has received two rubric scores on the quiz: one for each topic. To enter these scores into the grade book, the teacher calls up a list of all students. The computer then asks which topics the assessment addressed. After the teacher records the two scores for each student, the computer automatically assigns these scores to the appropriate topics. At any point, the teacher can view an individual student's scores on all topics addressed in the unit, as shown in Figure 5.13.
Note that in addition to the score on each topic, the computer reports the average score for each topic and the final power law score (referred to as the "learning trend" score in the computer program). These are computed automatically by the computer based on the topic scores entered in the grade book up to that point. Each time a new score is entered for a topic, the average score and the final power law score are automatically recomputed. Finally, note that the overall grade is also reported. This score, too, is automatically computed based on the weights the teacher has assigned to each topic, the score ranges identified for the various letter grades, and the score the teacher has instructed the computer to use as the final topic score: either the final power law score or the average score. (The teacher can also use some other score as the final topic score, if desired.) Each time a new set of scores is entered into the grade book, the average scores for each topic, the final power law score for each topic and the overall grade are all recomputed. Thus, using a computer program, the teacher need not estimate or compute any quantities. However, the teacher has complete control over which topics are recorded, how they are weighted to compute a final grade, and which quantity to use as the final topic score.
Even without a computerized grade book specifically designed for a topic-based system, most computer spreadsheets can be adapted to function in a similar way. For example, the Microsoft Excel "worksheet" can be programmed to accommodate the system described in the last two chapters. Experienced Excel users can enter the formula described in Appendix D (pp. 133–134) to compute a final power law score for each topic.
In summary, the use of computer software specifically designed for a topic-specific grade book or a generalized spreadsheet like Excel can make classroom record-keeping much easier and computation of final topic scores and final grades very precise.
Given that the purpose of organizing a grade book around topics is to provide accurate feedback, a critical feature in the success of this approach is to make sure that students and parents are aware of the nature and purpose of the topic-based system. Figure 5.14 shows a sample letter to parents about the content addressed in the sample unit on science described in this chapter. Note that in the letter the teacher has described the topics that will be addressed in the course, the weights that will be applied to each topic, and how the topics are to be assessed.
Dear Parents:
During this grading period, we will be studying weather. To master this concept, students will have to show competence in five topics related to weather:
3. Measurement of Temperature
In addition, students will have to show competence in two general skills and abilities:
6. Ability to Classify
7. Ability to Communicate Clearly
Finally, students will also be held accountable for the following two areas that aren't necessarily academic in nature but very important to their learning:
Students' grades at the end of the period will be based on these nine factors. Specifically, the following weight will be applied to each of the nine factors:
If you have any questions, please feel free to call.
This chapter presented techniques for computing final topic scores and final grades. Although computing an average of the topic scores is the most common method, we saw that the power law of learning can be used to estimate final topic scores that more accurately reflect student learning. This chapter also presented
Subscribe to ASCD Express, our free email newsletter, to have practical, actionable strategies and information delivered to your email inbox twice a month.
ASCD respects intellectual property rights and adheres to the laws governing them. Learn more about our permissions policy and submit your request online.