1703 North Beauregard St.
Alexandria, VA 22311-1714
Tel: 1-800-933-ASCD (2723)
8:00 a.m. to 6:00 p.m. eastern time, Monday through Friday
Local to the D.C. area: 1-703-578-9600
Toll-free from U.S. and Canada: 1-800-933-ASCD (2723)
All other countries: (International Access Code) + 1-703-578-9600
by Robert J. Marzano
Table of Contents
This book is about designing classroom grading systems that are both precise and efficient. One of the first steps to this end is to clarify the basic purpose of grades. How a school or district defines the purpose of grades dictates much of the form and function of grades.
Measurement experts such as Peter Airasian (1994) explain that educators use grades primarily (1) for administrative purposes, (2) to give students feedback about their progress and achievement, (3) to provide guidance to students about future course work, (4) to provide guidance to teachers for instructional planning, and (5) to motivate students.
For at least several decades, grades have served a variety of administrative functions (Wrinkle, 1947), most dealing with district-level decisions about students, including
Research indicates that some districts explicitly make note of the administrative function of grades. For example, in a study of school board manuals, district guidelines, and handbooks for teaching, researchers Susan Austin and Richard McCann (1992) found the explicit mention of administration as a basic purpose for grades in 7 percent of school board documents, 10 percent of district guidelines, and 4 percent of handbooks for teachers. Finally, in a survey conducted by The College Board (1998), over 81 percent of the schools reported using grades for administrative purposes.
One of the more obvious purposes for grades is to provide feedback about student achievement. Studies have consistently shown support for this purpose. For example, in 1976, Simon and Bellanca reported that both educators and noneducators perceived providing information about student achievement as the primary purpose of grading. In a 1989 study of high school teachers, Stiggins, Frisbie, and Griswold reported that this grading function—which they refer to as the information function—was highly valued by teachers. Finally, the study by Austin and McCann (1992) found that 25 percent of school board documents, 45 percent of district documents, and 65 percent of teacher documents mentioned reporting student achievement as a basic purpose of grades.
When used for guidance purposes, grades help counselors provide direction for students (Wrinkle, 1947; Terwilliger, 1971). Specifically, counselors use grades to recommend to individual students courses they should or should not take and schools and occupations they might consider (Airasian, 1994). Austin and McCann (1992) found that 82 percent of school board documents, 40 percent of district documents, and 38 percent of teacher documents identified guidance as an important purpose of grades.
Teachers also use grades to make initial decisions about student strengths and weaknesses in order to group them for instruction. Grading as a tool for instructional planning is not commonly mentioned by measurement experts. However, the Austin and McCann (1992) study reported that 44 percent of school board documents, 20 percent of district documents, and 10 percent of teacher documents emphasized this purpose.
Those who advocate using grades to motivate students assume that they encourage students to try harder both from negative and positive perspectives. On the negative side, receiving a low grade is believed to motivate students to try harder. On the positive side, it is assumed that receiving a high grade will motivate students to continue or renew their efforts.
As discussed later in this chapter, some educators object strongly to using grades as motivators. Rightly or wrongly, however, this purpose is manifested in some U.S. schools. For example, Austin and McCann (1992) found that 7 percent of school board documents, 15 percent of district-level documents, and 10 percent of teacher documents emphasized motivation as a purpose for grades.
According to the research cited in the previous sections, each of the five purposes for grading has some support from educators. A useful question is which of the five purposes is the most important or, more generally stated, what is the relative importance of the five purposes? Figure 2.1 depicts the results of the Austin and McCann (1992) study compared with an informal survey I undertook in preparing this book. (That survey is discussed in depth in Chapter 7.) If one uses the average rank (the last column) from the two studies as the criterion, Figure 2.1 indicates that using grades to provide feedback about student achievement should be considered the primary function of grades. Guidance is ranked second, instructional planning and motivation are tied for third, and administration is last. However, to obtain the most accurate picture of the opinions about the various purpose of grades, it is important to notice the variation in responses in Figure 2.1: whereas teachers in my informal survey ranked guidance as the least important, board-level documents ranked it as the most important. Whereas district-level documents ranked use of grades for administrative purposes last, administrators in my informal survey ranked it second.
Austin and McCann
Feedback About Student Achievement
Key: 1 = high, 5 = low
In short, there is no clear pattern of preference across the various sources except for the importance of feedback. Consequently, schools and districts must undertake their own studies of teachers and administrators regarding the purpose of grades. Again, use of an informal survey with teachers and administrators is discussed in depth in Chapter 7.
Another issue to address when developing a coherent grading system is the point of reference from which grades are interpreted. Three primary reference points are commonly used to interpret grades: (1) a predetermined distribution, (2) an established set of objectives, and (3) progress of individual students.
Assigning grades based on a predetermined distribution can be thought of as a "norm-referenced" approach to grading. The concept of norm-referencing is so embedded in educational practice that it is worth discussing in some detail. Most educators are familiar with the term as it relates to standardized tests. For example, scores on tests like the Iowa Tests of Basic Skills commonly are reported as percentile ranks. Results for a particular student on the reading comprehension section of a standardized test might be reported as the 73rd percentile, meaning that the score the student received was higher than 73 percent of the scores received by other students. These "other students" to which the sample student's score is compared are referred to as the "norming group." With standardized tests, the norming group is usually assumed to be students across the country at the same age/grade level. Additionally, it is commonly assumed that the scores of the norming group, when arranged in order of magnitude, are distributed in a "bell curve."
The technical name for the "bell curve" is the "normal distribution." As depicted in Figure 2.2, the normal distribution is quite symmetrical, which allows mathematicians and statisticians to make a wide variety of predictions based on it. You might recall from statistics or measurement classes that about 68 percent of the scores in a normal distribution will fall within one standard deviation above and below the mean; about 95 percent of the scores will fall within two standard deviations above and below the mean, and almost 100 percent of the scores will fall within three standard deviations above and below the mean.
The concept of normal distribution has had a profound effect on educational practice—and, indeed, on Western society. The mathematical equation for the normal distribution was formulated as early as 1733 by Abraham de Moivre (1667–1754). Its critical importance to probability theory was later articulated by mathematicians Pierre de Laplace (1749–1827) and Carl Friedrich Gauss (1777–1855). Today, Gauss is commonly thought of as the father of the normal distribution. In fact, so compelling were his writings about the characteristics and applications of the normal distribution, that it is frequently referred to as the "Gaussian distribution."
The wide use of the normal distribution in education stems from the fact that many physical and psychological phenomena adhere to it. For example, Figure 2.3 depicts the distribution of the height in inches of young Englishmen called upon for military service in 1939 as well as the distribution of IQ scores of 2,835 children ages 6 and 11 years randomly selected from London schools. For illustrative purposes both are plotted on a common axis.
Figure 2.3 dramatically illustrates that many characteristics do take the shape of a normal distribution when they are arranged in order of magnitude. Perhaps this is why many prominent researchers assume that the normal distribution can and should be used to describe student achievement. Among the most prominent are Arthur Jensen, Richard Heurnstein, and Charles Murray. Jensen is perhaps most well known for his book Bias in Mental Testing (1980). In it he argues that because aptitude is distributed normally, educators and psychologists should generally expect grades (or scores on any educational test) to conform to a normal distribution. Jensen notes that a tendency for scores to take the form of the normal distribution is so strong that it occurs even when tests are designed in such a way as to avoid a normal distribution. He offers the following anecdote about Alfred Binet designing the first practical intelligence test:
Historically, the first workable mental tests were constructed without any thought of the normal distribution, and yet the distribution of scores was roughly normal. Alfred Binet, in making the first practical intelligence test, selected items only according to how well they discriminated between younger and older children, and between children of the same age who were judged bright or dull by their teachers, and by how well the items correlated with one another. He also tried to get a variety of items so that item-specific factors of ability or knowledge would not be duplicated. . . . and he tried to find items rather evenly graded in difficulty. . . . Under these conditions it turned out, in fact, that the distribution of raw scores (number of items correct) within any one-year age interval was roughly normal. (Jensen, 1980, p. 71)
Richard Heurnstein and Charles Murray wrote the popular book The Bell Curve (1994). In this controversial work, the authors make a case not only that intelligence is distributed normally, but that it is a prime determinant of differences in factors such as income level, parenting ability, success in school, and virtually every social indicator of success. Of course, this position has rather strong negative implications for members of certain socioeconomic strata.
Because a teacher uses the normal distribution as a basis for grading does not necessarily mean that he or she agrees with the assertions of Jensen or Heurnstein and Murray. However, by using the bell curve as the reference point for grading, a teacher is implicitly assuming that the performance of students should or will approximate the bell curve. Consequently, the teacher forces a set of scores or set of grades into a normal distribution. To illustrate, assume that during a nine-week grading period students in a given course accumulated the points depicted in Figure 2.4.
The teacher would arrange these scores in order from the lowest to the highest. Then, using knowledge of the normal distribution, the teacher would partition the scores into groups and then assign grades (shown in Figure 2.5).
Percentage expected based on normal distribution
Expected number out of 30 students
From the discussion of Figure 2.2 we know that certain percentages of scores in a normal distribution fall between certain intervals above and below the average score. Specifically, scores within a normal distribution can be organized into the following six categories:
Finally, the teacher would assign letter grades to each category. In Figure 2.5, the teacher has assigned a letter grade of A to the top two categories, a grade of B to the third highest category, and so on.
It is important to note that the normal distribution is not the only distribution that can be the point of reference for assigning grades. For example, one teacher I worked with told me that she always uses the following scheme to assign grades:
There should be about as many marks of 3.5 or higher as there are pupils in a group with IQ's of 120 or above. There should be about as many marks of F (1.0 to 1.5) as there are pupils with IQ's of 95 or less. It is expected that the number of marks at the 3.5 level or higher, and at the 1.5 level or lower, may have a variance of 25 percent of the pupils in the IQ groups of 120 and up, and 95 and below. (1992, p. 10)
Specific learning objectives are another common point of reference for grades. Many measurement experts strongly endorse this approach. For example, James Terwilliger (1989) notes that "grading should be directly linked to an explicitly defined set of instructional goals . . ." (p. 15). This approach is commonly thought of as a "criterion-referenced" approach to grading, as opposed to the norm-referenced approach described previously.
Again, you are probably familiar with the term "criterion-referenced" as it relates to tests designed to assess student achievement on state standards. In such cases, the criterion is a specific score sometimes referred to as a "cut score." Students who do not obtain a score equal to or greater than the "cut score" are assumed not to have mastered the content at the requisite level. Of course, the key to designing such a test is to ensure that it contains items that students will answer correctly if they have a mastery of the content, but will answer incorrectly if they do not. Unfortunately, it is very difficult to design tests with truly valid "cut scores" (see Livingston, 1982).
Mathematics educators Warner Esty and Anne Teppo (1992) have described how they use a criterion-referenced system as the basis for grades in mathematics classes. They explain their system in the context of a unit on the concept of function. Before the unit begins, they clearly describe what the grades A, B, C, and so on will represent. Then, when the unit begins, they communicate this scale of understanding to the students. Each quiz, test, homework assignment, and so on is then graded using this scale. A C on the first quiz of the grading period indicates to a student that "at the present time" your understanding of the concept of a function is at the C level. The final grade for a student is his or her level of understanding of the concept at the end of a grading period.
This last point is very important to the criterion-referenced approach to grading. Because the target is a specific level of learning, the final grade is commonly considered the level of learning the student has reached by the end of the unit of instruction. This makes good sense within a criterion-referenced system. Stated negatively, it would make little sense to combine all test scores for a given student (by computing an average score for example) during a unit, because this might penalize the student for his lack of knowledge at the beginning of the unit. The driving force behind criterion-referenced grading is to ascertain the extent to which students reach a specific level of knowledge or skill in a specific learning outcome at the end of a grading period.
As the name implies, reference to knowledge gain uses individual student learning as the basis for grading. In this approach, the point of reference for each student is the level of skill or understanding at which the student begins the grading period. Stated differently, each student's entry level of knowledge is his or her unique point of reference. Each student's grade then, is based on how much he or she progresses beyond the initial level of knowledge or skill. The logic behind this approach is that students should not be compared to one another but, rather, to the amount of progress they can legitimately be expected to make. One challenge in this approach is to design a scale that can accommodate the different beginning points of reference for each student. Following is a sample of the types of scales that must be used in this approach.
A = Exceptional effort and improvement in the student's ability
B = Good effort; improvement exceeds expectations
C = Adequate effort; improvement consistent with level of effort and ability
D = Little improvement, but some evidence of effort
F = Little or no improvement; no effort
Although there is no right way or wrong way to design grades, there are ways that fit best with a given set of assumptions or beliefs. This book is based on two assumptions:
Research unquestionably supports the importance of feedback to specific learning goals. To illustrate, after reviewing 7,827 studies on learning and instruction, researcher John Hattie (1992) reported that providing students with specific information about their standing in terms of particular objectives increased their achievement by 37 percentile points. To dramatize the implications of this research, assume that two students of equal ability are in the same class learning the same content. Also assume that they take a test on the content before beginning instruction and that both receive a score that puts their knowledge of the content at the 50th percentile. Four weeks go by and the students receive exactly the same instruction, the same assignments, and so on. However, one student receives systematic feedback in terms of specific learning goals; the other does not. After four weeks, the two students take another test. Everything else being equal, the student who received the systematic feedback obtained a score that was 34 percentile points higher than the score of the student who had not received feedback. It was this dramatic finding that led Hattie to remark: "The most powerful single innovation that enhances achievement is feedback. The simplest prescription for improving education must be ‘dollops of feedback' " (p. 9).
Before concluding this general discussion of grades, let's look at a final topic: doing away with any form of quantitative feedback. More specifically, the practice of providing students with quantitative feedback about their knowledge or skill has been strongly criticized by a few zealous and, unfortunately, persuasive individuals. Education writer Alfie Kohn is perhaps the most well known of this group. In a series of publications, Kohn asserts that almost all forms of grading should be abolished. His popular book Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A's, Praise and Other Bribes (1993) begins with an impassioned case against the use of rewards to motivate students. Kohn explains that American education is ostensibly trapped in a pattern of trying to bribe students into achievement.
. . . Regardless of the political persuasion or social class, whether a Fortune 500 CEO, or a preschool teacher, we are immersed in this doctrine; it is as American as rewarding someone with apple pie.
To induce students to learn, we present stickers, stars, certificates, awards, trophies, memberships in elite societies, and, above all, grades. (Kohn, 1993, p. 11)
To counteract the negative influence of behaviorism on American education, Kohn cites a number of studies indicating that rewards do not positively influence behavior. For example, rewards are not good motivators in helping people lose weight, quit smoking, or use seat belts. He also cites research indicating that rewards do not improve performance on cognitive tasks. He places heavy emphasis on a dissertation by Louise Miller, who arranged a series of drawings of faces so pairs of identical and nearly identical images would be flashed on the screen. Nine-year-olds were then asked to differentiate between identical and nonidentical faces. Some of the students were paid when they succeeded; others were not. As Kohn explains, Miller
brought 72 nine-year-olds into her laboratory one at a time and challenged them to tell the two faces apart. Some of the boys were paid when they succeeded, others were simply told each time whether or not they were correct. (1993, p. 42)
In a later work entitled Beyond Discipline: From Compliance to Community (1996) Kohn summarizes the research on rewards:
At least two dozen studies have shown that when people are promised a reward for doing a reasonably challenging task—or for doing it well—they tend to do inferior work compared with people who are given the same task without being promised any reward at all. Other research has shown that one of the least effective ways to get people to change their behavior (quit smoking, lose weight, use their seatbelts, and so on) is to offer them an incentive for doing so. The promise of a reward is sometimes not just ineffective but counterproductive—that is, worse than doing nothing at all. (p. 33)
Finally, in a 1999 article entitled "From Grading to Degrading," Kohn asserts that
Grades tend to reduce students' interest in learning itself.
Grades tend to reduce students' preference for challenging tasks.
Grades tend to reduce the quality of students' thinking. (p. 39)
Some of Kohn's recommendations have merit—particularly 4, 6, and 7. Others, however, are questionable, at best, and downright dangerous, at worst. I believe Kohn's argument suffers from four primary weaknesses or misconceptions.
First, Kohn does not adequately address the complexities surrounding the issues of assessing and evaluating human learning. Stated differently, he ignores the research on the inappropriate and appropriate uses of assessment as a tool in the learning process. This book is an attempt to articulate the very issues that Kohn has ignored.
Second, Kohn does not accurately interpret the influence of behaviorism on education today. Specifically, he interprets as behavioristic a wide variety of educational practices that have little or nothing to do with behaviorism. Psychologist John Anderson explains that this is a common trap:
Modern educational writers assume that the behaviorist approach to education has been a failure, although little hard evidence has been cited. Recent writings have tended to generalize the perceived failure of the behaviorist program to the conclusion that any program that attempts to analyze a skill into components will fail. (1995, p. 396)
Third, Kohn appears to misinterpret the research on grading, perhaps because he confuses rewards with feedback. Although it is true that tangible rewards have little effect on achievement, feedback has a strong and straightforward relationship to achievement. As mentioned previously, in a review of 7,827 studies in education, Hattie (1992) found that accurate feedback to students can increase their level of knowledge and understanding by 37 percentile points.
Finally, Kohn does not address the rather extensive body of research on rewards that contradicts his basic thesis. A basic premise for Kohn is that the rewards inhibit intrinsic motivation. However, in a review of 96 experimental studies, researchers Judy Cameron and W. David Pierce (1994) note: "Results indicate that, overall, reward does not decrease intrinsic motivation. When interaction effects are examined, findings show that verbal praise produces an increase in intrinsic motivation. The only negative effect appears when expected tangible rewards are given to individuals simply for doing a task. Under these conditions, there is a minimal negative effect on intrinsic motivation . . ." (p. 363). Speaking specifically about grades, researcher David Berliner explains:
In fact, the evidence is persuasive that grades do motivate students to learn more in a given subject area. . . . The judicious use of grades that are tied to objective performance, as in mastery and some other instructional programs, appears to be related to increased achievement and positive student attitudes. (1984, p. 70)
Research offers strong support for grades and others forms of feedback (even rewards) as useful tools for learning. Unfortunately, because of a great many misconceptions about their use they have fallen out of favor with some educators.
In this chapter we've looked at the basic purpose of and the point of reference for grades. Out of five potential purposes, feedback was identified as the most important. Out of three possible points of reference, specific learning outcomes was deemed the most compatible with feedback as a purpose. Finally, we examined arguments against the use of grades, particularly those proposed by Alfie Kohn.
Copyright © 2000 by McREL (Mid-continent Research for Education and Learning) Institute. All rights reserved.
No part of this publication—including the drawings, graphs, illustrations, or chapters, except for brief quotations in
critical reviews or articles—may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopy, recording, or any information storage and retrieval system, without permission from ASCD.
Subscribe to ASCD Express, our free e-mail newsletter, to have practical, actionable strategies and information delivered to your e-mail inbox twice a month.
ASCD respects intellectual property rights and adheres to the laws governing them. Learn more about our permissions policy and submit your request online.