Ask About Accountability / What's Valid? What's Reliable?

W. James Popham

Question: Assessments originally created to measure standards and evaluate schools are now being "retrofitted" to measure students' grade-to-grade progress. What are the risks of retrofitting assessments in this way?

—Emily A. Larsen, White Bear Lake Area High School, White Bear Lake, Minnesota

Answer: A single test can accomplish multiple measurement missions in only a limited number of instances. For example, if an accountability test is built to measure a modest number of important and well-described curricular outcomes and does so by providing per-outcome and per-student information about each assessed outcome, it can do a good job of evaluating schools yet also help classroom teachers make better instructional decisions. But to construct a test capable of doing this sort of double duty, it's almost always necessary to tackle those two assessment functions from the get-go. Belatedly retrofitting solo-purpose tests for a second measurement mission is almost always a dumb idea. Folks who pursue such an approach may know scads about psychometrics, but they know little about teaching.

Question: Someone once told me that the more reliable a test is, the less valid it is—and vice versa. Is this true?

—Karen Ruffner, Rockford Public Schools, Rockford, Illinois

Answer: Sorry, Karen, but you've been running with the wrong crowd. It's not true that when a test becomes more reliable, it becomes less valid. Nor is the vice-versa version of that same proposition true.

Here's an easy way of keeping these two assessment concepts straight: Think of reliability as whether the test measures something with consistency; think of validity, however, asaccuracy. But it's not the validity of the test that's involved here. Instead, it's the validity of the inference—or, if you prefer, the interpretation—that's based on a test taker's performance. Lots of folks refer, far too loosely, to the validity of tests. How many times, for example, have you heard someone ask, Is this test valid? But tests aren't valid or invalid; it's the test-based inference that's valid or not.

To make a valid (that is, accurate) inference about a student's mastery of a particular cognitive skill, we need tests that are reliable. If a test measures something inconsistently, how can educators be confident that their test-based inference is accurate? Test reliability is a necessary condition for us to make valid inferences, but it is not a sufficient condition. A genuinely reliable test may or may not provide evidence leading to a valid inference. For instance, my bathroom scale may measure my weight accurately, but if I tried to use that scale to make an inference about my problem-solving skills, the resulting inference isn't going to be valid. If you keep in mind, with regard to assessment, that reliability equals consistency whereas validity equals accuracy, you'll do just fine.

Question: What kinds of tools can we use to assess younger students' performances in citizenship, in grades 1–3, for example?

—Shaikhah al Shamsi, Citizenship and Development Center, Abu Dhabi, United Arab Emirates

Answer: Measuring students' status with respect to citizenship is tricky. It might seem that all you need to do is to identify the skills and knowledge you want students to acquire and then measure whether students have, after instruction, acquired such skills and knowledge. But there is scant unanimity among social studies educators regarding what the appropriate citizenship skills and knowledge really are. When you focus on very young children, many of whom are not yet skilled readers, the assessment difficulties increase dramatically.

I encourage you to consider the possibility of assessing your students' citizenship-related attitudes as well as their cognitive achievement. A number of powerful attitudes and values are associated with being good citizens, such as respect, tolerance, honesty, responsibility, courage, and compassion. To assess the affective status of students who have not yet become skilled readers, teachers sometimes rely on observations of those students' routine behaviors in class. Unfortunately, experience suggests that such behavior-based inferences about students' affect are frequently incorrect. Regrettably, for young students who can't read, it's almost impossible to establish a "setting of anonymity" required for accurate affective assessment. However, as soon as the children can read reasonably well, you can employ anonymous self-report inventories to assess their attitudes toward, for example, people from different ethnic or social groups.

Question: Students in a classroom assessment course I teach sometimes raise the issue of whether instructional procedures "based on sound science" really are adequately supported. When Robert Marzano and others identify research-supported teaching practices, how much confidence should teachers place in those practices?

—Jeffrey Glanz, Professor of Education, Yeshiva University

Answer: I've been following with admiration the work of Bob Marzano for many years. Thankfully, when Marzano or any other sensible synthesizer of empirical evidence about teaching arrives at a research-based recommendation regarding instruction, he or she always offers it with appropriate caveats.

Because teaching is so consummately particularistic—that is, an activity involving particular teachers teaching particular students in particular settings—it is impossible to find research results specific to such situations that will yield infallible instructional guidelines. The best we can do is to tell teachers that when they use a given research-supported tactic, the tactic will probably prove effective. The odds are on the teacher's side. And having the odds on your side can be a real boon. Just ask any professional gambler.

So when your students raise questions about the soundness of research-supported teaching practices, point out that even Bob Marzano doesn't supply iron-clad money-back guarantees. But it's clearly better to use teaching practices that probably will succeed than to use those that probably won't.

James Popham is Emeritus Professor in the UCLA Graduate School of Education and Information Studies. At UCLA he won several distinguished teaching awards, and in January 2000, he was recognized by UCLA Today as one of UCLA's top 20 professors of the 20th century.

Popham is a former president of the American Educational Research Association (AERA) and the founding editor of Educational Evaluation and Policy Analysis, an AERA quarterly journal.

He has spent most of his career as a teacher and is the author of more than 30 books, 200 journal articles, 50 research reports, and nearly 200 papers presented before research societies. His areas of focus include student assessment and educational evaluation. One of his recent books is Assessment Literacy for Educators in a Hurry.

Learn More

ASCD is a community dedicated to educators' professional growth and well-being.

Let us help you put your vision into action.

Discover ASCD's Professional Learning Services

From our issue

Teaching Students to Think

Go To Publication