Why Does Conventional Grading Feel So Unfair?

This is a series of blogposts meant for students who are in courses using grading contracts of some kind to determine their final course grades, or those who just want to understand better what grades are, what they do in classrooms, and how they effect learning. This is the second post in a series of five blogposts meant to address questions about grading and grading contracts. If you're a teacher (or an inquisitive student), you might look at my Labor-Based Grading Contracts Resources page. 

This series is a collaboration with the really awesome podcast, Pedagogue  (@_Pedagogue_) with Shane Wood. You can listen to me reading this blogpost at Pedagogue, or use the widget below. But maybe you just want to read it on your own below, or follow along.  

***

Q2. Why Does Conventional Grading Feel So Unfair? 

Assumptions of Mediocrity Produces Unfairness

This question of fairness could be answered in a number of ways. For instance, standards used to grade you in writing classrooms are made by other people, not you, the students in the classroom. Our level of participation in such grading systems often determines how fair we feel those systems are, no matter what the results are. Fairness in any system is usually a function of how much participation people in that system can exercise and the results or outcomes of the system in a group or society. 

But let’s consider one really important, defining characteristic of all grading, which doesn’t always beg questions of fairness, but should. It’s a bias in the technology regardless of the standard used to determine grades. It is the bell curve bias. 

A key figure in establishing the bell curve bias is Francis Galton. Galton was, among other things, a statistician, a numbers guy. He believed in the power of statistics and numbers as a way to understand who were the best people in society and where they came from, which was another way of saying what good or bad characteristics do groups of individuals have. Galton’s goal was to improve the human race -- or rather, put elite White people at the top and in charge. 

In 1883, he coined the term “eugenics,” which means “good creation,” “good in birth,” or “noble in heredity.” But Galton had a more specific definition and application in mind. He described eugenics as 

the science of improving stock, which is by no means confined to questions of judicious mating, but which . . . takes cognizance of all influences that tend in however remote a degree to give the more suitable races or strains of blood a better chance of prevailing speedily over the less suitable than they otherwise would have had.

While he wasn’t the first to have such ideas -- Plato entertained similar ideas in his book, The Republic -- Galton did believe that most of a person’s intelligence was inherited and could be understood in a variety of objective measurements, such as head size and reflexes. These kinds of false biological beliefs about how individuals come to have traits and intelligence made possible lots of racist pseudoscience of the time, such as craniology and phrenology (see the phrenological map image). 

In fact, in 1882, Galton established the first center for the study of human intelligence. He also was the first to try to develop an intelligence test, sort of like the IQ tests we know today (and yes, those are racist too). The connection between eugenics, intelligence testing, and the practice of grading in classrooms can be seen in their similar biases. Galton migrated his theories of randomness in physical phenomena to groups of humans and human performance, such as intelligence, which others after him accepted as fact. For example, Galton invented the Galton Board or the quincunx (also see "History of Statistics and the Quincunx"). It is a simple device that demonstrates through randomly falling beads how what Galton called a “normal distribution” or a bell curve will be obtained every time (see an animated demonstration). 

This theory of bell curves in random physical and natural phenomena has influenced the biases in grading systems, which are number and ranking systems that Galton and others like him helped invent. Put simply: When we use grading or ranking systems, almost all of us assume a bell curve of scores will end up being produced, even when we are not thinking about it at all. Schools and departments often check the effectiveness of teachers or curricula or grading systems by how well they produce bell curves of grades in courses. The bell curve bias often suggests what makes a good grading system, even in diverse student populations. 

Here’s a way to see this bell curve bias: If a teacher is not thinking about it, their judgements of students’ writing, say in essays, will fall along a bell curve. Why? Because if you use a single standard to judge a group of things, such as essays or apples, the assumption is that most of those things are average regardless of your standard, most are grouped in a middle category of value, something around a C or B-. 

In his 1886 statistical work in genetics, Galton described a popular way of understanding what makes bell curves happen. He found that naturally occurring human traits such as height in families, exhibited what he called, “regression towards mediocrity,” or regression towards the mean -- that is, the middle value in a set. What he found was that if you took a really tall man, say someone 6’5”, and looked at his father, grandfather, his own sons and nephews, you’d find those other males in the family tended to be shorter, more mediocre in height. The really tall trait in that group of related men regressed toward mediocrity, or the mean of all men in that family. 

This theoretical principle of regression towards mediocrity has infiltrated all kinds of judgements beyond naturally occurring phenomena, such as height of people or the weight of coconuts, including teachers’ judgements of diverse ways of learning and communicating in schools. One thing to keep in mind is that you can only have a regression towards mediocrity if you have a standard by which you are measuring all these similar things or phenomena. Without that standard, there is no regression. There isn’t even mediocrity. What Galton’s work shows us is that in our grading systems our biases set us up to create things like mediocrity, excellence, and failure. These things in students are engineered in the biases of our grading technologies. They are not natural qualities in you as a student or writer. In fact, the bell curve bias often leads to faulty thinking, or errors in our thinking, which the Nobel prize winning economist Daniel Kahneman has discussed in his book Thinking: Fast and Slow.  

Now, Galton didn’t make up bell curves, or mediocrity. The bell curve bias is actually a cultural bias deeply ingrained in Western traditions that goes back at least to the 16th century with folks like Nicolaus Copernicus, the famous Polish mathematician and astronomer. These first astronomers and mathematicians used the principle of mediocrity to explain things like the average distance between stars. 

The principle of mediocrity was meant to explain the normal condition of our universe -- that is, life on Earth is not an exception in the universe, but a normal occurrence, given enough instances. That is, there’s nothing special about Earth or humans or our civilizations, given enough chances at making such a planet and people. It was a principle meant to de-center the known world and its natural phenomena. It was a philosophy that helped folks like Copernicus explain the world and the stars. Like “regression towards mediocrity,” the principle of mediocrity was an explanatory theory that helped Western Europeans make sense of their world. But both ideas of mediocrity require a single standard to judge a diverse set of things. 

Thus Galton’s theory of “regression towards mediocrity” and the principle of mediocrity explain how we’ve come to accept the bell curve bias today. These are two sides of the bell curve bias. They are also assumptions we’ve come to take as fact in all things. Both assumptions, however, were invented to explain natural phenomena, stars and physical traits, not cultures, intelligences, or social aspects of people, such as languages. This should call into question bell curves’ applicability and meaningfulness in grading in your writing class. 

Today it is difficult to escape the bell curve bias, even if you know about it, because we all have lived our lives holding the hand of the bell curve bias. It has been our companion. It helps us make sense of the chaos around us. And we see it everywhere, or think we do, and this confirms it as truthful to us. 

Test out this bell curve bias. Ask anyone in your life to rank anything on any scale you wish -- 1 to 10, 1 to 3, or bad-okay-good-great. They can rank apples in terms of their sweetness, redness, or roundness, or news articles from a random sample of newspapers in terms of their informativeness, helpfulness, or clarity, or even a random sample of movies based on how good they think each movie is, or how well it was cast. It doesn’t matter the things judged, the scale used, the dimension they are judging, nor even what that judge believes about the characteristics that make for higher scores or lower ones. There just needs to be enough things judged to create a distribution of scores on a graph, say 20 or 30, or better yet 40 or 50 items. 

The results will always be the same: bell curves. But this doesn’t make it right and fair for diverse writing classrooms. It’s a dangerous bias that likely will harm many students. And it will feel unfair. Perhaps you have been harmed by the bell curve bias in your schooling. And if you didn’t know about it, you likely blamed yourself for not being smart enough. But in reality, you were just different enough from the standard used and the teacher doing the judging to place you in a grade category you didn’t want. 

Remember, the bell curve bias organizes the way a teacher makes sense of students’ learning in classrooms. It also organizes your way of understanding your learning and abilities. It even can make you blame yourself for other people’s judgements of you. If we really believe that in schools and society the best way to understand and teach people is to honor and value their cultural, social, intellectual, and linguistic diversity, then why are we using grading systems that do the opposite? Why are we using the bell curve bias, a bias that assumes the principle of regression towards mediocrity and the principle of mediocrity? This should feel unfair, because in writing classrooms it is. 

Subjective Grades Produces A Sense of Unfairness 

Typically, many students, perhaps even you at some point, say that writing teachers’ grades are mostly flawed or unfair in ways that work outside of bell curve biases. They are too subjective and too much about the teacher’s personal biases. But, our biases frame or organize our ways of judging and seeing things. We actually need them in order to make any sense of anything. There is no way around biases when we judge or even just trying to understand something. To judge means to use or apply some set of biases to the thing you're judging. 

We often say that grading in writing classrooms is highly “subjective,” and so less fair than other courses. We typically think that subjectively produced judgements, like grades on essays, are bad because they have too much human error in them, too much bias and prejudice. If we accept the bell curve bias, then most teachers’ grades, which usually fall along bell curves, should be seen as fair when they make such bell curves, but we often don’t agree with this. In fact, it is because most teachers’ grades make bell curves that we individually find them unfair to us as individuals. 

Why? Because most of us do not see ourselves as average, as regressing towards mediocrity -- that is, we do not see ourselves in the bigger, average category, instead we usually see ourselves in exceptional categories. We are each special, not like those around us. And this is true, and it makes us experience grading as too subjective and unfair at the individual level. But it also points out that you are not like those around you, and you are especially not like your teachers. This is a human condition. But grading systems that use a single standard do not work from the assumption that uniqueness, or divergence from a standard, or diversity in groups, should be rewarded. These things are punished.  

What many students tend to prefer are “objectively” determined grades, which we think of as better because we think that they contain less or no human error. Apparently, we tend to equate error and unfairness as synonymous with human judgement and subjectivity. That is, error is to human judgement as unfairness is to subjectivity. Why? Because we’ve all experienced how different people often judge similar or the same things differently. But this logic works from a false binary, one that pits objectivity, fairness and accuracy against subjectivity, unfairness, and error.  

Each teacher’s bell curves are different, just as their other biases and expectations can be different. This is why we are all unique. For instance, you may have been put into a different grade category by two different teachers when you had submitted the same kind of writing, and your response might have been to say, “well, Dr. Klein likes my writing more than Dr. Lopez does.” This may be true from one way of looking at it, but it’s not the full reason why you get a better grade from Dr. Klein. 

You get a better grade because Dr. Klein’s expectations and judging process, when used with her standard and the bell curve bias, places you in an exceptional category, such as the A category. Meanwhile, in Dr. Lopez’ class, you are placed in the C grade category for similar writing. You are in a different category of his bell curve. It seems inconsistent, so unfair. But it isn't actually, if we can assume each teacher is using their standard consistently, and -- crucially -- if we assume that fairness equals internal consistency, or each teacher using their own standard consistently in their judgements. All these conditions are difficult to achieve in any writing course. 

In grading systems, this inconsistency feels wrong at the individual level, and it is especially unfair when we are talking about grading language. The uncomfortable fact is that if both teachers create bell curves in their respective grade books, then both grading systems are working as they have been designed, from the bell curve bias. Nothing has gone wrong in either case, at least not system-wise. It’s when we put those two grading systems and their teachers next to each other and see the differences in how those teachers judge similar writing. Should we expect different people, say Dr. Klein and Dr. Lopez, to judge similar writing differently? Of course. It’s the grades that mess up this normal occurrence in human judgement. 

This situation seems more unfair though when we know that there are other grading systems possible. What I’m describing is the difference between our felt sense of fairness in, say, a math or physics class where it is often not the teacher’s subjective evaluation of your performance that determines your grade but a so-called objective, multiple choice test. You either know those answers or you don’t. Your grade isn’t about whether the teacher likes you or your ideas or not. Your grade appears to be about how well you know physics or math. Your grade is objectively produced, or rather we think it is. 

Both systems are still subjective, only in different ways. Even those old Harvard professors who ranked their students by their family’s social class standings likely saw their grading system as objective and fair (see post 1 in this series). Today most of us see that old Harvard system as deeply subjective and flawed. Given our historical distance, changing ideas about people, and who is in colleges today, we can more easily see those early grading biases as, say, arbitrary and elitist. But our grading systems today have similar problems. 

While writing courses' grading systems are more easily seen as subjective, as based on the biases and personality of the teacher, those math course grades also are produced by subjective decisions. Who chooses the items on the test? Who chooses exactly how they are worded or what parts of the textbook to focus on in the test? Who decides the cut off scores for A, B, C, etc.? Who decides how much time students will get to take the test? These test-making decisions require biases. In fact, some teachers even impose bell curves on the grades students receive on tests after the test scores have been calculated, so depending on how well or poorly your colleagues do will determine what grade you get. These kinds of grading practices are simply explicit about their use of the bell curve bias. 

Here’s a lesson a savvy student might take from all this. If you know your teacher is going to grade your paper. Try to be the first one that teacher reads and grades. And definitely do not be the last one. The farther into the stack of papers the teacher reads, the more chances of the really good papers (to the teacher) they will find. So you don’t want to get regressed towards mediocrity because of those really good papers read before yours. You want to be judged with little to no chance of being compared next to those other students’ papers that live in your teacher’s mind. 

While the bell curve bias is very sticky in our mental processes of judging things, it is dangerous in culturally, racially, and linguistically diverse groups of students -- that is, when we apply it to ranking diverse human beings along any social dimension. This means in writing classrooms, grading writing is like judging apples and oranges based on how banana-like they are. The teacher’s rubric is telling everyone to be more banana-like, but you are an apple and your friend is an orange, and those fruits are wonderful too, just as bananas are. So why are we artificially stringing them all on a linear scale of best to worst, on how banana-like you can be as an apple?

The bell curve bias is especially dangerous when we are dealing with language, since language and identity are closely associated, and vary with different groups of people. And remember what grades historically have been used for: Ranking students by their social standing or merit, which in the U.S. has been tangled up with assumed hierarchies of social class, religious affiliation, immigration status, and race. 

Why do conventionally produced grades feel so unfair? Because those grading systems take the biggest variable, teachers and their necessarily subjective judging, and use them to make grades on diverse students. On top of this, the standards that all teachers use as well as the habits of language those teachers operate from to make judgements are closely associated historically with elite White racial groups of people. Those groups made our standards in writing classrooms. They have been in charge of textbooks, the rules of English in schools and society. But there are lots of varieties of English that are equally good at thinking critically, discussing things, analyzing ideas, describing our world, etc. 

Grading feels unfair to so many because, well, it is racist and White supremacist. But we should be clear: Writing teachers generally are not racist or White supremacist, but conventional grading with a single standard, because it contains the bell curve bias, because it assumes a regression towards mediocrity, because it operates from the principle of mediocrity in diverse groups of students today, and because it creates hierarchies of students and their writing, grading is racist and White supremacist. 

---

This blog is offered for free in order to engage language and literacy teachers of all levels in antiracist work and dialogue. The hope is that it will help raise enough money to do more substantial and ongoing antiracist work by funding the Asao and Kelly Inoue Antiracist Teaching Endowment, housed at Oregon State University. Read more about the endowment on my endowment page. Please consider donating to the endowment. Thank you for your help and engagement.

Comments