Blogbook -- The White Supremacy of Grades in the Literacy Classroom

Entry 27

If you’ve been listening carefully to this blogbook, you may have figured out that I understand the practice of grading literacy performances by a single standard as racist and White supremacist. In short, grading in the literacy and language classroom participates in White language supremacy if we use the dominant singular standards for language that the White supremacist system has given us. Unfortunately, those are usually all we have as teachers. That condition is on purpose. 

Now, I’m not saying that your own habits of language that you’ve acquired as a teacher, and the expectations they provide you as a reader, are inherently racist, or that you can’t use your habits to read and respond to student writing. I’m saying how we typically use our habits of language in our classrooms as singular standards for grading is racist. 

Please understand that what I’ve just said is NOT calling any teacher racist or White supremacist because they grade students’ writing. I don’t believe there are many true racists left in the world. And yet, we all are already in histories and systems built on racism, so to participate in racist practices is not BEING a racist. It’s participating in them. We all do this to some degree. We don't get much choice in the matter. But I'm not trying to be fatalistic about this condition either. 

Most literacy teachers from my experience want to help their students, all of them. They want social justice in their classrooms, schools, communities, and country. But if it ain’t clear, let me be clear here: White language supremacy does not require White supremacists to exist. That’s why we say it is structural, institutional, systemic. Pick your term. They all mean that the conditions we live and work and teach and learn in are racist because that’s how our conditions have been made. And your or my good antiracist intentions alone will not change those systems. 

Beyond the kinds of English language standards that we always use for grading in schools, the history of grading itself gives us a clue as to why it is tightly linked to race and White language supremacy. Grades were invented, not so ironically, around the same time that Lothrop Stoddard was writing his eugenic stuff. It was the last decades of the 19th century, and the problem colleges in the U.S. were having was how to handle the increasing number of students entering. These were previously ungraded places where small groups of homogenous young, White, privileged men circulated. People in these places, like Harvard, were not given grades before this time, even though they took examinations and classes. The assumption was not to measure them but to educate them, until the groups got bigger and relatively more diverse (note 159). 

At that time, the diversity came in the form of social class diversity. So assessment and grading were ways to control who could enter, and exclude those who were, in the parlance of the time, uneducable because of the places that made them. The logic of grades and assessment was simple: Keep people from that place out of this place. And of course, the dike metaphor circulates underneath all of this (see post 26). 

Grading Practices Spring from White Supremacy
Grading practices began in colleges like Yale and Harvard out of an institutional need to process larger numbers of students. Grading was mostly an administrative solution to an administrative problem, not a pedagogical solution to a pedagogical problem. But after the compulsory attendance laws in the U.S. in the mid-1800s, public schools (now compulsory) began filling up and getting bigger. Public schools followed the lead of colleges. 

Brave Work

Write for 15 minutes. 

Imagine your classroom without grades on any work. You might still provide a midpoint and final grade, perhaps even periodic progress reports or check-ins with students and parents, but no grades on individual assignments or points on anything. 

What do you think the benefits in learning for students could be in such an environment? What problems would go away? What evidence of learning might you have at your disposal to show students, parents, and perhaps a principal? What resistance would you likely encounter and what would people be most concerned about? Why do they care about those things?
By the end of the 19th century, grades were being used because they were faster, and offered a more efficient system of evaluation given the extreme growth in the number of students attending schools. And like other systems and structures that were invented with and in racist discourse, grading as a system maintained its racist biases too, even though its main purpose was not to keep some students out and others in. It’s expressed function was to manage larger groups of students, to keep track and rank them, move them through compulsory systems of education, and make decisions about who was eligible for higher education. The consequences, however, of managing in this way were to keep the rewards and opportunities of society in particular places and people, namely White, middle- and upper-class places and people. Grades accomplished this through the use of singular White, middle- to upper-class standards of English (for a deeper treatment of grading's history, see my blogpost/podcast, "Where Does Grading Come From?"). 

This period of history happens to also be a period where there was a growing interest in what later would be called psychological measurement. This is the same period that produced the IQ test, the Army Alpha and Beta tests, the psychological classifications of “idiot,” “imbecile,” and “moron,” the SAT, and college entrance exams. It also spawned many statistical studies of grading. Most of them were concerned with understanding and creating reliable systems of marking student papers in schools. Most notable of these were Daniel Starch and Edward C. Elliott’s  “Reliability of the Grading of High-School Work in English” (1912), and the numerous articles that identified the Hillegas-Thorndike scales for normalizing the judging of student writing (note 160). 

Of concern was the wide variance in grades given to the same papers in these studies. Starch and Elliott open their article, saying, 

the marks or grades attached to a pupil's work are the tangible measure of the result of his attainments, and constitute the chief basis for the determination of essential administrative problems of the school, such as transfer, promotion, retardation, elimination, and admission to higher institutions; to say nothing of the problem of the influence of these marks or grades upon the moral attitude of the pupil toward the school, education, and even life. (note 161)

Clearly those who were creating and refining reliable grading systems, like Starch and Elliott, saw them not just as administrative solutions but aiding in the learning and psychological development of children. Grades helped society, so went the logic. Thorndike too saw reliable grading scales for teachers as more than just aiding schools and teachers to do their jobs. 

Reliable grading scales, that is a set of examples of writing at different levels of competency, could help schools engineer society through consistent judgements of language. He says about his own scale, 

Such a scale would obviously be of great value to teachers, civil-service examiners, college-entrance boards, scientific students of education and any others who need to measure the merit of specimens of English writing in order to estimate the abilities of individuals or changes in such abilities as the result of mental maturity, educational effort and other causes. (note 162)

Reliable grading of language was important to these early teachers, researchers, and educational scholars because they understood grading as an important way to sift through the individuals in society and find the most capable. And of course, what this meant was to find those who were most like them, the elite White men making the grading scales and systems. 

This concern for reliable grading mechanisms in schools and colleges continued through the latter decades of the twentieth century. This focus meant assuming or creating a singular standard that could be used consistently in judging language. Not only did this limit what “good” language and ideas were, but the emphasis on reliability tended to ignore questions of validity. That is, focusing on measuring consistently something as difficult and varied as language meant that the experts sacrificed what they thought they were measuring? 

Consistency in Grading = White Homogeny or White Hegemony
If your goal is to measure something like “good writing” in consistent ways, the first thing that gives in that assessment system will be a diverse and varied way of understanding what “good writing” can look or sound like because diversity leads to inconsistency. You have to get rid of variance or a diversity in both what constitutes good writing and the ways people judge any instance of languaging. 

Variance in judging, in fact, is how statisticians measure the strength and direction of reliability coefficients, or the strength and direction of how reliable or consistent a test is at producing the same results or decisions for similar students or, in the case of written exams, similar instances of language (note 163). And of course, to be reliable in grading language, you have to favor someone’s languaging -- that is, you gotta use some group’s habits of language as the standard. There can be only one scale, or a limited number of examples in the scale. 

This can best be seen in assessments of language that start with “norming” its readers to some standard set of examples. Norming is designed to help a group of readers of a test read in the same way, so that consistency in judgements can be achieved across all those readers and many samples of writing. If you’ve ever been a part of an AP reading administration, you’ll know what I’m talking about. Norming, therefore, is a process by which you get an already inherently diverse set of readers to read in a singular and particular way. 

Norming is a way to calm the linguistic seas when any number of people judge the same language. Norming keeps grades and judgements from storming. Norming is getting a bunch of people to use the same standard through the same set of biases. And of course, this is not completely possible. That’s why such tests need reliability coefficients. Norming has been our response to the storming of our linguistic seas (picture found at Deviant Art). 

The problem with norming is that someone has to come up with the language norm, that is, the standard. The nature of what is good and appropriate language has to be assumed to be universal in such grading systems. Not only that, but you have to get everyone to judge those standards in the same ways. Standards or ideals in language are not the same as peoples’ judgements of language that use those standards. In blunt terms, language standards are someone’s or some group’s ideals. Judgements of language are an individual’s application of those ideals to instances of language. The second is a translation of standards. Judgements and decisions are one to two steps removed from the standard. 

Judgements also come from people, and people, even racially, culturally, and linguistically homogeneous people, are not exactly the same in how they judge, so they don’t see things exactly the same way, even though most in a homogenous group may see things similarly. In such grading systems, White middle and upper class students get more advantages because they already come from places where people language in ways that match the standard and the biases used by teachers to administer that standard, that is, to grade language. And those who do not come from those White places do not get those two kinds of advantages. They are disadvantaged to some degree, and often judged as less capable. 

Grading, then, turns out to be a White supremacist practice in public schools by focusing on reliable grading methods and standards. And as discussed earlier in this blogbook, this occurrence is overdetermined. Racist discourse in society pretty much guarantees that grading will reproduce White language supremacy. Historically, those involved in these efforts were deeply invested in White habits of language that they all shared. Elite, White men controlled it all, and they didn’t question their own habits of language or biases. They focused on making consistent judgements, or building reliable grading systems from their languaging and language biases. 

This amounted to schools and teachers arguing about how to be consistent in grading student writing, and assuming as a standard the elite White habits of English language they were using. It was a way to arrange the terms of engagement in one front of the language race war, school (see post 26). These terms were mostly taken for granted in the battle over the next century in the U.S. Most ignored the fact that people use language differently and effectively, that one can be highly communicative in a wide range of Englishes (English dialects), or habits of English language. 

Lippi-Green's "Linguistic Facts of Life"
Linguists have discussed this fact for years (note 164). One kind of English, one group’s English, cannot have a monopoly on communicative effectiveness, even as one group may establish their English as preferred or “proper” from a position of power in a society or system (recall Gramsci’s hegemony, see post 17). The linguist, Rosina Lippi-Green (see picture), offers what she calls, the “linguistic facts of life,” which are five facts about language that linguists have agreed upon for some time now. They are: 
  • all spoken language changes over time
  • all spoken languages are equal in terms of linguistic potential
  • grammaticality and communicative effectiveness are distinct and independent issues
  • written language and spoken language are historically, structurally and functionally fundamentally different creatures 
  • variation is intrinsic to all spoken language at every level, and much of that variation serves an emblematic purpose (note 165)
While she’s mostly speaking in terms of spoken language, written language is related to spoken language, although not exactly the same, as the fourth item above states. Much of her textbook on this topic explains and illustrates the relationship between spoken and written language. 

Lippi-Green further explains that most people typically work from the “standard language myth,” which assumes that there is a single standard of best or appropriate languaging and that standard comes from an elite, White, monolingual English speaking group of people. This assumption leads to the mistaken belief in “standard language ideology” (SLI). SLI says that an “idealized nation-state has one perfect, homogenous language,” which of course is not true in reality (note 166). Thus we get English-only laws, and policies about particular English standards and learning outcomes in schools. 

Lippi-Green draws on James and Leslie Milroy’s work that originally defined SLI. Lippi-Green gives this definition of SLI: “a bias toward an abstracted, idealized, homogenous spoken language which is imposed and maintained by dominant bloc institutions and which names as its model the written language, but which is drawn from the spoken language of the upper middle class” (note 167). And that group of people have and continue to be mostly White in the U.S.

What has made these “linguistic facts of life” difficult to recognize in schools is that reliability, or consistency, in grading often has been synonymous with fairness. But you can be consistently unfair and not realize it -- that is, have a system that consistently produces bad decisions because it is designed that way. And in fact, that is what we have now. Now, if we let go of the idea that a system or situation is inherently fair or unfair, then fairness can be a social construction. We build or create fairness through negotiation and agreement in a community or classroom. It’s socially constructed. 
Brave Work

Write 10 minutes. 

Take some time today or tomorrow to notice the various ways that English is used differently around you, perhaps noticing all the ways  “non-standard” or non-dominant English is used. Pause and make a note of as many as you can. Try to write down the non-dominant phrases or sentences you hear or see in your daily life. 

At the end of the day, look at all the non-dominant English you’ve collected. In each case, did you understand the message? Did it seem appropriate for its intended purpose and context? Were you confused by the language? Was your first impulse to correct the language? What racialized aspects of the languaging helped make meaning for you, or others?
So, fairness in judging language and learning has to do with (1) who participates in judgements and the making of standards, and (2) what kinds of outcomes occur in that system. Those elite White men who made the grading systems we inherit today assumed anyone who didn’t language like them was obviously not smart, not clear communicators, or didn’t have what it takes to merit society’s best jobs, schools, and positions. No other groups of English language users got to be at the table to determine things. That condition is unfair and narrow-minded. It’s also White language supremacy. 

Smitherman's "Linguistic Universals"
A prominent Black linguistics scholar whom I’ll say more about later, Geneva Smitherman (see picture), offers a similar kind of list as Lippi-Green’s through a discussion of the politics of Black English in America in the 1970s, but her discussion is still relevant today. It also adds important ideas about language that help explain the flaws in singular language standards and the grades that are used with them. Smitherman calls her list “linguistic universals,” and discusses each in detail. 

The first universal is that all languages have dialects and everyone speaks a dialect of some language. However all dialects of a language, such as English, share a “deep structure” that produces meaning, but they may not all share “surface structures.” She gives the examples of “He do know it” and “He does know it.” Both statements are understandable. They share a deep structure that makes meaning for an audience who speaks and writes English (note 168). They don’t share the same exact surface features. 

The second universal Smitherman lists is that “every language is systematic and represents rule-governed behavior on the part of speakers.” Children learn these rules intuitively before they even get to school. She explains: “small children quickly learn to go from ‘sentences’ like ‘up’ to ‘Pick me up.’ They learn principles of English word order like The book is here, not Is here book the” (note 169). In short, our deep structures are learned intuitively when we are young. 

The third universal is that “[a]ll native speakers of a language have an underlying competence in the forms of their language and thus can produce sentences they’ve never said before as well as understand those they’ve never heard before” (note 170). Because we know the deep structure of English, we can understand and produce an infinite variety of sentences. 

The fourth and last universal Smitherman offers is that “a language reflects a people’s culture and their world view, and thus each group’s language is suited to the needs and habits of its users” (note 171). In short, the way we language springs from our material conditions, which includes things like our cultural heritages and the people around us. We language the way we do because of the habits we embody and the things we do in the places we are from with the people there. 

Earlier in the same chapter, Smitherman draws on Franz Fanon (see picture), the influential West Indies psychologist and social philosopher who led important discussions and theorizing of anti-colonial liberation struggles. Smitherman says, “[i]n the history of man's inhumanity to man, it is clearly understandable why the conqueror forces his victim to learn his language, for as black psychiatrist Franz Fanon said, ‘every dialect is a way of thinking.’” Furthermore, she quotes Fanon: “to speak means . . . to assume a culture . . . The Negro of the Antilles will be proportionately whiter . . . in direct ratio to his mastery of the French language” (note 172). Language standards and grading by them in classrooms participate in the historical colonizing of BIPOC bodies globally. 

Panning back a bit, it isn’t hard to see that all of these psychological assessments, including grading in schools, work from fundamental assumptions that dictate that all people have uniform cognitive dimensions and those dimensions fall along inherent hierarchies, bell curves, that can be measured with some precision, then used to engineer society toward some end (for more on bell curves and grading, see post/podcast, "Why Does Conventional Grading Feel So Unfair"). In short, large scale testing and grading began as and continues to promote eugenic purposes, which in this instance is another way to say colonial purposes. Grading by a single standard participates in White language supremacy. 

Grades Often Function As False Reifications
What I’m getting at is that all tests, even classroom grades to some degree, fabricate a human psychological dimension, like “IQ,” or a unified aptitude for college, or “clear and effective prose,” which do not exist before the test or grading scale is made. Stephen J. Gould called this phenomenon the fallacy of “reification,” meaning a tendency for scientists and test makers to falsely “convert abstract concepts into entities” (note 173). We create stuff, then act like that stuff is real, and many teachers in Pygmalion fashion fall in love with their own language standards in their classrooms, forgetting that their standards are made up. And unlike the myth from Ovid, there is no Aphrodite here to bless our creations and make them real for us (see picture: "Pygmalion and Galatea," 1797, by Louis Gauffier). 

In history and schools, there have been devastating consequences to the fallacy of reification. We wanted to measure intelligence so we made up the construct or idea of “intelligence” as a unified and definable thing, then acted as if that thing was real in our world outside of the test, then used that test to determine jobs and other opportunities for people in society. But, we just made up “IQ” for a particular use, like finding all the “idiots” and “morons” in a society in order to sterilize or stigmatize them, which would keep the preferred smart ones from having babies with them. But there are no idiots or morons without the tests and labels, without someone's imposed standard (note 174). 

Or today, we make entrance exams for colleges so that we let in only the “best” students, which ends up keeping out poorer students and BIPOC students. Or we create a rubric for our writing assignment in a class. The “A” paper is this. The “B” paper does that, etc. And what constitutes such categories or grades is made by the test or grading rubric, all managed through our (the teacher’s) biases, biases and preferences acquired in White supremacist educational systems, racist inheritances. And we forget that such distinctions in language do not exist inherently outside of that test or rubric or classroom. They are just what we see and reify in our examples, scales, and grading. 

But after we’ve created such rubrics and examples for our students, after we’ve installed them in our classrooms, they circulate and take on a life of their own in our and our students’ minds. Teachers, students, and parents act like such constructs are real. We act as if an “A” paper is a real, identifiable, and unified thing. And that objectively, it is different from a “B” or “C” paper. They are not. And this is what all those early reliability studies couldn’t fathom. How can a paper be judged as an A and a C, and a D at the same time? Simple: Grades are human judgements. And judgements are more about the judge than what that judge is judging.

Resistances To The Hegemony of Grading and Testing Testing
It’s worth noting that by 1960, there were cracks in the grading edifice in schools and colleges. A few were looking for alternatives to grading students. For instance, in that year, Max S. Marshall, a microbiology professor at the University of California, offered a gradeless system he and his faculty colleagues were using for pre-medical courses. In 1968, he published a book on the practice called, Teaching Without Grades (note 175). In 1964, the Public Education Association in New York state proposed to not report grades to students in high schools because grades had a negative effect on students’ motivations to take challenging classes in subsequent semesters (note 176). Their proposal was to simply provide reporting that indicated “merely that a student has satisfactorily completed a specific course on a specific level of difficulty.” And by 1993, Alfie Kohn offered his compelling argument against grades in Punished by Rewards (note 177).

What teachers like Marshall and Kohn realized is something that Gould makes clear. Test scores and definitions of grade-quality in writing are made up constructs. They function out of necessity from the politics and biases of the systems and people who invent them. So if you want to be an antiracist teacher, the question is not just who are you and where do your habits of language come from, but what are your politics as a teacher and how are your politics of language antiracist? How do they counter White language supremacy in your grading practices? 

F. Allan Hanson (see picture), an emeritus professor of Anthropology at the University of Kansas, argues from an historical perspective that all tests have these reification problems. But they do other things in societies too, things that easily turn White supremacist in schools. His central argument in his book, Testing Testing, offers two theses. The first will sound familiar. He argues that “tests do not simply report on preexisting facts but, more important, they actually produce or fabricate the traits and capacities that they supposedly measure.” In a nutshell, this means that our tests and grades actually create the things we say they measure. 

For instance, a paper written by a student that gets an “F” grade because the teacher judges it as lacking something significant, maybe evidence for its arguments, or deeper engagement with key texts, is only made by the grading system. That “F” paper is created by that system that is defining and looking for something like deeper engagement with texts that a specific teacher is looking for in a specific way -- that is, using particular textual markers that come from that teacher’s habits of language. 

Change any element in this assessment ecology, say the judge or the rubric, and you very well may change what the paper seems to measure or demonstrate. This finding has been proven time and again in studies of the reliability of readers (note 178). That’s why Starch and Eliot and Thorndike were trying to find reliable grading scales and methods. People always judge language differently. That’s the nature of languaging. And when scales, rubrics, and teachers only use White habits of language, you may get more consistency, but you’ll also get White language supremacy. You'll silence the beautiful voices of BIPOC and other marginalized students. And this condition in classrooms also keeps the tools of criticality away from students and teachers.   

Hanson’s second thesis is related: “tests act as techniques for surveillance and control of the individual in a disciplinary technology of power” (note 179). As I mentioned in the last blogpost (post 26), schools, assessments and tests generally do three things: control bodies, enforce accountability, and measure people in order to control where resources go. Tests and grading exercise power in schools. They engineer places and societies, so they easily turn eugenic. And they do this through the use of the White racial constructs that are central to their definitions of merit and competency. In literacy classrooms, these White constructs are standards.  
Brave Work

Write for 15 minutes. 

Hopefully, you have access to your past grade books and grading records. Look at a few years of grades in your classrooms (or as many years and courses as you can). Without making too many assumptions about your past students, do some tabulations by racial group and grade group. 

How many As did you give and who racially got them? What about the Bs? What about all the students who did poorly (Ds and Fs)? Who were they racially? 

In each racial group, what are the percentages of poor grades? Do more Black and Latine students do poorly in your class, relatively speaking, than their White peers? 

How might your past grades be eugenic in outcome? 
Given the above, it shouldn’t be a surprise that the contemporary term “eugentics,” which today is closely associated with White supremacist groups, was founded in the latter part of the 19th century by Francis Galton, a British mathematician and statistician who promoted the idea after misreading his half-cousin’s, Charles Dawin’s, famous work, The Origin of Species. Galton, the creator of the statistical concepts of correlation and regression toward the mean, the inventor of meteorology and the first weather map for Europe, was influential in creating what we know today as the institutional practice of grades, and much of his thinking and work was motivated by -- premised on -- White supremacy through his promotion of eugenics. What better way to engineer society than through literacy practices in schools, languaging that follows groups of people already segregated by race (read more about Galton in my blogpost/podcast, "Why Does Conventional Grading Feel So Unfair?"). 

Galton, like those statisticians after him, such as Karl Pearson, another White supremacist, were part of a larger scientific movement that attempted to measure people on a single scale, often around reifications, made up concepts, that then were measured hierarchically through tests, like IQ, SAT, and AP scores (by the mid-twentieth century), all of which found that White people -- the judges and measurers -- were on top, were superior. Stephen Jay Gould, Norbert Elliot, Mary Lowell Smallwood, and Angela Saini offer various accounts of this historical scientific movement that fed or led to the grading practices we know today (note 180). 

Our fast thinking about the usefulness of standards and grading for learning in all literacy classrooms, because of its history, is structured by White supremacist and eugenic thinking, which has, over the decades, become naturalized, so much so that it don’t even feel like race thinking anymore. It’s just thinking, just an objective standard, just a grading practice that helps us all in our places. But what exactly is grading helping us do? 

For me, the answer to that question is simple: The practice maintains White language supremacy. 


This blog is offered for free in order to engage language and literacy teachers of all levels in antiracist work and dialogue. The hope is that it will help raise enough money to do more substantial and ongoing antiracist work by funding the Asao and Kelly Inoue Antiracist Teaching Endowment, housed at Oregon State University. Read more about the endowment on my endowment page. Please consider donating to the endowment. Thank you for your help and engagement.