Did A Bot Write This
Three or four months into the pandemic, depending on how one counts such things, the OpenAI corporation released their GPT-3 language model. GPT-3 is an automated system for generating texts that are difficult to distinguish from those from a human being in response to prompts and questions. It consists of a machine learning model with 175 billion parameters built on a vast corpus of data, including petabytes of information stored by Common Crawl, a non-profit that provides a free archive of the content of the public internet.
Alan Turing had originally conceived of a text-based imitation game as a way of thinking about our criteria for assigning intelligence to candidate machines. If something can pass what we now call the Turing Test: if it can consistently and sustainably convince us that we are texting with an intelligent being, then we have no good reason to deny that it counts as intelligent.
It shouldn’t matter that it doesn’t have a body like ours or wasn’t born of a human mother. If it passes, it is entitled to the kind of value and moral consideration we would assign to any other intelligent being. Turing’s test was intended to remove irrelevant conditions on our judgments regarding the physical features, material composition, etc., of the interlocutors.
Large language models (LLM), like GPT-3, are likely to be a central part of projects to build artificial general intelligence systems for reasons Turing had foreseen. While many philosophers were correctly impressed by the power of GPT-3 in the summer of 2020, they focused on its consequences for traditional philosophical questions about intelligence, cognition, and the like. For me, GPT-3 represented a hack that potentially undermined the kind of writing-inten- sive course that had served as the backbone of my teaching for two decades. I was less worried about whether GPT-3 is genuinely intelligent and more concerned about whether the development of these tools would make us less intelligent.
GPT-3 is impressive and has impressed the media. While it’s difficult to know how much clever public relations efforts shape contemporary media coverage of a new technology, there is something important about these systems independent of the usual Californian hype.
The effects of LLMs of this kind are probably significant, with implications in a range of contexts from obvious commercial applications to the less obvious effects on our psychological well-being, relationships, political discourse, social inequality, child development, care for the elderly, and education. We are becoming increasingly sensitive to the ways that technology changes society.
The philosopher Bruno Latour argued that technology is “society made robust.” But rather than being simply the projection of culture onto the physical world, technology has reshaped culture, society, and politics. Whereas mobile telephony had unexpected effects on love, friendships, and politics, LLMs will change the traditional relationship between writing and thinking. The initial effects will be obvious to teachers as we head into the coming school year. AI is looming over the education system, and while LLMs have received relatively little attention, classroom teachers will soon see the early stages of what promises to be a transformation in our relationship to writing.
In teaching, modern philosophers take written texts as the basis for what to do in lectures and discussions in the classroom. In addition, most contemporary philosophers aim to help their students to learn the craft of producing thoughtful and rationally persuasive essays. In some sense, the ultimate goal of the creators of LLMs is to imitate someone who has mastered this craft. Like most of my colleagues who teach in the humanities, philosophers are generally convinced that writing and thinking are connected. In some sense, the creators of GPT-3 share that view.
However, the first difference is that most teachers would not regard the student in their classroom as a large weighted network whose nodes are pieces of text. Instead, the student is regarded as using the text as a vehicle for articulating and testing their ideas. Students are not being trained to produce passable text as an output. Instead, the process of writing is intended to be an aid to thinking. We teach students to write because unaided thinking is limited, and writing is a way of educating a student in the inner conversation that is the heart of thoughtful reflection.
As a practical matter, it is difficult to maintain lengthy and complex chains of reasoning without the external scaffolding of a stable text. The development of writing has permitted philosophy, law, mathematics, and other complex argumentative endeavors to develop beyond the oral traditions of the past. Unaided oral traditions can contain valuable insights, but they allow only a limited level of articulation and complication in the development of arguments.
It is for this reason that we teach students how to research the content of their essays, how to develop and construct an argument, how to write an outline, then a draft, then the progressive stages of editing, how to cull the extraneous and keep the essential. At each stage, the student must make decisions.
These judgments happen as part of the student’s thinking about what they have written. The process of judgment places us in relation to the draft on the screen or on paper. In a way, the student’s essay becomes an interlocutor, a conversation partner the student builds as they go. The text, like the student’s own thinking, should be an unsettled work in progress. By contrast, an LLM takes the vast corpus of work already done and produces a finished product that looks close enough to what we would expect from educated human writers. The LLM predicts what a likely response to the question or prompt would look like, given the corpus it has at its disposal.
Teachers aim to cultivate the ability to write well in part because this signals their ability to think well. Put simply, learning to write has helped generations of students to improve their ability to think. Essays, research papers, and the grades associated with them are a way (albeit an imperfect way) that potential employers can know something about the quality of student thinking.
The grade that students receive in humanities courses de- pends, in large part, on the quality of their written work. And this is where AI begins to change the game. For example, Jasper AI is marketing its GPT-3-based software heavily this summer (other free services are also increasingly accessible). While Jasper AI is not explicitly mentioning students in its marketing materials, it is difficult to believe that academic writing is not one of their target markets.
By October, if we continue with business as usual in writing-intensive courses, I expect that faculty across North America will see student “work” that has been generated by LLMs. While some colleagues have already reported what they suspect are AI-generated essays, it will be pointless (for a range of reasons) to attempt to police the use of LLMs. Since the release of GPT-3, the marketing push and the friendly, accessible interface of services like Jasper means they will influence our teaching noticeably and very soon.
Testing the AI
I tried giving some of these systems standard topics that one might assign in an introductory ethics course, and the results were similar to the kind of work I would expect from first-year college students. Written in grammatical English, with (mostly) appropriate word choice and some convincing development of arguments. Generally, the system accurately represented the philosophical positions under consideration.
What it said about Kant’s categorical imperative or Mill’s utilitarianism, for example, was accurate. And a discussion of weaknesses in Rawlsian liberalism generated by GPT-3 was stunningly good. Running a small sample of the outputs through plagiarism detection software produced no red flags for me. After a little editing, GPT-3 produced a copy that would receive at least a B+ in one of our large introductory ethics lecture courses.
I thought perhaps I could foil the AI with some quirky, idiosyncratic assignments. It turns out that the system was also pretty good, certainly as good as a mediocre undergraduate student, at generating passable paragraphs that could be strung together to produce the kinds of essays that might ordinarily get a C+ or a B-. Simply being creative with one’s assignments or avoiding standard topics will not prevent students from using LLMs.
Students have long been tempted by services that write essays for them, and plagiarism is a constant and annoying feature of undergraduate teaching, but this is different. The LLM marks the end for standard writing-intensive college courses. The use of an LLM has the potential to disconnect students from the traditional process of writing and research in ways that will inevitably reshape their thinking. At the very least, these tools will require us to reconsider the mechanics of writing-intensive courses. How should we proceed? Should we concentrate on handwritten in-class assignments? Should we design more sophisticated writing projects? Multiple drafts?
A new tool for the classroom?
Perhaps we should teach them to work with these tools? One can imagine that Silicon Valley boosters might argue that the change will be salutary. Perhaps these tools will realign our attitudes to writing in the same way that calculators changed the way we think about mathematics. The ability to simply compute arithmetical operations is not a highly valued skill. Perhaps the ability to compose a coherent paragraph from grammatical sentences will also fade into the background as our concerns with writing aspire to more elevated and rarer skills. Perhaps there will be new ways to cultivate our students’ capacity to engage in the kinds of sophisticated reasoning that writing has made possible. I see no reason to be optimistic.
Conceivably, LLMs will be a solution for certain kinds of tedious and unrewarding tasks and will benefit certain kinds of students. One can imagine AI systems helping a gifted biology student who struggles to organize results into a passable research paper or a non-native speaker of English whose scientific research simply needs to be written up into a coherent article. For writing that serves an expository function – what we might call functional writing – the LLM may be a valuable tool.
The effects of technology on education have been mixed. The internet is a paradise for autodidacts but an abyss for everyone else. The promised convenience and economies of scale that were supposed to come with online education have failed to deliver high-quality education for most students but have been an undeniably valuable resource for highly motivated and intelligent students around the world. Access is great, but if access to information were all that were required, a library card would be as good as a college education.
Online learning and its malcontents
By June 2020, it was becoming clear to teachers at various levels that the pandemic disruption would continue and that many of our students would face serious setbacks as they struggled to complete their coursework without being physically present in the classroom. The costs of the decision to switch to online teaching during the pandemic are now clear. As usual, those who were disproportionately affected are those who were already disadvantaged in various ways beforehand.
In practice, technology is not a panacea in education. Contrary to the rhetoric of boosters, we now know that online education fails to adequately replace the traditional craft of teaching in embodied and meaningful social contexts. By April 2020, Eric Winsberg and I had argued publicly that the decision to close schools for more than eight weeks was not supported by epidemiological evidence, and was traditionally thought to be a very costly response to epidemics. One of the reasons that we and others were ignored was the assumption that online education could fill the gap. We now know that this did not happen.
The consensus response to the pandemic separated teachers and students in ways that made the mediating role of technology obvious. Zoom forced new modes of interaction and new habits on us in ways that changed the “classroom” for teachers and students alike. Zoom teaching is difficult to do well and most of us stumbled. In March and April of 2020, the disruptive shift left many formerly talented teachers adrift, and as we now see, it doomed large numbers of students to failure.
Teaching during the pandemic, it became clear that technology was reshaping education in ways that would be difficult to undo. Most students and faculty were miserable in Zoomworld, but some version of it will likely hang over many of us for the foreseeable future. Technological mediation is not new, of course, but the Zoom classroom is its most drastic form.
Like Zoom, the appearance of LLMs in education will serve as a disruptive technological mediation.
Zoom broke the traditional classroom in ways that we are still unpacking. LLMs will change the way we think about the relationship between writing and thinking in the classroom. In their white paper, the creators of GPT-3 flagged what they saw as some of the harmful consequences of its release. They concentrate on the widely discussed worries about bias and fairness in AI with respect to gender and race. However, they briefly indicate the potential misuse of GPT-3 in “misinformation, spam, phishing, abuse of legal and governmental processes, fraudulent academic essay writing and social engineering pre-texting.”
The authors explain why they were relatively unconcerned about the use of their system by malicious state-sponsored and other groups, but they left the problem of fraudulent academic essay writing undiscussed and did not appreciate its significance for the traditional project of education. Now that the LLM has arrived, it’s necessary for faculty to change the way we evaluate student written work in our courses and, more importantly, to rethink the role of writing in education.
My first independent teaching assignment was a course in environmental ethics that I taught during a sweaty summer in 1999. In some ways, the advent of LLMs puts me right back into the position of being a novice teacher. The first hurdle involves figuring out what is expected of students in a philosophy course.
How should they study for exams? How would they be evaluated? And how should they write a philosophy paper? In 1999 they heard me say that a philosophy paper is not like a book report; it’s also not an expression of one’s opinions or experiences. So, if it’s not about reporting on what others have said about the way the world is, and if it’s not about how one feels about the way the world is, what is it?
“There are facts, and there are opinions,” one student confidently insisted, “there’s nothing else to write about.” I saw my task as a novice teacher to be the cultivation of students’ ability to produce a rationally persuasive defense of a claim. This is one of the virtues of analytic philosophy. At our best, we analytic philosophers help students learn how to generate and evaluate arguments, become sharper reasoners, and, in time and with care, better thinkers all around.
In the age of the LLM, we will not be able to rely on written exercises to make the work of thinking happen. We will also find that writing skills that previously served as reliable signals of the virtues we associate with thinking can no longer do so. Of course, it’s those intellectual virtues, not the hundreds of thousands of student papers that we really care about.
John Symons is a professor of Philosophy at the University of Kansas.