Can AI Produce Great Art?
What art criticism and aesthetic beauty teach us about the artificial artistic age.
It’s a devilish question that raises various related—and equally existential—questions about art’s purpose and identity. After all, if we can’t even agree on what art is, how on earth are we supposed to judge if the art made by AI is any good?
For those who haven’t heard the apocalyptic warnings about the algorithmic assault on human creativity, or read about the lawsuits and scandals surrounding the new breed of generative AI models, they are sophisticated tools for turning words into visual images. Written instructions (called “prompts”) are fed through a multi-layered architecture of neural networks, ground into a kind of statistical powder, and matched via probabilistic correspondences to shades in individual pixels. The resulting image is the neural network’s interpretation of the words in the prompt, informed by its familiarity with hundreds of millions of other JPEG files.
The process by which these models generate images has an ancient analog: the Greek concept of ekphrasis. Meaning literally “to speak out,” ekphrasis refers to the rhetorical technique of describing works of art with vivid language. The most famous instance of ekphrasis is Homer’s depiction of Achilles’s shield in the Iliad. When Homer describes the many emblems on the shield, which include “the inexhaustible blazing sun” and “noble cities filled with mortal men,” he is prompting the mushy gray nest of neural networks inside your brain to produce a detailed image of a “gorgeous and immortal work.”
What generative AI models make possible is the automation of ekphrasis. Rather than relying on humans to conjure an image in their mind’s eye based on a written description, these models do the work for us, turning written language into something we can actually look at.
Aesthetic virtue
Many famous works of art have resulted from artisanal versions of such a process. Bernini turned Ovid’s description of Apollo chasing Daphne into a marble masterpiece that unfolds in four dimensions as you circle around it. Even the text of Dante’s Paradiso, filled with heavenly experiences that the poet himself claimed that his eyes “could not sustain,” has inspired artists from William Blake to Gustave Doré.
One key difference between the above examples and the outputs of generative AI models is the nature of their respective prompts. For Bernini, Blake, and Doré, the words of Ovid and Dante were the prompts that fired their imaginations. For models like Midjourney or Stable Diffusion, the best prompts take a distinctly different form than classic literature.
Currently, summoning the finest images from these models means writing prompts with an esoteric syntax and vocabulary. If you want a minimalist image of a cat, including the word “minimalist” in the prompt will help, but repeating it five times in a row like a Buddhist mantra will help even more. Indicating that you prefer “4k resolution” or “volumetric lighting” can work wonders. So can specifying which rendering software style you would like the image to replicate. The end result can resemble an alchemist’s recipe of oddities that make little sense to humans but makes perfect sense to an AI model:
RAW SF VFX 3d hyperrealistic 32K cosmic crashed cars sculpture gestalt Trisha Paytas composition 🌌🚀🌉🛣️highway spatial hair CHRIS lABROOY sculpture made of HAIR of 🛣️ Trisha Paytas CAR 🌌. Belin postneocubismo BY Andrew Thomas Huang. A goddess cyberpunk with a ram skull. beautiful intricately detailed Japanese crow kitsune mask and BIOTECH kimono:: OCCULTIST 🛣️ bubble CARs, epic royal background, big royal uncropped crown, royal jewelry, robotic, nature, full shot, symmetrical, Greg Rutkowski, Charlie Bowater, Beeple, Unreal 5, hyperrealistic, dynamic lighting, fantasy art
This is what that input creates:
Those who specialize in communicating with AI models in this exquisitely bizarre idiom are called prompt engineers, and a stream of job advertisements for such positions has begun to emerge in the wake of the generative AI craze.
Painting with words
Despite the excitement surrounding prompt engineering, however, some argue that this peculiar way of addressing AI models will be short-lived. A chorus of tech-focused Twitter accounts have declared that “prompt engineering will not be a thing.” OpenAI CEO Sam Altman appeared to concur when he recently announced: “I don’t think we’ll still be doing prompt engineering in five years.” Rather than a litany of seeming gibberish, these commentators argue that AI models will eventually become much more responsive to prompts written in conventional sentences. These strange machine dialects will wither away like other programming languages rendered obsolete by software advances.
Should such advancements come to fruition, the only constraint on the quality of the outputs generated by AI models will be the linguistic proficiency of the people using them. The heights of human eloquence, rather than any technical limitations, will determine the upper boundary of their potential performance.
This poses one small problem. For ages, humanity has rested on linguistic laurels, smugly satisfied with the ability—unique in all known creation—to weave together long recursive sentences made of distinct symbols. Now, technologies like ChatGPT have come along that evince the same ability, and the decline of our collective eloquence has suddenly become conspicuous.
Over the last several decades, the average American’s vocabulary has shriveled. This trend holds true for every level of educational attainment. Even though we spend on average two years longer in school than in 1974, we somehow manage to learn fewer words.
Already in 1999, David Orr coined the term “verbicide” to describe the unlettered speech of even his most gifted students. In his essay of the same name, he cites research suggesting that “the working vocabulary of the average 14-year-old has declined from some 25,000 words to 10,000 words” since the 1950s. This tendency has been going on too long to blame convenient bogeymen like smartphones or social media. But it does suggest a link with the proliferation of an earlier generation of household screens.
Not only have our vocabularies shrunk, but many fields of professional writing have come to be dominated by mind-numbingly dull prose conventions. One finds more flair on a tombstone than between the pages of peer-reviewed scientific journals. Ask a scientist, and they will often embarrassingly admit that these acronym-laced articles are frequently not comprehensible to the specialists comprising their target audience. Meanwhile, at work, we are assailed by a grating corpspeak that memoirist Anna Wiener has memorably dubbed “garbage language.” This is the grandiose verbiage of company vision statements, full of buzzwords and pomposity but signifying nothing.
A picture’s worth a billion parameters
If prompt engineering is soon obsolesced by more capable AI models—less reliant on linguistic hacks and gimmicks—then the success of generative AI as a creative endeavor would seem to hinge on a linguistic toolkit rusted over with neglect. Currently, many of these models suffer from certain limitations. They often struggle to spell. They’re also not very good with hands and fingers. But these shortcomings will soon be fixed. And when they are, then it will be our unrelenting slide into inarticulateness that hobbles their potential.
To return to the question with which we began, the ability of these deep learning models to conjure beautiful works of art would seem destined to depend on our ability to compose beautiful lines of prose. It will be our capacity to carefully and evocatively describe the images we desire to see that will be tested, rather than the technology itself.
The nineteenth-century art critic John Ruskin’s advice to painters, applies equally well to artists working with the generative AI models of the future. In his five-volume work Modern Painters, he insists that “every class of rock, earth, and cloud, must be known by the painter, with geologic and meteorologic accuracy.” As each combination of elements in a landscape offers “distinct pleasures” and “peculiar lessons,” the quality of an image is partly determined by the artist’s knowledge of nature’s specificities. Accordingly, creating a desired effect with AI models will necessitate fluency in the many words we use to describe the world.
Consider Ruskin’s own description of J.M.W. Turner’s The Slave Ship, first exhibited in 1840:
It is a sunset on the Atlantic after prolonged storm; but the storm is partially lulled, and the torn and streaming rain-clouds are moving in scarlet lines to lose themselves in the hollow of the night. The whole surface of sea included in the picture is divided into two ridges of enormous swell, not high, nor local, but a low, broad heaving of the whole ocean, like the lifting of its bosom by deep-drawn breath after the torture of the storm. Between these two ridges, the fire of the sunset falls along the trough of the sea, dyeing it with an awful but glorious light, the intense and lurid splendor which burns like gold and bathes like blood. Along this fiery path and valley, the tossing waves by which the swell of the sea is restlessly divided, lift themselves in dark, indefinite, fantastic forms, each casting a faint and ghastly shadow behind it along the illumined foam. They do not rise everywhere, but three or four together in wild groups, fitfully and furiously, as the under strength of the swell compels or permits them; leaving between them treacherous spaces of level and whirling water, now lighted with green and lamp-like fire, now flashing back the gold of the declining sun, now fearfully dyed from above with the indistinguishable images of the burning clouds, which fall upon them in flakes of crimson and scarlet, and give to the reckless waves the added motion of their own fiery flying.
I submit that a useful metric of a generative AI model’s performance may very well be its ability to capture the visual correlate of phrases like those used by Ruskin above.
The Ruskin test, if you will.
An artistic Turing test
The oft-cited Turing test evaluates AI systems based on how effectively they can simulate a human being in conversation. If a person cannot tell they are talking to an AI, it has passed the test. The Ruskin test offers us a similar method of assessment. In this test, a person must compare a description of a work of art written by Ruskin with an accompanying image.
If the person cannot tell whether that image is the output of a generative AI model using Ruskin’s description as a prompt or a photo of the painting Ruskin described, then the model has passed the test. Any model that passes will have demonstrated its capacity to transmute human eloquence into great art seamlessly. The moment a model passes the Ruskin test, the age of prompt engineering will be over, and a new era of eloquence will have begun.
An AI model that can skillfully render “reckless waves” or “green and lamp-like fire” will be one responsive to the most luxuriant expressions of human language. Such a model would reward eloquence, inspire it, and usher in a transformation in technical education. Courses in practical ekphrasis would supplement those in algorithms and data structures.
Disused words once coined to describe specific shades, landforms, or peculiarities of anatomy or architecture will need to be dusted off to better tailor future prompts. We shall need to reacquaint ourselves with the difference between vales and dales, between ash trees and elm trees, and between loggias and porticos. Repairing our vocabularies after decades of erosion will prove a key part of creating the best images these models can produce.
The advent of AI is only the latest in a series of historical traumas that have slowly undermined our sense of human exceptionalism. While Darwin showed that we share an indivisible genetic link with other animals and evolved by employing the same mechanism of natural selection, our notable language and art capacities still set us apart. Now a crowded field of neural networks has usurped these aspects of our unique identity.
What’s more, they have brought into stark relief the tragic fact that we failed to make good use of these talents. Responding to the challenges posed by these technologies will require us to undo a mechanization of thought that has flattened our forms of expression, making them frequently indistinguishable from algorithmic outputs. Doing so will require us to reconnect with our language, and with the world that inspired it, to seek out the exact places where these technologies fail and where a revitalization of human eloquence might help them succeed.
Jason Rhys Parry holds a PhD in comparative literature from Binghamton University and received a 2021 English PEN Translates Award. He tweets @JRhysParry.
Can AI Produce Great Art?
This article is well researched, yet extremely subversive and socially progressive in its core implications.
"Responding to the challenges posed by these technologies will require us to undo a mechanization of thought that has flattened our forms of expression, making them frequently indistinguishable from algorithmic outputs."
The mechanisation of thought is a side effect of mechanical automation. Orwell has a great paragraph in chapter 12 of 'The Road to Wigan Pier', which I will quote below:
'The function of the machine is to save work. In a fully mechanized world all the dull drudgery will be done by machinery, leaving us free for more interesting pursuits. So expressed, this sounds splendid. But presently the question arises, what else are they to do? Supposedly they are set free from 'work' in order that they may do something which is not 'work'. But what is work and what is not work? [...] The labourer set free from digging may want to spend his leisure, or part of it, in playing the piano, while the professional pianist may be only too glad to get out and dig at the potato patch. Hence the antithesis between work, as something intolerably tedious, and not-work, as something desirable, is false.
The truth is that when a human being is riot eating, drinking, sleeping, making love, talking, playing games, or merely lounging about--and these things will not fill up a lifetime--he needs work and usually looks for it, though he may not call it work. Above the level of a third- or fourth-grade moron, life has got to be lived largely in terms of effort. For man is not, as the vulgarer hedonists seem to suppose, a kind of walking stomach; he has also got a hand, an eye, and a brain. Cease to use your hands, and you have lopped off a huge chunk of your consciousness.
There is scarcely anything, from catching a whale to carving a cherry stone, that could not conceivably be done by machinery. The machine would even encroach upon the activities we now class as 'art'; it is doing so already, via the camera and the radio.
At a first glance this might not seem to matter. Why should you not get on with your 'creative work' and disregard the machines that would do it for you? But it is not so simple as it sounds. Here am I, working eight hours a day in an insurance office; in my spare time I want to do something 'creative', so I choose to do a bit of carpentering--to make myself a table, for instance. Notice that from the very start there is a touch of artificiality about the whole business, for the factories can turn me out a far better table than I can make for myself. But even when I get to work on my table, it is not possible for me to feel towards it as the cabinet-maker of a hundred years ago felt towards his table, still less as Robinson Crusoe felt towards his. For before I start, most of the work has already been done for me by machinery. I can get, for instance, planes which will cut out any moulding; the cabinet-maker of a hundred years ago would have had to do the work with chisel and gouge, which demanded real skill of eye and hand. The boards I buy are ready planed and the legs are ready turned by the lathe. I can even go to the wood-shop and buy all the parts of the table ready-made and only needing to be fitted together; my work being reduced to driving in a few pegs and using a piece of sandpaper. And if this is so at present, in the mechanized future it will be enormously more so. The tools I use demand the minimum of skill.
With the tools and materials available then, there will be no possibility of mistake, hence no room for skill. Making a table will be easier and duller than peeling a potato. In such circumstances it is nonsense to talk of 'creative work'.
But it may be said, why not retain the machine and retain 'creative work'? Because of a principle that is not always recognized, though always acted upon: that so long as the machine is _there_, one is under an obligation to use it. No one draws water from the well when he can turn on the tap. One sees a good illustration of this in the matter of travel. Everyone who has travelled by primitive methods in an undeveloped country knows that the difference between that kind of travel and modern travel in trains, cars, etc., is the difference between life and death. The nomad who walks or rides, with his baggage stowed on a camel or an ox-cart, may suffer every kind of discomfort, but at least he is living while he is travelling; whereas for the passenger in an express train or a luxury liner his journey is an interregnum, a kind of temporary death. And yet so long as the railways exist, one has got to travel by train--or by car or aeroplane.
The tendency of mechanical progress, then, is to frustrate the human need for effort and creation. It makes unnecessary and even impossible the activities of the eye and the hand.
There is really no reason why a human being should do more than eat, drink, sleep, breathe, and procreate; everything else could be done for him by machinery. Therefore the logical end of mechanical progress is to reduce the human being to something resembling a brain in a bottle. That is the goal towards which we are already moving, though, of course, we have no intention of getting there; just as a man who drinks a bottle of whisky a day does not actually intend to get cirrhosis of the liver". - George Orwell, 'The Road to Wigan Pier', chapter 12'
On the concept of ekphrasis - as an architect and a visual artist, I must admit reading about your vision of art raised my pulse. You seem well versed in literature and art theory, and thus you regard a text-to-image automation tool as an extraordinary opportunity to turn flowery, imaginative language into visual art.
But art is first and foremost participative. Just as Ruskin had to be mindful of natural landscapes, to participate in them physically and then analytically, 'to take it all in' in order to later comprehend a painting by Turner, so do painters and visual artists. You first imagine the scene you want to depict through visual imagination, only then you try to convey it through words. Writers do the same - Dante first imagined his realms, and later turned them into poetry. Same with music. McGilchrist writes on the participatory nature of art:
"A piece of music I have passively heard and overheard is familiar to the point of having no life; a piece of music practised and struggled with by a musician is familiar to the point of coming alive. One is emptied of meaning by being constantly represented; the other is enriched in meaning by being constantly present – lived with, and actively incorporated into ‘my’ life" - The Master and his Emissary.
What text-to-image AI does is it removes the necessity of participation. Training my mind to envisage a certain scene is no longer necessary. Training my practical skills to learn perspectival systems, colour theory, brush strokes, measuring proportions, constructive drawing, 3d modelling, texturing, rendering, post-production and compositing - none of these skills which are necessary for the visual artist to take that initial effort of visual imagination and bring it to the real world - is required with text-to-image AI.
The very aspects that make visual art meaningful and bring joy to the artist are removed from him, AI being a complete black box which sucks a string of words and spews the end result at the other end.
Now as a classical writer you might regard yourself well positioned to out-compete the text prompters, when the text-to-AI will finally be released from the shackles of bizare jargon; but I bet you would not be happy if someone made the opposite move, and allowed AI to create beautiful literature from visual imagery. So your literary skills would instantly become obsolete, just as those of visual artists.
Last thing I will say, as an architect with many years of practice, I can confirm that Iain McGilchrist was absolutely correct when he argued that thinking in images is superior to thinking in language; that the processes of imagination reside in the right hemisphere of the brain, and that 'Since the left hemisphere [responsible of thinking in language] actually inhibits the breadth of attention that the right hemisphere brings to bear, creativity can increase after a left hemisphere stroke, and not just in sensory qualities but […] in ‘numerous intellectual and affective components’' - idem.