At play with DALL-E 2: fieldnotes from the algorithm

By Ian Dull and Ariel Abonizio

Since its public launch in September 2022, DALL-E 2 – a generative text-to-image algorithm, which alongside Stable Diffusion, Imagen, Midjourney and other forms of ‘generative AI’ – has been making waves, even winning art competitions. More than anything, it has been inspiring endless rumination on the future of technology and art: will DALL-E be a new creative tool or the death of art altogether?

As social scientists studying emerging technologies, we have seen time and time again how the apocalyptic speculation and boundless optimism that surround new technologies rarely pan out as predicted: rather the truth often lies somewhere in between. So, we decided to see what we could learn by putting DALL-E 2 to the test ourselves – in an office-wide contest to ‘create’ the best artwork for our own walls. 

What follows here are early reflections and provocations from our own experiences playing with DALL-E as well as a selection of images from the contest. It paints a picture of DALL-E not as an existential threat, but as a new medium for creativity and reflecting on our relationship with technology – with plenty of kinks and quirks to work out and explore. The question isn’t if DALL-E will be the death of art; rather what will DALL-E change about how we make, interpret and look at art?

1. The human artist is present

The existential angst around DALL-E comes from its ability to create images. Unlike previous generations of text-to-image algorithms, DALL-E is not a search engine that combs through existing images online. Rather, it generates entirely new ones: by typing “a lilac school bus” into DALL-E’s online interface (‘prompting’), DALL-E will provide an image that combines buses, colors, and connected scenery among the 400+ million images it was trained on – sometimes in very surprising ways – rather than anything that already exists. The existential thinking goes: if algorithms can create new images, is DALL-E the only artist we need? After all, its name is a cross between famed surrealist Salvador Dalí and Pixar’s charming robot WALL-E, a nod to it being both robot and creator.

Yet, during the contest, we leapfrogged the idea that DALL-E was an artist at all – and, rightly or wrongly – were quick to recognize human authorship. To take an example: Even though, technically-speaking DALL-E created all of the images, our language was all focused on the human ‘artists’: “Whose was this one?”; “Tell me your reasons for producing this image?”; “I loved your submissions.” DALL-E created the works, added the unexpected, but no one seriously considered DALL-E the ‘artist’ behind the art.

These are early signals. Yet our instincts for human authorship give clues about DALL-E’s evolving position in society: though a creative force, DALL-E could become a medium like any other – an oil palette, but with billions of colors – available to an artist en route to realizing their vision. What some have called the ‘AI dance’ – the ping-pong of the artist’s idea and DALL-E’s unexpected response – is as with any medium, the feedback loop of the artist and material, like a sculptor with clay, honed over time into the fine-grained movements of mastery. 

Thinking of generative AI as a medium makes it more familiar, but no more mundane. DALL-E appears to free our imagination from the classical constraints of physical media and skills to create new types of artwork. The better question than if DALL-E will take artists’ place, is: What will separate the artist using DALL-E from the amateur? And where do we need to advance the medium to foster creativity with AI?

2. Style becomes a commodity

Style has historically been an artist’s most distinctive attribute. To speak of “Hieronymus Bosch’s style” is to say that he made a new addition to the historical canon; to say that someone has worked “in the style of Bosch” is to both provide a point of reference for the audience and to label the work as derivative. 

Working with DALL-E turns style into a reference point, though a generative one. In contrast to descriptors of an image in a prompt (e.g. “woman trapped in a teacup”), the results of which can vary widely, we found using artists’ styles a faster way to ascribe an aesthetic or mood to a work. “Woman trapped in a teacup in the style of Bosch” for example, guarantees a clearer outcome: DALL-E will furnish texture, composition, colors, and a demonic figure to boot. Our contest wound up with numerous examples that drew from famous artists as the strongest reference points for a specific mood, or a type of content, scenery, or composition. The goal was rarely to mimic, rather to more efficiently access content in line with the (new) artist’s vision.

Yet in the process, DALL-E renders style a commodity: it is no longer something to develop and perfect over time, but the easiest guardrail for producing a work of art in line with your intent – as much a commodity as the common paintbrush. Mastery is unlikely to be exhibited in the end product alone, but rather in the vision and the process. 

Prompting becomes the real skill, the medium with which we try to guide DALL-E towards our envisioned output. After playing with DALL-E, many in our group began to provide very specific prompts, including detail around the content, setting, texture, composition, and style (e.g. Hyperrealism, Romanticism). Though far from prescriptive of everything, these more specific prompts represent an increasingly thought-through creation process and more intentional reliance on chance. The audience knows this too. In our contest, people felt a strong need to see the prompt and the image side-by-side, or were curious about alternatives – the artist’s skill lies in selecting. 

We would bet that some of the next big innovations in generative AI will be about process: showing the mastery in how someone reached their result, and the volume of output on their way there. While prompting is still clunky and imprecise, it is likely to develop its own vocabulary and systems that give artists more granular control, just as app UX has developed standard UI elements and coding interfaces to control it. How will artists show their skill through process? And what kinds of finer controls will help artists create what they’d envisioned? 

3. A new medium for play

Generative AI appears poised to amplify the democratization of culture that has been taking shape since the smartphone and social media made everything easy to capture and distribute. It didn’t take the meme-inclined among us long to begin toying with classic Romantic paintings, imagining “good ol’ boys drinking whiskey and rye” as a scene, or Andy Warhol in an iPhone ad. 

Remixing has long been the internet-native form of self-expression, with videos, memes, and TikToks adapting existing media or formats to personal effect. A tool like DALL-E has the potential to amplify every aspect of that process – and much faster. One prompt can, moreover, now generate plentiful iterations on any idea, reflection, or joke in high-fidelity right away, with a playful injection of DALL-E randomness. It’s not hard to imagine DALL-E becoming the fastest way to generate 1,000 memes of Bernie Sanders’ mittens in 1,000 different settings.

With so much visual production power at the internet’s fingertips, it seems hard to imagine generative AI won’t have an impact on everyday culture right away. Technology companies are likely to be in for exploding volumes of new content, with computing and data storage needs to match, and plenty of new experience challenges: What will be the ‘hashtag’ of the DALL-E era, the metadata that connects voluminous AI outputs, and what new formats will be needed to experience 1,000 connected memes at once? What imagery or processes will future meme-creators want algorithms to draw from to make DALL-E not only about art, but also fun?

4. Prepare for surrealism

Generative algorithms aren’t neutral in terms of their aesthetics. The longer we spent with DALL-E, the more we were drawn to surreal imagery and ideas – a sad robot in an 18th-century painting of a crumbling monastery, a portrait of a lie through time, a dream -– visualizations of people and things displaced from their own time, concepts lacking a clear schema for visual representation. When working in a medium with the power to create (almost) out of thin air, one is drawn towards manifestations of the non-visual, the imaginary, the impossible.

With generative AI likely to be a major tool for populating the metaverse, these tendencies aren’t academic: they tell us about the aesthetics and emotional sensibility likely to dominate our future digital spaces. Our tendency towards the surreal with image-generating AI will likely mean more and stranger forms of exploration. DALL-E is only a generative algorithm based on caption-image pairs, and applied to creating aesthetic objects. Could we create imaginative visuals from smells or moods, if the data can be collected? Could surreal images become a new tool for scenario planning, forecasting, or creative problem solving?

5. Synthetic aesthetics: a new Romanticism?

What is often most interesting in the art produced by DALL-E is the uncanny and unexpected: the strangeness of six legs on a horse and no rider; a misinterpreted prompt that yielded a canvas full of refrigerator magnets; a human hand that looked normal except for a missing finger. There were countless examples in our contest of the attraction not (only) being the ‘beauty’ of the image. Rather, it was DALL-E’s surprising output: well-composed images, yet strange in how they misrepresented the basics or made startling leaps.

There is a deeper phenomenon than surprise at work here. Edmund Burke called it the ‘sublime,’ when he noted something similar in the rise of Romanticism. In contrast to classical ideas of beauty, sublime aesthetics fixated on terror-inducing vastness, unknowability, ruin, rot – ultimately, he believed, existential angst. The sublime was aesthetically pleasing, yet it tugged not on perfection and proportion, but an existential sense of loss.

The sublime quality of DALL-E images comes from the slow, sinking discovery of the difference in how a human and machine look. A missing finger reveals that we and DALL-E actually understand what a hand is differently – that we are looking at ‘synthetic aesthetics’. Worse, it took us at least a few seconds to realize anything was different. To use artist James Turrell’s turn of phrase for his own perception-focused light art, “you are looking at you looking.” We realize our own way of seeing is partial, and, when it comes to something like composition, limited. 

This uncanny, sublime mood seems likely to dominate DALL-E art going forward: whether or not it will take artists’ jobs, we can’t help but contemplate what its synthetic aesthetic means for humanity. DALL-E is a tool for exploring and challenging our own ways of seeing and conceiving of the world. And what’s more human than that?


[Banner image by Christopher Burns, via Unsplash]

Previous
Previous

Deep context: A strategic approach to organisational change

Next
Next

Community Talks: Farhad Anklesaria