Four pathways to an AI culture

Illustration © Romualdo Faura

Creative skills can currently be reproduced using deep learning and neural networks, but their true potential lies in the prospects of collaboration rather than replacement. Two distinct subjectivities, one human and one automated, can meet halfway and produce a captivating mutual symbolic space, brimming with possibilities, but also with dead ends. It is both too early and arrogant to hazard a guess as to the artistic impact AI will exert in the medium term.

The myth of creativity

The spectre of automation and its impact on the labour market and the shape of tomorrow’s society have generated countless analyses and prognoses so far in the 21st century. For a time, a linear rule applied in these debates: the more mechanical and routine a task was, the easier it would be to automate it. According to this logic, the professions in which creativity played a pivotal role would be the ones that would be most safe from replacement. This kind of “creative exception” somehow reasserted artists’ and creators’ special status in society.

If the proliferation of AI tools for creation in multiple domains since the summer of 2022 has had any eye-opening effect, it has been to shatter this illusion. The application of tools such as ChatGPT or Midjourney for productive ends does not, in most cases, seek artistic excellence, but cost-effective efficiency. “Good enough (and infinitely cheaper)” is the decisive factor in creative industries where the vast majority of everyday tasks do not strive to achieve the utmost originality or to break the established moulds. Based on this standpoint, artists, musicians, journalists and video makers are beginning to acknowledge that they are in the same position as any other group subjected to the imposition of a new emerging order, which does not negotiate its conditions.

Accumulation by (digital) dispossession

Every creative AI application operates on the basis of a model trained using vast amounts of data. Stable Diffusion, one of the most widely used image generators, developed its model from the LAION-5B dataset, published openly on the internet in March 2022. LAION-5B consists of almost 6 billion files that combine images with text labels identifying their content. Anyone keen to explore what’s inside the model can download it from https://laion.ai/blog/laion-5b/, but the answer isn’t hard to guess: web-sourced images, the product of the day-to-day collective effort made by all of us, internet users.

Over the last thirty years, but particularly during the golden age of social media and the emergence of the smartphone from 2007 onwards, the social production of data in large volumes has been part of the bargain implicit in the internet industry’s economic model. The widely accepted trade-off was simple: in exchange for the free use of all kinds of digital services, we are encouraged to actively produce and share as much information as possible. Initially, the value chain consisted of aggregating personal data on a massive scale, enabling the production of consumer profiles. These personal data are packaged and commoditised.

The machine learning industry has added an extra layer of value to the mountains of content that the web holds. This is the raw material for training models with the capacity to produce what some call “intelligence” and others simply “statistical predictive capability”. An automatic image generator can produce a cat of any breed and colour faithfully and accurately, because the model can be fed with an infinite pool of internet-sourced cat images.

Behind every one of those images uploaded to a social network, a photo album or a news item in a media outlet, there are citizens, authors, creators, artists, professional or amateur photographers, and every other conceivable category of producer. Needless to say, no deep learning company has previously endeavoured to obtain the rights to use these images. Nor have they limited themselves to using royalty-free images or images under free licensing schemes such as Creative Commons. The creators of some of the most widely used services, such as Midjourney, have justified themselves by citing the unfeasibility of producing such a tool if they had to obtain the rights to use billions of images on an individual basis. The responsibility has been shifted, and it is the author who must expressly request that their images be excluded from the model.

The term “accumulation by dispossession”, coined by urban geographer and Marxist social theorist David Harvey to describe capitalism’s ability to commodify what was previously shared and common, is perfectly expressed in this process. Although the expressive and creative potential of these new tools is seen as socially beneficial and artistically productive, the symbolic and cognitive capital of a large collective endeavour is once again controlled and extractively appropriated by only a few.

Centaurs

In all initial diagnoses, the emergence of AI as a transformative force is seen in terms of the logic of replacement: robots and automated systems will take our place. But equally true is that many skills can be replicated using deep learning and neural networks, but their true potential lies in the prospects of collaboration rather than replacement. Two distinct subjectivities, one human and one automated, can meet halfway and produce a mutual symbolic space, with different outcomes than creators could ever achieve on their own.

In reality, writers, musicians and architects delegating fundamentally creative decisions to rule-based systems, autonomous evolutionary processes and tools with the capacity to generate their own responses is nothing very new. There are precedents, from the long tradition of automatic writing and the cut-up technique, instruction-based art and the algorithmic art of the 1960s, to the multiple generative tools that electronic music producers or parametric architects have used on a day-to-day basis in the last two decades.

Interaction designer Matt Jones uses the evocative image of the centaur, the half-human, half-horse mythological creature, to foster a symbiotic vision of the creative collaboration between humans and machines. Each half of the hybrid creator brings its best assets and capabilities to the table, drawing on the other in the areas in which it falls short.

The medium is the model

In July 2015, Google programmer Alexander Mordvintsev initiated the DeepDream project, a tool that uses a neural network to search for the trace of possible images within existing ones. The images produced by DeepDream are visual kaleidoscopes that can inevitably be related to the aesthetics of psychedelia, the closest reference point at hand for recognising them as part of an earlier tradition. In the years that followed, artists exploring the realm of possibility of neural networks and deep learning turned to the use of generative adversarial networks (GANs), a specific methodology for image production. In the projects of Gene Kogan, Mario Klingemann and Anna Ridler, GANs show us all the possible variations that may exist in the process of transforming one image into another, in a potentially infinite visual flow.

In July 2022, OpenAI, one of the giants of the deep learning industry, made its DALL-E 2 service publicly available, the first mainstream image generation tool to use a specific deep learning technology called Transformer. Images produced using DALL-E 2 have quickly flooded digital media channels. They are characterised by an imperfect photorealism capable of producing images that look realistic at first glance, but on closer inspection reveal inconsistencies, such as blurred or distorted faces. DALL-E generated portraits can easily be compared to a sketch by the painter Francis Bacon.

DALL-E 2 has been swiftly followed by other transformers, such as Stable Diffusion and Midjourney, perhaps the most widely used at the time of writing. Midjourney has become a kind of implicit atlas of the history of visual culture. Its users soon learn that using the “in the style of” command allows them to produce images that reproduce the aesthetics of any painter, illustrator or filmmaker. Nevertheless, the model is undoubtedly biased, like any technology based on statistical analysis. For instance, it is surprisingly easy to produce images that replicate the style of American film director Wes Anderson; his saturated colours, flat frames and symmetrical compositions are relatively easy for the model to imitate, so that images “in the style of” Anderson are constantly being generated.

With no special training required, anyone could easily spot the difference between an image produced with DeepDream and one generated with GAN, DALL-E 2 or Midjourney. Moreover, the most prominent artists in this medium (Refik Anadol, Holly Herndon and Mat Dryhurst) are those producing their own models, rather than simply using the same generic tools that everyone else uses.

In less than a decade of frenzied artistic practices around deep learning and neural networks, big language models, GANs and generative transformers, the possible aesthetics of artificial intelligence are proving to be a captivating space, brimming with possibilities and dead ends, with platitudes, but also with deep alienation. It would be both too early and arrogant to draw any categorical and unequivocal conclusion about the artistic impact of AI technologies in the medium term.

 

The newsletter

Subscribe to our newsletter to keep up to date with Barcelona Metròpolis' new developments