Vai al contenuto

Chapter 3. The dual creativity

After exploring the concept of hybrid author in the previous chapters, let's now examine how creativity concretely manifests itself in this new collaborative paradigm between human and machine. This chapter will allow us to better understand the creative dynamics that emerge from the interaction between human and artificial intelligence.

The creativity of artificial intelligence is expressed mainly in two ways: responding to user requests and—so to speak—"shuffling the cards." This latter function is particularly interesting, as the AI can combine elements in unexpected ways, producing surprising and sometimes innovative results.

An emblematic example is once again that of AlphaGo, the AI developed by DeepMind which, as we saw in the previous chapter, defeated the world Go champion, Lee Sedol. AlphaGo did not limit itself to replicating human strategies but developed completely new approaches to the game.

It is therefore a type of computational creativity based on the AI's ability to analyze enormous quantities of data and identify patterns that escape the human eye. In the context of image generation, this translates into the ability to create unique and unexpected visual combinations, often surprising even the human creators themselves.

This new conception of creativity, understood as a process of connecting pre-existing elements, was well described by Steve Jobs:

Creativity is just connecting things. When you ask creative people how they did something, they feel a little guilty because they didn't really do it, they just saw something. It seemed obvious to them after a while. [Gary Wolf, Steve Jobs: The Next Insanely Great Thing, in Wired, February 1, 1996]

In a certain sense, AI also follows this path, combining elements to generate results that can also be surprising. On the other hand, human creativity in the context of generative AI manifests itself mainly through choices. The user, in fact, is responsible for a series of technical and conceptual decisions that guide the entire creative process. These choices, far from being marginal, are the very heart of human creativity in this new context.

When a user sits at the desk to create with AI, their creative process is articulated in a series of choices that go well beyond the simple formulation of the prompt. This new approach to creation requires a deep understanding of both the possibilities offered by AI and one's own creative objectives.

The prompt is only one of the elements of the process. The user must consider a series of factors starting from the objective they set: the tone of the desired image? Which elements should be emphasized? How do you want the AI to interpret certain words or concepts? These decisions require not only creativity but also a form of "translation" of human thought into a language understandable to the AI. This leads to the need to understand, at least in a basic way, how the generative AI being used works, a need that grows proportionally to the complexity of the system.

It should also be noted that not all AI platforms offer the same degree of creative control. This variety reflects different design philosophies and usage objectives, creating a diverse ecosystem of creative tools.

Some, like Microsoft's Design Image Creator and DALL-E, offer limited options, privileging ease of use. These platforms are ideal for users who want quick results with minimal complexity but limit the degree of creative control.

Midjourney positions itself at an intermediate level, allowing a certain degree of customization. With Midjourney, users can more directly influence the style and content of generated images also thanks to the number of usable parameters to build the prompt, which is growing ever more with the evolution of the platform. For this reason, while maintaining a relatively simple interface, it is worth following for its interesting development that places it in the high range.

But it is with Stable Diffusion that you get the maximum possible control over the produced image, also thanks to the Open Source community that has gathered around the project and that has created an ecosystem of tools and interfaces that allow an unprecedented level of customization.

Using an interface like Automatic1111 for Stable Diffusion, the user can manipulate a multitude of parameters, each of which influences the final result in often subtle but significant ways. This complexity allows a level of creative control that approaches that of traditional digital creation tools, while maintaining the advantages of generative AI.

Automatic1111, screenshot of November 16, 2024

For example, even an apparently trivial choice like that relating to image dimensions can produce different results. This is because in the Stable Diffusion workflow each parameter is an integral part of the generative process, not a simple pre or post-production adjustment; therefore changing the value of a parameter means influencing the process as a whole.

Among the most influential parameters we find the choice of model (checkpoint in English), which essentially defines the "creative personality" of the AI with which we are collaborating. Selecting one model rather than another can lead to radically different results with the same prompt. Some models may excel in certain artistic styles, while others might be more suitable for certain subjects or themes.

Among other determining parameters, we have the "CFG Scale" (Classifier Free Guidance Scale), which controls the degree of freedom that the user wants to leave to the AI in relation to consistency with the prompt; and the "Sampling Steps," which determines how long the AI "refines" the image. By manipulating these and other parameters, the user can exercise fine control over the creative process, guiding the AI toward the realization of their own artistic vision.

Pushing further, we find interfaces like ComfyUI that take the concept of creative control to a new level. It is an interface that allows the user to freely assemble the "building blocks" of the creative process, like in a sophisticated construction game. This approach not only expands creative possibilities but transforms the entire process into an act of designing the creative workflow itself.

With ComfyUI, the user can create customized workflows, a kind of game with these blocks as if they were LEGO bricks with which to build different paths, and each composition is in fact a set of choices made by the user, combining different models, sampling techniques, and even integrating elements of traditional image processing. This level of control allows for the creation of unique creative pipelines, adapted to the specific needs and visions of the artist.

ComfyUI – Example workflow

Creative possibilities expand enormously. This multi-step, completely customizable process offers creative control that goes well beyond simple image generation from prompts. And in fact the customization of workflows made possible by this interface becomes another element that is part of the creative process put in place by the user.

The set of blocks is called precisely "workflow," because they represent and allow the execution of a complete work flow. Excellent examples of workflows are those of Stefano Flore, an Italian developer who has developed very interesting projects https://stefanoflore.it/progetti/.