AI Image Generation

Introduction

Figure 1: Trinity

In this article I provide discussion, resources and technical support for videos on my YouTube channel on this article's topic, which, once created, I will link here.

This article's video is now live:

AI Image Secrets (released 2023.09.02)

Unless you've been away from Planet Earth recently, you know that AI is making big waves in both society and our personal lives. This article describes how a particular AI entity can create striking images on demand, in response to a simple text entry called a "prompt".

To see what's possible, look at Figure 1 on this page (and click the image for full-size). I call her "Trinity" because I chose a random seed number of 333 to create her image (other numbers create entirely different faces). My text prompt to the AI image generation program (named StableDiffusion) was "photorealistic teenage girl full face", one of the least complex text prompts one can imagine. Then I tried different random seed numbers (each of which changes the outcome) until I liked the result.

For those with little exposure to artificial intelligence (the meaning of "AI"), let me explain about Trinity — who she is and where she came from:

Trinity is not a real person, she's an imaginary construct created by an "Artificial Intelligence" (AI) computer program called StableDiffusion.

On the other hand, because of how StableDiffusion works, in a very real sense Trinity represents the mathematical average of every teenage girl picture ever posted to the Internet.

So in reality, Trinity is no one, and she is everyone.

The computer program that created Trinity first trained itself by searching the Internet for images and text descriptions of all kinds. This training phase requires a lot of computer power and time.

As the program scanned the Internet for useful data, it constructed a database with entries called "weights." There are 3.5 billion weights in the newest database (SDXL 1.0 at the time I write this).

Now that its training is complete, people can make text entries to unlock this database, in a creative process never before seen in computer science.

In a conventional database, before AI, there would be one correct response for a given search query, but sometimes no response was possible for lack of an exact match.

In AI image generation by contrast, the AI program relies on its "weights" database to decide what to do. This is a less deterministic, more creative process — the computer chooses its response based on likelihoods and probabilities, not certainties.

It's important to add that these new programs are so powerful and complex that even their creators can't predict what they'll do next.

In this article I explain how this new AI engine creates its results. I also show how to install StableDiffusion on a personal computer (but not just any personal computer) where that is feasible, or, for those with less local computer horsepower, how to find StableDiffusion online.

I also provide some AI-generated videos as well as a matrix of AI images that cross-reference text queries (rows) with generation criteria (artists and styles in columns), and I provide the Python programs that created this content. These article sections give the reader a chance to see how creative modern AI image creation can be.

Artificial Intelligence

Changes in computer hardware

Early computers consisted of a Central Processing Unit or CPU having one processor, some memory, storage devices, plus displays and printers to record results. Since then we've begun to imitate how we think the human brain works — not one central processor, but thousands or millions of less powerful processors that act in concert to solve a problem.

Here's an example — I recently updated my mathematical model of the Solar System, a task that requires a lot of computer numerical processing. My earlier model used a single processor to calculate the position of Mercury, then Venus, then Earth, and so forth, for a given step of modeled time. Then my program would use a force matrix to apply all the gravitational interactions between planets, reposition the planets and move on to the next time step. This means a useful calculation might require hours of computer time.

But because of new and more powerful computers, my updated model assigns each solar system planet or moon its own separate processor. The processors work in parallel to calculate results much faster than was possible using the old method.

Here's how this works. Let's say we have 12 planets and moons in a solar system model. In the traditional method we would compute the gravitational force of each of the 12 bodies against the remaining 11 bodies, one at a time, which required 11 times 12 calculations (total 132) using one processor(#) and organized in a matrix:

A B C D E F G H I J K L

A - # # # # # # # # # # #

B # - # # # # # # # # # #

C # # - # # # # # # # # #

D # # # - # # # # # # # #

E # # # # - # # # # # # #

F # # # # # - # # # # # #

G # # # # # # - # # # # #

H # # # # # # # - # # # #

I # # # # # # # # - # # #

J # # # # # # # # # - # #

K # # # # # # # # # # - #

L # # # # # # # # # # # -

In this old method, the single processor (#) would traverse the matrix, one cell at a time, calculating the forces created by the other bodies. But if we have 12 processors available, often true in modern computers, we can process the matrix this way:

A B C D E F G H I J K L

A - # # # # # # # # # # #

B # - # # # # # # # # # #

C # # - # # # # # # # # #

D # # # - # # # # # # # #

E # # # # - # # # # # # #

F # # # # # - # # # # # #

G # # # # # # - # # # # #

H # # # # # # # - # # # #

I # # # # # # # # - # # #

J # # # # # # # # # - # #

K # # # # # # # # # # - #

L # # # # # # # # # # # -

In this scenario 12 independent processors each compute 11 results, which speeds up computation by (132/11) 12 times, as one might expect. But there's more — what about a completely different kind of computer with an essentially unlimited number of processors? Can we assign a separate processor to each matrix cell? In a word, yes:

A B C D E F G H I J K L

A - # # # # # # # # # # #

B # - # # # # # # # # # #

C # # - # # # # # # # # #

D # # # - # # # # # # # #

E # # # # - # # # # # # #

F # # # # # - # # # # # #

G # # # # # # - # # # # #

H # # # # # # # - # # # #

I # # # # # # # # - # # #

J # # # # # # # # # - # #

K # # # # # # # # # # - #

L # # # # # # # # # # # -

To produce this kind of result, present-day computers rely on a new device called a GPU, a Graphics Processing Unit, which solves this class of problem by having thousands of independent processors, all working in parallel.

Modern AI programs like StableDiffusion rely on massively parallel GPU processing, as with this example. These programs can run without a GPU available, but in that case they're very slow, sometimes requiring hours to produce a single result. This means practical AI activities require a GPU.

Reader Install and Access Options

Readers who want to explore AI image processing have two primary choices:

A. Install StableDiffusion or a similar program on one's own GPU-equipped computer:

Installing StableDiffusion is a topic too complex to detail here, instead I'll provide links to sites that offer free, open-source programs that can turn your home computer into an AI image generation tool.

NOTE: It's important to add that Linux is a much better platform for AI image processing than Windows is, and if a computer has an AMD-based GPU, Linux is the only choice, such that people will install a Linux virtual machine on a Windows system just to be able to carry on. Also, the Python programs listed below assume a Linux platform — they can be made to run on Windows, but only after some editing.

AUTOMATIC1111 / stable-diffusion-webui — This is a free, open-source, very popular software suite that provides the StableDiffusion engine plus a popular local-Web-based user interface. Very powerful, easy to learn. Setting up requires a modest amount of technical knowledge. The user interface looks like this:

InvokeAI — Also free, a somewhat large install, significant technical knowledge required, has a nice browser-based user interface. This choice requires more of the client system than AUTOMATIC1111. Here's a picture of its user interface:

ComfyUI — Also free and open-source. This is a common choice along more advanced users. It uses a modular user interface, also local-Web-based, more flexible than AUTOMATIC1111 but with a steeper learning curve. The user interface looks like this:

Very important — along with the basic software and an acceptable user interface, you need an up-to-date library of weights. A recent release has created a lot of favorable attention and is the source of most of the images displayed here: SDXL 1.0. As with all computer activities, be careful with downloads, make sure you're visiting a reliable site.

B. As an alternative, readers can use an online service that provides access to an AI image processor:

Midjourney — a well-known site that, after a free trial, offers paid subscriptions to gain access to their AI image generation service.

There are many similar sites, but I don't want to seem to favor one over another, and some may disappear over time, so for other options, readers must use their own judgment. Having a GPU equipped PC is a far better choice.

Image Generator Software

Here are some Python programs I wrote that I've been using to create different kinds of image content on a Linux system. These script can be used on Windows, but only after some editing:

These scripts are meant for the A1111 environment:

Morph Generator — I use this program to create sequences of images that become videos — examples below.

Matrix Generator — this program accepts a list of artists/styles and text prompts and generates a matrix (table) of results embedded in a Web page that it also creates. The Matrix article section below shows an example of its output.

These scripts are meant for the ComfyUI environment:

Morph Generator — I use this program to create sequences of images that become videos — examples below.

Matrix Generator — this program accepts a list of artists/styles and text prompts and generates a matrix (table) of results embedded in a Web page that it also creates. The Matrix article section below shows an example of its output.

Create Video from Stills — this small utility reads in a series of images and produces a video. The utility "ffmpeg" must be present on your system.

These Python scripts require either the A1111 or ComfyUI software environment and server to be locally installed and running.

AI Generated "Morph" Videos

To create these videos I used my Python script listed above named "Morph Generator", created many images that morph between prompts, then generated a video based on the resulting still images. I emphasize this is still an experimental activity and the outcomes are a bit rough. I plan to create more videos and figure out how to make them less flaky. Click the videos below to play them:

Cat -> Dog

Photorealistic -> Impressionist

Fashions 1850 -> 2020

Chronology of a Man

Chronology of a Car

Mass Attack

Donald Trump -> Vladimir Putin

The Matrix

In this section I show how easy it is to create a matrix of images that cross-reference text prompts (rows) and styles of art or named artists (columns).

I created this matrix (the Web page content as well as the images) using my script "Matrix Generator" listed above. Readers can easily reprogram the script for a new matrix with different prompts and artists.

Each of the images contains within it the details of its creation, so if an image is downloaded and dropped into the StableDiffusion Automatic1111 user interface's "PNG Info" tab, the details will be listed and the reader can approximately recreate the image. (I say "approximately" because AI image recreation is rarely exact, because it relies on pseudorandom number generators and other procedural details that differ from computer to computer.)
My readers will recognize most of the artists and styles listed in the column headings, but the rightmost artist is a less-well-known personal favorite, a cartoonist named R. Crumb with a distinctive style.
This section shows that a modern AI program can accurately imitate many artists and styles, applying a recognizable style to subjects the original artists may not have considered. For example, those with an art background, on seeing any of the images in the Van Gogh column, will likely say, "That's unmistakably a Van Gogh." This is why many artists see AI as an existential threat (which it is).

Readers can click matrix images to see them in full size. There are some accidentally funny pictures in the matrix — for example click cell (3,3), the "photorealistic art nouveau / grumpy old man" image, and say, "Wait, what?"

photorealistic pencil drawing photorealistic art nouveau impressionist Van Gogh Rembrandt Vermeer Hokusai R. Crumb

teenage girl full face

still life

grumpy old man

picnic in the park

open-air market

children at play

sailors, boat, stormy sea

post-apocalyptic decaying city

Concluding Remarks

No one should regard AI as just another technology, like cars, airplanes or cell phones — over time AI will produce a much bigger effect than earlier technical advances. The reason, put bluntly, is that while those earlier technologies assisted us, AI may be able to replace us. While embracing AI, we should keep that from happening.

Pioneering firm OpenAI recently and reluctantly decided to withdraw their AI writing detector, a tool meant to distinguish human-written prose from that created by AI. They did this because it wasn't reliable enough — it produced too many false positives and false negatives. For the moment, this means we have no reliable way to distinguish human-created prose from AI-generated content.

For contrast, some attorneys recently submitted an AI-generated legal brief, only to discover that many of the included legal citations were fantasy, invented by the AI processor and without any connection to reality. This is a known AI behavior called "hallucination", in which the AI processor abandons any connection to reality without warning the human users. Confronted by a furious judge, the lawyers were fined US$5K and lost their case.

At the time I write this, professional show-business artists, writers and actors are on strike, objecting to among other things their replacement by AI entities. The artists object to their replacement by AI-generated art, the writers object to their replacement by AI prose generators, and the actors object to their replacement by computer-generated performances, like the recent replacement of a young Indiana Jones actor with a computer-generated simulation. These professionals have a point and deserve a fair outcome, but I have to say that, in the long term, however the strike turns out, regardless of how people are treated at present, in the future computer-generated content will replace human writers and performers whenever that's the best and least expensive choice.
This leads to an obvious question: after computers take over the jobs they do better than humans do, what then? What will people do instead, and how will society be structured after the loss of all the jobs computers do better than people?
And no, I don't have an easy answer to this question — no one does. But this is certainly a question we need to ask, well in advance of a requirement for an immediate answer.

Thanks for reading, and please visit my YouTube channel to see my (planned) video version of this article.

	photorealistic	pencil drawing	photorealistic art nouveau	impressionist	Van Gogh	Rembrandt	Vermeer	Hokusai	R. Crumb
teenage girl full face
still life
grumpy old man
picnic in the park
open-air market
children at play
sailors, boat, stormy sea
post-apocalyptic decaying city