This post provides information of us to any seeking to understand the strengths and weaknesses of existing AI systems that enable image generation.
The evolution of AI enabled image generation has not stopped. All indications are that all the big players in this space will keep innovating. But at this particular moment in time (13opm Eastern on Thursday 5 Oct 2023), OpenAI is winning. I have no intention of using Midjourney or Stable Diffusion or other image generation capabilities until they can catch up.
Here is more on why.
As we wrote in “AI Enabled Image Generation: MidJourney, Dall-E, Stable Diffusion“, these tools are all very capable and are causing disruption already. All the capabilities are improving and all are being integrated into other platforms.
With Dall-E, not only have they been improving their core platform, but pro users of the ChatGPT capability can interface with it from ChatGPT. This is amazing. You can activate the feature, enter a simple desire for what you want to see, then ChatGPT will generate four options for more precise descriptions of an image and get to work.
Here is an example. We needed an image to go with a new post by Emilio Iasiello titled “Global Democracies Need To Reign In Intrusive Surveillance Technologies.” I typed this into ChatGPT with the image creation feature selected: draw a representation of Global Democracies Need to Reign In Intrusive Surveillance Technologies ar 2:1
It then started thinking and showed me the text of the prompts it was sending to Dall-E.
The initial results included these four drafts:
Those were certainly all better than I could do, But I needed more choices. So I just hit the regenerate button till I saw some liked better. Each time I did, new prompts were created. In the end I went with one that was meant to show a high tech control room. From my simple desire to show the advanced concept of a need for democracies to thwart intrusive surveillance, ChatGPT iterated with me till it produced this prompt:
Photo of a high-tech control room, where diverse operators monitor multiple screens displaying worldwide data feeds. Maps, satellite images, and various tracking icons fill the screens.
The result was this image:
That is exactly the one I wanted for Emilio’s article.
Not bad for a tool that only requires text input.
And a comparison to consider. When we did our first