This horse-riding astronaut is a milestone in AI's journey to make sense of the world

To support MIT Technology Review’s journalism, please consider becoming a subscriber.

Diffusion models are trained on images that have been completely distorted with random pixels. They learn to convert these images back into their original form. In DALL-E 2, there are no existing images. So the diffusion model takes the random pixels and, guided by CLIP, converts it into a brand new image, created from scratch, that matches the text promptly.

The diffusion model allows DALL-E 2 to produce higher-resolution images more quickly than DALL-E. “That makes it vastly more practical and enjoyable to use,” says Aditya Ramesh at OpenAI.

In the demo, Ramesh and his colleagues showed me pictures of a hedgehog using a calculator, a corgi and a panda playing chess, and a cat dressed as Napoleon holding a piece of cheese. I remark at the weird cast of subjects. “It’s easy to burn through a whole work day thinking up prompts,” he says.

“A sea otter in the style of Girl with a Pearl Earring by Johannes Vermeer” / “An ibis in the wild, painted in the style of John Audubon”

DALL-E 2 still slips up. For example, it can struggle with a prompt that asks it to combine two or more objects with two or more attributes, such as “A red cube on top of a blue cube.” OpenAI thinks this is because CLIP does not always connect attributes to objects correctly.

As well as riffing off text prompts, DALL-E 2 can spin out variations of existing images. Ramesh plugs in a photo he took of some street art outside his apartment. The AI immediately starts generating alternate versions of the scene with different art on the wall. Each of these new images can be used to kick off their own sequence of variations. “This feedback loop could be really useful for designers,” says Ramesh.

One early user, an artist called Holly Herndon, says she is using DALL-E 2 to create wall-sized compositions. “I can stitch together giant artworks piece by piece, like a patchwork tapestry, or narrative journey,” she says. “It feels like working in a new medium.”

User beware

DALL-E 2 looks much more like a polished product than the previous version. That wasn’t the aim, says Ramesh. But OpenAI does plan to release DALL-E 2 to the public after an initial rollout to a small group of trusted users, much like it did with GPT-3. (You can sign up for access here.)

GPT-3 can produce toxic text. But OpenAI says it has used the feedback it got from users of GPT-3 to train a safer version, called InstructGPT. The company hopes to follow a similar path with DALL-E 2, which will also be shaped by user feedback. OpenAI will encourage initial users to break the AI, tricking it into generating offensive or harmful images. As it works through these problems, OpenAI will begin to make DALL-E 2 available to a wider group of people.

OpenAI is also releasing a user policy for DALL-E, which forbids asking the AI to generate offensive images—no violence or pornography—and no political images. To prevent deep fakes, users will not be allowed to ask DALL-E to generate images of real people.

“A bowl of soup that looks like a monster, knitted out of wool” / “A shibu inu dog wearing a beret and black turtleneck”

As well as the user policy, OpenAI has removed certain types of image from DALL-E 2’s training data, including those showing graphic violence. OpenAI also says it will pay human moderators to review every image generated on its platform.

“Our main aim here is to just get a lot of feedback for the system before we start sharing it more broadly,” says Prafulla Dhariwal at OpenAI. “I hope eventually it will be available, so that developers can build apps on top of it.”

Creative intelligence

Multiskilled AIs that can view the world and work with concepts across multiple modalities—like language and vision—are a step towards more general-purpose intelligence. DALL-E 2 is one of the best examples yet.

But while Etzioni is impressed with the images that DALL-E 2 produces, he is cautious about what this means for the overall progress of AI. “This kind of improvement isn’t bringing us any closer to AGI,” he says. “We already know that AI is remarkably capable at solving narrow tasks using deep learning. But it is still humans who formulate these tasks and give deep learning its marching orders.”

For Mark Riedl, an AI researcher at Georgia Tech in Atlanta, creativity is a good way to measure intelligence. Unlike the Turing test, which requires a machine to fool a human through conversation, Riedl’s Lovelace 2.0 test judges a machine’s intelligence according to how well it responds to requests to create something, such as “A picture of a penguin in a spacesuit on Mars. ”

Techlifely

This horse-riding astronaut is a milestone in AI’s journey to make sense of the world

User beware

Creative intelligence

TechLifely

Leave a Comment Cancel Reply

Recent Posts

Fair crypto laws ‘possible’ in the US but needs ‘a lot of work’ — Crypto Council advisor

Crypto Wendy on trashing the SEC, sexism, and how underdogs can win: Hall of Flame

Jake Paul-endorsed SafeMoon gets hacked after introducing a bug in upgrade

Business

Disney+ advertisers will soon get Hulu’s ad targeting capabilities

Resolving to live the Year of the Rabbit to the fullest

Oneleaf is self-hypnosis app that guides you through audio programs

Categories

User beware

Creative intelligence

Related posts

Leave a Comment Cancel Reply