top of page

Machine learning that can visualize and imagine outcomes of descriptions (like humans!) 💭💡🤔

Can language models understand situations enough to imagine them and convey these projections back to us? That's what this project set out to answer—and the results are shocking. As humans, we are incredible at putting random information together to envision future circumstances in the real world. We can dream, imagine, plan, and foresee even the most random of things. For example, can you imagine a red car with wings? You probably pictured exactly that—a red car with wings, without ever before having seen a red car with wings. But can we get a computer to do the same thing? It turned out, if we make some changes to existing models and data collection techniques, we can achieve pretty remarkable results (like the real interaction below).

futureCity.jpeg
FutureCity_1.jpg
futureCity_2.jpg

PROMPT

RESPONSE

In the future I envisioned, I saw big buildings and incredible technology, like flying cars 

In the future I envisioned, I saw big buildings and incredible technology, like flying cars 

Imagine the future.

To accomplish this life-like ability, the entire text creation process needed to be redesigned. Humans take in previous experiences and images to formulate generalizations, From the red car with wings example from before, you have seen a red car before, and you have seen wings before. But probably never together. Despite this, imagining this new combination of objects is relatively easy for us. For the first time, this new NLP model can decipher both text and image inputs, to truly visualize and predict situations and provide realistic insights.

The Method

STEP 1

Take initial prompt and gather text-based information from the web

We start with a prompt

Imagine the future

STEP 2

Take initial prompt and gather image information from the web

We start with the same prompt

Imagine the future

Inputs are parsed into key words

<the>

<future>

The web is searched for matching images

Images of

the future

Key trends are determined from web research

Cities

Cars

Tech

Health

These images are analyzed to find trends

<shiny buildings>

<sunny>

<flying planes>

Outputs

Text-based outputs are generated

Image-based outputs are made

Outputs

The future is cool

STEP 3

Combine all inputs to a trained model to construct a final answer

The prompt along with both outputs are combined

Inputs are fed into advanced NLP models

NLP Model

Models are fine-tuned to respond appropriately

In what I envisioned...

A final response is generated and shown

By combining text and images, the model is able to have a much greater understanding of the real-world impacts of any given prompt, which in turn, allows for a vastly more nuanced output. All examples on this page of the interaction between this model (prompt and response) are real, and the supporting images depicted are also a sample of real web scraped images processed by the model's architecture. I wanted to conclude this document with the first prompt we started with, to see how the model responds. 

redCar2.jpg
redCar1.jfif
redCar3.jpg

PROMPT

Imagine a red car with wings.

RESPONSE

I imagined a bright sports car with wings extending from the sides that could fly high into the sky

​I imagined a bright sports car with wings extending from the sides that could fly high into the sky

The Moonshine Distillery

 

Made by your friends at

 

Feedback

Loading...

Log out

bottom of page