The Persistent Flaw in Image Models: Time

7 months ago 70

At the root of this issue lie the datasets used to train these AI image models.

Illustration by Nikhil Kumar

It’s a bad time for AI image generators – literally. Most notable AI models, including the likes of DALL.E 3, Midjourney, Stable Diffusion, and Ideogram 2.0, are all struggling to generate analog clocks, often defaulting to 10:10.

When AIM tested DALL.E 3 with a simple prompt asking it to generate the image of an analog clock showing the time as 6:30, both of the images it produced were timed at 10:10.

We even tried the recently released FLUX.1 AI Pro model with the same prompt and the results were similar to what DALL·E 3 produced.

What Explains This Fixation?

At the root of this issue lie the datasets used to train these AI image models.

Most of the clock and watch images used to train these models come from product photos and advertisements. As a strategy, clocks are almost always set to 10:10 as this V-shaped arrangement keeps the hands from obscuring logos usually placed at the 12:00 position and creates an aesthetically pleasing “smiling” face.

Now, since AI models learn patterns from training data, this overrepresentation of the 10:10 configuration gets baked in. The AI doesn’t understand the actual meaning or mechanics of clock hands – it simply learns from the statistical pattern that clocks should look like this.

Unfortunately, this issue goes beyond generating images of analog clocks. The deeper problem is that AI doesn’t understand the concept of time.

Tex-to-Image Models Don’t Get Time

A Reddit user pointed out that AI knows no concept of time. “It is basically a mathematical function where for a certain input a corresponding output is calculated. There is no timing mechanism whatsoever in that path,” he said, adding that LLMs can only derive this conclusion from their training data and this is how LLMs work.

A Medium user reported how LLMs like ChatGPT are essentially “blacked out” between prompts, unable to maintain a continuous memory of events or experiences. This limitation prevents AI from forming a coherent understanding of the passage of time, as it lacks the ability to perceive and process time-related cues in the same way as humans do.

Sure, this problem can be solved by using a custom prompt where it will ask about time to an internal clock but then differentiating between real time and hypothetical time will be another challenge and that is the reason why ChatGPT has not integrated this feature.

AI’s inability to understand the concept of time also stems from its absence of real-world experiences. The development of a comprehensive understanding of time requires a gradual process of learning through experience and problem-solving.

The Reasoning Part

AI systems, including the advanced ones, are not considered to be conscious and that explains why AI can’t reason. They process information but lack self-awareness, feelings, and intentions that are central to human consciousness. This lack of consciousness limits their ability to reason flexibly and understand context deeply.

Explaining AI’s inability to reason, Subbarao Kambhampati, professor at Arizona State University, said, “The tricky part about reasoning is, if you ask me a question that requires reasoning and I gave an answer to you, on the face of it, you can never tell whether I memorised it.”

A Medium post by James Gondola suggests that AI systems lack their own moral reasoning, which may lead to biases or overlook the subtle ethical factors that human judgement naturally considers.

There are techniques like Chain-of-Thought (CoT) prompting where you prompt the model to break down its reasoning process into a series of intermediate steps. This way, you can clearly understand how the model reached that specific output.

MIT researchers have proposed a neuro-symbolic AI technique that enables LLMs to better solve natural language, maths, and data analysis problems. It has two key components: neural networks that process and generate natural language and symbolic systems that perform logical reasoning and manipulation of symbols.

By combining these two, neuro-symbolic AI can leverage the strengths of deep learning for handling unstructured data while also incorporating the reasoning capabilities of symbolic systems.

Sagar Sharma

A software engineer who loves to experiment with new-gen AI. He also happens to love testing hardware and sometimes they crash. While reviving his crashed system, you can find him reading literature, manga, or watering plants.