Users In Awe of OpenAI’s GPT-4o Native Image Generation Feature

3 weeks ago 7

OpenAI, the company behind the GPT family of AI models, unveiled native image generation capabilities in GPT-4o on Tuesday. This makes it possible for GPT-4o to generate images of varied natures, like infographics, comic strips, signboards, graphics, menus, memes, street signs, and more.

It is also possible to refine and edit images generated with follow-up prompts. OpenAI has introduced native image generation features for users with Plus, Pro, Team, and Free plans. Access to Enterprise and Edu plans will be available shortly. Access to the API will be rolled out in the next few weeks.

Native image generation indicates that GPT-4o can generate images using its inherent knowledge, meaning it doesn’t have to rely on any external diffusion models, such as the company’s very own DALL-E. OpenAI also mentioned that users can continue to use DALL-E as usual.

“Creating and customising images is as simple as chatting using GPT‑4o – just describe what you need, including any specifics like aspect ratio, exact colors using hex codes, or a transparent background,” said the company.

In no time, users were blown away by its capabilities. Tobias Lutke, CEO of Shopify, shared in a post on X how the model could describe the anatomy of an unknown animal on his son’s t-shirt. After he saw the results, he remarked, “How is this even real?”. Besides, the model is also capable of generating texts without any distortions or errors.

The model is also capable of generating user interfaces based on details in a prompt without any reference images.

Users have also been experimenting with style transformations on existing photos. Grant Slatton, a founding engineer at Row Zero, showcased an example of how GPT-4o could convert a regular photo into a ‘Studio Ghibli’-style anime image. His post quickly gained traction, inspiring many others to share their own AI-generated creations.

In another instance, users could reproduce advertisement images, including the copy material. A user on X shared an ad image as a reference and asked GPT-4o to recreate it for their app. He also requested that the app screenshot in the original ad be replaced with a screenshot of their app. “Within minutes, it had almost perfectly replicated it,” he said. Besides, people are also amazed by the model’s capabilities of generating photorealistic images.

OpenAI’s announcement comes a few days after Google introduced native image generation in the Gemini 2.0 Flash AI model. Initially introduced to trusted testers in December, this feature is now accessible across all regions supported by Google AI Studio.

“Developers can now test this new capability using an experimental version of Gemini 2.0 Flash (gemini-2.0-flash-exp) in Google AI Studio and via the Gemini API,” Google said.

Read Entire Article

Users In Awe of OpenAI’s GPT-4o Native Image Generation Feature

Related

The State of Reinforcement Learning for LLM Reasoning

GPT-4o makes beautiful images but fails basic reasoning test...

Researchers introduce COLORBENCH to test color understanding...