ByteDance Releases Seedream 3.0 to Rival GPT-4o and Imagen 3

4 hours ago 4

Published on April 21, 2025
In AI News

The model can generate images with resolutions of up to 2K, enabling it to deliver high-quality results.

ByteDance, the company behind TikTok, has unveiled its latest image generation foundational model, Seedream 3.0, claiming that it outperforms OpenAI’s GPT-4o in image creation capabilities.

Seedream 3.0 is a bilingual (Chinese-English) model that tries to address limitations found in its predecessor, Seedream 2.0.

This comes right after the ‘Ghiblification’ of images with the help of GPT-4o.

The model utilises an expanded (approximately by 100%) dataset, leveraging a dynamic sampling mechanism. The pre-training phase incorporates mixed-resolution training, cross-modality RoPE, representation alignment loss, and resolution-aware timestep sampling for improved scalability and visual language alignment.

Post-training optimisation utilises diverse aesthetic captions and a VLM-based reward model to improve the quality of the final output.

The technical report mentions, “By employing consistent noise expectation and importance-aware timestep sampling, we achieve a 4 to 8 times speedup while maintaining image quality.”

With the model, one can generate up to 2K resolution images, enabling it to deliver high-quality results.

The report states that it was compared to OpenAI GPT-4o, Imagen 3, Midjourney, among others. Although it initially topped the charts according to their claims, it appears to be on par with GPT-4o and surpasses Imagen 3. This is evident when referencing the latest benchmarks from Artificial Analysis at the time of publication.

ByteDance highlights the distinct strengths of the model. In dense text rendering, Seedream 3.0 excels in handling complex Chinese text generation with superior typesetting and aesthetic composition, whereas GPT-4o, while strong with small English characters and LaTeX, shows limitations with Chinese fonts.

In image editing tasks, ByteDance’s SeedEdit, derived from Seedream, demonstrates better ID preservation and prompt following compared to GPT-4o and Gemini-2.0, although it faces challenges with more complex editing scenarios.

ByteDance claims that images generated by GPT-4o tend to exhibit a dark yellowish hue and significant noise, potentially impacting their usability. At the same time, Seedream models have consistently demonstrated strong performance in terms of colour, texture, clarity, and overall aesthetic appeal.