🚀 First Benchmark of OpenAI's 4o Image Generation Model!
We've just completed the first-ever (to our knowledge) benchmarking of the new OpenAI 4o image generation model, and the results are impressive!
In our tests, OpenAI 4o image generation absolutely crushed leading competitors, including Black Forest Labs, Google, xAI, Ideogram, Recraft, and DeepSeek AI, in prompt alignment and coherence! They hold a gap of more than 20% to the nearest competitor in terms of Bradley-Terry score, the biggest we have seen since the beginning of the benchmark!
The benchmarks are based on 200k human responses collected through our API. However, the most challenging part wasn't the benchmarking itself, but generating and downloading the images:
- 5 hours to generate 1000 images (no API available yet)
- Just 10 minutes to set up and launch the benchmark
- Over 200,000 responses rapidly collected
While generating the images, we faced some hurdles that meant that we had to leave out certain parts of our prompt set. Particularly we observed that the OpenAI 4o model proactively refused to generate certain images:
🚫 Styles of living artists: completely blocked
🚫 Copyrighted characters (e.g., Darth Vader, Pokémon): initially generated but subsequently blocked
Overall, OpenAI 4o stands out significantly in alignment and coherence, especially excelling in certain unusual prompts that have historically caused issues such as: 'A chair on a cat.' See the images for more examples!