OpenAI has released a new image generation model, gpt-image-2, making it available to developers immediately through the OpenAI API and the Codex coding environment. The model, designated gpt-image-2, is the API identifier for OpenAI's second-generation image generator, released alongside the ChatGPT Images 2.0 product on April 21, 2026. According to the OpenAI API model page, gpt-image-2 is described as the company's "state-of-the-art image generation model for fast, high-quality image generation and editing."

Access Tiers Across ChatGPT and the API

The rollout is tiered. ChatGPT Free users receive access to the standard gpt-image-2 model. ChatGPT Plus, Pro, and Business subscribers gain access to thinking mode, longer reasoning runs, and web search inside the generation process. API developers receive both modes via the gpt-image-2 model ID.

Enhanced functionalities, including images generated with thinking processes, are limited to ChatGPT Plus, Pro, and Business subscribers, with Enterprise rollout expected to follow.

Capabilities: Text Rendering, Multilingual Output, and Resolution

The model is designed to handle fine-grained elements that previous image models consistently struggled with, including small text, iconography, UI elements, dense compositions, and subtle stylistic instructions.

OpenAI describes gpt-image-2 as a "polyglot" model with significant gains in non-Latin script rendering. Specifically, the model now supports high-fidelity text generation in Japanese, Korean, Chinese, Hindi, and Bengali.

Aspect ratio support ranges from 3:1 (ultra-wide) to 1:3 (ultra-tall), covering formats from banners and presentation slides to mobile screens. Resolution goes up to 2K through the API.

Thinking Mode and Multi-Image Generation

ChatGPT Images 2.0 runs on the new gpt-image-2 model, which can "think" before generating, spending more or less time reasoning depending on the selected mode, and can search the web during that process.

With thinking mode enabled, ChatGPT Images 2.0 can generate up to eight images at once from a single prompt, with characters, objects, and styles remaining consistent across all scenes. OpenAI lists page-long manga generated from a single picture and a text prompt, a series of social media graphics, and design plans for different rooms in a house as example use cases.

In thinking mode, the model can pull reference images and facts mid-generation, which helps with diagram accuracy, including charts with real numbers and maps with correct labels. Standard mode does not include web search.

Image Arena Benchmark Results

gpt-image-2 claimed the number one spot across every Image Arena leaderboard, Text-to-Image, Single-Image Edit, and Multi-Image Edit. Arena ranks AI models based on blind human preference votes, making it one of the most credible third-party benchmarks in the industry.

The margin at the top of the Text-to-Image leaderboard was notable. gpt-image-2 scored 1,512, a +242 point lead over second-place Nano Banana 2, which scored 1,271. Arena described it as the largest gap between first and second place ever recorded on the leaderboard.

The dominance extended across all of Arena's seven Text-to-Image sub-categories, with particularly large gains over its predecessor, gpt-image-1.5. Those gains ranged from +197 points in Art to +316 points in Text Rendering, with Cartoon/Anime/Fantasy and Portraits both registering +296 point improvements.

API Pricing

Developers can access the model via the API under the name gpt-image-2. OpenAI charges on a token basis: $8 per million image input tokens and $30 per million image output tokens. Text tokens cost $5 (input) and $10 (output) per million.

According to OpenAI's pricing overview, a 1024 x 1024 image at low quality costs $0.006, at medium quality $0.053, and at high quality $0.211. Larger resolutions like 1024 x 1536 cost $0.005, $0.041, and $0.165, respectively.

Implications for Marketing and Creative Workflows

For digital marketers and creative teams building production workflows, gpt-image-2's improvements in text rendering and multilingual output are the most practically significant changes. The ability to generate ad creative, infographics, and branded visuals with accurate in-image text, including non-Latin scripts, removes one of the most persistent manual correction steps in AI-assisted content production. The tiered access structure means developers evaluating the model for production use should account for the distinction between standard and thinking mode outputs, as thinking mode capabilities carry additional token costs that will vary by prompt complexity.

The model is already being integrated by downstream tools, including Figma, Canva, Firefly, fal, and Hermes Agent. OpenAI highlights localized advertising, infographics, educational content, design tools, and creative platforms as its target use cases for the model.

OpenAI Launches gpt-image-2 in API and Codex, Claiming Top Spot Across All Image Arena Leaderboards

Access Tiers Across ChatGPT and the API

Capabilities: Text Rendering, Multilingual Output, and Resolution

Thinking Mode and Multi-Image Generation

Image Arena Benchmark Results

API Pricing

Implications for Marketing and Creative Workflows

Don't forget to share this post!

Subscribe to Our Newsletter

It's a competitive market. Contact us to learn how you can stand out from the crowd.

Ready To Rule The First Page of Google?

What Our Clients Have To Say