Models reference
GPT-5.4, GPT-5.5, Codex-5.3, GPT-5.4-mini, Fast and Priority tiers — cost coefficients, reasoning effort levels, selection guide.
Summary table
| Model | Purpose | Cost coefficient | Context |
|---|---|---|---|
gpt-5.4 | Base GPT-5, balanced default | 1.0× | 256k |
gpt-5.5 | Extended, better for long-context tasks | 1.4× | 400k |
gpt-5.4-mini | Cheap and fast, for classification | 0.2× | 128k |
codex-5.3 | Code-specialized for Codex CLI | 1.1× | 256k |
The coefficient is applied to the base OpenAI token price and multiplied by the 1.09 margin.
GPT-5.4
The default general-purpose model. Good for chat, content generation, simple reasoning. Pick it when unsure — it offers the best price/quality trade-off.
{"model": "gpt-5.4", "messages": [...]}
GPT-5.5
Extended version with larger context and deeper reasoning. Use it for:
- long documents (>100k tokens),
- complex code analysis,
- multi-step reasoning,
- math and logic problems.
It costs 40% more, but pays off when gpt-5.4 underperforms.
GPT-5.4-mini
The cheapest model. Suitable for:
- classification, tagging, labeling,
- structured-data extraction from text,
- simple fallback flows,
- embedding-alternative use cases.
Do not use it for long-form generation or code — quality drops.
Codex-5.3
Specialized for Codex CLI and code generation. Supports the Responses API (POST /v1/responses) for interactive agent sessions. Pick it for:
- IDE code completion,
- patch/diff generation,
- agent loops with tool use.
codex --model codex-5.3 "Generate a REST API in FastAPI"
Tiers: Fast and Priority
You can choose a processing tier with the service_tier request parameter:
default— standard queue.priority— higher priority, lower latency, +30% price.flex— batched processing, −40% price, latency up to several minutes.
{"model": "gpt-5.4", "service_tier": "priority", "messages": [...]}
Reasoning effort
GPT-5.x models accept a reasoning_effort parameter that controls internal reasoning depth:
| Value | Output-token multiplier | When to use |
|---|---|---|
minimal | 1.0× | Plain answers, chat, classification |
low | 1.5× | Basic reasoning, typical tasks |
medium | 2.5× | Complex analysis, multi-step logic |
high | 4.0× | Math, proof checking, debugging |
{
"model": "gpt-5.4",
"reasoning_effort": "medium",
"messages": [{"role": "user", "content": "Solve this equation..."}]
}
Higher effort means more internal reasoning tokens spent — and a more expensive request. Start with minimal and increase only if the response quality is insufficient.
How to choose
- General-purpose chat / agent →
gpt-5.4+minimal. - Long PDF / large-repo analysis →
gpt-5.5+medium. - Codex CLI / IDE assistant →
codex-5.3. - Bulk classification →
gpt-5.4-mini+minimal. - Production agent with tool use →
codex-5.3+low/medium.
See Billing for token-cost details.