MODELS

Models reference

GPT-5.4, GPT-5.5, Codex-5.3, GPT-5.4-mini, Fast and Priority tiers — cost coefficients, reasoning effort levels, selection guide.

May 19, 2026

Summary table

Model	Purpose	Cost coefficient	Context
`gpt-5.4`	Base GPT-5, balanced default	1.0×	256k
`gpt-5.5`	Extended, better for long-context tasks	1.4×	400k
`gpt-5.4-mini`	Cheap and fast, for classification	0.2×	128k
`codex-5.3`	Code-specialized for Codex CLI	1.1×	256k

The coefficient is applied to the base OpenAI token price and multiplied by the 1.09 margin.

GPT-5.4

The default general-purpose model. Good for chat, content generation, simple reasoning. Pick it when unsure — it offers the best price/quality trade-off.

{"model": "gpt-5.4", "messages": [...]}

GPT-5.5

Extended version with larger context and deeper reasoning. Use it for:

long documents (>100k tokens),
complex code analysis,
multi-step reasoning,
math and logic problems.

It costs 40% more, but pays off when gpt-5.4 underperforms.

GPT-5.4-mini

The cheapest model. Suitable for:

classification, tagging, labeling,
structured-data extraction from text,
simple fallback flows,
embedding-alternative use cases.

Do not use it for long-form generation or code — quality drops.

Codex-5.3

Specialized for Codex CLI and code generation. Supports the Responses API (POST /v1/responses) for interactive agent sessions. Pick it for:

IDE code completion,
patch/diff generation,
agent loops with tool use.

codex --model codex-5.3 "Generate a REST API in FastAPI"

Tiers: Fast and Priority

You can choose a processing tier with the service_tier request parameter:

default — standard queue.
priority — higher priority, lower latency, +30% price.
flex — batched processing, −40% price, latency up to several minutes.

{"model": "gpt-5.4", "service_tier": "priority", "messages": [...]}

Reasoning effort

GPT-5.x models accept a reasoning_effort parameter that controls internal reasoning depth:

Value	Output-token multiplier	When to use
`minimal`	1.0×	Plain answers, chat, classification
`low`	1.5×	Basic reasoning, typical tasks
`medium`	2.5×	Complex analysis, multi-step logic
`high`	4.0×	Math, proof checking, debugging

{
  "model": "gpt-5.4",
  "reasoning_effort": "medium",
  "messages": [{"role": "user", "content": "Solve this equation..."}]
}

Higher effort means more internal reasoning tokens spent — and a more expensive request. Start with minimal and increase only if the response quality is insufficient.

How to choose

General-purpose chat / agent → gpt-5.4 + minimal.
Long PDF / large-repo analysis → gpt-5.5 + medium.
Codex CLI / IDE assistant → codex-5.3.
Bulk classification → gpt-5.4-mini + minimal.
Production agent with tool use → codex-5.3 + low/medium.

See Billing for token-cost details.