BILLING

Codex Key token economy: what's counted and how to pay less

How Codex Key meters tokens, what the tariff coefficients mean, which optimizations actually cut your bill, and which are myths.

May 19, 2026·billing · tokens · optimization · pricing

Codex Key token economy: what's counted and how to pay less

Short version: you pay for tokens, not for requests or minutes. This post unpacks how each request is counted and where the actual savings hide.

What a token even is

A token is a slice of text the model chops your input and output into. English text: ~1 token per 4 characters. Russian: ~1 token per 2-3 characters. Code: denser, usually ~1 token per 3-4 characters.

Rough planning numbers:

One A4 page of prose ≈ 400-500 tokens
100 lines of Python ≈ 800-1200 tokens
One SWE-bench ticket (input + output) ≈ 15-40k tokens

Codex Key billing formula

billed_tokens = (input_tokens + output_tokens) × model_coef × mode_coef

Multiplier	Values
`model_coef`	`codex-5.3` ×0.9 · `gpt-5.4` ×1.0 · `gpt-5.5` ×4.5
`mode_coef`	`standard` ×1.0 · `fast` ×2.0 · `priority` ×2.0

Example. A gpt-5.5 call in Priority with 3000 input and 800 output tokens:

(3000 + 800) × 4.5 × 2.0 = 34,200 billed tokens

On the Team plan (~3.4B tokens for $90) that's ~$0.001 per request.

What actually cuts your bill

1. Right model per task (up to ×5 savings)

Moving 80% of traffic from gpt-5.5 to gpt-5.4 cuts your bill 4.5×. Only escalate to 5.5 where the quality delta is visible.

2. Short system prompts (×1.3-2.0)

A long system prompt ships with every request. 2,000 system tokens × 100 requests = 200k tokens before the user typed anything. Trim to 500 — save 150k.

3. Truncate history intelligently

Chats by default send the whole transcript. After 20 turns that's 30-50k input tokens. Strategies:

Sliding window of the last N messages
Summarize older turns via gpt-5.4 every N iterations
Tool-aware compaction: drop raw tool outputs after you've used them

4. Stop sequences and `max_tokens`

client.chat.completions.create(
    model="gpt-5.4",
    messages=[...],
    max_tokens=400,           # cap the answer
    stop=["\n\n---", "</answer>"],
)

Without max_tokens the model can casually emit 2-3k tokens unprompted.

5. Reasoning effort

reasoning_effort: low produces answers 30-50% shorter than medium. For simple work (classification, short answer) use low.

6. Streaming + early break

If your app can abort the stream once a condition is met (e.g. closing } in JSON) — you save on the tail.

What does not work

"Prompt compression" via GPT — usually costs more than it saves.
Replacing words with emoji — emoji tokenize denser, not cheaper.
Translating to English — ~20% savings, but quality on Russian domain tasks degrades more. Do the math.

How to inspect billing

In the Codex Key cabinet, the Usage section shows model × mode × day breakdowns. Each request is recorded with a request_id (also returned in the x-request-id response header). If something looks off — send support that ID.

Example: bill refactor on a real team

8-developer team, ~2000 requests/day:

Change	Monthly savings
Moved autocomplete from `gpt-5.4` to `codex-5.3`	~10%
Trimmed system prompt from 1800 to 600 tokens	~22%
Added history summarization in chats > 15 turns	~18%
Set `max_tokens: 600` on classification handlers	~7%
Total	~50%

Dropped from the Team plan to Pro — $360/year saved with no quality regression.

Bottom line

The biggest lever is picking the right model per task. Second biggest is system prompt and history hygiene. Everything else is half-percent tuning.

Start by labeling the 5 most frequent endpoints in your app: which model, which reasoning_effort, which max_tokens. That gives you 80% of the savings in one evening.

Codex Key token economy: what's counted and how to pay less

What a token even is

Codex Key billing formula

What actually cuts your bill

1. Right model per task (up to ×5 savings)

2. Short system prompts (×1.3-2.0)

3. Truncate history intelligently

4. Stop sequences and max_tokens

5. Reasoning effort

6. Streaming + early break

What does not work

How to inspect billing

Example: bill refactor on a real team

Bottom line

4. Stop sequences and `max_tokens`