流式输出
TTToken 以真流式的方式透传上游事件:数据一到网关就立即转发给你,中间不做缓冲/重排。这意味着你看到的协议格式和 OpenAI / Anthropic / Google 官方完全一致。
SSE 事件解析
OpenAI / Claude 默认返回 text/event-stream。客户端按 SSE 规范解析即可:事件之间用两个换行(\n\n)分隔。
OpenAI 格式
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""}}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"你"}}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"好"}}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","choices":[{"index":0,"finish_reason":"stop","delta":{}}]}
data: [DONE]
Claude 格式
Anthropic 的 SSE 用 event: 字段区分事件类型。关键事件顺序:
message_start:会话元信息content_block_start:新块开始(text / thinking / tool_use)content_block_delta:增量文本content_block_stopmessage_delta:最后 stop_reason 与 usagemessage_stop
Gemini 格式
默认:NDJSON(每行一个 JSON,无 data: 前缀)。
传 ?alt=sse 得到 SSE 风格,与 Google 官方 SDK 行为一致。
Python 消费示例
stream = client.chat.completions.create(
model="gpt-4o",
stream=True,
messages=[{"role":"user","content":"讲个笑话"}],
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role":"user","content":"讲个笑话"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
import httpx, json
with httpx.stream("POST",
"https://tttoken.xyz/v1/chat/completions",
headers={"Authorization": f"Bearer {KEY}"},
json={"model":"gpt-4o","stream":True,
"messages":[{"role":"user","content":"Hi"}]},
timeout=None,
) as r:
for line in r.iter_lines():
if not line or not line.startswith("data: "):
continue
payload = line[6:]
if payload == "[DONE]":
break
data = json.loads(payload)
print(data["choices"][0]["delta"].get("content", ""), end="")
curl 调试
带 -N(关闭 buffer)就能实时看到流:
curl -N https://tttoken.xyz/v1/chat/completions \
-H "Authorization: Bearer $TTT_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"stream": true,
"messages": [{"role":"user","content":"数 1 到 10"}]
}'
流中错误处理
流开始后出错,TTToken 会发送一个特殊事件再关闭连接:
OpenAI 风格
data: {"error":{"message":"upstream_timeout","type":"server_error","code":504}}
data: [DONE]
Claude 风格
event: error
data: {"type":"error","error":{"type":"overloaded_error","message":"..."}}
客户端最佳实践
- 收到事件先
JSON.parse,存在error字段即视为失败。 - 不要依赖 HTTP 状态码判断(一旦响应头发出就已经是
200)。 - 对
message_start / chatcmpl chunk中的id做记录,方便排障(搭配响应头X-Request-Id)。 - 读取到
[DONE]或message_stop之前若连接断开,可按幂等策略重试。