流式输出

最后更新:2026-04-24 · 预计阅读 7 分钟

TTToken 以真流式的方式透传上游事件:数据一到网关就立即转发给你,中间不做缓冲/重排。这意味着你看到的协议格式和 OpenAI / Anthropic / Google 官方完全一致。

SSE 事件解析

OpenAI / Claude 默认返回 text/event-stream。客户端按 SSE 规范解析即可:事件之间用两个换行(\n\n)分隔。

OpenAI 格式

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""}}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"你"}}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"好"}}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","choices":[{"index":0,"finish_reason":"stop","delta":{}}]}

data: [DONE]

Claude 格式

Anthropic 的 SSE 用 event: 字段区分事件类型。关键事件顺序:

  1. message_start:会话元信息
  2. content_block_start:新块开始(text / thinking / tool_use)
  3. content_block_delta:增量文本
  4. content_block_stop
  5. message_delta:最后 stop_reason 与 usage
  6. message_stop

Gemini 格式

默认:NDJSON(每行一个 JSON,无 data: 前缀)。
?alt=sse 得到 SSE 风格,与 Google 官方 SDK 行为一致。

Python 消费示例

stream = client.chat.completions.create(
    model="gpt-4o",
    stream=True,
    messages=[{"role":"user","content":"讲个笑话"}],
)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
with client.messages.stream(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role":"user","content":"讲个笑话"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
import httpx, json

with httpx.stream("POST",
    "https://tttoken.xyz/v1/chat/completions",
    headers={"Authorization": f"Bearer {KEY}"},
    json={"model":"gpt-4o","stream":True,
          "messages":[{"role":"user","content":"Hi"}]},
    timeout=None,
) as r:
    for line in r.iter_lines():
        if not line or not line.startswith("data: "):
            continue
        payload = line[6:]
        if payload == "[DONE]":
            break
        data = json.loads(payload)
        print(data["choices"][0]["delta"].get("content", ""), end="")

curl 调试

-N(关闭 buffer)就能实时看到流:

curl -N https://tttoken.xyz/v1/chat/completions \
  -H "Authorization: Bearer $TTT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "stream": true,
    "messages": [{"role":"user","content":"数 1 到 10"}]
  }'

流中错误处理

流开始后出错,TTToken 会发送一个特殊事件再关闭连接:

OpenAI 风格

data: {"error":{"message":"upstream_timeout","type":"server_error","code":504}}

data: [DONE]

Claude 风格

event: error
data: {"type":"error","error":{"type":"overloaded_error","message":"..."}}

客户端最佳实践