Stream AI Responses in Production
Batch responses make AI feel slow. Streaming fixes the perception problem. Here's the pattern I use with Hono and the Vercel AI SDK.
John Ryan Cottam
2 min read
Stream AI Responses in Production
AI responses are slow. Not unusable slow — but slow enough that a loading spinner feels like a lie. The model is thinking, tokens are generating, and your user is staring at nothing.
Streaming fixes the perception problem. Instead of waiting for the full response, tokens arrive as they’re generated. The interface feels alive. Users read while the model writes.
Here’s how I wire it up with Hono and the Vercel AI SDK.
The setup
bun add ai @ai-sdk/anthropic hono
The route
import { Hono } from "hono";
import { streamText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { stream } from "hono/streaming";
const app = new Hono();
app.post("/api/chat", async (c) => {
const { messages } = await c.req.json();
const result = streamText({
model: anthropic("claude-3-5-sonnet-20241022"),
messages,
maxTokens: 1024,
});
return result.toDataStreamResponse();
});
export default app;
That’s the server. toDataStreamResponse() handles the SSE headers, chunking, and error framing — you don’t write any of that yourself.
The client
import { useChat } from "@ai-sdk/react";
export function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: "/api/chat",
});
return (
<div>
{messages.map((m) => (
<div key={m.id}>
<strong>{m.role}:</strong> {m.content}
</div>
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} />
<button type="submit">Send</button>
</form>
</div>
);
}
useChat manages the stream lifecycle — connecting, buffering chunks, updating state. The component just renders what’s there.
What actually matters in production
Timeouts. Streaming connections stay open longer than typical API calls. Set maxDuration on your serverless function — Vercel defaults to 10 seconds, which isn’t enough for longer generations. I use 60.
Error handling. The stream can fail mid-response. streamText surfaces errors through the stream itself, so toDataStreamResponse() forwards them correctly. On the client, useChat exposes an error state — handle it.
Backpressure. If the client disconnects, the model keeps generating. Use the abortSignal from the request to cancel:
app.post("/api/chat", async (c) => {
const { messages } = await c.req.json();
const result = streamText({
model: anthropic("claude-3-5-sonnet-20241022"),
messages,
abortSignal: c.req.raw.signal,
});
return result.toDataStreamResponse();
});
Now if the user closes the tab, the generation stops. Tokens cost money — don’t waste them.
Why this stack
The AI SDK’s streamText is provider-agnostic. Swap anthropic(...) for openai(...) or google(...) and nothing else changes. That’s the right abstraction. Hono gives you the lightweight runtime that works on Cloudflare Workers, Vercel Edge, Bun — wherever you deploy.
Streaming isn’t an optimization. It’s a fundamental part of what makes AI interfaces feel good. Build it in from the start.