Appearance
Streaming Responses
Strands Agents support real-time streaming responses via API Gateway Response Streaming. This enables long-running AI tasks with immediate feedback.
Overview
The streaming endpoint uses a dedicated API Gateway at stream.api.universalapi.co:
- Dedicated Streaming API - Separate API Gateway optimized for streaming
- Lambda Web Adapter (LWA) - Runs FastAPI inside Lambda
- API Gateway Response Streaming - Streams chunks as they're generated
- 15-minute timeout - Supports long-running agent tasks
Streaming vs Buffered
| Feature | Streaming | Buffered |
|---|---|---|
| Base URL | stream.api.universalapi.co | api.universalapi.co |
| Endpoint | /agent/{agentId}/chat | /agent/{agentId}/chat |
| Response | Real-time chunks | Complete response |
| Timeout | 15 minutes | 5 minutes |
| Use Case | Interactive chat | Background tasks |
| Content-Type | text/plain | application/json |
Using the Streaming Endpoint
Basic Request
bash
curl -N "https://stream.api.universalapi.co/agent/{agentId}/chat" \
-H "Content-Type: application/json" \
-H "X-Uni-UserId: YOUR_USER_ID" \
-H "X-Uni-SecretUniversalKey: YOUR_SECRET_KEY" \
-d '{"prompt": "Tell me a story about a robot."}'Using Bearer Token (Recommended)
bash
curl -N "https://stream.api.universalapi.co/agent/{agentId}/chat" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{"prompt": "Tell me a story about a robot."}'!!! tip "The -N flag" Use curl -N (or --no-buffer) to disable output buffering and see the stream in real-time.
Request Body
json
{
"prompt": "Your message to the agent",
"conversationId": "optional-uuid-to-continue-conversation"
}| Field | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | The user's message |
conversationId | string | No | UUID to continue an existing conversation |
Response Format
The streaming response is text/plain with special markers:
__META__{"conversationId": "abc123-def456"}__
Hello! I'd be happy to tell you a story about a robot.
Once upon a time, in a factory far away...Response Markers
| Marker | Format | Description |
|---|---|---|
__META__ | __META__{json}__\n | Metadata at start of response |
__TOOL__ | __TOOL__{toolName}__ | Tool execution indicator |
__ERROR__ | __ERROR__{json}__ | Error information |
Metadata JSON:
json
{
"conversationId": "625c2112-9eac-4630-bbbc-785a845a182d"
}Error JSON:
json
{
"error": "Error message here"
}Response Headers
| Header | Description |
|---|---|
X-Conversation-Id | The conversation UUID |
Content-Type | text/plain; charset=utf-8 |
Cache-Control | no-cache |
Connection | keep-alive |
Frontend Integration
JavaScript/TypeScript
typescript
async function chatWithAgent(agentId: string, prompt: string, conversationId?: string) {
const response = await fetch(
`https://stream.api.universalapi.co/agent/${agentId}/chat`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${ACCESS_TOKEN}`, // Recommended
// Or use legacy headers:
// 'X-Uni-UserId': USER_ID,
// 'X-Uni-SecretUniversalKey': SECRET_KEY,
},
body: JSON.stringify({ prompt, conversationId }),
}
);
const reader = response.body?.getReader();
const decoder = new TextDecoder();
let fullResponse = '';
let metadata: { conversationId?: string } = {};
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
// Parse metadata marker
const metaMatch = chunk.match(/__META__({.*?})__/);
if (metaMatch) {
metadata = JSON.parse(metaMatch[1]);
// Remove metadata from display text
const cleanChunk = chunk.replace(/__META__.*?__\n?/, '');
fullResponse += cleanChunk;
onChunk(cleanChunk);
} else {
fullResponse += chunk;
onChunk(chunk);
}
}
return { response: fullResponse, conversationId: metadata.conversationId };
}
// Usage
const { response, conversationId } = await chatWithAgent(
'agent-id',
'Hello!',
undefined // or existing conversationId
);
console.log('Response:', response);
console.log('Conversation ID:', conversationId);React Hook Example
typescript
import { useState, useCallback } from 'react';
interface StreamingMessage {
role: 'user' | 'assistant';
content: string;
}
export function useAgentChat(agentId: string) {
const [messages, setMessages] = useState<StreamingMessage[]>([]);
const [isStreaming, setIsStreaming] = useState(false);
const [conversationId, setConversationId] = useState<string | null>(null);
const sendMessage = useCallback(async (prompt: string) => {
setIsStreaming(true);
// Add user message
setMessages(prev => [...prev, { role: 'user', content: prompt }]);
// Add empty assistant message that we'll stream into
setMessages(prev => [...prev, { role: 'assistant', content: '' }]);
try {
const response = await fetch(
`https://stream.api.universalapi.co/agent/${agentId}/chat`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${ACCESS_TOKEN}`,
},
body: JSON.stringify({ prompt, conversationId }),
}
);
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
let chunk = decoder.decode(value, { stream: true });
// Extract metadata
const metaMatch = chunk.match(/__META__({.*?})__/);
if (metaMatch) {
const meta = JSON.parse(metaMatch[1]);
setConversationId(meta.conversationId);
chunk = chunk.replace(/__META__.*?__\n?/, '');
}
// Update the last message (assistant's response)
setMessages(prev => {
const updated = [...prev];
updated[updated.length - 1].content += chunk;
return updated;
});
}
} finally {
setIsStreaming(false);
}
}, [agentId, conversationId]);
return { messages, sendMessage, isStreaming, conversationId };
}Python
python
import requests
def stream_chat(agent_id: str, prompt: str, conversation_id: str = None):
"""Stream a chat response from a Strands Agent."""
response = requests.post(
f"https://stream.api.universalapi.co/agent/{agent_id}/chat",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {ACCESS_TOKEN}", # Recommended
# Or use legacy headers:
# "X-Uni-UserId": USER_ID,
# "X-Uni-SecretUniversalKey": SECRET_KEY,
},
json={"prompt": prompt, "conversationId": conversation_id},
stream=True
)
full_response = ""
metadata = {}
for chunk in response.iter_content(chunk_size=None, decode_unicode=True):
# Check for metadata marker
if "__META__" in chunk:
import re
match = re.search(r'__META__({.*?})__', chunk)
if match:
import json
metadata = json.loads(match.group(1))
chunk = re.sub(r'__META__.*?__\n?', '', chunk)
full_response += chunk
print(chunk, end="", flush=True)
print() # Newline at end
return full_response, metadata.get("conversationId")
# Usage
response, conv_id = stream_chat("agent-id", "Hello!")
print(f"Conversation ID: {conv_id}")Error Handling
Errors are streamed as __ERROR__ markers:
__META__{"conversationId": "abc123"}__
__ERROR__{"error": "AWS Bedrock credentials required. Please add your AWS credentials in the API Keys section."}__Common Errors
| Error | Cause | Solution |
|---|---|---|
Authentication required | Missing or invalid credentials | Check Bearer token |
Agent not found | Invalid agentId | Verify agent exists |
Insufficient credits for Platform Bedrock | Less than 5 credits and no AWS keys | Add credits or store AWS credentials |
Access denied | Agent is private | Use your own agent or public agent |
Platform Bedrock (Managed AI)
No AWS account needed. If you don't have AWS credentials stored, agents automatically use Universal API's own Bedrock access (Platform Bedrock):
- Bedrock token costs + 20% infrastructure fee are charged to your credits
- Requires ≥ 5 credits to start
- Zero configuration — it just works
- The
__META__marker will include"bedrockProvider": "platform"when Platform Bedrock is active
If you store your own AWS credentials in the Credentials page, those are used instead and Bedrock costs go to your AWS bill directly.
Detecting Platform Bedrock in Streams
__META__{"conversationId":"abc123","requestId":"req-xyz","bedrockProvider":"platform"}__
Hello! I'm using Platform Bedrock to respond...When bedrockProvider is "platform" in the __META__ marker, the agent is using Universal API's Bedrock credentials and token costs will be charged to your credits.
Timeout Behavior
- Streaming endpoint: 15 minutes (900 seconds)
- Buffered endpoint: 5 minutes (300 seconds)
For long-running tasks, the streaming endpoint will continue sending data as long as the agent is processing. If no data is sent for an extended period, the connection may be closed by intermediate proxies.
Architecture Details
┌─────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│ Client │────▶│ API Gateway │────▶│ Lambda + LWA │
│ │◀────│ (ResponseStream) │◀────│ (FastAPI/Uvicorn) │
└─────────────┘ └─────────────────────┘ └─────────────────────┘
│ │
│ ResponseTransferMode: │ AWS_LWA_INVOKE_MODE:
│ STREAM │ RESPONSE_STREAM
│ │
│ TimeoutInMillis: │ Timeout: 900s
│ 900000 │
│ ▼
│ ┌─────────────────────┐
│ │ AWS Bedrock │
│ │ (Claude, etc.) │
│ └─────────────────────┘Key Configuration:
API Gateway Integration:
ResponseTransferMode: STREAM- Uses
response-streaming-invocationsLambda endpoint TimeoutInMillis: 900000(15 minutes)
Lambda Environment:
AWS_LWA_INVOKE_MODE: RESPONSE_STREAM- Lambda Web Adapter layer (ARM64)
- FastAPI with
StreamingResponse
Best Practices
- Always handle the
__META__marker - Extract the conversationId for follow-up messages - Use streaming for interactive UIs - Better user experience with real-time feedback
- Implement reconnection logic - Network issues can interrupt streams
- Parse markers before displaying - Remove
__META__,__TOOL__,__ERROR__from user-visible output - Set appropriate timeouts - Client-side timeouts should exceed 15 minutes for long tasks
Next Steps
- Creating Agents - Build agents with custom tools
- API Reference - Complete endpoint documentation