Streaming Responses

Strands Agents support real-time streaming responses via API Gateway Response Streaming. This enables long-running AI tasks with immediate feedback.

Overview

The streaming endpoint uses a dedicated API Gateway at stream.api.universalapi.co:

Dedicated Streaming API - Separate API Gateway optimized for streaming
Lambda Web Adapter (LWA) - Runs FastAPI inside Lambda
API Gateway Response Streaming - Streams chunks as they're generated
15-minute timeout - Supports long-running agent tasks

Streaming vs Buffered

Feature	Streaming	Buffered
Base URL	`stream.api.universalapi.co`	`api.universalapi.co`
Endpoint	`/agent/{agentId}/chat`	`/agent/{agentId}/chat`
Response	Real-time chunks	Complete response
Timeout	15 minutes	5 minutes
Use Case	Interactive chat	Background tasks
Content-Type	`text/plain`	`application/json`

Using the Streaming Endpoint

Basic Request

bash

curl -N "https://stream.api.universalapi.co/agent/{agentId}/chat" \
  -H "Content-Type: application/json" \
  -H "X-Uni-UserId: YOUR_USER_ID" \
  -H "X-Uni-SecretUniversalKey: YOUR_SECRET_KEY" \
  -d '{"prompt": "Tell me a story about a robot."}'

Using Bearer Token (Recommended)

bash

curl -N "https://stream.api.universalapi.co/agent/{agentId}/chat" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -d '{"prompt": "Tell me a story about a robot."}'

!!! tip "The -N flag" Use curl -N (or --no-buffer) to disable output buffering and see the stream in real-time.

Request Body

json

{
  "prompt": "Your message to the agent",
  "conversationId": "optional-uuid-to-continue-conversation"
}

Field	Type	Required	Description
`prompt`	string	Yes	The user's message
`conversationId`	string	No	UUID to continue an existing conversation

Response Format

The streaming response is text/plain with special markers:

__META__{"conversationId": "abc123-def456"}__
Hello! I'd be happy to tell you a story about a robot.

Once upon a time, in a factory far away...

Response Markers

Marker	Format	Description
`__META__`	`__META__{json}__\n`	Metadata at start of response
`__TOOL__`	`__TOOL__{toolName}__`	Tool execution indicator
`__ERROR__`	`__ERROR__{json}__`	Error information

Metadata JSON:

json

{
  "conversationId": "625c2112-9eac-4630-bbbc-785a845a182d"
}

Error JSON:

json

{
  "error": "Error message here"
}

Response Headers

Header	Description
`X-Conversation-Id`	The conversation UUID
`Content-Type`	`text/plain; charset=utf-8`
`Cache-Control`	`no-cache`
`Connection`	`keep-alive`

Frontend Integration

JavaScript/TypeScript

typescript

async function chatWithAgent(agentId: string, prompt: string, conversationId?: string) {
  const response = await fetch(
    `https://stream.api.universalapi.co/agent/${agentId}/chat`,
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${ACCESS_TOKEN}`,  // Recommended
        // Or use legacy headers:
        // 'X-Uni-UserId': USER_ID,
        // 'X-Uni-SecretUniversalKey': SECRET_KEY,
      },
      body: JSON.stringify({ prompt, conversationId }),
    }
  );

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();
  let fullResponse = '';
  let metadata: { conversationId?: string } = {};

  while (reader) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value, { stream: true });
    
    // Parse metadata marker
    const metaMatch = chunk.match(/__META__({.*?})__/);
    if (metaMatch) {
      metadata = JSON.parse(metaMatch[1]);
      // Remove metadata from display text
      const cleanChunk = chunk.replace(/__META__.*?__\n?/, '');
      fullResponse += cleanChunk;
      onChunk(cleanChunk);
    } else {
      fullResponse += chunk;
      onChunk(chunk);
    }
  }

  return { response: fullResponse, conversationId: metadata.conversationId };
}

// Usage
const { response, conversationId } = await chatWithAgent(
  'agent-id',
  'Hello!',
  undefined // or existing conversationId
);
console.log('Response:', response);
console.log('Conversation ID:', conversationId);

React Hook Example

typescript

import { useState, useCallback } from 'react';

interface StreamingMessage {
  role: 'user' | 'assistant';
  content: string;
}

export function useAgentChat(agentId: string) {
  const [messages, setMessages] = useState<StreamingMessage[]>([]);
  const [isStreaming, setIsStreaming] = useState(false);
  const [conversationId, setConversationId] = useState<string | null>(null);

  const sendMessage = useCallback(async (prompt: string) => {
    setIsStreaming(true);
    
    // Add user message
    setMessages(prev => [...prev, { role: 'user', content: prompt }]);
    
    // Add empty assistant message that we'll stream into
    setMessages(prev => [...prev, { role: 'assistant', content: '' }]);

    try {
      const response = await fetch(
        `https://stream.api.universalapi.co/agent/${agentId}/chat`,
        {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'Authorization': `Bearer ${ACCESS_TOKEN}`,
          },
          body: JSON.stringify({ prompt, conversationId }),
        }
      );

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();

      while (reader) {
        const { done, value } = await reader.read();
        if (done) break;

        let chunk = decoder.decode(value, { stream: true });
        
        // Extract metadata
        const metaMatch = chunk.match(/__META__({.*?})__/);
        if (metaMatch) {
          const meta = JSON.parse(metaMatch[1]);
          setConversationId(meta.conversationId);
          chunk = chunk.replace(/__META__.*?__\n?/, '');
        }

        // Update the last message (assistant's response)
        setMessages(prev => {
          const updated = [...prev];
          updated[updated.length - 1].content += chunk;
          return updated;
        });
      }
    } finally {
      setIsStreaming(false);
    }
  }, [agentId, conversationId]);

  return { messages, sendMessage, isStreaming, conversationId };
}

Python

python

import requests

def stream_chat(agent_id: str, prompt: str, conversation_id: str = None):
    """Stream a chat response from a Strands Agent."""
    
    response = requests.post(
        f"https://stream.api.universalapi.co/agent/{agent_id}/chat",
        headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {ACCESS_TOKEN}",  # Recommended
            # Or use legacy headers:
            # "X-Uni-UserId": USER_ID,
            # "X-Uni-SecretUniversalKey": SECRET_KEY,
        },
        json={"prompt": prompt, "conversationId": conversation_id},
        stream=True
    )
    
    full_response = ""
    metadata = {}
    
    for chunk in response.iter_content(chunk_size=None, decode_unicode=True):
        # Check for metadata marker
        if "__META__" in chunk:
            import re
            match = re.search(r'__META__({.*?})__', chunk)
            if match:
                import json
                metadata = json.loads(match.group(1))
                chunk = re.sub(r'__META__.*?__\n?', '', chunk)
        
        full_response += chunk
        print(chunk, end="", flush=True)
    
    print()  # Newline at end
    return full_response, metadata.get("conversationId")

# Usage
response, conv_id = stream_chat("agent-id", "Hello!")
print(f"Conversation ID: {conv_id}")

Error Handling

Errors are streamed as __ERROR__ markers:

__META__{"conversationId": "abc123"}__
__ERROR__{"error": "AWS Bedrock credentials required. Please add your AWS credentials in the API Keys section."}__

Common Errors

Error	Cause	Solution
`Authentication required`	Missing or invalid credentials	Check Bearer token
`Agent not found`	Invalid agentId	Verify agent exists
`Insufficient credits for Platform Bedrock`	Less than 5 credits and no AWS keys	Add credits or store AWS credentials
`Access denied`	Agent is private	Use your own agent or public agent

Platform Bedrock (Managed AI)

No AWS account needed. If you don't have AWS credentials stored, agents automatically use Universal API's own Bedrock access (Platform Bedrock):

Bedrock token costs + 20% infrastructure fee are charged to your credits
Requires ≥ 5 credits to start
Zero configuration — it just works
The __META__ marker will include "bedrockProvider": "platform" when Platform Bedrock is active

If you store your own AWS credentials in the Credentials page, those are used instead and Bedrock costs go to your AWS bill directly.

Detecting Platform Bedrock in Streams

__META__{"conversationId":"abc123","requestId":"req-xyz","bedrockProvider":"platform"}__
Hello! I'm using Platform Bedrock to respond...

When bedrockProvider is "platform" in the __META__ marker, the agent is using Universal API's Bedrock credentials and token costs will be charged to your credits.

Timeout Behavior

Streaming endpoint: 15 minutes (900 seconds)
Buffered endpoint: 5 minutes (300 seconds)

For long-running tasks, the streaming endpoint will continue sending data as long as the agent is processing. If no data is sent for an extended period, the connection may be closed by intermediate proxies.

Architecture Details

┌─────────────┐     ┌─────────────────────┐     ┌─────────────────────┐
│   Client    │────▶│   API Gateway       │────▶│   Lambda + LWA      │
│             │◀────│   (ResponseStream)  │◀────│   (FastAPI/Uvicorn) │
└─────────────┘     └─────────────────────┘     └─────────────────────┘
                           │                            │
                           │ ResponseTransferMode:      │ AWS_LWA_INVOKE_MODE:
                           │ STREAM                     │ RESPONSE_STREAM
                           │                            │
                           │ TimeoutInMillis:           │ Timeout: 900s
                           │ 900000                     │
                           │                            ▼
                           │                    ┌─────────────────────┐
                           │                    │   AWS Bedrock       │
                           │                    │   (Claude, etc.)    │
                           │                    └─────────────────────┘

Key Configuration:

API Gateway Integration:
- ResponseTransferMode: STREAM
- Uses response-streaming-invocations Lambda endpoint
- TimeoutInMillis: 900000 (15 minutes)
Lambda Environment:
- AWS_LWA_INVOKE_MODE: RESPONSE_STREAM
- Lambda Web Adapter layer (ARM64)
- FastAPI with StreamingResponse

Best Practices

Always handle the __META__ marker - Extract the conversationId for follow-up messages
Use streaming for interactive UIs - Better user experience with real-time feedback
Implement reconnection logic - Network issues can interrupt streams
Parse markers before displaying - Remove __META__, __TOOL__, __ERROR__ from user-visible output
Set appropriate timeouts - Client-side timeouts should exceed 15 minutes for long tasks

Next Steps

Creating Agents - Build agents with custom tools
API Reference - Complete endpoint documentation

Streaming Responses ​

Overview ​

Streaming vs Buffered ​

Using the Streaming Endpoint ​

Basic Request ​

Using Bearer Token (Recommended) ​

Request Body ​

Response Format ​

Response Markers ​

Response Headers ​

Frontend Integration ​

JavaScript/TypeScript ​

React Hook Example ​

Python ​

Error Handling ​

Common Errors ​

Platform Bedrock (Managed AI) ​

Detecting Platform Bedrock in Streams ​

Timeout Behavior ​

Architecture Details ​

Best Practices ​

Next Steps ​

Streaming Responses

Overview

Streaming vs Buffered

Using the Streaming Endpoint

Basic Request

Using Bearer Token (Recommended)

Request Body

Response Format

Response Markers

Response Headers

Frontend Integration

JavaScript/TypeScript

React Hook Example

Python

Error Handling

Common Errors

Platform Bedrock (Managed AI)

Detecting Platform Bedrock in Streams

Timeout Behavior

Architecture Details

Best Practices

Next Steps