Skip to content

Streaming Responses

Strands Agents support real-time streaming responses via API Gateway Response Streaming. This enables long-running AI tasks with immediate feedback.

Overview

The streaming endpoint uses a dedicated API Gateway at stream.api.universalapi.co:

  • Dedicated Streaming API - Separate API Gateway optimized for streaming
  • Lambda Web Adapter (LWA) - Runs FastAPI inside Lambda
  • API Gateway Response Streaming - Streams chunks as they're generated
  • 15-minute timeout - Supports long-running agent tasks

Streaming vs Buffered

FeatureStreamingBuffered
Base URLstream.api.universalapi.coapi.universalapi.co
Endpoint/agent/{agentId}/chat/agent/{agentId}/chat
ResponseReal-time chunksComplete response
Timeout15 minutes5 minutes
Use CaseInteractive chatBackground tasks
Content-Typetext/plainapplication/json

Using the Streaming Endpoint

Basic Request

bash
curl -N "https://stream.api.universalapi.co/agent/{agentId}/chat" \
  -H "Content-Type: application/json" \
  -H "X-Uni-UserId: YOUR_USER_ID" \
  -H "X-Uni-SecretUniversalKey: YOUR_SECRET_KEY" \
  -d '{"prompt": "Tell me a story about a robot."}'
bash
curl -N "https://stream.api.universalapi.co/agent/{agentId}/chat" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -d '{"prompt": "Tell me a story about a robot."}'

!!! tip "The -N flag" Use curl -N (or --no-buffer) to disable output buffering and see the stream in real-time.

Request Body

json
{
  "prompt": "Your message to the agent",
  "conversationId": "optional-uuid-to-continue-conversation"
}
FieldTypeRequiredDescription
promptstringYesThe user's message
conversationIdstringNoUUID to continue an existing conversation

Response Format

The streaming response is text/plain with special markers:

__META__{"conversationId": "abc123-def456"}__
Hello! I'd be happy to tell you a story about a robot.

Once upon a time, in a factory far away...

Response Markers

MarkerFormatDescription
__META____META__{json}__\nMetadata at start of response
__TOOL____TOOL__{toolName}__Tool execution indicator
__ERROR____ERROR__{json}__Error information

Metadata JSON:

json
{
  "conversationId": "625c2112-9eac-4630-bbbc-785a845a182d"
}

Error JSON:

json
{
  "error": "Error message here"
}

Response Headers

HeaderDescription
X-Conversation-IdThe conversation UUID
Content-Typetext/plain; charset=utf-8
Cache-Controlno-cache
Connectionkeep-alive

Frontend Integration

JavaScript/TypeScript

typescript
async function chatWithAgent(agentId: string, prompt: string, conversationId?: string) {
  const response = await fetch(
    `https://stream.api.universalapi.co/agent/${agentId}/chat`,
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${ACCESS_TOKEN}`,  // Recommended
        // Or use legacy headers:
        // 'X-Uni-UserId': USER_ID,
        // 'X-Uni-SecretUniversalKey': SECRET_KEY,
      },
      body: JSON.stringify({ prompt, conversationId }),
    }
  );

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();
  let fullResponse = '';
  let metadata: { conversationId?: string } = {};

  while (reader) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value, { stream: true });
    
    // Parse metadata marker
    const metaMatch = chunk.match(/__META__({.*?})__/);
    if (metaMatch) {
      metadata = JSON.parse(metaMatch[1]);
      // Remove metadata from display text
      const cleanChunk = chunk.replace(/__META__.*?__\n?/, '');
      fullResponse += cleanChunk;
      onChunk(cleanChunk);
    } else {
      fullResponse += chunk;
      onChunk(chunk);
    }
  }

  return { response: fullResponse, conversationId: metadata.conversationId };
}

// Usage
const { response, conversationId } = await chatWithAgent(
  'agent-id',
  'Hello!',
  undefined // or existing conversationId
);
console.log('Response:', response);
console.log('Conversation ID:', conversationId);

React Hook Example

typescript
import { useState, useCallback } from 'react';

interface StreamingMessage {
  role: 'user' | 'assistant';
  content: string;
}

export function useAgentChat(agentId: string) {
  const [messages, setMessages] = useState<StreamingMessage[]>([]);
  const [isStreaming, setIsStreaming] = useState(false);
  const [conversationId, setConversationId] = useState<string | null>(null);

  const sendMessage = useCallback(async (prompt: string) => {
    setIsStreaming(true);
    
    // Add user message
    setMessages(prev => [...prev, { role: 'user', content: prompt }]);
    
    // Add empty assistant message that we'll stream into
    setMessages(prev => [...prev, { role: 'assistant', content: '' }]);

    try {
      const response = await fetch(
        `https://stream.api.universalapi.co/agent/${agentId}/chat`,
        {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'Authorization': `Bearer ${ACCESS_TOKEN}`,
          },
          body: JSON.stringify({ prompt, conversationId }),
        }
      );

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();

      while (reader) {
        const { done, value } = await reader.read();
        if (done) break;

        let chunk = decoder.decode(value, { stream: true });
        
        // Extract metadata
        const metaMatch = chunk.match(/__META__({.*?})__/);
        if (metaMatch) {
          const meta = JSON.parse(metaMatch[1]);
          setConversationId(meta.conversationId);
          chunk = chunk.replace(/__META__.*?__\n?/, '');
        }

        // Update the last message (assistant's response)
        setMessages(prev => {
          const updated = [...prev];
          updated[updated.length - 1].content += chunk;
          return updated;
        });
      }
    } finally {
      setIsStreaming(false);
    }
  }, [agentId, conversationId]);

  return { messages, sendMessage, isStreaming, conversationId };
}

Python

python
import requests

def stream_chat(agent_id: str, prompt: str, conversation_id: str = None):
    """Stream a chat response from a Strands Agent."""
    
    response = requests.post(
        f"https://stream.api.universalapi.co/agent/{agent_id}/chat",
        headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {ACCESS_TOKEN}",  # Recommended
            # Or use legacy headers:
            # "X-Uni-UserId": USER_ID,
            # "X-Uni-SecretUniversalKey": SECRET_KEY,
        },
        json={"prompt": prompt, "conversationId": conversation_id},
        stream=True
    )
    
    full_response = ""
    metadata = {}
    
    for chunk in response.iter_content(chunk_size=None, decode_unicode=True):
        # Check for metadata marker
        if "__META__" in chunk:
            import re
            match = re.search(r'__META__({.*?})__', chunk)
            if match:
                import json
                metadata = json.loads(match.group(1))
                chunk = re.sub(r'__META__.*?__\n?', '', chunk)
        
        full_response += chunk
        print(chunk, end="", flush=True)
    
    print()  # Newline at end
    return full_response, metadata.get("conversationId")

# Usage
response, conv_id = stream_chat("agent-id", "Hello!")
print(f"Conversation ID: {conv_id}")

Error Handling

Errors are streamed as __ERROR__ markers:

__META__{"conversationId": "abc123"}__
__ERROR__{"error": "AWS Bedrock credentials required. Please add your AWS credentials in the API Keys section."}__

Common Errors

ErrorCauseSolution
Authentication requiredMissing or invalid credentialsCheck Bearer token
Agent not foundInvalid agentIdVerify agent exists
Insufficient credits for Platform BedrockLess than 5 credits and no AWS keysAdd credits or store AWS credentials
Access deniedAgent is privateUse your own agent or public agent

Platform Bedrock (Managed AI)

No AWS account needed. If you don't have AWS credentials stored, agents automatically use Universal API's own Bedrock access (Platform Bedrock):

  • Bedrock token costs + 20% infrastructure fee are charged to your credits
  • Requires ≥ 5 credits to start
  • Zero configuration — it just works
  • The __META__ marker will include "bedrockProvider": "platform" when Platform Bedrock is active

If you store your own AWS credentials in the Credentials page, those are used instead and Bedrock costs go to your AWS bill directly.

Detecting Platform Bedrock in Streams

__META__{"conversationId":"abc123","requestId":"req-xyz","bedrockProvider":"platform"}__
Hello! I'm using Platform Bedrock to respond...

When bedrockProvider is "platform" in the __META__ marker, the agent is using Universal API's Bedrock credentials and token costs will be charged to your credits.

Timeout Behavior

  • Streaming endpoint: 15 minutes (900 seconds)
  • Buffered endpoint: 5 minutes (300 seconds)

For long-running tasks, the streaming endpoint will continue sending data as long as the agent is processing. If no data is sent for an extended period, the connection may be closed by intermediate proxies.

Architecture Details

┌─────────────┐     ┌─────────────────────┐     ┌─────────────────────┐
│   Client    │────▶│   API Gateway       │────▶│   Lambda + LWA      │
│             │◀────│   (ResponseStream)  │◀────│   (FastAPI/Uvicorn) │
└─────────────┘     └─────────────────────┘     └─────────────────────┘
                           │                            │
                           │ ResponseTransferMode:      │ AWS_LWA_INVOKE_MODE:
                           │ STREAM                     │ RESPONSE_STREAM
                           │                            │
                           │ TimeoutInMillis:           │ Timeout: 900s
                           │ 900000                     │
                           │                            ▼
                           │                    ┌─────────────────────┐
                           │                    │   AWS Bedrock       │
                           │                    │   (Claude, etc.)    │
                           │                    └─────────────────────┘

Key Configuration:

  1. API Gateway Integration:

    • ResponseTransferMode: STREAM
    • Uses response-streaming-invocations Lambda endpoint
    • TimeoutInMillis: 900000 (15 minutes)
  2. Lambda Environment:

    • AWS_LWA_INVOKE_MODE: RESPONSE_STREAM
    • Lambda Web Adapter layer (ARM64)
    • FastAPI with StreamingResponse

Best Practices

  1. Always handle the __META__ marker - Extract the conversationId for follow-up messages
  2. Use streaming for interactive UIs - Better user experience with real-time feedback
  3. Implement reconnection logic - Network issues can interrupt streams
  4. Parse markers before displaying - Remove __META__, __TOOL__, __ERROR__ from user-visible output
  5. Set appropriate timeouts - Client-side timeouts should exceed 15 minutes for long tasks

Next Steps

Universal API - The agentic entry point to the universe of APIs