Voice-to-Text Agent Delegation

When a voice agent delegates tasks to a text agent, the platform automatically propagates session context so your MCP tools know which voice call they're serving. Zero configuration required.

The Pattern

Voice agents (bidi) can't use MCP servers directly — they delegate complex operations to text agents. This creates a multi-agent chain:

📞 Caller speaks
  → Voice Agent (bidi, real-time audio)
    → Text Agent (delegated via call_uapi_agent tool)
      → MCP Server (booking, CRM, database tools)

The problem: Without context propagation, the MCP server wouldn't know which voice call triggered the tool execution.

The solution: UAPI automatically injects parentConversationId at every hop — your MCP tools receive the full lineage with zero setup.

How It Works

┌──────────────────────────────────────────┐
│  Voice Agent (bidi runtime)              │
│                                          │
│  conversationId: "voice-session-abc"     │
│  agentId: "voice-agent-123"             │
│                                          │
│  → Calls text agent via call_uapi_agent  │
│  → Platform auto-injects context header  │
└──────────────────┬───────────────────────┘
                   │ (automatic)
                   ▼
┌──────────────────────────────────────────┐
│  Text Agent                              │
│                                          │
│  conversationId: "text-conv-xyz"         │
│  parentConversationId: "voice-session-abc" ← auto-set
│  parentAgentId: "voice-agent-123"        │ ← auto-set
│                                          │
│  → Calls MCP tools                       │
│  → Full lineage passed downstream        │
└──────────────────┬───────────────────────┘
                   │ (automatic)
                   ▼
┌──────────────────────────────────────────┐
│  MCP Server                              │
│                                          │
│  userContext.sessionContext = {           │
│    conversationId: "text-conv-xyz",      │
│    agentId: "text-agent-456",            │
│    userId: "user-789",                   │
│    parentConversationId: "voice-session-abc", ← the voice call!
│    parentAgentId: "voice-agent-123"      │
│  }                                       │
└──────────────────────────────────────────┘

Everything above happens automatically. You don't need to pass conversation IDs manually, configure headers, or modify your agent code.

MCP Server: Accessing Parent Context

In your MCP server, access parentConversationId to link records back to the originating voice call:

javascript

function createMcpServer(userContext) {
  const server = new McpServer({ name: "my-booking-server", version: "1.0.0" });

  const session = userContext.sessionContext || {};

  server.registerTool("create_booking", {
    description: "Create a booking for a patient",
    inputSchema: {
      patientName: z.string(),
      date: z.string(),
      time: z.string(),
      service: z.string(),
    }
  }, async ({ patientName, date, time, service }) => {
    const booking = await db.insert({
      patientName,
      date,
      time,
      service,
      // Link this booking to both the text conversation AND the voice call
      conversationId: session.conversationId,            // text agent conv
      voiceSessionId: session.parentConversationId,      // originating voice call
      voiceAgentId: session.parentAgentId,               // voice agent that initiated
      createdBy: session.userId,
    });

    return {
      content: [{ type: "text", text: `Booking confirmed: ${booking.id}` }]
    };
  });

  return server;
}
module.exports = { createMcpServer };

Key Fields

session.conversationId — The text agent conversation that directly called your tool
session.parentConversationId — The voice session (phone call) that triggered everything
session.parentAgentId — The voice agent UUID

Voice Agent Setup

Your voice agent just needs the standard call_uapi_agent tool to delegate work. The platform handles context propagation automatically:

python

from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.models.nova_sonic import BidiNovaSonicModel
from strands import tool
import os, json, urllib.request

@tool
def call_uapi_agent(agent_id: str, prompt: str, conversation_id: str = "") -> str:
    """Delegate a task to a text agent.

    Args:
        agent_id: UUID of the text agent to call
        prompt: The task to delegate
        conversation_id: Optional, for multi-turn with the same text agent
    """
    bearer_token = os.environ.get("UNIVERSALAPI_BEARER_TOKEN", "")
    url = f"https://stream.api.universalapi.co/agent/{agent_id}/chat"

    payload = {"prompt": prompt}
    if conversation_id:
        payload["conversationId"] = conversation_id

    req = urllib.request.Request(
        url,
        data=json.dumps(payload).encode("utf-8"),
        method="POST",
        headers={
            "Authorization": f"Bearer {bearer_token}",
            "Content-Type": "application/json",
        },
    )

    with urllib.request.urlopen(req, timeout=840) as resp:
        raw = resp.read().decode("utf-8")

    # Strip metadata markers
    text_lines = []
    for line in raw.strip().split("\n"):
        if line.startswith(("__META__", "__METRICS__", "__COMPLETE__",
                            "__TOOL__", "__ERROR__")):
            continue
        text_lines.append(line)

    return "\n".join(text_lines).strip() or "Empty response."


def create_bidi_agent():
    model = BidiNovaSonicModel(
        region="us-east-1",
        model_id="amazon.nova-sonic-v1:0",
        provider_config={
            "audio": {
                "input_sample_rate": 16000,
                "output_sample_rate": 24000,
                "voice": "tiffany"
            }
        }
    )

    system_prompt = """You are a friendly AI receptionist.
When callers want to book an appointment or check availability,
delegate to the booking assistant using call_uapi_agent.
Keep voice responses concise (1-3 sentences)."""

    return BidiAgent(
        model=model,
        system_prompt=system_prompt,
        tools=[call_uapi_agent]
    )

That's it — no special configuration for context propagation. When call_uapi_agent calls the text agent, the platform automatically attaches the voice session's context via X-UAPI-Session-Context header.

Querying Correlated Conversations

To build a "call history" or "interaction timeline" view, query the AgentConversationTable for all text agent conversations spawned by a given voice session:

python

import boto3
from boto3.dynamodb.conditions import Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('AgentConversationTable')

def get_delegated_conversations(voice_session_id: str):
    """Get all text agent work delegated from a voice call."""
    response = table.scan(
        FilterExpression=Attr('parentConversationId').eq(voice_session_id)
    )
    return response['Items']

Each returned item includes:

conversationId — The text agent conversation UUID
agentId — Which text agent handled it
parentConversationId — Links back to the voice session
parentAgentId — The voice agent that delegated
title — AI-generated title for the conversation
createdAt / updatedAt — Timestamps

GSI Recommendation

For production at scale, consider adding a GSI on parentConversationId for O(1) lookups instead of table scans. For moderate volume, scan with filter works fine.

Complete Example: Dental Receptionist

Here's a complete real-world setup:

1. Voice Agent (receptionist)

Answers phone calls, greets patients, delegates booking tasks.

2. Text Agent (booking assistant)

Connects to an MCP server with scheduling tools. Receives caller context automatically.

3. MCP Server (scheduling tools)

Creates bookings, checks availability. Links every booking record to the originating voice call via parentConversationId.

Result: When you look at a booking record in your database, you can trace it directly back to the specific phone call that created it — even though the booking was made by a text agent, not the voice agent itself.

What Gets Propagated

Field	Source	Available In
`conversationId`	Current text agent session	MCP `sessionContext`
`agentId`	Current text agent UUID	MCP `sessionContext`
`userId`	End-user (voice caller's authenticated user)	MCP `sessionContext`
`parentConversationId`	The voice session UUID	MCP `sessionContext`, DynamoDB record
`parentAgentId`	The voice agent UUID	MCP `sessionContext`, DynamoDB record
`channelId`	Channel UUID (if via Twilio)	MCP `sessionContext`
`platform`	`"twilio-voice"`, `"browser"`, etc.	MCP `sessionContext`

FAQ

Do I need to pass `parentConversationId` manually?

No. The platform handles it automatically when one agent calls another. The X-UAPI-Session-Context header is injected transparently on any HTTP call to UAPI domains from within an agent sandbox.

What if my text agent calls another text agent (3-level chain)?

The immediate parent's context propagates. Agent A → Agent B → Agent C: Agent C sees parentConversationId = Agent B's conversation. For the root of the chain, you can follow the parent chain up.

Does this work with channels (Slack, SMS, etc.)?

Yes! If the voice agent is triggered via a Twilio channel, channelId and platform are also included in the session context. Your MCP tools can see how the interaction started.

What if my MCP server is called directly (not from an agent)?

userContext.sessionContext will be null. Always handle this case:

javascript

const { parentConversationId } = userContext.sessionContext || {};
if (parentConversationId) {
  // Called from a delegated agent — link to parent
} else {
  // Called directly or from a single-level agent
}

Session Context — Full sessionContext reference
Multi-Agent Patterns — The call_uapi_agent tool
Voice Agents — Creating voice agents with tools
Channels — Connecting agents to Twilio, Slack, etc.

Voice-to-Text Agent Delegation ​

The Pattern ​

How It Works ​

MCP Server: Accessing Parent Context ​

Voice Agent Setup ​

Querying Correlated Conversations ​

Complete Example: Dental Receptionist ​

1. Voice Agent (receptionist) ​

2. Text Agent (booking assistant) ​

3. MCP Server (scheduling tools) ​

What Gets Propagated ​

FAQ ​

Do I need to pass parentConversationId manually? ​

What if my text agent calls another text agent (3-level chain)? ​

Does this work with channels (Slack, SMS, etc.)? ​

What if my MCP server is called directly (not from an agent)? ​

Related ​