Skip to content

Voice-to-Text Agent Delegation

When a voice agent delegates tasks to a text agent, the platform automatically propagates session context so your MCP tools know which voice call they're serving. Zero configuration required.

The Pattern

Voice agents (bidi) can't use MCP servers directly — they delegate complex operations to text agents. This creates a multi-agent chain:

📞 Caller speaks
  → Voice Agent (bidi, real-time audio)
    → Text Agent (delegated via call_uapi_agent tool)
      → MCP Server (booking, CRM, database tools)

The problem: Without context propagation, the MCP server wouldn't know which voice call triggered the tool execution.

The solution: UAPI automatically injects parentConversationId at every hop — your MCP tools receive the full lineage with zero setup.

How It Works

┌──────────────────────────────────────────┐
│  Voice Agent (bidi runtime)              │
│                                          │
│  conversationId: "voice-session-abc"     │
│  agentId: "voice-agent-123"             │
│                                          │
│  → Calls text agent via call_uapi_agent  │
│  → Platform auto-injects context header  │
└──────────────────┬───────────────────────┘
                   │ (automatic)

┌──────────────────────────────────────────┐
│  Text Agent                              │
│                                          │
│  conversationId: "text-conv-xyz"         │
│  parentConversationId: "voice-session-abc" ← auto-set
│  parentAgentId: "voice-agent-123"        │ ← auto-set
│                                          │
│  → Calls MCP tools                       │
│  → Full lineage passed downstream        │
└──────────────────┬───────────────────────┘
                   │ (automatic)

┌──────────────────────────────────────────┐
│  MCP Server                              │
│                                          │
│  userContext.sessionContext = {           │
│    conversationId: "text-conv-xyz",      │
│    agentId: "text-agent-456",            │
│    userId: "user-789",                   │
│    parentConversationId: "voice-session-abc", ← the voice call!
│    parentAgentId: "voice-agent-123"      │
│  }                                       │
└──────────────────────────────────────────┘

Everything above happens automatically. You don't need to pass conversation IDs manually, configure headers, or modify your agent code.

MCP Server: Accessing Parent Context

In your MCP server, access parentConversationId to link records back to the originating voice call:

javascript
function createMcpServer(userContext) {
  const server = new McpServer({ name: "my-booking-server", version: "1.0.0" });

  const session = userContext.sessionContext || {};

  server.registerTool("create_booking", {
    description: "Create a booking for a patient",
    inputSchema: {
      patientName: z.string(),
      date: z.string(),
      time: z.string(),
      service: z.string(),
    }
  }, async ({ patientName, date, time, service }) => {
    const booking = await db.insert({
      patientName,
      date,
      time,
      service,
      // Link this booking to both the text conversation AND the voice call
      conversationId: session.conversationId,            // text agent conv
      voiceSessionId: session.parentConversationId,      // originating voice call
      voiceAgentId: session.parentAgentId,               // voice agent that initiated
      createdBy: session.userId,
    });

    return {
      content: [{ type: "text", text: `Booking confirmed: ${booking.id}` }]
    };
  });

  return server;
}
module.exports = { createMcpServer };

Key Fields

  • session.conversationId — The text agent conversation that directly called your tool
  • session.parentConversationId — The voice session (phone call) that triggered everything
  • session.parentAgentId — The voice agent UUID

Voice Agent Setup

Your voice agent just needs the standard call_uapi_agent tool to delegate work. The platform handles context propagation automatically:

python
from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.models.nova_sonic import BidiNovaSonicModel
from strands import tool
import os, json, urllib.request

@tool
def call_uapi_agent(agent_id: str, prompt: str, conversation_id: str = "") -> str:
    """Delegate a task to a text agent.

    Args:
        agent_id: UUID of the text agent to call
        prompt: The task to delegate
        conversation_id: Optional, for multi-turn with the same text agent
    """
    bearer_token = os.environ.get("UNIVERSALAPI_BEARER_TOKEN", "")
    url = f"https://stream.api.universalapi.co/agent/{agent_id}/chat"

    payload = {"prompt": prompt}
    if conversation_id:
        payload["conversationId"] = conversation_id

    req = urllib.request.Request(
        url,
        data=json.dumps(payload).encode("utf-8"),
        method="POST",
        headers={
            "Authorization": f"Bearer {bearer_token}",
            "Content-Type": "application/json",
        },
    )

    with urllib.request.urlopen(req, timeout=840) as resp:
        raw = resp.read().decode("utf-8")

    # Strip metadata markers
    text_lines = []
    for line in raw.strip().split("\n"):
        if line.startswith(("__META__", "__METRICS__", "__COMPLETE__",
                            "__TOOL__", "__ERROR__")):
            continue
        text_lines.append(line)

    return "\n".join(text_lines).strip() or "Empty response."


def create_bidi_agent():
    model = BidiNovaSonicModel(
        region="us-east-1",
        model_id="amazon.nova-sonic-v1:0",
        provider_config={
            "audio": {
                "input_sample_rate": 16000,
                "output_sample_rate": 24000,
                "voice": "tiffany"
            }
        }
    )

    system_prompt = """You are a friendly AI receptionist.
When callers want to book an appointment or check availability,
delegate to the booking assistant using call_uapi_agent.
Keep voice responses concise (1-3 sentences)."""

    return BidiAgent(
        model=model,
        system_prompt=system_prompt,
        tools=[call_uapi_agent]
    )

That's it — no special configuration for context propagation. When call_uapi_agent calls the text agent, the platform automatically attaches the voice session's context via X-UAPI-Session-Context header.

Querying Correlated Conversations

To build a "call history" or "interaction timeline" view, query the AgentConversationTable for all text agent conversations spawned by a given voice session:

python
import boto3
from boto3.dynamodb.conditions import Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('AgentConversationTable')

def get_delegated_conversations(voice_session_id: str):
    """Get all text agent work delegated from a voice call."""
    response = table.scan(
        FilterExpression=Attr('parentConversationId').eq(voice_session_id)
    )
    return response['Items']

Each returned item includes:

  • conversationId — The text agent conversation UUID
  • agentId — Which text agent handled it
  • parentConversationId — Links back to the voice session
  • parentAgentId — The voice agent that delegated
  • title — AI-generated title for the conversation
  • createdAt / updatedAt — Timestamps

GSI Recommendation

For production at scale, consider adding a GSI on parentConversationId for O(1) lookups instead of table scans. For moderate volume, scan with filter works fine.

Complete Example: Dental Receptionist

Here's a complete real-world setup:

1. Voice Agent (receptionist)

Answers phone calls, greets patients, delegates booking tasks.

2. Text Agent (booking assistant)

Connects to an MCP server with scheduling tools. Receives caller context automatically.

3. MCP Server (scheduling tools)

Creates bookings, checks availability. Links every booking record to the originating voice call via parentConversationId.

Result: When you look at a booking record in your database, you can trace it directly back to the specific phone call that created it — even though the booking was made by a text agent, not the voice agent itself.

What Gets Propagated

FieldSourceAvailable In
conversationIdCurrent text agent sessionMCP sessionContext
agentIdCurrent text agent UUIDMCP sessionContext
userIdEnd-user (voice caller's authenticated user)MCP sessionContext
parentConversationIdThe voice session UUIDMCP sessionContext, DynamoDB record
parentAgentIdThe voice agent UUIDMCP sessionContext, DynamoDB record
channelIdChannel UUID (if via Twilio)MCP sessionContext
platform"twilio-voice", "browser", etc.MCP sessionContext

FAQ

Do I need to pass parentConversationId manually?

No. The platform handles it automatically when one agent calls another. The X-UAPI-Session-Context header is injected transparently on any HTTP call to UAPI domains from within an agent sandbox.

What if my text agent calls another text agent (3-level chain)?

The immediate parent's context propagates. Agent A → Agent B → Agent C: Agent C sees parentConversationId = Agent B's conversation. For the root of the chain, you can follow the parent chain up.

Does this work with channels (Slack, SMS, etc.)?

Yes! If the voice agent is triggered via a Twilio channel, channelId and platform are also included in the session context. Your MCP tools can see how the interaction started.

What if my MCP server is called directly (not from an agent)?

userContext.sessionContext will be null. Always handle this case:

javascript
const { parentConversationId } = userContext.sessionContext || {};
if (parentConversationId) {
  // Called from a delegated agent — link to parent
} else {
  // Called directly or from a single-level agent
}

Universal API - The agentic entry point to the universe of APIs