Appearance
Voice-to-Text Agent Delegation
When a voice agent delegates tasks to a text agent, the platform automatically propagates session context so your MCP tools know which voice call they're serving. Zero configuration required.
The Pattern
Voice agents (bidi) can't use MCP servers directly — they delegate complex operations to text agents. This creates a multi-agent chain:
📞 Caller speaks
→ Voice Agent (bidi, real-time audio)
→ Text Agent (delegated via call_uapi_agent tool)
→ MCP Server (booking, CRM, database tools)The problem: Without context propagation, the MCP server wouldn't know which voice call triggered the tool execution.
The solution: UAPI automatically injects parentConversationId at every hop — your MCP tools receive the full lineage with zero setup.
How It Works
┌──────────────────────────────────────────┐
│ Voice Agent (bidi runtime) │
│ │
│ conversationId: "voice-session-abc" │
│ agentId: "voice-agent-123" │
│ │
│ → Calls text agent via call_uapi_agent │
│ → Platform auto-injects context header │
└──────────────────┬───────────────────────┘
│ (automatic)
▼
┌──────────────────────────────────────────┐
│ Text Agent │
│ │
│ conversationId: "text-conv-xyz" │
│ parentConversationId: "voice-session-abc" ← auto-set
│ parentAgentId: "voice-agent-123" │ ← auto-set
│ │
│ → Calls MCP tools │
│ → Full lineage passed downstream │
└──────────────────┬───────────────────────┘
│ (automatic)
▼
┌──────────────────────────────────────────┐
│ MCP Server │
│ │
│ userContext.sessionContext = { │
│ conversationId: "text-conv-xyz", │
│ agentId: "text-agent-456", │
│ userId: "user-789", │
│ parentConversationId: "voice-session-abc", ← the voice call!
│ parentAgentId: "voice-agent-123" │
│ } │
└──────────────────────────────────────────┘Everything above happens automatically. You don't need to pass conversation IDs manually, configure headers, or modify your agent code.
MCP Server: Accessing Parent Context
In your MCP server, access parentConversationId to link records back to the originating voice call:
javascript
function createMcpServer(userContext) {
const server = new McpServer({ name: "my-booking-server", version: "1.0.0" });
const session = userContext.sessionContext || {};
server.registerTool("create_booking", {
description: "Create a booking for a patient",
inputSchema: {
patientName: z.string(),
date: z.string(),
time: z.string(),
service: z.string(),
}
}, async ({ patientName, date, time, service }) => {
const booking = await db.insert({
patientName,
date,
time,
service,
// Link this booking to both the text conversation AND the voice call
conversationId: session.conversationId, // text agent conv
voiceSessionId: session.parentConversationId, // originating voice call
voiceAgentId: session.parentAgentId, // voice agent that initiated
createdBy: session.userId,
});
return {
content: [{ type: "text", text: `Booking confirmed: ${booking.id}` }]
};
});
return server;
}
module.exports = { createMcpServer };Key Fields
session.conversationId— The text agent conversation that directly called your toolsession.parentConversationId— The voice session (phone call) that triggered everythingsession.parentAgentId— The voice agent UUID
Voice Agent Setup
Your voice agent just needs the standard call_uapi_agent tool to delegate work. The platform handles context propagation automatically:
python
from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.models.nova_sonic import BidiNovaSonicModel
from strands import tool
import os, json, urllib.request
@tool
def call_uapi_agent(agent_id: str, prompt: str, conversation_id: str = "") -> str:
"""Delegate a task to a text agent.
Args:
agent_id: UUID of the text agent to call
prompt: The task to delegate
conversation_id: Optional, for multi-turn with the same text agent
"""
bearer_token = os.environ.get("UNIVERSALAPI_BEARER_TOKEN", "")
url = f"https://stream.api.universalapi.co/agent/{agent_id}/chat"
payload = {"prompt": prompt}
if conversation_id:
payload["conversationId"] = conversation_id
req = urllib.request.Request(
url,
data=json.dumps(payload).encode("utf-8"),
method="POST",
headers={
"Authorization": f"Bearer {bearer_token}",
"Content-Type": "application/json",
},
)
with urllib.request.urlopen(req, timeout=840) as resp:
raw = resp.read().decode("utf-8")
# Strip metadata markers
text_lines = []
for line in raw.strip().split("\n"):
if line.startswith(("__META__", "__METRICS__", "__COMPLETE__",
"__TOOL__", "__ERROR__")):
continue
text_lines.append(line)
return "\n".join(text_lines).strip() or "Empty response."
def create_bidi_agent():
model = BidiNovaSonicModel(
region="us-east-1",
model_id="amazon.nova-sonic-v1:0",
provider_config={
"audio": {
"input_sample_rate": 16000,
"output_sample_rate": 24000,
"voice": "tiffany"
}
}
)
system_prompt = """You are a friendly AI receptionist.
When callers want to book an appointment or check availability,
delegate to the booking assistant using call_uapi_agent.
Keep voice responses concise (1-3 sentences)."""
return BidiAgent(
model=model,
system_prompt=system_prompt,
tools=[call_uapi_agent]
)That's it — no special configuration for context propagation. When call_uapi_agent calls the text agent, the platform automatically attaches the voice session's context via X-UAPI-Session-Context header.
Querying Correlated Conversations
To build a "call history" or "interaction timeline" view, query the AgentConversationTable for all text agent conversations spawned by a given voice session:
python
import boto3
from boto3.dynamodb.conditions import Attr
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('AgentConversationTable')
def get_delegated_conversations(voice_session_id: str):
"""Get all text agent work delegated from a voice call."""
response = table.scan(
FilterExpression=Attr('parentConversationId').eq(voice_session_id)
)
return response['Items']Each returned item includes:
conversationId— The text agent conversation UUIDagentId— Which text agent handled itparentConversationId— Links back to the voice sessionparentAgentId— The voice agent that delegatedtitle— AI-generated title for the conversationcreatedAt/updatedAt— Timestamps
GSI Recommendation
For production at scale, consider adding a GSI on parentConversationId for O(1) lookups instead of table scans. For moderate volume, scan with filter works fine.
Complete Example: Dental Receptionist
Here's a complete real-world setup:
1. Voice Agent (receptionist)
Answers phone calls, greets patients, delegates booking tasks.
2. Text Agent (booking assistant)
Connects to an MCP server with scheduling tools. Receives caller context automatically.
3. MCP Server (scheduling tools)
Creates bookings, checks availability. Links every booking record to the originating voice call via parentConversationId.
Result: When you look at a booking record in your database, you can trace it directly back to the specific phone call that created it — even though the booking was made by a text agent, not the voice agent itself.
What Gets Propagated
| Field | Source | Available In |
|---|---|---|
conversationId | Current text agent session | MCP sessionContext |
agentId | Current text agent UUID | MCP sessionContext |
userId | End-user (voice caller's authenticated user) | MCP sessionContext |
parentConversationId | The voice session UUID | MCP sessionContext, DynamoDB record |
parentAgentId | The voice agent UUID | MCP sessionContext, DynamoDB record |
channelId | Channel UUID (if via Twilio) | MCP sessionContext |
platform | "twilio-voice", "browser", etc. | MCP sessionContext |
FAQ
Do I need to pass parentConversationId manually?
No. The platform handles it automatically when one agent calls another. The X-UAPI-Session-Context header is injected transparently on any HTTP call to UAPI domains from within an agent sandbox.
What if my text agent calls another text agent (3-level chain)?
The immediate parent's context propagates. Agent A → Agent B → Agent C: Agent C sees parentConversationId = Agent B's conversation. For the root of the chain, you can follow the parent chain up.
Does this work with channels (Slack, SMS, etc.)?
Yes! If the voice agent is triggered via a Twilio channel, channelId and platform are also included in the session context. Your MCP tools can see how the interaction started.
What if my MCP server is called directly (not from an agent)?
userContext.sessionContext will be null. Always handle this case:
javascript
const { parentConversationId } = userContext.sessionContext || {};
if (parentConversationId) {
// Called from a delegated agent — link to parent
} else {
// Called directly or from a single-level agent
}Related
- Session Context — Full sessionContext reference
- Multi-Agent Patterns — The
call_uapi_agenttool - Voice Agents — Creating voice agents with tools
- Channels — Connecting agents to Twilio, Slack, etc.