Voice Agents

Voice agents enable real-time, bidirectional voice conversations powered by Amazon Nova Sonic. Users speak naturally and the agent responds with lifelike speech — no text typing needed.

What Are Voice Agents?

Voice agents (also called "bidi agents") are a special type of agent on Universal API that use WebSocket connections for real-time audio streaming. Unlike text agents that use HTTP request/response, voice agents maintain a persistent connection for continuous, natural conversation.

Key capabilities:

🎙️ Real-time speech-to-speech conversations
🔧 Tool use during voice conversations (check availability, book appointments, etc.)
🌐 Embeddable on any website with one script tag
📞 Phone integration via Twilio (inbound and outbound calls)
🗣️ Multiple voice personalities (tiffany, matthew, amy)
⚡ Barge-in support (users can interrupt the agent mid-sentence)

Quick Start

1. Write the Agent Source Code

Voice agents define a create_bidi_agent() function (instead of create_agent() for text agents):

python

from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.models.nova_sonic import BidiNovaSonicModel

def create_bidi_agent():
    model = BidiNovaSonicModel(
        region="us-east-1",
        model_id="amazon.nova-sonic-v1:0",
        provider_config={
            "audio": {
                "input_sample_rate": 16000,
                "output_sample_rate": 24000,
                "voice": "tiffany"
            }
        }
    )

    system_prompt = """You are a friendly voice assistant.
Keep responses concise (1-3 sentences) for natural conversation.
Confirm important details by repeating them back."""

    return BidiAgent(model=model, system_prompt=system_prompt)

2. Deploy via API

Create the agent with agentType: "bidi":

bash

curl -X POST https://api.universalapi.co/agent/create \
  -H "Authorization: Bearer uapi_ut_your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "agentName": "my-voice-assistant",
    "description": "A friendly voice assistant",
    "agentType": "bidi",
    "sourceCode": "...",
    "visibility": "public"
  }'

Or use the create_agent MCP tool with agentType="bidi".

3. Connect Users

Choose one or more delivery methods:

Embed Widget — One script tag on any website
WebSocket — Custom browser integration
Twilio Phone — Inbound/outbound phone calls

Available Voices

Voice	Style	Best For
`tiffany`	Warm, professional female	Customer service, receptionists
`matthew`	Clear, friendly male	General assistants, support
`amy`	British English female	International audiences

Set the voice in the provider_config:

python

provider_config={
    "audio": {
        "input_sample_rate": 16000,
        "output_sample_rate": 24000,
        "voice": "tiffany"  # or "matthew" or "amy"
    }
}

Adding Tools

Voice agents can use tools just like text agents. Define tools with the @tool decorator:

python

from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.models.nova_sonic import BidiNovaSonicModel
from strands import tool

@tool
def check_availability(date: str, time_preference: str) -> dict:
    """Check appointment availability for a given date and time preference."""
    # Your implementation — call your booking API, database, etc.
    return {"available": True, "slots": ["9:00 AM", "2:00 PM", "4:30 PM"]}

@tool
def book_appointment(patient_name: str, date: str, time: str, service: str) -> dict:
    """Book an appointment for a patient."""
    return {"confirmed": True, "confirmation_number": "APT-12345"}

@tool
def get_office_info(question: str) -> dict:
    """Answer questions about the office (hours, location, services)."""
    return {"answer": "We're open Monday-Friday, 8 AM to 6 PM."}

def create_bidi_agent():
    model = BidiNovaSonicModel(
        region="us-east-1",
        model_id="amazon.nova-sonic-v1:0",
        provider_config={
            "audio": {
                "input_sample_rate": 16000,
                "output_sample_rate": 24000,
                "voice": "tiffany"
            }
        }
    )

    system_prompt = """You are a friendly AI receptionist for Bright Smile Dental.
You can check appointment availability, book appointments, and answer office questions.
Keep responses concise and confirm details by repeating them back."""

    return BidiAgent(
        model=model,
        system_prompt=system_prompt,
        tools=[check_availability, book_appointment, get_office_info]
    )

Initial Prompt (Speak-First Greeting)

By default, Nova Sonic uses Voice Activity Detection (VAD) to determine when to speak — meaning the agent waits for the caller to talk first. To make the agent speak first (greet the caller immediately), use the initialPrompt field.

The initialPrompt is a text message injected into the model at connection start, triggering it to generate speech before any caller audio arrives.

Setting initialPrompt on the Agent

bash

curl -X POST https://api.universalapi.co/agent/create \
  -H "Authorization: Bearer uapi_ut_your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "agentName": "my-receptionist",
    "agentType": "bidi",
    "sourceCode": "...",
    "initialPrompt": "A caller just connected to your phone line. Greet them with your standard opening."
  }'

Or update an existing agent:

bash

curl -X PUT https://api.universalapi.co/agent/update \
  -H "Authorization: Bearer uapi_ut_your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "your-agent-id",
    "initialPrompt": "A caller just connected. Greet them warmly and ask how you can help."
  }'

Channel-Level Override

You can also set initialPrompt on a channel, which overrides the agent-level value. This is useful for the same voice agent serving different phone lines with different greetings:

bash

curl -X PUT https://api.universalapi.co/channels/{channelId} \
  -H "Authorization: Bearer uapi_ut_your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "initialPrompt": "A customer called the support line. Greet them and ask for their account number."
  }'

Priority: Channel initialPrompt > Agent initialPrompt

Tips for Good Initial Prompts

Don't write the greeting itself — write an instruction that tells the agent to greet. The model reads its system prompt and generates an appropriate greeting.
Example: "A caller just connected. Greet them with your standard opening." — not "Hello! Welcome to our office."
Keep it short — the model already has its full system prompt for context.

The easiest way to add a voice agent to your website. One script tag creates a floating voice chat button:

html

<script src="https://cdn.universalapi.co/embed/v1.js"
  data-text-agent="{agentId}"
  data-token="{emb_pk_live_xxx}"
  data-color="#6366f1"
  data-greeting="Hi! How can I help you today?"
  data-position="bottom-right">
</script>

Attribute	Required	Default	Description
`data-text-agent`	✅	—	Agent UUID
`data-token`	✅	—	Embed public key (`emb_pk_live_xxx`)
`data-position`	❌	`bottom-right`	`bottom-right` or `bottom-left`
`data-color`	❌	`#6366f1`	Brand color (hex)
`data-greeting`	❌	—	Custom greeting text shown before conversation starts
`data-trigger`	❌	—	CSS selector for a custom trigger element (hides the default floating button)

How It Works

User clicks the floating button (or your custom trigger)
Browser requests microphone permission
WebSocket connection opens to voice.api.universalapi.co
Audio streams bidirectionally — user speaks, agent responds in real-time
Widget uses Shadow DOM for complete style isolation from your site

WebSocket Integration

For custom browser integrations with full control over the UI:

Endpoint:

wss://voice.api.universalapi.co/ws/{agentId}?token=uapi_ut_xxx

Audio Format:

Send: PCM 16kHz mono, 16-bit signed little-endian
Receive: PCM 24kHz mono, 16-bit signed little-endian

JavaScript Example:

javascript

// Request microphone access
const stream = await navigator.mediaDevices.getUserMedia({
  audio: {
    sampleRate: 16000,
    channelCount: 1,
    echoCancellation: true,
    noiseSuppression: true
  }
});

// Connect to voice agent
const ws = new WebSocket(
  `wss://voice.api.universalapi.co/ws/${agentId}?token=${token}`
);
ws.binaryType = "arraybuffer";

// Send audio frames from AudioWorklet or MediaRecorder
// Receive audio frames and play via AudioContext
ws.onmessage = (event) => {
  if (event.data instanceof ArrayBuffer) {
    // Play received audio through speakers
    playAudioBuffer(event.data);
  }
};

Twilio Phone Integration

Connect your voice agent to a phone number for real phone calls.

Inbound Calls (Customers Call You)

Create a Twilio voice channel:

bash

curl -X POST https://api.universalapi.co/channels \
  -H "Authorization: Bearer uapi_ut_your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "office-phone",
    "platform": "twilio-voice",
    "agentId": "{your-voice-agent-id}",
    "platformConfig": {
      "accountSid": "ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
      "authToken": "your_twilio_auth_token",
      "phoneNumber": "+12125551234"
    }
  }'

Configure Twilio: In the Twilio Console, set your phone number's Voice webhook URL to the channel's webhookUrl returned in the response.
Test it: Call your Twilio phone number — the voice agent answers!

Outbound Calls (Agent Calls Customers)

bash

curl -X POST https://api.universalapi.co/channels/{channelId}/call \
  -H "Authorization: Bearer uapi_ut_your_token" \
  -H "Content-Type: application/json" \
  -d '{"to": "+12125559876"}'

The agent initiates the call and begins the voice conversation when the recipient answers.

Billing

Voice agents cost approximately 50 credits/minute while connected
Billing is per-second (no rounding up to full minutes)
2 credit minimum per session
Check your balance: GET /user/credits

Best Practices

System Prompt Tips for Voice

Keep it concise — Voice users expect quick responses. Aim for 1-3 sentences per turn.
Use confirmation patterns — Repeat back important details (dates, names, phone numbers).
Handle interruptions gracefully — Nova Sonic supports barge-in. Design prompts that work when cut short.
Avoid long lists — Limit to 3-4 items and offer to continue.
Use natural filler phrases — "Let me check on that" or "One moment" while processing tool calls.
Be conversational — Avoid robotic or overly formal language.

Example System Prompt

You are a friendly AI receptionist for Bright Smile Dental in Denver.

Your capabilities:
- Check appointment availability
- Book appointments
- Answer questions about services, hours, and location

Guidelines:
- Keep responses to 1-3 sentences
- Always confirm appointment details before booking
- If unsure about something, offer to transfer to a human
- Be warm and professional

Embed Token Management

The embed widget requires an embed token (emb_pk_live_xxx) to authenticate. Embed tokens are domain-restricted publishable keys — safe to include in client-side HTML.

Create an Embed Token

bash

curl -X POST https://api.universalapi.co/embed/create \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "YOUR_AGENT_ID",
    "allowedDomains": ["yourdomain.com", "localhost"],
    "rateLimitPerDay": 1000,
    "rateLimitPerIp": 10,
    "greeting": "Hi! Click to start a voice conversation.",
    "color": "#6366f1",
    "position": "bottom-right"
  }'

Parameters:

Parameter	Type	Required	Description
`agentId`	string	✅	Voice agent UUID
`allowedDomains`	string[]	No	Domains where widget can load (default: `["localhost"]`). Use `["*"]` for any domain.
`rateLimitPerDay`	number	No	Max calls/day across all users (default: 1000)
`rateLimitPerIp`	number	No	Max calls per IP per day (default: 10)
`greeting`	string	No	Initial greeting shown in widget
`color`	string	No	Hex color for widget button (default: `#6366f1`)
`position`	string	No	`bottom-right` or `bottom-left`
`minutesLimit`	number	No	Max minutes per call
`creditLimit`	number	No	Max credits this token can consume

List Embed Tokens

bash

curl https://api.universalapi.co/embed/list \
  -H "Authorization: Bearer YOUR_TOKEN"

Update an Embed Token

bash

curl -X PUT https://api.universalapi.co/embed/EMBED_TOKEN_ID \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"allowedDomains": ["newdomain.com"], "rateLimitPerIp": 20}'

Revoke an Embed Token

bash

curl -X DELETE https://api.universalapi.co/embed/EMBED_TOKEN_ID \
  -H "Authorization: Bearer YOUR_TOKEN"

This endpoint is called by the widget script itself — no user auth needed, just the embed token:

bash

curl "https://api.universalapi.co/embed/config?token=emb_pk_live_xxx"

Returns the widget configuration (agentId, greeting, color, position) if the Origin header matches an allowed domain.

API Reference

Endpoint	Method	Description
`/agent/create`	POST	Create voice agent (set `agentType: "bidi"`)
`wss://voice.api.universalapi.co/ws/{agentId}`	WebSocket	Browser voice chat
`wss://voice.api.universalapi.co/ws/twilio/{agentId}`	WebSocket	Twilio phone integration
`/channels`	POST	Create Twilio voice channel
`/channels/{channelId}/call`	POST	Make outbound phone call
`/embed/create`	POST	Create embed token
`/embed/list`	GET	List your embed tokens
`/embed/{id}`	PUT	Update embed token
`/embed/{id}`	DELETE	Revoke embed token
`/embed/config`	GET	Get widget config (public)
`https://voice.api.universalapi.co/health`	GET	Voice runtime health check

Creating Agents — General agent creation guide
Streaming — Text agent streaming responses
Blog: AI Agent Outbound Phone Calls with Twilio — Step-by-step Twilio setup tutorial

Voice Agents

What Are Voice Agents?

Quick Start

1. Write the Agent Source Code

2. Deploy via API

3. Connect Users

Available Voices

Adding Tools

Initial Prompt (Speak-First Greeting)

Setting initialPrompt on the Agent

Channel-Level Override

Tips for Good Initial Prompts

Embed Widget

Widget Parameters

How It Works

WebSocket Integration

Twilio Phone Integration

Inbound Calls (Customers Call You)

Outbound Calls (Agent Calls Customers)

Billing

Best Practices

System Prompt Tips for Voice

Example System Prompt

Embed Token Management

Create an Embed Token

List Embed Tokens

Update an Embed Token

Revoke an Embed Token

Get Widget Config (Public)

API Reference

Voice Agents ​

What Are Voice Agents? ​

Quick Start ​

1. Write the Agent Source Code ​

2. Deploy via API ​

3. Connect Users ​

Available Voices ​

Adding Tools ​

Initial Prompt (Speak-First Greeting) ​

Setting initialPrompt on the Agent ​

Channel-Level Override ​

Tips for Good Initial Prompts ​

Embed Widget ​

Widget Parameters ​

How It Works ​

WebSocket Integration ​

Twilio Phone Integration ​

Inbound Calls (Customers Call You) ​

Outbound Calls (Agent Calls Customers) ​

Billing ​

Best Practices ​

System Prompt Tips for Voice ​

Example System Prompt ​

Embed Token Management ​

Create an Embed Token ​

List Embed Tokens ​

Update an Embed Token ​

Revoke an Embed Token ​

Get Widget Config (Public) ​

API Reference ​

Related Resources ​

Voice Agents

What Are Voice Agents?

Quick Start

1. Write the Agent Source Code

2. Deploy via API

3. Connect Users

Available Voices

Adding Tools

Initial Prompt (Speak-First Greeting)

Setting initialPrompt on the Agent

Channel-Level Override

Tips for Good Initial Prompts

Embed Widget

Widget Parameters

How It Works

WebSocket Integration

Twilio Phone Integration

Inbound Calls (Customers Call You)

Outbound Calls (Agent Calls Customers)

Billing

Best Practices

System Prompt Tips for Voice

Example System Prompt

Embed Token Management

Create an Embed Token

List Embed Tokens

Update an Embed Token

Revoke an Embed Token

Get Widget Config (Public)

API Reference

Related Resources