/library / template-voice-agent
templateAI agents

Realtime voice agent (OpenAI Realtime API + Twilio)

A phone-callable AI voice agent. Twilio handles the PSTN connection, OpenAI Realtime API handles speech-to-speech, your business logic intercepts via function calls. Use to replace 'press 1 for sales' phone trees with something that actually understands callers.

use whenSmall business gets too many calls to staff a receptionist but every missed call is lost revenue. The agent handles common asks (hours, booking, quote requests) and warm-transfers the rest.

May 18, 20262,440 bytesvoicetwilioopenairealtimephone

[Agent Name] · Realtime Voice

A phone-callable AI voice agent. Customer dials your Twilio number, agent picks up, conversation streams through OpenAI's Realtime API, your business logic provides function calls for booking, lookups, and warm transfers.

Source of truth

Production runs as a long-lived WebSocket relay on Fly.io. Twilio Media Streams send raw audio in; OpenAI Realtime sends audio back; your function calls hit your business backend. Postgres holds call transcripts + outcomes.

Tech stack

Node 22 + TypeScript + Fastify with ws for the Twilio Media Streams WebSocket. OpenAI Realtime API (gpt-realtime-mini for cost, full Realtime for harder calls). Twilio Programmable Voice (<Stream> TwiML). Postgres for transcripts. Optional: ElevenLabs voice for non-OpenAI voice options.

Deploy

fly deploy. Twilio number's webhook points at https://yourdomain.com/twilio/incoming. Set Twilio <Stream> to wss://yourdomain.com/twilio/stream.

File map

  • src/twilio/incoming.ts HTTP webhook returning TwiML that opens the stream
  • src/twilio/stream.ts WebSocket: Twilio audio in <-> OpenAI audio out
  • src/realtime/session.ts OpenAI Realtime session setup + event handlers
  • src/functions/ business logic exposed as Realtime function calls (lookup_hours, book_appointment, transfer_to_human)
  • src/db/ transcripts, call_outcomes tables
  • src/lib/audio.ts mulaw <-> PCM16 conversion (Twilio is mulaw, OpenAI wants PCM16)
  • prompts/system.md agent personality + escalation rules

.env keys

  • OPENAI_API_KEY
  • TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN
  • TWILIO_PHONE_NUMBER your bought number
  • BUSINESS_API_BASE your booking + lookup backend
  • DATABASE_URL
  • HUMAN_TRANSFER_NUMBER where to warm-transfer when agent escalates

Hard rules

  • Audio formats: Twilio sends 8kHz mulaw; OpenAI Realtime wants PCM16 (24kHz). Convert in both directions. There are off-by-one bugs here that ruin audio quality; test with a real phone call, not a recorded clip.
  • The agent's first turn MUST greet within 500ms or callers hang up. Keep system prompt short and instructions direct.
  • Function calls return JSON. Keep them small. The agent has to talk while waiting; don't make functions take 3 seconds.
  • Always have a human-transfer fallback. Some calls are emergencies. The agent must know its limits.
  • Recordings are sensitive. Comply with two-party consent states (announce recording in the greeting if any of your callers are in IL, CA, FL, MA, MD, MT, NH, PA, WA).
  • Cost: Realtime API is per-second-of-audio expensive. Cap call duration. Disconnect dormant calls.

Recent significant changes

  • 2026-05-18: Scaffolded. Locked: OpenAI Realtime over Deepgram+ElevenLabs pipeline (latency wins), Twilio over Vonage (better docs), Fly over AWS Lambda (long-lived WebSockets).

Next session: start here

  1. Buy a Twilio number. Set webhook to /twilio/incoming (use ngrok in dev).
  2. Wire OPENAI_API_KEY. Test the WebSocket relay with a real phone call.
  3. Implement first function call (e.g. lookup_business_hours). Make sure the agent uses it.
  4. Test escalation: ask something hard, confirm the warm transfer fires.
  5. Record 10 real test calls before going live. Listen to all of them.

Get the next CLAUDE.md in your inbox.

One new template every week, plus occasional case studies.