
# [Agent Name] · Realtime Voice

A phone-callable AI voice agent. Customer dials your Twilio number, agent picks up, conversation streams through OpenAI's Realtime API, your business logic provides function calls for booking, lookups, and warm transfers.

## Source of truth
Production runs as a long-lived WebSocket relay on Fly.io. Twilio Media Streams send raw audio in; OpenAI Realtime sends audio back; your function calls hit your business backend. Postgres holds call transcripts + outcomes.

## Tech stack
Node 22 + TypeScript + Fastify with `ws` for the Twilio Media Streams WebSocket. OpenAI Realtime API (`gpt-realtime-mini` for cost, full Realtime for harder calls). Twilio Programmable Voice (`<Stream>` TwiML). Postgres for transcripts. Optional: ElevenLabs voice for non-OpenAI voice options.

## Deploy
`fly deploy`. Twilio number's webhook points at `https://yourdomain.com/twilio/incoming`. Set Twilio `<Stream>` to `wss://yourdomain.com/twilio/stream`.

## File map
- `src/twilio/incoming.ts` HTTP webhook returning TwiML that opens the stream
- `src/twilio/stream.ts` WebSocket: Twilio audio in <-> OpenAI audio out
- `src/realtime/session.ts` OpenAI Realtime session setup + event handlers
- `src/functions/` business logic exposed as Realtime function calls (lookup_hours, book_appointment, transfer_to_human)
- `src/db/` transcripts, call_outcomes tables
- `src/lib/audio.ts` mulaw <-> PCM16 conversion (Twilio is mulaw, OpenAI wants PCM16)
- `prompts/system.md` agent personality + escalation rules

## .env keys
- `OPENAI_API_KEY`
- `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`
- `TWILIO_PHONE_NUMBER` your bought number
- `BUSINESS_API_BASE` your booking + lookup backend
- `DATABASE_URL`
- `HUMAN_TRANSFER_NUMBER` where to warm-transfer when agent escalates

## Hard rules
- Audio formats: Twilio sends 8kHz mulaw; OpenAI Realtime wants PCM16 (24kHz). Convert in both directions. There are off-by-one bugs here that ruin audio quality; test with a real phone call, not a recorded clip.
- The agent's first turn MUST greet within 500ms or callers hang up. Keep `system` prompt short and `instructions` direct.
- Function calls return JSON. Keep them small. The agent has to talk while waiting; don't make functions take 3 seconds.
- Always have a human-transfer fallback. Some calls are emergencies. The agent must know its limits.
- Recordings are sensitive. Comply with two-party consent states (announce recording in the greeting if any of your callers are in IL, CA, FL, MA, MD, MT, NH, PA, WA).
- Cost: Realtime API is per-second-of-audio expensive. Cap call duration. Disconnect dormant calls.

## Recent significant changes
- 2026-05-18: Scaffolded. Locked: OpenAI Realtime over Deepgram+ElevenLabs pipeline (latency wins), Twilio over Vonage (better docs), Fly over AWS Lambda (long-lived WebSockets).

## Next session: start here
1. Buy a Twilio number. Set webhook to `/twilio/incoming` (use ngrok in dev).
2. Wire `OPENAI_API_KEY`. Test the WebSocket relay with a real phone call.
3. Implement first function call (e.g. `lookup_business_hours`). Make sure the agent uses it.
4. Test escalation: ask something hard, confirm the warm transfer fires.
5. Record 10 real test calls before going live. Listen to all of them.
