Telecom platforms are increasingly expected to support real-time AI interactions, yet most implementations rely on CPaaS abstractions that hide the underlying call mechanics.
This session presents a practical implementation of a WhatsApp voice integration built on SIP, using Kamailio as the core.
We start with the signaling and media layer:
- Handling WhatsApp voice calls via Meta’s gateways
- Managing RTP streams and media flow
- Implementing routing logic, authentication, and CDR generation in Kamailio
On top of this, we introduce an open source AI voice service integrated as a SIP endpoint:
- Real-time RTP stream capture
- Streaming audio to STT services
- Processing with an LLM
- Returning synthesized speech (TTS) into the live call
We will discuss different service examples and also present learnings from real-world usage of the service.


