If you are choosing between Retell and a self-hosted stack (Dograh, LiveKit, Pipecat, Vocode), you do not need more feedback. You need a cost model, real cost analysis, and a break-even point. This post provides you, monthly cost breakdown tables of Self-hosted vs Retell.

What this post will answer (real numbers, not vibes)
You will leave with a cost formula you can realistically budget against, and a clear view of when self-hosting saves money and when it doesn’t.
What we are comparing (Self-hosted: Dograh, LiveKit, Pipecat, Vocode vs Retell)
Two paths show up in almost every voice agent project. Both can work. The costs and risks are different.
Path A: Retell (hosted platform) Retell is a managed voice agent platform. You pay a per-minute platform fee, then usually pay other vendors for telephony and sometimes for models. Pricing starts at $0.07+/min for voice agents and $0.002+/msg for chat agents, but real costs can rise when you add STT/TTS/LLM and telephony items.
Path B: Self-hosted (framework + your infra + your vendors) You run your own orchestration and deployment, using components like:
- Dograh (open source, drag-and-drop builder, self-hostable)
- LiveKit (real-time infrastructure, you can host or use LiveKit Cloud)
- Pipecat / Vocode (open source frameworks for voice pipelines)
With self-hosting, you control data flow, model choice, hosting region, and tooling. You also own uptime and debugging.
I use dograh in my real estate business for inbound and outbound calls. It saves time for my sales team, and open source gives real customization options. That flexibility matters when you are not building a demo agent, but an agent that must follow strict rules.
The promise: a simple cost formula
You will get:
- A TCO formula for Retell and self-hosted
- Scenario tables for 500, 3,000, and 10,000 minutes/month
- A break-even chart approach you can recreate in a spreadsheet
- A place for dev time as a real cost
We also call out concurrency assumptions, because concurrency can change your bill fast.
Table of Contents
- What we used (hands-on notes, real deployments, and quick benchmarks)
- Myths to Ignore before you do the math
- Cost model (TCO): exact line items you must include
- Real cost scenarios: monthly totals (tables you can copy)
- Security and Compliance: when self-hosting is the simplest path
- Developer experience and build speed (time-to-first-call vs long-term control)
- Decision Guide: pick the cheapest + safest option for your team
- FAQ
What we used (hands-on notes, real deployments, and quick benchmarks)
The numbers here are taken from official websites and real deployments. Reflect realistic ranges, not artificial precision.
From deployments:
- Retell often lands around ~$0.12-$0.15/min excluding telephony in practice, depending on voice and model choices (deployment notes from our projects).
- Self-hosted can often be kept under ~$0.06/min excluding telephony at decent scale when using third-party STT/TTS/LLM plus your own hosting (deployment notes).
- Advanced self-hosting can go below ~$0.02/min, but it is tricky and not what we recommend early (deployment notes).
Published latency benchmarks show a real trade-off:
- Hosted pipeline (Deepgram + Grok + ElevenLabs): average end-to-end 800-1200ms, p95 1.5-2.0s
- Open-source pipeline (Whisper + Llama-4bit + Piper): average 900-1500ms, p95 2.0-3.0s
That difference can matter in real phone calls.
Myths to Ignore before you do the math
Skipping these myths will save you weeks of wasted work. They are common reasons teams pick the wrong approach.
Myth 1: "Self-hosted is always cheaper"
Self-hosted setups often win on running costs, but not always on total cost, especially at lower call volumes. Below around 100k calls, proprietary platforms can be cheaper overall. Self-hosted early on engineering effort and reliability work tend to dominate in future.
What drives the cost up for self-hosted:
- Building the call pipeline (streaming audio, interruptions, tools)
- Observability (traces, logs, recordings, redaction)
- On-call and incident response
- Load and concurrency testing
Self-hosting becomes cheaper when your usage is stable, you have repeatable infrastructure, and you are ready to own operations.
Myth 2: "Retell pricing is just per-minute"
Per-minute is only part of the bill. The rest is where budgets usually break.
Retell's pricing starts at $0.07+/min, but extra charges for STT, TTS, LLMs, and telephony can push real costs to $0.25-$0.33/min depending on configuration. Retell also lists branded outbound calls at $0.10 per outbound call on top of per-minute charges.
Retell provides a sample breakdown that illustrates stacking costs. In one example for 4,500 monthly calls using GPT-5, ElevenLabs/Cartesia voices, and custom telephony, the listed total monthly cost is $495.00, with a cost per minute shown as $0.110/min and TTS shown as $0.070/min.
Myth 3: "Compliance is solved if the vendor says so"
Certifications help, but they do not remove architecture risk. Voice agents handle recordings, transcripts, and PII.
A managed platform can add an extra hop in the data flow. That extra vendor in the path can create:
- Data residency conflicts
- Wider access surface area
- Harder deletion and retention guarantees
With self-hosting, you can keep more of the system inside your boundary. That reduces vendor exposure in regulated setups.
Glossary (key terms)
- Concurrency tiers (included concurrent calls): The maximum number of calls that can run at the same time under your plan. If you exceed it, you may throttle, queue, or pay more.
- SIP trunk (telephony/SIP): A provider connection that routes phone calls over the internet. It is how your agent sends/receives PSTN calls without being a telecom carrier.
- p95 first-response latency: The time it takes for the agent to produce the first audible response after the call starts, measured at the 95th percentile. p95 captures common worst-case delays.
- Call recording integrity: Whether recordings are complete, correctly linked to call IDs, and reliably stored. It affects debugging, compliance, and dispute handling.
- TCO (Total Cost of Ownership): The full cost including vendor fees, telephony, hosting, monitoring, and people time (build + maintenance + on-call).
Cost model (TCO): exact line items you must include
Below is the full checklist comparison of Retell and Self-hosted voice agent
Retell cost formula (per-minute + what is included vs not included)
Retell's cost is usually: Retell Monthly Cost = (Retell platform minutes x $/min) + telephony + add-ons + dev time
Common line items:
- Per-minute platform charges (starting at $0.07+/min)
- LLM usage (Ex: GPT 5 $0.04/minute either bundled, marked up, or BYO depending on setup)
- TTS usage (often a large component; Retell's example shows $0.070/min for ElevenLabs TTS
- STT usage
- Branded outbound calls: $0.10 per outbound call
- Telephony (phone numbers + inbound/outbound minutes; often Twilio)
From implementations: Retell tends to work quickly, with solid defaults and fewer configuration traps. If you need an MVP fast, that usually outweighs small unit-cost differences.
Self-hosted cost formula (hosting + inference + telephony + tooling + people time)
Self-hosting looks cheaper if you only compare compute. But voice agents are pipelines, not a single server.
Self-hosted Monthly Cost = hosting + model usage + telephony + tooling + dev/ops time
Typical line items:
Infrastructure
- CPU/GPU instances (inference or orchestration)
- Bandwidth and egress (The transfer of data out of a private network, database system, or cloud storage to an external location)
- Containers and deployment (Docker/Kubernetes)
- Load balancers, TURN/ICE if using WebRTC components
Models
- STT (cloud API or self-hosted)
- TTS (cloud API or self-hosted)
- LLM (cloud API or self-hosted)
Telephony
- Phone numbers (monthly)
- Inbound + outbound minutes
- SIP trunk fees
Tooling
- Logging, tracing, metrics
- Call recording storage
- Analytics and dashboards
- Evaluation tools (tests, call replay)
People time
- Initial build
- Maintenance
- On-call and incident response
GPU cost reference points help when you self-host heavier pieces. On-demand GPU pricing examples:
- AWS: A10 ~$1.21/hr, A100 ~$3.21/hr, H100 ~$4.20/hr
- GCP: T4 as low as ~$0.35/hr, L4 ~$0.67/hr on-demand zonal
Those numbers matter if you try to self-host STT/LLM/TTS at scale.
Hidden costs checklist (both sides)
These are the line items people forget. Each one becomes money or time.
- Telephony phone numbers and per-minute calling
- Call recording storage (and retention policies)
- Retries, dropped calls, and re-dials
- Prompt and tool debugging time
- QA time with real call scripts
- Load testing for concurrency spikes
- Compliance reviews and vendor security questionnaires
- Outages and incident response
- Support delays (waiting days for an answer can block shipping)
Community reality check (from builder discussions): In the thread "Lost between LiveKit Cloud vs Vapi vs Retell...", builders estimate Retell around $275-$320/month at ~3,000 minutes, while LiveKit Cloud is $320-$350 + dev time (Full control, open source base). The theme is consistent: dev time decides early.
What changes the result (latency, concurrency, model choice, outbound dialing)
These are the biggest cost levers. If you optimize only one thing, optimize one of these.
- Minutes/month: the obvious multiplier
- Concurrency: more simultaneous calls means more infra and often higher tiers
- Model choice: LLM and TTS can dominate cost
- Outbound vs inbound: outbound adds dialing costs and sometimes per-call fees (Retell branded outbound adds $0.10 per outbound call)
- Latency targets: lower latency may require more expensive models or co-located servers
- Region placement: placing servers near users can reduce first-response latency (self-hosting gives you more control here)
Real cost scenarios: monthly totals (tables you can copy)
These tables are meant to be copied into a sheet. They include dev time because it is part of TCO.
Scenario table: 500 minutes/month (starter MVP)
At this size, build speed dominates. The per-minute delta is usually not the deciding factor.
Assumptions
- Excluding telephony in the base per-minute ranges (telephony added separately)
- Retell real-world platform range: $0.12-$0.15/min excl. telephony (deployment notes)
- Self-hosted run-cost range: $0.04-$0.06/min excl. telephony (deployment notes)
- Dev time is a one-time build cost spread over a month for comparison
Telephony reference (typical ranges)
- Twilio US inbound: $0.0085-$0.022/min, outbound: $0.013-$0.030/min
- Telnyx: platform charge $0.002/min plus trunking, often landing ~$0.003-$0.006/min domestic depending on route.
Monthly cost table (500 minutes)
My take at 500 minutes: Pick Retell unless you have a hard requirement for data control or you already have the infrastructure and skills to run this reliably. Self-hosting can be the right call, but it is rarely the cheapest path for an MVP.
Scenario table: 3,000 minutes/month (small business / agency client)
At this size, per-minute fees start to matter. Reliability starts to matter too.
Assumptions
- Retell: $0.12-$0.15/min excl. telephony (deployment notes)
- Self-hosted: $0.04-$0.06/min excl. telephony (deployment notes)
- Telephony: add separately
- Concurrency: assume 3-10 concurrent calls during peaks
Monthly cost table (3,000 minutes)
From deployments: This is where teams start feeling the Retell bill. It is also where self-hosted starts to look attractive if you can reuse the same platform across clients. If you are an agency, self-hosting becomes a competitive advantage because you stop paying platform markup on every client minute.
Scenario table: 10,000 minutes/month (scale point where break-even often shows)
At this size, platform markup hurts. Self-hosting often wins on run-cost if you do it properly.
We show two self-hosted variants:
- A: BYO cloud STT/TTS/LLM (simpler, common)
- B: Advanced self-hosted models (cheaper per minute, but non-trivial)
Assumptions
- Retell: $0.12-$0.15/min excl. telephony (deployment notes)
- Self-hosted A: $0.03-$0.06/min excl. telephony (deployment notes)
- Self-hosted B: <$0.02/min excl. telephony but higher complexity (deployment notes)
- Monthly cost table (10,000 minutes)
Warning on variant B: you can drive unit cost below $0.02/min, but it is not beginner-friendly. You need expertise in model hosting, GPU scheduling, quantization, and incident handling. Most teams should start with open source orchestration (like Dograh) while still using third-party STT/TTS/LLM, then migrate pieces later.
Break-even chart: when does self-hosted beat Retell (and when it doesn't)
Break-even depends on dev time and ops maturity. The cheapest option changes as you scale.
A clean way to model it:
- Retell TCO(M) = (M x R) + T + D
- Self-hosted TCO(M) = (M x S) + H + T + D2
Where:
- M = minutes/month
- R = Retell $/min (e.g., 0.12-0.15 excl. telephony from deployments)
- S = self-hosted $/min (e.g., 0.04-0.06 excl. telephony from deployments)
- H = hosting + tooling (fixed-ish monthly)
- T = telephony (both sides, often similar)
- D / D2 = dev + maintenance time cost
Assumptions box you can swap
- Retell R = 0.13
- Self-hosted S = 0.05
- Hosting H = $800/month
- Dev (monthly equivalent) D2 - D = +$1,500/month at first, then declines as you reuse infra
Break-even minutes (rough): If (R - S) = $0.08/min, then $800 hosting breaks even at 10,000 minutes on run-cost alone. But if self-hosting costs you $1,500/month more in people time early, break-even shifts to ~28,750 minutes until you reuse the platform.
That is why self-hosted can be cheaper per minute and still be the wrong decision in month one.
Vapi vs Open Source Voice Agents: Which to Choose?
Discover Vapi vs Open-Source voice agents like Dograh, Pipecat, LiveKit, and Vocode to decide the best option for cost, control, and scale.
Reliability and voice quality: what matters in real calls
Reliability is what users notice. A voice agent that sounds smart but responds late loses trust fast.
Latency and first response time (why 3-4 seconds kills trust)
A few seconds of silence at the start of a call feels broken. In real deployments, a 3-4 second lag at the beginning can destroy credibility, especially if it persists.
Published benchmarks back the idea that hosted pipelines can be faster at p95:
- Hosted: p95 end-to-end 1.5-2.0s
- Open-source: p95 end-to-end 2.0-3.0s
That gap is not always your framework's fault. It is often:
- region placement
- model cold starts
- network routing
- audio chunk sizes
What is first-response latency (vs ongoing turn latency) in live calls?
First-response latency is the time from "call connected" to the first meaningful audio response. It includes time to start streaming, detect speech, transcribe, generate tokens, and synthesize speech.
Ongoing turn latency is what happens after the call settles. That is the back-and-forth delay during the conversation.
First-response latency matters more for perceived quality. If the agent starts strong, users forgive small delays later.
A practical fix we have used in self-hosted deployments is co-locating servers closer to end users. This can reduce first-response latency, and it is a real advantage when your users are concentrated in specific regions.
Retell vs self-hosted performance trade-offs (300ms edge vs more control)
Retell often feels snappier out of the box. In practice, Retell can have about a ~300ms latency advantage versus some other platforms, which users perceive as more natural (expert guidance).
Why Retell feels faster early:
- pre-optimized voice settings
- curated defaults
- fewer knobs to misconfigure
Why self-hosted can win long-term:
- you can pick any model and tune aggressively
- you can place servers near your users
- you can remove vendor hops
Self-hosting is not automatically faster. But with the right architecture, it can match hosted latency while giving you more control.
Uptime, dropped calls, and incident ownership (who gets paged?)
Someone always gets paged. The only question is whether it is your team or a vendor.
Reality check:
- Hosted platforms still depend on major cloud providers.
- Self-hosted depends on the same clouds, but you own the runbooks.
Ops checklist (copy this into your project doc):
- Health checks for STT/TTS/LLM providers
- Retry logic and fallbacks (voice downgrade, model fallback)
- Call recording integrity checks (missing segments, storage failures)
- Post-call logs that join: call ID -> transcript -> tool calls -> outcome
- Alerting on dropped calls and p95 first-response latency
- A plan for partial outages (TTS degraded, LLM slow)
Synthflow vs Open Source Voice Agents: Which to Choose ?
Explore Synthflow vs Open-Source voice agents like Dograh, Pipecat, LiveKit, and Vocode to find the best option for cost, control, and scalability.
Security and compliance: when self-hosting is the simplest path
Compliance is often a data-flow problem, not a checkbox problem. Where data travels is usually the hard part.
Data flow and the "extra hop" problem
Each additional vendor in your audio path is another risk surface. That is true even if vendors have strong security programs.
In many proprietary setups, audio, transcripts, and recordings flow through:
- telephony provider
- voice platform vendor
- model vendors (STT/TTS/LLM)
- your application
Self-hosting can reduce hops. At minimum, it lets you control what you store, where, and how long.
What is the "extra hop" problem in voice agent data flow?
The "extra hop" problem means your call data passes through an additional third party before it reaches your systems. That third party may see audio streams, transcripts, tool payloads, and metadata.
This matters because it can create:
- extra legal agreements (DPAs)
- extra breach exposure
- extra retention risk
- data residency conflicts
Self-hosting does not remove all vendors. But it can keep orchestration and storage inside your environment, which often simplifies audits.
Healthcare example (AI voice agent for healthcare)
Healthcare voice flows can include protected health information. That triggers strict access control, logging, and retention rules.
A practical architecture pattern we have used in regulated environments:
- Self-host orchestration and storage
- Encrypt recordings at rest
- Restrict staff access with least privilege
- Implement retention and deletion policies
- Log every access to transcripts and recordings
Even if a platform claims compliance, the extra hop can make internal approval harder. With self-hosting, the data resides on your servers, which often makes privacy controls easier to document and enforce.
Compliance cost line items (what it costs to implement either way)
Compliance has a real cost. Include it in your TCO.
- Security review time (internal + vendor)
- Legal/DPA review time
- Audit logging storage
- Key management (KMS, rotation)
- Access control (RBAC, SSO)
- Redaction pipeline (remove PII from logs)
- Incident response plan and drills
Self-hosted can reduce vendor scope, but increases your implementation work. Hosted can reduce engineering work, but increases vendor management and data-flow complexity.
Retell AI vs Dograh AI : Which is Best For You in 2025 ?
Retell AI vs Dograh AI in 2025: a clear comparison of costs, features, and use cases to help you choose the right voice AI platform.
Developer experience and build speed (time-to-first-call vs long-term control)
Speed to first call is a business advantage. Long-term control is also a business advantage. You must choose what matters now.
Learning curve reality: Retell is fast; self-hosted is flexible
Retell is usually faster to ship. You can often get a basic agent running in a few hours because defaults are intuitive and not hidden behind deep configuration (expert guidance).
More configurable platforms can take longer. This is not because they are worse. It is because they expose more choices, and choices require decisions and testing.
In my own work, I found Pipecat pretty good. But for quick experiments, using the open source OpenAI Agents JS SDK was fast. I had Twilio plus a model working in about 30 minutes. That is the time-to-first-call advantage of a minimal stack.
How to build an AI voice agent (minimal stack for each option)
You can ship with either approach. The difference is who owns the moving parts.
A) Retell setup (fast path)
- Create an agent and define the prompt and tools
- Connect a phone number and call routing
- Add webhooks for your backend actions
- Test with real call scripts and monitor logs
Pricing and add-ons should be checked directly on Retell's pricing page, because the real bill depends on model and voice choices.
B) Self-hosted setup (Dograh / LiveKit / Pipecat / Vocode)
- Choose orchestration: Start with Dograh if you want a visual builder plus self-hosting options (Dograh). Use LiveKit/Pipecat/Vocode if you want framework-level control
- Pick STT/TTS/LLM vendors (or self-host later)
- Connect telephony: Twilio, Plivo, or Telnyx, depending on rates and routing needs
- Deploy and monitor: logs, metrics, recordings, dashboards
- Load test concurrency and failure cases
Dograh is designed to reduce the self-hosted glue work. It supports bring-your-own telephony and models, and it is committed to open source.
Open source starting points (voice agent GitHub links to look for)
Open source saves cost only if the repo is alive. Use this checklist before you build on it.
What to look for in a repo:
- Recent commits and active issues
- Streaming audio support
- Telephony adapters (Twilio/SIP)
- Examples that run end-to-end
- Observability hooks (logs/traces)
- Testing or eval harness
Useful searches:
- "voice agent github"
- "ai calling agent github"
- "open source voice ai"
Also, this Dograh demo tutorial is a good reference for how quickly a workflow-based voice agent can be started.
Support and debugging reality (Discord support speed, ticket lag)
Support speed affects delivery dates. It is part of cost, even if it is not on an invoice.
From experience:
- Discord-based support can be slow for many platforms.
- Some tickets can sit for days or weeks.
- Self-hosted shifts support to your team and the community.
A practical rule: if your business needs strict response times, you need either enterprise support or internal ownership. Community-only support is rarely enough for mission-critical call flows.

Decision guide: pick the cheapest + safest option for your team
The cheapest option depends on who they are and what they are building. This should be used as a map.
Persona map: solo builder vs 1 engineer vs enterprise
This framing is consistent across most serious cost breakdowns. It is also a practical way to choose.
Solo builder / non-technical team
- Best fit: Retell
- Reason: fastest time-to-first-call, fewer infrastructure decisions
One engineer (startup or agency)
- Best fit: start with Retell or Dograh + hosted model vendors
- Reason: ship quickly, then migrate pieces when minutes grow
- If you need flexibility early, Dograh gives open source control without starting from scratch
Enterprise / regulated
- Best fit: self-hosted with Dograh/LiveKit-style control
- Reason: data residency, audit needs, reduced extra-hop risk
- You will still use vendors, but you can reduce scope

Prerequisites (so the cost math is fair)
You cannot compare platforms fairly without these inputs. Collect them before you decide.
- Target regions (US, EU, or both)
- Expected minutes/month and peak concurrency
- Inbound/outbound split
- Recording and retention needs
- Required integrations (CRM, calendar, ticketing)
- A realistic hourly cost for engineering and operations
Where Dograh fits (and why we built it)
Self-hosting should not mean building everything from scratch. That is why Dograh exists.
Dograh is an open source platform for voice bots and AI calling agents:
- Drag-and-drop workflow builder
- Build workflows in plain English
- Bring-your-own telephony, STT, LLM, and TTS
- Cloud-hosted or self-hosted
- Multi-agent workflows to reduce hallucinations and enforce decision paths
- A testing suite in progress (Looptalk) to stress test agents with simulated personas
If you want an open source alternative to hosted platforms, but you still want a fast setup, start with Dograh. I am looking for beta users and contributors because the fastest way to make this stronger is feedback from real call flows.
Final takeaway
Hosted platforms are the right move for most teams at the start, and Retell is one of the fastest ways to get to production.
If you have compliance constraints, need tighter control of data flow, or you are paying for enough minutes that platform markup is becoming a line-item you feel, move to self-hosting. At that point, the operational burden is worth it because you stop paying a platform tax on every call.
Use the formulas and tables above, plug in your numbers, and decide with a spreadsheet instead of gut feel.
Related Blog
- Discover the Self-Hosted Voice Agents vs Vapi : Real Cost Analysis
- A Practical Cost Comparison Self-Hosted Voice Agents vs Bland: Real Cost Analysis (100k+ Minute TCO)
- A Practical Cost Comparison Self-Hosted Voice Agents vs Bolna AI: Real Cost Analysis (DPDP-Safe TCO)
- Explore Voice AI for Law Firms: Why We Chose Quality Over Latency By Alejo Pijuan (Co-Founder & CEO @ Amplify Voice AI, AI Ethics Thought Leader, Expert Data Scientist, Previously senior data scientist at Nike.)
- See how 24/7 Virtual Receptionist Helps Small Firms Win More Clients by boosting responsiveness and improving customer engagement.
- Learn how From Copilots to Autopilots The Quiet Shift Toward AI Co-Workers By Prabakaran Murugaiah (Building AI Coworkers for Entreprises, Government and regulated industries.)
- Check out "An Year of Building Agents: My Workflow, AI Limits, Gaps In Voice AI and Self hosting" By Stephanie Hiewobea-Nyarko (AI Product Manager (Telus AI Factory), AI Coach, Educator and AI Consultancy)
FAQ's
1. Why does Retell feel cheaper at the beginning?
Retell reduces setup time, hides infrastructure complexity, and has optimized defaults. For MVPs and early launches, speed and reliability usually matter more than per-minute savings.
2. Is Retell pricing really just per minute?
No. Per-minute pricing is only part of the bill. STT, TTS, LLM usage, telephony, outbound call fees, and branded calls can significantly increase real-world costs.
3. Why does concurrency matter so much in voice AI costs?
Concurrency determines how many calls run at once. Higher concurrency increases infrastructure needs and can push you into higher pricing tiers. It’s one of the fastest ways for bills to spike unexpectedly.
4. What costs do teams usually miss when comparing voice platforms?
Common misses include engineering, on-call time, observability, call-recording storage, load testing for concurrency spikes, and compliance or vendor security reviews.
5. Why does latency matter more in voice than chat?
In voice, even a few seconds of silence feels broken. High first-response latency immediately damages trust, especially at the start of a call. Users forgive small delays later, but not a slow start.
6. Can self-hosted voice agents really go below $0.02 per minute?
Yes, but only with advanced setups. This usually requires self-hosting models, careful GPU scheduling, quantization, and strong ops discipline. It’s not recommended early because complexity and failure risk are high.
Was this article helpful?



