If you are choosing between Retell and a self-hosted stack (Dograh, LiveKit, Pipecat, Vocode), you do not need more feedback. You need a cost model, real cost analysis, and a break-even point. This post provides you, monthly cost breakdown tables of Self-hosted vs Retell.

Retell vs Self-Hosted Voice Agent
Retell vs Self-Hosted Voice Agent
dograh oss

What this post will answer (real numbers, not vibes)

You will leave with a cost formula you can realistically budget against, and a clear view of when self-hosting saves money and when it doesn’t.

What we are comparing (Self-hosted: Dograh, LiveKit, Pipecat, Vocode vs Retell)

Two paths show up in almost every voice agent project. Both can work. The costs and risks are different.

Path A: Retell (hosted platform) Retell is a managed voice agent platform. You pay a per-minute platform fee, then usually pay other vendors for telephony and sometimes for models. Pricing starts at $0.07+/min for voice agents and $0.002+/msg for chat agents, but real costs can rise when you add STT/TTS/LLM and telephony items.

Path B: Self-hosted (framework + your infra + your vendors) You run your own orchestration and deployment, using components like:

  • Dograh (open source, drag-and-drop builder, self-hostable)
  • LiveKit (real-time infrastructure, you can host or use LiveKit Cloud)
  • Pipecat / Vocode (open source frameworks for voice pipelines)

With self-hosting, you control data flow, model choice, hosting region, and tooling. You also own uptime and debugging.

I use dograh in my real estate business for inbound and outbound calls. It saves time for my sales team, and open source gives real customization options. That flexibility matters when you are not building a demo agent, but an agent that must follow strict rules.

The promise: a simple cost formula

You will get:

  • A TCO formula for Retell and self-hosted
  • Scenario tables for 500, 3,000, and 10,000 minutes/month
  • A break-even chart approach you can recreate in a spreadsheet
  • A place for dev time as a real cost

We also call out concurrency assumptions, because concurrency can change your bill fast.

What we used (hands-on notes, real deployments, and quick benchmarks)

The numbers here are taken from official websites and real deployments. Reflect realistic ranges, not artificial precision.

From deployments:

  • Retell often lands around ~$0.12-$0.15/min excluding telephony in practice, depending on voice and model choices (deployment notes from our projects).
  • Self-hosted can often be kept under ~$0.06/min excluding telephony at decent scale when using third-party STT/TTS/LLM plus your own hosting (deployment notes).
  • Advanced self-hosting can go below ~$0.02/min, but it is tricky and not what we recommend early (deployment notes).

Published latency benchmarks show a real trade-off:

  • Hosted pipeline (Deepgram + Grok + ElevenLabs): average end-to-end 800-1200ms, p95 1.5-2.0s
  • Open-source pipeline (Whisper + Llama-4bit + Piper): average 900-1500ms, p95 2.0-3.0s

That difference can matter in real phone calls.

Dograh Slack Link

Myths to Ignore before you do the math

Skipping these myths will save you weeks of wasted work. They are common reasons teams pick the wrong approach.

Myth 1: "Self-hosted is always cheaper"

Self-hosted setups often win on running costs, but not always on total cost, especially at lower call volumes. Below around 100k calls, proprietary platforms can be cheaper overall. Self-hosted early on engineering effort and reliability work tend to dominate in future.

What drives the cost up for self-hosted:

  • Building the call pipeline (streaming audio, interruptions, tools)
  • Observability (traces, logs, recordings, redaction)
  • On-call and incident response
  • Load and concurrency testing

Self-hosting becomes cheaper when your usage is stable, you have repeatable infrastructure, and you are ready to own operations.

Myth 2: "Retell pricing is just per-minute"

Per-minute is only part of the bill. The rest is where budgets usually break.

Retell's pricing starts at $0.07+/min, but extra charges for STT, TTS, LLMs, and telephony can push real costs to $0.25-$0.33/min depending on configuration. Retell also lists branded outbound calls at $0.10 per outbound call on top of per-minute charges.

Retell provides a sample breakdown that illustrates stacking costs. In one example for 4,500 monthly calls using GPT-5, ElevenLabs/Cartesia voices, and custom telephony, the listed total monthly cost is $495.00, with a cost per minute shown as $0.110/min and TTS shown as $0.070/min.

Myth 3: "Compliance is solved if the vendor says so"

Certifications help, but they do not remove architecture risk. Voice agents handle recordings, transcripts, and PII.

A managed platform can add an extra hop in the data flow. That extra vendor in the path can create:

  • Data residency conflicts
  • Wider access surface area
  • Harder deletion and retention guarantees

With self-hosting, you can keep more of the system inside your boundary. That reduces vendor exposure in regulated setups.

Glossary (key terms)

Cost model (TCO): exact line items you must include

Below is the full checklist comparison of Retell and Self-hosted voice agent

Retell cost formula (per-minute + what is included vs not included)

Retell's cost is usually: Retell Monthly Cost = (Retell platform minutes x $/min) + telephony + add-ons + dev time

Common line items:

  • Per-minute platform charges (starting at $0.07+/min)
  • LLM usage (Ex: GPT 5 $0.04/minute either bundled, marked up, or BYO depending on setup)
  • TTS usage (often a large component; Retell's example shows $0.070/min for ElevenLabs TTS
  • STT usage
  • Branded outbound calls: $0.10 per outbound call
  • Telephony (phone numbers + inbound/outbound minutes; often Twilio)

From implementations: Retell tends to work quickly, with solid defaults and fewer configuration traps. If you need an MVP fast, that usually outweighs small unit-cost differences.

Self-hosted cost formula (hosting + inference + telephony + tooling + people time)

Self-hosting looks cheaper if you only compare compute. But voice agents are pipelines, not a single server.

Self-hosted Monthly Cost = hosting + model usage + telephony + tooling + dev/ops time

Typical line items:

Infrastructure

  • CPU/GPU instances (inference or orchestration)
  • Bandwidth and egress (The transfer of data out of a private network, database system, or cloud storage to an external location)
  • Containers and deployment (Docker/Kubernetes)
  • Load balancers, TURN/ICE if using WebRTC components

Models

  • STT (cloud API or self-hosted)
  • TTS (cloud API or self-hosted)
  • LLM (cloud API or self-hosted)

Telephony

  • Phone numbers (monthly)
  • Inbound + outbound minutes
  • SIP trunk fees

Tooling

  • Logging, tracing, metrics
  • Call recording storage
  • Analytics and dashboards
  • Evaluation tools (tests, call replay)

People time

  • Initial build
  • Maintenance
  • On-call and incident response

GPU cost reference points help when you self-host heavier pieces. On-demand GPU pricing examples:

  • AWS: A10 ~$1.21/hr, A100 ~$3.21/hr, H100 ~$4.20/hr
  • GCP: T4 as low as ~$0.35/hr, L4 ~$0.67/hr on-demand zonal

Those numbers matter if you try to self-host STT/LLM/TTS at scale.

Hidden costs checklist (both sides)

These are the line items people forget. Each one becomes money or time.

  • Telephony phone numbers and per-minute calling
  • Call recording storage (and retention policies)
  • Retries, dropped calls, and re-dials
  • Prompt and tool debugging time
  • QA time with real call scripts
  • Load testing for concurrency spikes
  • Compliance reviews and vendor security questionnaires
  • Outages and incident response
  • Support delays (waiting days for an answer can block shipping)

Community reality check (from builder discussions): In the thread "Lost between LiveKit Cloud vs Vapi vs Retell...", builders estimate Retell around $275-$320/month at ~3,000 minutes, while LiveKit Cloud is $320-$350 + dev time (Full control, open source base). The theme is consistent: dev time decides early.

What changes the result (latency, concurrency, model choice, outbound dialing)

These are the biggest cost levers. If you optimize only one thing, optimize one of these.

  • Minutes/month: the obvious multiplier
  • Concurrency: more simultaneous calls means more infra and often higher tiers
  • Model choice: LLM and TTS can dominate cost
  • Outbound vs inbound: outbound adds dialing costs and sometimes per-call fees (Retell branded outbound adds $0.10 per outbound call)
  • Latency targets: lower latency may require more expensive models or co-located servers
  • Region placement: placing servers near users can reduce first-response latency (self-hosting gives you more control here)
dograh oss

Real cost scenarios: monthly totals (tables you can copy)

These tables are meant to be copied into a sheet. They include dev time because it is part of TCO.

Scenario table: 500 minutes/month (starter MVP)

At this size, build speed dominates. The per-minute delta is usually not the deciding factor.

Assumptions

  • Excluding telephony in the base per-minute ranges (telephony added separately)
  • Retell real-world platform range: $0.12-$0.15/min excl. telephony (deployment notes)
  • Self-hosted run-cost range: $0.04-$0.06/min excl. telephony (deployment notes)
  • Dev time is a one-time build cost spread over a month for comparison

Telephony reference (typical ranges)

  • Twilio US inbound: $0.0085-$0.022/min, outbound: $0.013-$0.030/min
  • Telnyx: platform charge $0.002/min plus trunking, often landing ~$0.003-$0.006/min domestic depending on route.
Future of Work: 4x4x4x4 Model for Human-AI Collaboration | Prabakaran Murugaiah posted on the topic | LinkedIn
My future of work framing: the 4 x 4 x 4 x 4 idea When I look at where this is going, I use a simple mental model: 4 days a week 4 hours a day 4 shifts a day $4 an hour (Expected ai assistant cost) The future looks like this: Human workforce at $40/hour, supported by an AI assistant that costs about $4/hour. This is not a promise and not a pricing sheet. It’s a direction. The core idea is that AI co-workers will work faster, cover more hours (cover multiple shifts), and lower the cost of routine operations. As a result, businesses will redesign their workflows around this new reality. I recently had an insightful conversation with Pritesh Kumar on the future of AI transformation at work and across the workforce. Below are the top 10 insights. The full blog link is in the comments. Top 10 Insights on the Future of Work & Workforce 1. Work is shifting from roles to outcomes. 2. Copilots are transitional; autonomous AI workers are the end state. 3. AI replaces tasks, not entire roles. 4. Managers will become orchestrators of humans and AI. 5. Productivity will be measured by decision velocity. 6. Skills adjacency will matter more than deep specialization. 7. 24x7 digital labor + Human Assistance will redefine availability. 8. Organizations will flatten as coordination work disappears. 9. Competitive advantage will come from AI adoption speed. 10. AI will become a formal workforce category. Maayu AI and Maayu Government Solutions are deploying #DigitalHumans as autonomous #AIworkers that deliver outcomes, not just assistance. These #AIcoworkers operate 24×7, can read, write, speak, listen, and see simultaneously, and provide personalized, one-to-one support at scale across recruiting and workforce programs. Led by Michael T. , Maayu Government Solutions deploys AI Digital Human Coaches to support veterans, transitioning service members, and unemployed workers with personalized, one-to-one guidance at scale, available 24×7, without requiring a computer or smartphone.

Monthly cost table (500 minutes)

Line item

Retell (hosted)

Self-hosted (Dograh/LiveKit/Pipecat/Vocode)

Platform / orchestration

500 x $0.12-$0.15 = $60-$75

Included in your stack = $0 (software), but infra below

STT/TTS/LLM add-ons

Often bundled/marked up; can push higher

BYO vendor costs (varies widely)

Telephony minutes (example Twilio)

500 x $0.01-$0.03 = $5-$15

500 x $0.01-$0.03 = $5-$15

Hosting (CPU/GPU)

$0 (included)

$30-$150 (small setup + staging)

Observability + storage

$0-$50

$10-$60

Dev time (one-time, MVP)

4-12 hours

12-40 hours

Estimated month-1 total (excluding dev)

$65-$140+

$45-$275

Estimated month-1 total (including dev @ $100/hr)*

$465-$1,340+

$1,245-$4,275

My take at 500 minutes: Pick Retell unless you have a hard requirement for data control or you already have the infrastructure and skills to run this reliably. Self-hosting can be the right call, but it is rarely the cheapest path for an MVP.

Scenario table: 3,000 minutes/month (small business / agency client)

At this size, per-minute fees start to matter. Reliability starts to matter too.

Assumptions

  • Retell: $0.12-$0.15/min excl. telephony (deployment notes)
  • Self-hosted: $0.04-$0.06/min excl. telephony (deployment notes)
  • Telephony: add separately
  • Concurrency: assume 3-10 concurrent calls during peaks

Monthly cost table (3,000 minutes)

Line item

Retell (hosted)

Self-hosted (Dograh/LiveKit/Pipecat/Vocode)

Platform minutes

3,000 x $0.12-$0.15 = $360-$450

3,000 x $0.04-$0.06 = $120-$180

Telephony (Twilio typical)

3,000 x $0.01-$0.03 = $30-$90

$30-$90

Branded outbound calls

If outbound: $0.10/call

N/A (depends on your carrier features)

Hosting

$0

$100-$600 (depends on architecture)

Monitoring + storage

$0-$100

$30-$200

Reliability work (ongoing)

Lower

Higher (you own incidents)

Estimated month-2 run total (excluding dev)

$390-$640+

$280-$1,070

From deployments: This is where teams start feeling the Retell bill. It is also where self-hosted starts to look attractive if you can reuse the same platform across clients. If you are an agency, self-hosting becomes a competitive advantage because you stop paying platform markup on every client minute.

Scenario table: 10,000 minutes/month (scale point where break-even often shows)

At this size, platform markup hurts. Self-hosting often wins on run-cost if you do it properly.

We show two self-hosted variants:

  • A: BYO cloud STT/TTS/LLM (simpler, common)
  • B: Advanced self-hosted models (cheaper per minute, but non-trivial)

Assumptions

  • Retell: $0.12-$0.15/min excl. telephony (deployment notes)
  • Self-hosted A: $0.03-$0.06/min excl. telephony (deployment notes)
  • Self-hosted B: <$0.02/min excl. telephony but higher complexity (deployment notes)
  • Monthly cost table (10,000 minutes)

Line item

Retell (hosted)

Self-hosted A (BYO providers)

Self-hosted B (advanced)

Platform minutes

10,000 x $0.12-$0.15 = $1,200-$1,500

10,000 x $0.03-$0.06 = $300-$600

10,000 x $0.015-$0.02 = $150-$200

Telephony

10,000 x $0.01-$0.03 = $100-$300

$100-$300

$100-$300

Hosting

$0

$400-$2,000

$800-$6,000 (GPU heavy, orchestration)

Observability + storage

$0-$200

$80-$400

$150-$600

Ops/on-call

Lower

Medium

High

Estimated run total

$1,300-$2,000+

$880-$3,300

$1,200-$7,100

Warning on variant B: you can drive unit cost below $0.02/min, but it is not beginner-friendly. You need expertise in model hosting, GPU scheduling, quantization, and incident handling. Most teams should start with open source orchestration (like Dograh) while still using third-party STT/TTS/LLM, then migrate pieces later.

Break-even chart: when does self-hosted beat Retell (and when it doesn't)

Break-even depends on dev time and ops maturity. The cheapest option changes as you scale.

A clean way to model it:

  • Retell TCO(M) = (M x R) + T + D
  • Self-hosted TCO(M) = (M x S) + H + T + D2

Where:

  • M = minutes/month
  • R = Retell $/min (e.g., 0.12-0.15 excl. telephony from deployments)
  • S = self-hosted $/min (e.g., 0.04-0.06 excl. telephony from deployments)
  • H = hosting + tooling (fixed-ish monthly)
  • T = telephony (both sides, often similar)
  • D / D2 = dev + maintenance time cost

Assumptions box you can swap

  • Retell R = 0.13
  • Self-hosted S = 0.05
  • Hosting H = $800/month
  • Dev (monthly equivalent) D2 - D = +$1,500/month at first, then declines as you reuse infra

Break-even minutes (rough): If (R - S) = $0.08/min, then $800 hosting breaks even at 10,000 minutes on run-cost alone. But if self-hosting costs you $1,500/month more in people time early, break-even shifts to ~28,750 minutes until you reuse the platform.

That is why self-hosted can be cheaper per minute and still be the wrong decision in month one.

CTA Image

Vapi vs Open Source Voice Agents: Which to Choose?

Discover Vapi vs Open-Source voice agents like Dograh, Pipecat, LiveKit, and Vocode to decide the best option for cost, control, and scale.

Vapi vs Open Source

Reliability and voice quality: what matters in real calls

Reliability is what users notice. A voice agent that sounds smart but responds late loses trust fast.

Latency and first response time (why 3-4 seconds kills trust)

A few seconds of silence at the start of a call feels broken. In real deployments, a 3-4 second lag at the beginning can destroy credibility, especially if it persists.

Published benchmarks back the idea that hosted pipelines can be faster at p95:

  • Hosted: p95 end-to-end 1.5-2.0s
  • Open-source: p95 end-to-end 2.0-3.0s

That gap is not always your framework's fault. It is often:

  • region placement
  • model cold starts
  • network routing
  • audio chunk sizes

What is first-response latency (vs ongoing turn latency) in live calls?

First-response latency is the time from "call connected" to the first meaningful audio response. It includes time to start streaming, detect speech, transcribe, generate tokens, and synthesize speech.

Ongoing turn latency is what happens after the call settles. That is the back-and-forth delay during the conversation.

First-response latency matters more for perceived quality. If the agent starts strong, users forgive small delays later.

A practical fix we have used in self-hosted deployments is co-locating servers closer to end users. This can reduce first-response latency, and it is a real advantage when your users are concentrated in specific regions.

Retell vs self-hosted performance trade-offs (300ms edge vs more control)

Retell often feels snappier out of the box. In practice, Retell can have about a ~300ms latency advantage versus some other platforms, which users perceive as more natural (expert guidance).

Why Retell feels faster early:

  • pre-optimized voice settings
  • curated defaults
  • fewer knobs to misconfigure

Why self-hosted can win long-term:

  • you can pick any model and tune aggressively
  • you can place servers near your users
  • you can remove vendor hops

Self-hosting is not automatically faster. But with the right architecture, it can match hosted latency while giving you more control.

Uptime, dropped calls, and incident ownership (who gets paged?)

Someone always gets paged. The only question is whether it is your team or a vendor.

Reality check:

  • Hosted platforms still depend on major cloud providers.
  • Self-hosted depends on the same clouds, but you own the runbooks.

Ops checklist (copy this into your project doc):

  • Health checks for STT/TTS/LLM providers
  • Retry logic and fallbacks (voice downgrade, model fallback)
  • Call recording integrity checks (missing segments, storage failures)
  • Post-call logs that join: call ID -> transcript -> tool calls -> outcome
  • Alerting on dropped calls and p95 first-response latency
  • A plan for partial outages (TTS degraded, LLM slow)
CTA Image

Synthflow vs Open Source Voice Agents: Which to Choose ?

Explore Synthflow vs Open-Source voice agents like Dograh, Pipecat, LiveKit, and Vocode to find the best option for cost, control, and scalability.

Synthflow vs Open Source

Security and compliance: when self-hosting is the simplest path

Compliance is often a data-flow problem, not a checkbox problem. Where data travels is usually the hard part.

Data flow and the "extra hop" problem

Each additional vendor in your audio path is another risk surface. That is true even if vendors have strong security programs.

In many proprietary setups, audio, transcripts, and recordings flow through:

  • telephony provider
  • voice platform vendor
  • model vendors (STT/TTS/LLM)
  • your application

Self-hosting can reduce hops. At minimum, it lets you control what you store, where, and how long.

What is the "extra hop" problem in voice agent data flow?

The "extra hop" problem means your call data passes through an additional third party before it reaches your systems. That third party may see audio streams, transcripts, tool payloads, and metadata.

This matters because it can create:

  • extra legal agreements (DPAs)
  • extra breach exposure
  • extra retention risk
  • data residency conflicts

Self-hosting does not remove all vendors. But it can keep orchestration and storage inside your environment, which often simplifies audits.

Healthcare example (AI voice agent for healthcare)

Healthcare voice flows can include protected health information. That triggers strict access control, logging, and retention rules.

A practical architecture pattern we have used in regulated environments:

  • Self-host orchestration and storage
  • Encrypt recordings at rest
  • Restrict staff access with least privilege
  • Implement retention and deletion policies
  • Log every access to transcripts and recordings

Even if a platform claims compliance, the extra hop can make internal approval harder. With self-hosting, the data resides on your servers, which often makes privacy controls easier to document and enforce.

Compliance cost line items (what it costs to implement either way)

Compliance has a real cost. Include it in your TCO.

  • Security review time (internal + vendor)
  • Legal/DPA review time
  • Audit logging storage
  • Key management (KMS, rotation)
  • Access control (RBAC, SSO)
  • Redaction pipeline (remove PII from logs)
  • Incident response plan and drills

Self-hosted can reduce vendor scope, but increases your implementation work. Hosted can reduce engineering work, but increases vendor management and data-flow complexity.

CTA Image

Retell AI vs Dograh AI : Which is Best For You in 2025 ?

Retell AI vs Dograh AI in 2025: a clear comparison of costs, features, and use cases to help you choose the right voice AI platform.

Retell AI vs Dograh AI

Developer experience and build speed (time-to-first-call vs long-term control)

Speed to first call is a business advantage. Long-term control is also a business advantage. You must choose what matters now.

Learning curve reality: Retell is fast; self-hosted is flexible

Retell is usually faster to ship. You can often get a basic agent running in a few hours because defaults are intuitive and not hidden behind deep configuration (expert guidance).

More configurable platforms can take longer. This is not because they are worse. It is because they expose more choices, and choices require decisions and testing.

In my own work, I found Pipecat pretty good. But for quick experiments, using the open source OpenAI Agents JS SDK was fast. I had Twilio plus a model working in about 30 minutes. That is the time-to-first-call advantage of a minimal stack.

How to build an AI voice agent (minimal stack for each option)

You can ship with either approach. The difference is who owns the moving parts.

A) Retell setup (fast path)

  • Create an agent and define the prompt and tools
  • Connect a phone number and call routing
  • Add webhooks for your backend actions
  • Test with real call scripts and monitor logs

Pricing and add-ons should be checked directly on Retell's pricing page, because the real bill depends on model and voice choices.

B) Self-hosted setup (Dograh / LiveKit / Pipecat / Vocode)

  • Choose orchestration: Start with Dograh if you want a visual builder plus self-hosting options (Dograh). Use LiveKit/Pipecat/Vocode if you want framework-level control
  • Pick STT/TTS/LLM vendors (or self-host later)
  • Connect telephony: Twilio, Plivo, or Telnyx, depending on rates and routing needs
  • Deploy and monitor: logs, metrics, recordings, dashboards
  • Load test concurrency and failure cases

Dograh is designed to reduce the self-hosted glue work. It supports bring-your-own telephony and models, and it is committed to open source.

Open source starting points (voice agent GitHub links to look for)

Open source saves cost only if the repo is alive. Use this checklist before you build on it.

What to look for in a repo:

  • Recent commits and active issues
  • Streaming audio support
  • Telephony adapters (Twilio/SIP)
  • Examples that run end-to-end
  • Observability hooks (logs/traces)
  • Testing or eval harness

Useful searches:

  • "voice agent github"
  • "ai calling agent github"
  • "open source voice ai"

Also, this Dograh demo tutorial is a good reference for how quickly a workflow-based voice agent can be started.

Support and debugging reality (Discord support speed, ticket lag)

Support speed affects delivery dates. It is part of cost, even if it is not on an invoice.

From experience:

  • Discord-based support can be slow for many platforms.
  • Some tickets can sit for days or weeks.
  • Self-hosted shifts support to your team and the community.

A practical rule: if your business needs strict response times, you need either enterprise support or internal ownership. Community-only support is rarely enough for mission-critical call flows.

The Ultimate Guide to Reduce Speech Latency in AI Calling [Proven]
Speech latency in AI is the total delay from user speech to system response, involving STT, LLM processing, and TTS stages. Keeping this under 500–800 ms is key to creating natural, seamless interactions.

Decision guide: pick the cheapest + safest option for your team

The cheapest option depends on who they are and what they are building. This should be used as a map.

Persona map: solo builder vs 1 engineer vs enterprise

This framing is consistent across most serious cost breakdowns. It is also a practical way to choose.

Solo builder / non-technical team

  • Best fit: Retell
  • Reason: fastest time-to-first-call, fewer infrastructure decisions

One engineer (startup or agency)

  • Best fit: start with Retell or Dograh + hosted model vendors
  • Reason: ship quickly, then migrate pieces when minutes grow
  • If you need flexibility early, Dograh gives open source control without starting from scratch

Enterprise / regulated

  • Best fit: self-hosted with Dograh/LiveKit-style control
  • Reason: data residency, audit needs, reduced extra-hop risk
  • You will still use vendors, but you can reduce scope
Top 10 Open-Source Alternative to Retell AI in 2025
Retell AI offers production-ready voice agents for phone, SMS, and chat with natural interactions, HIPAA compliance, and 31+ languages, starting at $0.07+/min (voice) and $0.002+/msg (chat). Dograh AI is a scalable, flexible open-source alternative.

Prerequisites (so the cost math is fair)

You cannot compare platforms fairly without these inputs. Collect them before you decide.

  • Target regions (US, EU, or both)
  • Expected minutes/month and peak concurrency
  • Inbound/outbound split
  • Recording and retention needs
  • Required integrations (CRM, calendar, ticketing)
  • A realistic hourly cost for engineering and operations

Where Dograh fits (and why we built it)

Self-hosting should not mean building everything from scratch. That is why Dograh exists.

Dograh is an open source platform for voice bots and AI calling agents:

  • Drag-and-drop workflow builder
  • Build workflows in plain English
  • Bring-your-own telephony, STT, LLM, and TTS
  • Cloud-hosted or self-hosted
  • Multi-agent workflows to reduce hallucinations and enforce decision paths
  • A testing suite in progress (Looptalk) to stress test agents with simulated personas

If you want an open source alternative to hosted platforms, but you still want a fast setup, start with Dograh. I am looking for beta users and contributors because the fastest way to make this stronger is feedback from real call flows.

Final takeaway

Hosted platforms are the right move for most teams at the start, and Retell is one of the fastest ways to get to production.

If you have compliance constraints, need tighter control of data flow, or you are paying for enough minutes that platform markup is becoming a line-item you feel, move to self-hosting. At that point, the operational burden is worth it because you stop paying a platform tax on every call.

Use the formulas and tables above, plug in your numbers, and decide with a spreadsheet instead of gut feel.

Related Blog

FAQ's

1. Why does Retell feel cheaper at the beginning?

Retell reduces setup time, hides infrastructure complexity, and has optimized defaults. For MVPs and early launches, speed and reliability usually matter more than per-minute savings.

2. Is Retell pricing really just per minute?

No. Per-minute pricing is only part of the bill. STT, TTS, LLM usage, telephony, outbound call fees, and branded calls can significantly increase real-world costs.

3. Why does concurrency matter so much in voice AI costs?

Concurrency determines how many calls run at once. Higher concurrency increases infrastructure needs and can push you into higher pricing tiers. It’s one of the fastest ways for bills to spike unexpectedly.

4. What costs do teams usually miss when comparing voice platforms?

Common misses include engineering, on-call time, observability, call-recording storage, load testing for concurrency spikes, and compliance or vendor security reviews.

5. Why does latency matter more in voice than chat?

In voice, even a few seconds of silence feels broken. High first-response latency immediately damages trust, especially at the start of a call. Users forgive small delays later, but not a slow start.

6. Can self-hosted voice agents really go below $0.02 per minute?

Yes, but only with advanced setups. This usually requires self-hosting models, careful GPU scheduling, quantization, and strong ops discipline. It’s not recommended early because complexity and failure risk are high.

Was this article helpful?