When it comes to pricing voice agents, don't focus on listed prices. The number that matters is the fully- loaded cost per minute, not the headline rate. This blog breaks down real TCO for self-hosted voice agents (Dograh, Pipecat, LiveKit, Vocode style stacks) vs Bland. The article is written for builders and operators who expect to run 10k, 100k, or 1M minutes/month and want cost math they can trust and accept.
What this Post will answer (and who it is for)
By the end of this post, you’ll clearly understand what you really pay per minute and where the extra costs come from. You will also see where self-hosting wins on cost, latency, debugging, and compliance at scale. Vapi shows up in the SERP a lot, but this post focuses on Bland vs self-hosting because cost confusion is highest there.
The Real Question: total cost per minute, not sticker price
The listed price is rarely your real price in voice. Voice bot also includes failure costs: retries, transfers, dead air, hangups, longer calls, and support time.
In my own deployments, the surprise was not the per-minute rate. The surprise was how fast the bill moved when we had small reliability issues (carrier-specific failures, latency spikes, and missing timeouts).
What "self-hosted" means in voice (pipeline ownership)
Self-hosted means you own/control the pipeline, end to end. That usually includes:
- Telephony (SIP trunks, inbound/outbound, transfers)
- Media routing (WebRTC/SIP bridges, relays, TURN if needed)
- STT (streaming speech-to-text)
- Orchestration (turn-taking, barge-in, tool calls, state, retries)
- LLM (reasoning + tool decisions)
- TTS (streaming text-to-speech)
- Logging/monitoring (per-turn timings, traces, recordings, audits)
Self-hosted does not mean free. User still need pay for vendors and infrastructure:
- Twilio/Telnyx minutes
- Deepgram (or other STT)
- ElevenLabs (or other TTS)
- LLM tokens
- Compute, bandwidth, logging, on-call time
Quick answer preview (when each wins)
Below ~10k minutes/month, managed platforms can be simpler. Above ~100k minutes/month, the cost gap can become decisive: ~ $0.03-$0.04 raw cost vs ~ $0.10-$0.15 platform pricing based on real deployments.
Under about ~10k minutes a month, managed platforms are simpler to use. Above ~100k minutes a month, the cost difference really matters: around $0.03 - $0.04 in raw costs versus $0.10 - $0.15 on platforms, based on real usage.
At 100k minutes/month, that difference is not "nice to have". It can decide whether your product has a margin.
Myths to ignore before you run the math
These myths cost you time and lead to bad budgets. Skip them and rely on real numbers like failure rates and latency.
Myth 1: Voice failures are just prompt issues
Most voice failures are not prompts. They are timing and audio edge cases: barge-in collisions, partial transcripts, race conditions, and media routing issues.
The impact is measurable:
- More hang-ups
- More retries
- Distorted audio
- Longer calls
- Higher cost per successful outcome
Myth 2: Self-hosting is only a micro-optimization
Colocation is not a micro-optimization in voice. Network latency in voice turns is often non-compressible, and it stacks.
From real measurements, colocation removed ~180ms of non-compressible latency on a leading platform. A reasonable expectation is that ~200ms saved is common when you stop bouncing between regions and vendors.
That matters because turn-taking is extremely sensitive. A "small" delay changes whether users interrupt, wait, or hang up.
Myth 3: Platforms automatically make compliance easier
Platforms can increase compliance surface area by adding a vendor layer. That can mean more DPAs, more audit paths, and more uncertainty about PII routing.
Self-hosting can simplify compliance if done well:
- Fewer vendors in the transcript/recording path
- Clear retention controls
- Better audit traces
But it only helps if you implement controls properly.
Glossary (key terms)
- Non-compressible network latency: Latency you cannot optimize away with faster code. It comes from physical distance, routing, and vendor hops. In voice, every hop adds delay that users feel instantly.
- Tail latency (p95/p99) for voice turns: The slowest 5% or 1% of turns. Averages can look fine while p95 causes barge-in failures and awkward pauses. Tail latency is what breaks live calls.
- Streaming backpressure (in real-time audio pipelines): When downstream systems (STT/TTS/bridges) cannot keep up, audio gets buffered. Users then get responses 2-4 seconds late, even if compute is healthy.
- Codec pinning (PCMU vs OPUS): Forcing a specific audio codec end-to-end to avoid mismatches across carriers and media bridges. This reduces one-way audio and dead air failures.
- TCO (Total Cost of Ownership): Fully-loaded cost including vendor minutes, tokens, infrastructure, monitoring, engineering time, and failure overhead.
Assumptions box: the TCO model inputs (copy/paste friendly)
These are the exact inputs used in the tables. If you disagree, swap the numbers and keep the same model.
Baseline assumptions we will use (100k min/month model)
We model a realistic mid-size deployment:
- Usage: 100,000 minutes/month
- p95 concurrency: 40 (bursty traffic without surprise throttling)
- Telephony: Twilio SIP inbound ~ $0.0085-$0.01/min (US blended)
- STT: Deepgram ~ $0.004-$0.006/min (real-time)
- TTS: ElevenLabs ~ $0.01-$0.015/min spoken
- LLM: GPT-4o-mini tokens ~ 1.5-2.5k tokens per call minute (bidirectional)
Infrastructure + region assumptions (so results are reproducible)
We assume US - East for most components. It is a common choice because many vendors have strong coverage there and latency to US carriers is often good.
Operational targets assumed:
- SLO goal: "Feels responsive" in live calls, with p95 turn latency controlled
- Basic redundancy: at least one failover path for telephony and media routing
- Infrastructure/ops includes: compute, bandwidth, TURN/media relay if needed, logs, metrics, and on-call time
What we exclude and why (keep it honest)
We exclude items that vary too widely:
- Sales time and deal cycles
- Custom ASR training
- Paid compliance consulting (outside basic engineering controls)
Pricing changes fast. The value here is the model, not any single number.
The cost math: fully-loaded per-minute formula (Bland vs self-hosted)
The formula below is the real work. Most pricing pages only show one term.
Per-minute fully-loaded cost formula (components)
Fully-loaded cost per minute:
- Telephony (carrier minutes + connection fees)
- STT (streaming speech-to-text minutes)
- TTS (spoken audio minutes)
- LLM tokens (input + output tokens per minute)
- Infrastructure/ops (compute, bandwidth, logging, on-call)
- Failure overhead (retries, transfers, longer calls, hangups)
- Engineering amortization (build + maintenance spread over minutes)
Simple version:
Cost/min = Telephony + STT + TTS + LLM + Infrastructure + Failure overhead + Engineering amortization
Bland pricing model: what you pay for (and what is unclear)
Bland publishes a usage-based baseline that many teams treat as the price. But you need to include minimums and add-ons.
From Bland's docs:
- $0.09/min inbound and outbound voice calls (baseline)
- Minimum outbound call charge: $0.015 per call attempt
- Transfers: $0.025/min (using Bland numbers)
- Transfers are free with your own Twilio (BYOT)
- SMS: $0.02 per message (in/outbound)
- Voicemail: $0.09/min
- Extra for multilingual transcription/voices, voice cloning, custom LLM hosting, advanced integrations, premium support
- Some enterprise features (higher concurrency, volume discounts) require a contract
That means your $0.09/min can become $0.09/min + transfer minutes + SMS + minimum attempt charges. The doc example also shows how quickly it adds up:
- 5-minute call costs $0.45
- Add a 3-minute transfer: +$0.075
- Send a follow-up text: +$0.02
- Total per engagement: $0.545
Also note the floor: 100 failed calls connected cost at least $1.50 via minimum charges.
Self-hosted pricing model: what you still pay for (BYO vendors)
Self-hosted usually means bring your own keys for each component. Your cost becomes more transparent and more controllable.
Typical self-hosted COGS:
- Telephony (Twilio/Telnyx)
- STT (Deepgram, Google STT, etc.)
- TTS (ElevenLabs, etc.)
- LLM tokens
- Infrastructure (compute + bandwidth + logging)
The strategic difference is not that vendors are free. You can switch vendors, change regions, and decide where the money goes.
Non-obvious costs to add (the ones that change decisions)
These costs are often bigger than people expect:
- Engineering time (build + maintenance)
- Monitoring and observability (traces, per-turn timings, audio slices)
- On-call and incident time
- Compliance work (PII routing, retention, encryption, DPAs/BAAs)
- Latency-driven hangups and longer calls
- Failed calls and retries
- Vendor lock-in and migration cost
In real deployments, observability gaps alone can cost weeks. If you cannot see per-leg timing, you cannot fix "it feels laggy".
Real cost breakdown: 3 scenario tables (low, mid, high volume)
These scenarios show how cost behaves as volume rises. We have used the contextual Q&A fully-loaded numbers from production-style deployments.
Scenario 1: 10k minutes/month (minimums dominate)
At low volume, platform simplicity can be attractive. Self-hosting can still be cheaper on raw minutes while feeling expensive in time cost.
If you only run 10k minutes/month, self-hosted savings may not feel huge. You still need basic monitoring and reliability work.
Scenario 2: 100k minutes/month (the break-even neighborhood)
At this scale, the gap starts to matter. This is where platform margin becomes your biggest line item.
Self-hosted line items (example model):
Now compare totals:
Above ~100k minutes/month, the difference reshapes whether the product has acceptable margin.
Scenario 3: 1M minutes/month (scale economics)
At 1M minutes/month, platform fees can dominate the P&L. Self-hosting only works here if you have real operational maturity.
To capture these savings, you need:
- Multi-region planning
- Strong monitoring
- Safe deployment practices
- Clear budget caps
How to customize the tables to your stack (quick knobs)
You only need to change a few knobs to adapt this to your numbers. Start with the biggest movers:
- TTS cost: swap ElevenLabs rate if you use another provider
- Tokens/min: measure real tokens per minute for your agent
- Average call length: longer calls amplify TTS and token cost
- Concurrency and region: tail latency and throttling risk
- Redundancy level: failover trunks, relays, and logging retention
Step-by-step:
- Replace Telephony, STT, TTS unit rates
- Compute LLM $/min from your tokens/min and token price
- Add infrastructure/ops per minute (monthly infrastructure / monthly minutes)
- Add failure overhead (see next section)
Break- even point: when self-hosted becomes cheaper (and why)
Break-even is driven by minimums, margins, and failure costs. It is not driven by engineering purity.
Break- even chart: minutes/month vs total monthly cost
If you plotted the scenario totals, you would see:
- Bland starts simpler at low volume
- Self-hosted has a fixed ops baseline, then flattens
- Past ~100k minutes/month, the curves separate quickly
Why the curves diverge:
- Platforms bake in margin and risk
- Your raw costs (telephony/STT/TTS/LLM) scale closer to linear
- Your infrastructure cost per minute usually drops with volume
Sensitivity analysis: what moves break-even the most
The biggest break-even movers in voice are usually:
- TTS cost (often dominant in natural-sounding agents)
- Failure rate (retries and longer calls)
- Tokens/min (agents that ramble are expensive)
- Telephony rate (less flexible than other components)
- Tail latency (drives hangups, which drives retries and transfers)
Concurrency matters too. High p95 concurrency without controls can force higher-tier plans or throttling.
The performance angle: 200ms saved can reduce hangups and cost
Latency is not only a UX metric. It affects cost.
In production testing, we measured ~180ms of non-compressible latency on a leading platform. When we moved to a colocated self-hosted setup, that latency disappeared.
Guidance used for this post: colocation can often save ~200ms in network calls alone.
Why that changes cost:
- Fewer awkward pauses means fewer hangups
- Better barge-in means fewer repeated turns
- Shorter calls means fewer minutes billed
- Less need for transfers means fewer transfer minutes
Side-by-side comparison: self-hosted (Dograh, Pipecat, LiveKit, Vocode) vs Bland
This table is the fastest way to pick a direction. It reflects what changes cost, speed, and risk.
Comparison table: pricing, setup effort, scaling, compliance, support
Bland also offers a managed self-hosted infrastructure optimized for large-scale calling. That can fit enterprises that want less operations work. It is still different from owning the pipeline and negotiating every vendor contract yourself.
Latency and call experience: sluggish vs responsive (what users feel)
Voice quality can be good while the call still feels slow. That is a common failure mode in live calls.
From real experience, some managed platforms (Bland, Retell) feel sluggish during turn-taking and live calls. Self-hosting lets you colocate telephony, STT, orchestration, and models, which removes avoidable round-trip time.
STT latency is a key piece here. Deepgram Nova-2 reports sub-100ms recognition latency in streaming mode, with ~420ms average end-to-end and p95 under 500ms in voice agent tests.
Compare that with Whisper-style streaming behavior:
- Streaming latency ~ 1-2.5s for initial tokens, often unsuitable for sub-second turn-taking
This is why stack choices matter. A cheap STT that adds 1-2 seconds can cost you more in hangups and longer calls.
Feature reality check: warm transfer, SMS, advanced routing
Warm transfers and SMS are core for SMB operations: missed calls, appointment reminders, human handoff.
Guidance used for this post: Bland is a weak fit for many SMBs under $5k/month because common tools like warm transfer and SMS can be contract-gated in practice. That blocks real-world call flows.
Self-hosting makes these implementable:
- Warm transfer via Twilio call control and your workflow
- SMS via your telephony provider
- Advanced routing via your own decision tree and webhooks
Dograh is designed for this style of flexibility:
- Drag-and-drop builder
- Plain-English workflow editing
- Multi-agent workflows to reduce hallucinations
- Bring-your-own-keys across vendors
- Self-hostable, fully open source
Compliance and data control: reduce surface area by removing the middle layer
Compliance is about data paths and controls, not badges. Adding a platform can add another processor of transcripts and recordings.
Self-hosting can reduce surface area:
- Fewer vendors touching PII
- Clear retention and deletion policies
- Direct audit trails from your logs and storage
But you must implement:
- Encryption at rest
- Access controls
- Audit logs
- Retention controls
- Vendor DPAs/BAAs where required
Hidden costs and failure modes (from real deployments)
These failure modes show up in real voice systems, not demos. They directly change your cost per minute.
Dropped calls and dead air: SIP/media routing edge cases
The most painful voice bugs are when users say "it just went silent". Common incidents we have seen:
- One-way audio after SBC upgrades
- NAT/TURN misconfig causing silent agents on mobile networks
- Twilio/Telnyx region mismatch causing 5 -10% call failures for specific carriers
Fixes that worked:
- Multi-region media relays
- Explicit codec pinning (PCMU/OPUS)
- Per-carrier health checks
- Automatic failover to backup trunks
A 5-10% failure rate is not only a reliability issue. It is a cost multiplier because retries and support calls pile up.
Latency regressions break barge-in (tail latency matters)
Average latency can look fine while the worst 5% destroys UX. Incidents we have seen:
- STT model version increased tail latency by 300-500ms
- LLM routing to a cheaper region added ~800ms RTT, breaking barge-in and causing hangups
Fixes:
- Canary releases with p95 latency SLOs
- Per-model circuit breakers
- Slow-path fallbacks (switch model or reduce context when RTT exceeds threshold)
Streaming backpressure bugs (2-4s delayed replies under load)
This one makes teams think the LLM is slow. Often it is your audio bridge buffering.
Incidents:
- WebRTC to STT gRPC bridges buffering audio
- Mishandled backpressure in Node/Go workers
- Result: 2-4 seconds delayed replies under load
Fixes:
- Strict frame sizes (20-40ms frames)
- Drop-instead-of-buffer past small jitter windows
- Explicit flow control in workers
Billing surprises: runaway loops and missing timeouts
Billing surprises are usually small bugs with big multipliers. Incidents:
- Agent repetition loops causing 10x token usage
- No silence timeout means abandoned calls burn budget
Fixes:
- Per-call caps (max tokens, max minutes)
- Hard budget guards
- Anomaly alerts on COGS/min spikes
What is a canary release for STT/LLM model upgrades?
A canary release means you roll out a model change to a small slice of traffic first. In voice, this is critical because the risk is not just accuracy. It is latency and turn-taking.
In practice, a voice canary release looks like:
- 1-5% of calls use the new STT model (or new LLM routing)
- You track p95 turn latency, barge-in success, hangups, and retries
- You only expand rollout if those metrics stay within your SLO
Model changes can add 300-500ms tail latency without obvious warnings. That can turn a good demo agent into a production agent that users interrupt and abandon.
What is a circuit breaker in a real-time voice agent stack ?
A circuit breaker is a safety check that stops a slow or failing service from ruining the whole call. In voice systems, it’s usually triggered by high delay or too many errors, not just complete failures.
Voice-specific circuit breaker behavior:
- If STT latency spikes or error rate increases, switch to a backup STT model
- If LLM RTT crosses a threshold (for example after a routing change), reduce context or switch regions
- If TTS is slow, use a simpler voice or shorter prompts for the next turn
This reduces cost because it prevents:
- Long dead-air segments
- Repeated turns
- Hangups that cause immediate retries
It also keeps the user experience stable during vendor incidents.
What is a failover trunk (and how does it reduce dropped-call rates) ?
A failover trunk is a backup telephony route that takes over when the primary route fails. Think of it as a second carrier or a second region configuration that you can switch to automatically.
In voice agents, failover trunks help when:
- a carrier has a regional outage
- a specific carrier path has codec issues
- a region mismatch causes higher failures for certain networks
We have seen carrier/region mismatch contribute to 5-10% failures for specific carriers. Failover trunks plus per-carrier health checks are how you reduce that in production.
Build vs buy decision: recommendations by persona (with budgets)
The right answer depends on your team, volume, and compliance scope. Use these as practical defaults.
Solo founder / indie hacker (optimize for speed, avoid long contracts)
If you need to ship fast, a managed platform can get you to a demo quickly. Plan an exit path if you expect growth.
A practical approach that works:
- Start with a simple stack and measure tokens/min, hangups, and transfer rates
- Move to self-hosting once you approach the 100k min/month range
Dograh is a good fit when you want:
- Quick iteration with a visual builder
- Bring-your-own-keys
- Open source self-hosting for control
SMB ops team (<$5k/month budget): features that block you
Many SMB call flows require:
- Warm transfer to a human
- Sending SMS confirmations
- Custom routing rules
Bland usually becomes a poor fit under ~ $5k/month, because important features are locked behind contracts or extras, even though the baseline per-minute price looks simple.
Self-hosting gives you direct access to telephony features:
- Twilio call control for warm transfer
- SMS using your provider pricing
- webhooks into your CRM and scheduling tools
Enterprise: compliance, audits, and scale (100k+ min/month)
At 100k+ minutes/month, cost and compliance become board-level topics. Self-hosting (or managed self-hosting where you still own keys and data paths) is usually the right move.
What enterprises should standardize:
- p95 latency SLOs for each hop (STT/LLM/TTS)
- Multi-region routing for media and telephony
- Retention windows for transcripts and recordings
- Encryption at rest and audit logs
- Budget guards and anomaly detection
Bland is enterprise-focused and offers managed self-hosted infrastructure. That can help, but users still need to evaluate data paths, vendor contracts, and how much you can control.
Practical self-hosted stack guide (Dograh, Pipecat, LiveKit, Vocode, GitHub projects)
Self-hosting works when you treat it like a pipeline, not a single tool. This section gives a reference architecture and evaluation checklist.
Reference architecture: telephony > media > STT > LLM/tools > TTS
A standard real-time pipeline:
- Telephony (SIP / PSTN)
- Media gateway (SIP > WebRTC or RTP handling)
- Streaming STT (partial transcripts, word timings)
- Orchestrator (state machine, barge-in, tools, memory, retries)
- LLM and tool calls (CRM, scheduling, ticketing)
- Streaming TTS (low-latency audio)
- Media back to caller
Where latency accumulates:
- SIP/media bridging
- STT partial delay
- LLM RTT (Round Trip Time) and tool RTT
- TTS first audio chunk time
- Network hops between each service
Colocating these services reduces RTT and protects barge-in behavior.
STT choice is crucial for responsiveness:
- Deepgram Nova-2: sub-100ms recognition latency, ~420ms average end-to-end, p95 ~ 500ms in tests.
Open source options: Dograh vs Pipecat vs LiveKit vs Vocode (when to use what)
Pick based on what problem you need to solve first. Do not pick based on hype.
- Dograh: best when you want a visual workflow builder, intuitive UI, fast iteration, multi-agent workflows, and full self-hosting with BYO keys.
- Pipecat-style pipelines: best when you want composable real-time building blocks and you already have engineers comfortable assembling the stack.
- LiveKit: best when you want robust real-time media infrastructure (WebRTC) and need reliable media routing primitives.
- Vocode-style SDKs: best when you want a code-first developer experience and are comfortable implementing missing ops pieces yourself.
Dograh stands out because it gets you to a working call flow fast (2 min launch), without hiding the pipeline while keeping the system open and self-hostable.
GitHub checklist: how to evaluate a voice agent repo fast
Use this when searching "self hosted voice agent github" or "ai voice agent github". It helps you avoid adopting a repo that only works in a demo.
Checklist:
- Recent commits and active issue responses
- True streaming support (STT and TTS streaming, not batch)
- Barge-in and VAD support
- Telephony adapters (Twilio/Telnyx/SIP) or clear integration docs
- Observability hooks (timings per hop, traces, call IDs)
- Load behavior documented (concurrency guidance)
- License clarity (important for commercial use)
- Deployment docs (Docker, Kubernetes, secrets handling)
- Tests for timing-sensitive components
- Examples that include transfers, webhooks, and error handling
Observability and debugging: log every hop (own the pipeline)
Voice debugging is hard everywhere. Self-hosting does not make it easy. It lets you instrument deeply.
What to log from day one:
- Per-turn timing: STT partials, final transcript time, LLM RTT, TTS first audio chunk
- Call graphs tied to a single call ID
- Audio slices around failures (with strict retention)
- VAD decisions and barge-in triggers
- Vendor response codes and throttling events
You can use OSS tracing tools like Langfuse or build your own. The key is to make "users say it feels laggy" a 10-minute investigation, not a 3-week rebuild.
Security and compliance checklist (simple and strict)
Compliance is part design, part discipline. This checklist covers what security reviews usually block.
PII data flow map: transcripts, recordings, logs and webhooks
Start by mapping where PII can appear:
- Transcripts
- Recordings
- Call metadata
- Webhook payloads
- Logs and error traces
Controls to implement:
- Encryption in transit (TLS) for all service hops
- Encryption at rest for recordings and transcripts
- Avoid putting raw transcripts in application logs
- Redact sensitive fields before storage when possible
In real deployments, go-live has been blocked because:
- Transcripts and recordings were not encrypted at rest
- PII leaked into logs
- no clear retention policy existed
These are preventable with a data flow map.
Retention, access control, and keys (what auditors ask for)
Auditor-style checklist:
- Configurable retention windows per tenant
- Per-tenant encryption keys (or strong isolation)
- RBAC for dashboards and logs
- Audit logs for access and exports
- Secure storage for recordings (scoped access, signed URLs)
- DPA/BAA readiness with telephony/STT/TTS/LLM vendors
Self-hosting makes this easier to reason about, because you choose exactly where data lives. But you still have to implement it.
Compliance trust signals to look for (and what they do not mean)
SOC2, HIPAA, and GDPR claims matter, but they are not a shield. They do not remove your responsibility for:
- Retention policies
- Access controls
- Least privilege
- Breach response planning
- Customer-specific requirements
A practical approach:
- Treat vendor badges as inputs
- Validate your own data paths and controls
- Document everything (auditors reward clarity)
Final checklist: choose the best AI voice agents platform for your situation
Answering this section gives you a decision.
Decision checklist (10 questions)
- How many minutes/month will you run in 3 months? In 12 months?
- What is your p95 concurrency, not your average?
- What is your p95 turn latency target for "feels responsive"?
- Do you need warm transfer in your MVP?
- Do you need SMS and follow-ups as part of the flow?
- What failure rate is acceptable (dropped calls, dead air)?
- Do you need to control where recordings and transcripts are stored?
- Can your team support on-call and incident response?
- How quickly do you need to switch STT/TTS/LLM vendors if costs change?
- Do you need lock-in protection for compliance or margin?
Summary: who should pick Bland, who should self-host
If you need fast setup and you are below 10k minutes/month, a managed platform can be simpler. Model minimum attempt charges, transfer minutes, and SMS costs using published pricing.
If you are approaching 100k minutes/month, self-hosting is usually the economic choice. Real deployments show ~ $0.035/min self-hosted (~ $3.5k/month) vs proprietary ~ $0.12/min (~ $12k/month) at 100k minutes/month. That gap can make or break margin.
My take: if you already know you will reach 100k+ minutes/month, do not build your business on a per-minute platform fee unless you have no alternative. Own the pipeline early, or accept that margin will be capped.
Self-hosting does not make voice easy. It makes cost control, performance tuning, debugging, and compliance achievable because you own the pipeline.
If you want the self-hosted route without losing speed, Dograh's approach is straightforward:
- Build workflows in plain English with a drag-and-drop builder
- Keep full BYO keys and vendor choice
- Stay open source and self-hostable
- Test with Looptalk-style AI-to-AI testing as you harden reliability
If you are evaluating this seriously, Dograh is looking for beta users and contributors. The goal is to keep voice infrastructure inspectable and flexible, not tied up in contracts.
FAQ's
1. What is a voice agent?
A voice agent is a system that understands spoken input, reasons using NLP and LLMs, and responds by speaking back, using STT for listening and TTS for replying.
2. What is the best AI voice agent?
There’s no single “best” AI voice agent, but self-hosted open-source stacks (Dograh, Pipecat, LiveKit, Vocode) are often better than proprietary tools because they give you more control, flexibility, and lower long-term cost.
3. Is self-hosting a voice agent cheaper from day one?
No. At low volume (under ~10k minutes/month), managed platforms can be simpler. Self-hosting becomes clearly cheaper as volume grows.
4. When does Bland make sense?
Bland can work well for quick demos or low-volume use, especially if you don’t need advanced routing or transfers early.
5. When does self-hosting break even?
Usually around 50k - 100k minutes/month, depending on call length, failures, and TTS cost.
6. Do I need to build everything myself to self-host?
No. You don’t need to build everything from scratch, tools like Dograh, Pipecat, LiveKit, and Vocode provide the core building blocks, while still letting you own and customize the pipeline.
Was this article helpful?