# Cantari > Every great AI voice in one studio. One fair price. You own everything you make. Cantari is a creator studio for AI voice work: text to speech across five engines, speech to text, dubbing, an audiobook studio, and sound and music. Voice cloning and a voice changer are coming soon. Each link below ending in .md is a markdown version of the page, generated from the same content the page renders. ## Docs: Start here - [Getting started: your first generation in five minutes](https://cantari.io/docs/getting-started.md): Create an account, learn the free plan, make your first clip, and find it in your library. No credit card, no setup. ## Docs: Tools - [How to use the Text to Speech studio](https://cantari.io/docs/text-to-speech.md): The editor, choosing an engine by character, directing a read with bracketed cues, the fine controls, and Enhance. - [How to clone your voice from a short recording](https://cantari.io/docs/voice-cloning.md): Clone your voice from 10 to 20 seconds of clean audio. What to record, the consent attestation, language support, and what delete actually does. - [How to re-voice a recording as one of your cloned voices](https://cantari.io/docs/voice-changer.md): Upload or record a take, pick a clone you built, and convert. What carries over, the real caps, and how usage meters by audio length. - [How to dub audio into another language](https://cantari.io/docs/dubbing.md): Transcribe a recording, translate the script, and re-voice it in one of 8 languages. Three real steps with an editable transcript between each. - [How to turn a manuscript into an audiobook](https://cantari.io/docs/audiobook-studio.md): Import a manuscript, split it into chapters and segments, render each line as real audio, and export stitched chapter WAVs. Beta, documented honestly. - [How to transcribe audio to text](https://cantari.io/docs/speech-to-text.md): Upload or record audio and get an accurate plain-text transcript from a Whisper-class model. Formats, the 25 MB cap, and what metering really counts. - [How to generate background music and ambience](https://cantari.io/docs/sound-and-music.md): Describe a mood and get a finished instrumental clip from Lyria 3 (preview). Prompt tips, what the tool makes today, and what it honestly does not. ## Docs: Cantari Scribe - [Cantari Scribe: push-to-talk dictation for Windows](https://cantari.io/docs/scribe.md): Install, pair with a one-time code, hold a key, and clean text lands at your cursor in any app. How metering, privacy, and the beta caveats actually work. - [Setting up Cantari Scribe, step by step](https://cantari.io/docs/scribe-setup.md): From download to your first placed sentence: the SmartScreen prompt, the pairing code, the hotkey, and what each part of the little pill means. - [Scribe troubleshooting: when dictation does not land](https://cantari.io/docs/scribe-troubleshooting.md): The known dead ends and their fixes: SmartScreen, busy microphones, stolen hotkeys, apps that fight paste, Signed out, and the allowance meter. ## Docs: Platform - [The five engines, and which one to pick](https://cantari.io/docs/engines.md): The full roster by character, plus how to read the benchmark: third-party quality scores and latencies we measure ourselves. - [How usage, limits, and file caps work](https://cantari.io/docs/usage-and-limits.md): What a character buys, when the meter resets, what happens at the limit, and the real caps on uploads and scripts. - [Who owns the audio you generate](https://cantari.io/docs/ownership-and-rights.md): You own your generations: commercial use, worldwide, no watermark, no attribution. What we store, what delete does, and your cloning responsibilities. - [Supported audio and file formats](https://cantari.io/docs/supported-formats.md): Every format each tool reads and writes, with the real caps: 25 MB uploads, 20 MB clone clips, 150,000-character manuscripts, MP3 and WAV out. - [Plans and billing: what each plan includes and how payment works](https://cantari.io/docs/plans-and-billing.md): The three plans and their real allowances, what counts toward the meter, how checkout and cancellation work, and what happens when you hit the cap. - [How the community works: sharing, remixing, and moderation](https://cantari.io/docs/community.md): Share your own clips with explicit consent, remix anyone's script into your studio, join the discussions, and see exactly how moderation works. ## Docs: Trust & safety - [Why Cantari Has No Children's Voices](https://cantari.io/docs/no-childrens-voices.md): No child stock voices, no cloning minors, no child-like presets. The five reasons: consent, privacy law, voiceprint law, replica law, and the abuse record. - [Voice cloning consent and the law](https://cantari.io/docs/cloning-consent-and-the-law.md): Why consent is the legal foundation of voice cloning: publicity rights, voiceprint statutes with real damages, GDPR biometrics, and what genuine permission looks like. - [Cloning celebrities, politicians, and other public figures](https://cantari.io/docs/impersonation-and-public-figures.md): Famous voices are the most legally protected voices, not the least. The ELVIS Act, the FCC's robocall ruling, the New Hampshire deepfake fine, and where parody actually stands. - [Voices of the dead: who can say yes to a posthumous clone](https://cantari.io/docs/voices-of-the-dead.md): Death does not end a voice's legal protection. Postmortem publicity rights, California's digital replica law, how estates actually license voices, and what we allow. - [Deepfake disclosure laws: when synthetic audio must say so](https://cantari.io/docs/deepfake-disclosure-laws.md): The labeling rules arriving for AI audio: the EU AI Act's transparency article, US state election-deepfake laws, platform policies, and where Cantari already stands. - [Voice cloning scams: what they sound like and how to protect your family](https://cantari.io/docs/voice-scams-and-fraud.md): The family-emergency voice scam, the case that reached Congress, what the FTC and FBI advise, the passphrase habit worth adopting today, and what we do on our side. - [Voice banking: preserving your own voice, and consent done right](https://cantari.io/docs/voice-banking-and-consent-done-right.md): The best case for voice cloning: people banking their voices ahead of ALS, performers licensing replicas on their own terms, and how you will preserve a voice here when cloning ships (coming soon). ## Docs: Glossary - [What is text to speech? TTS, explained simply](https://cantari.io/docs/what-is-text-to-speech.md): Text to speech (TTS) is software that turns written words into spoken audio. What modern neural engines actually do, and how to size a script in minutes. - [What is speech to text? Transcription, explained simply](https://cantari.io/docs/what-is-speech-to-text.md): Speech to text (STT) turns recorded speech into written text. How modern transcription models work, what limits their accuracy, and what they cannot do yet. - [What is a TTS cue? Bracketed emotion directions, explained](https://cantari.io/docs/what-is-a-tts-cue.md): A cue is a stage direction in square brackets, like [whispering], that tells a voice engine how to deliver the next line. Which engines act them, and which ignore them. - [What is speakable text normalization?](https://cantari.io/docs/speakable-text-normalization.md): Normalization rewrites numbers, dates, and abbreviations into the words a voice should actually say: Dr. into Doctor, 1982 into nineteen eighty-two. Why it matters for long-form audio. - [What is voice drift in AI narration?](https://cantari.io/docs/what-is-voice-drift.md): Voice drift is the slow change in a synthetic narrator's tone, pacing, or energy across long-form audio. Why it happens, and how chaptered workflows keep hour seven sounding like hour one. - [What is TTS latency? Time to first byte vs full audio](https://cantari.io/docs/tts-latency-time-to-first-byte.md): TTS latency is how long an engine takes to speak. The number depends entirely on where you stop the clock: the first streamed byte, or the complete audio file. - [What is a Quality Elo score for AI voices?](https://cantari.io/docs/what-is-quality-elo.md): Quality Elo is a listener-vote rating for voice engines, borrowed from chess: blind pairwise comparisons produce a score nobody can self-award. How to read one, with the attribution. - [What is voice cloning consent, and why does it matter?](https://cantari.io/docs/voice-cloning-consent.md): Voice cloning consent is the speaker's explicit permission before their voice is cloned. What good consent practice looks like, and how it is enforced here. - [Dubbing vs subtitling: what is the difference?](https://cantari.io/docs/dubbing-vs-subtitling.md): Dubbing replaces the voice track in a new language; subtitling keeps the original audio and translates on screen. The honest trade-offs, and which one this studio does. - [What is zero data retention (ZDR)?](https://cantari.io/docs/zero-data-retention.md): Zero data retention means an AI provider processes your request and keeps nothing: no stored prompts, no stored outputs, no training on your text. What ZDR covers, and what it does not. ## Format guides - [MP3 to Text](https://cantari.io/formats/mp3-to-text.md): Turn any MP3 into clean text with a Whisper-class model: podcasts, dictation, old interviews. Uploads to 25 MB, transcripts you can copy or download. - [MP4 to Text](https://cantari.io/formats/mp4-to-text.md): Get a transcript from an MP4 video: the audio track is what we read, the picture plays no part. Zoom exports, phone clips, and screen recordings to 25 MB. - [M4A to Text](https://cantari.io/formats/m4a-to-text.md): Transcribe M4A voice memos without converting them first. Share the file off your phone, upload up to 25 MB, and the whole backlog turns into text. - [WAV to Text](https://cantari.io/formats/wav-to-text.md): Upload a WAV master and read it back as text. Honest math on the 25 MB cap, what fits at common sample rates, and when FLAC is the smarter upload. - [OGG to Text](https://cantari.io/formats/ogg-to-text.md): Transcribe .ogg and .oga files, Opus or Vorbis inside: chat voice notes, Linux recordings, and open-source audio, up to 25 MB per upload. - [WebM to Text](https://cantari.io/formats/webm-to-text.md): That .webm a browser recorder handed you transcribes as it is. No conversion step, uploads up to 25 MB, and the transcript downloads as plain text. - [FLAC to Text](https://cantari.io/formats/flac-to-text.md): Turn FLAC recordings into text without unpacking them: oral histories, live tapes, and archival masters, with twice the minutes of WAV under 25 MB. - [Text to MP3](https://cantari.io/formats/text-to-mp3.md): Convert a script to MP3 with five AI voice engines. About 1,000 characters is a minute of audio; downloads carry full commercial rights, no watermark. - [Text to WAV](https://cantari.io/formats/text-to-wav.md): Generate WAV from text: Gemini Flash delivers PCM wrapped as WAV, and stitched audiobook chapters export at 24 kHz mono, ready for the edit timeline. ## Engines - [Gemini Flash](https://cantari.io/engines/gemini.md): The only engine here that acts your bracketed [emotion] directions. - [Grok Voice](https://cantari.io/engines/grok.md): xAI voice with 5 personas. Plain read, ignores cues. - [Kokoro](https://cantari.io/engines/kokoro.md): Cheapest. Clean, plain read. Ignores cues. - [MAI Voice 2](https://cantari.io/engines/mai.md): Microsoft voice with real style and speed controls. - [Zonos](https://cantari.io/engines/zonos.md): Open-weight Zyphra engine with four accent voices. ## Use cases - [Audiobooks & Publishing](https://cantari.io/use-cases/audiobooks.md): Narrate a whole book without the per-word meter running. - [YouTube & Video](https://cantari.io/use-cases/youtube-video.md): A voiceover for every upload, without booking a booth. - [Podcasts](https://cantari.io/use-cases/podcasts.md): Intros, ad reads, and pickups in your show's voice. - [E-learning](https://cantari.io/use-cases/e-learning.md): Narrate a whole curriculum without rerecording. - [Game Dev](https://cantari.io/use-cases/game-dev.md): Placeholder lines and barks, generated as you build. - [Advertising](https://cantari.io/use-cases/advertising.md): Test ten reads of a spot before you commit to one. - [Accessibility](https://cantari.io/use-cases/accessibility.md): Turn written content into a clear read-aloud. - [Localization & Dubbing](https://cantari.io/use-cases/localization.md): Carry your audio into another language. - [Articles to Audio](https://cantari.io/use-cases/publishers.md): Every article, listenable, by the time it publishes. - [Corporate Training](https://cantari.io/use-cases/corporate-training.md): Onboarding that updates the day the policy does. - [Language Learning](https://cantari.io/use-cases/language-learning.md): Dialog practice in two languages, from one studio. - [Meditation & Wellness](https://cantari.io/use-cases/meditation-wellness.md): A voice that can go softly, and the sound beneath it. - [News Briefings](https://cantari.io/use-cases/news-briefings.md): A daily audio brief that never misses the 7 a.m. slot. - [Product Demos](https://cantari.io/use-cases/product-demos.md): Walkthrough VO that keeps pace with the release train. ## Blog - [The open voice benchmark, and why we run it](https://cantari.io/blog/the-open-benchmark.md): Vendor demos tell you which engine sounds best in the vendor's hands. We wanted numbers nobody here can tilt. - [How we measure voice latency (and why vendors won't)](https://cantari.io/blog/measuring-voice-latency.md): Wall-clock to the complete audio file, median of three, same script, same gateway. The script is in the repo. - [You own what you make here](https://cantari.io/blog/own-your-voice.md): No meter renting you your own audio, no watermark, no clause that quietly keeps your work. - [Why our free tier has an unlimited engine](https://cantari.io/blog/why-free-drafting-is-unlimited.md): Open weights changed the arithmetic of a draft. We priced the free plan on the arithmetic. - [How we verify every fixed sample on this site](https://cantari.io/blog/verifying-every-fixed-sample.md): Engines intermittently truncate long takes. Every fixed clip here is transcribed back and checked before it ships. ## Key pages - [Engine benchmark](https://cantari.io/benchmark): The open benchmark: third-party quality scores with attribution, plus our own measured latencies. - [Pricing](https://cantari.io/pricing): The plans, and what every engine really costs, so the routing and pricing are checkable. - [Supported formats](https://cantari.io/formats): Every file format the tools read and write, traced to the routes that enforce them. - [About](https://cantari.io/about): Who is behind Cantari and how the product is built. ## Full content - [llms-full.txt](https://cantari.io/llms-full.txt): every guide, format, engine, use-case, and blog page above, serialized as one markdown document.