Skip to content
New · the open voice benchmark is liveRead it
cantari
LiveTranscribe

Turn recordings into clean, readable text.

Upload audio and get an accurate transcript you can search, edit, and repurpose. Built on a Whisper-class transcription model, wired into the same one-studio workflow as the rest of Cantari.

Painted transcription desk with a typewriter and tape recorder
Transcript

So the first thing we noticed was the latency dropping, and that changed the whole plan.

00:14 - 00:21
The payoff

Watch the audio become text.

This is a staged replay of a real transcription, sped up so you can see the shape of the output: accurate words, clean punctuation, plain text ready to use anywhere.

Transcribed by a Whisper-class model. Your files in, your text out, nothing stored that you do not save.

EPISODE-NOTES.M4A · 03:24Transcript

Okay, quick plan for the next episode before I lose the thread. I want to open with the lighthouse story, because it sets the tone for everything that follows.

The middle section is the interview cut. Keep the part about learning to listen before you ever press record, and trim the rest down to the strongest two minutes.

For the close, come back to the lighthouse. One line, no music underneath, and let it sit for a beat before the outro.

Note to self: the room echoes after ten, so book the small studio next time. That is everything for tonight.

CopyDownload .txtSave to library
How it works

From a blank script to audio you own.

Step 1: Upload your audio

Bring a recording, an interview, or a voice note, or record one straight in the studio. We handle the common formats.

Step 2: Transcribe

A Whisper-class model turns speech into accurate text in seconds, in the same studio.

Step 3: Take the text

Copy the transcript, download it as a .txt file, or save it with its source audio to your library.

Capabilities

What Speech to Text gives you.

Accurate transcripts

Built on a Whisper-class transcription model we already use in the studio, so quality keeps pace as those models improve.

Searchable text

Turn an hour of audio into text you can scan and quote in seconds, instead of scrubbing a timeline.

One workflow

Pairs with Text to Speech so you can transcribe, edit, and re-voice without leaving Cantari.

Live now

Speech to Text is generating in the studio today.

Speech to Text is live today. Upload a file or record straight into the studio and a Whisper-class model returns your transcript in seconds, ready to copy, download, or save to your library.

Engine connectedTool live
Open Speech to Text
Questions

The honest answers.

What Speech to Text can and cannot do today, in plain language.

Can I use this today?
Yes. Speech to Text is live: upload a recording or capture one in the browser, and a clean transcript comes back in seconds, ready to copy, download as .txt, or save to your library. The free plan covers it.
What powers it?
A Whisper-class transcription model, the same transcription step our dubbing pipeline runs on. It returns plain text today; word-level timestamps are not here yet, and we say that plainly rather than pretend.
What can I upload?
mp3, wav, m4a, webm, ogg, or flac, up to 25 MB per file. Clear, single-speaker audio transcribes best, and you can skip the file entirely by recording straight in the studio.
What does it cost?
It draws on the same flat monthly allowance as everything else in Cantari, measured by the length of the transcript. No credit meter, and nothing extra to unlock.

Start generating with Speech to Text.

Free to start, no credit meter. Open the console and hear it for yourself.