Skip to content

Text-to-Speech

Turn text into natural-sounding speech without a cloud voice API. PortBay runs the Kokoro text-to-speech model on your Mac through the same portbay-stt sidecar that powers Speech-to-Text — pick a voice, type, and play it back. Synthesized clips can be replayed instantly and exported as .wav.

Everything runs on your Mac: the text, the model, and the audio.

Quickstart

  1. Open AI in the sidebar (/ai) and go to the PlaygroundText to Speech tab (or jump straight there with /ai?playground=tts).
  2. The first time, click Download voice model to fetch Kokoro into PortBay's models folder.
  3. Choose a Voice, type some Text, and click Speak.
  4. The clip plays automatically. Replay plays it again without re-synthesizing; Export .wav saves it.

How it works

PortBay hands your text and the chosen voice to the bundled speech sidecar, which synthesizes the audio on-device and returns a WAV clip. Nothing is streamed to a server.

ControlWhat it does
VoicePick from the model's voices, grouped American / British × female / male.
TextWhat to speak.
SpeakSynthesize and play. While the text and voice are unchanged, the button becomes Replay and plays the cached clip without re-synthesizing.
Export .wavSave the last synthesized clip as a .wav file.

Editing the text or switching the voice flips Replay back to Speak — the cached clip no longer matches the inputs, so PortBay re-synthesizes.

Where the model lives

The Kokoro voice model is downloaded once and shares the AI-models root with Ollama and image models — set AI → Configuration → Models directory once and all three live together. You can also download and manage it from the Models → Text-to-Speech family; it then appears in the same installed-models list as your speech-to-text and Ollama models.

Reference

Requirements

Needs
macOS 14+, the bundled portbay-stt sidecar, and the downloaded Kokoro voice model.

Text-to-Speech is macOS-only (it shares the speech sidecar with Speech-to-Text). If the sidecar or model is missing, the panel says so rather than failing silently.

Commands

The playground drives these Tauri commands; they're the same surface the rest of the app uses:

CommandPurpose
tts_overviewThe model catalog, install state, and available voices.
tts_download_modelDownload the Kokoro voice model (streams progress).
tts_speakSynthesize one clip from text + voice and return WAV.

Troubleshooting

SymptomLikely causeNext action
The tab shows only "Download voice model"The Kokoro model isn't installed yetClick Download voice model and wait for the progress bar to finish.
No catalog / "No text-to-speech models"The sidecar couldn't report a catalogReinstalling PortBay restores the bundled sidecar; Text-to-Speech needs macOS 14+.
Speak is disabledThe text box is emptyType something to synthesize.
The clip won't replay after I edit the textReplay only re-plays the exact last clipThat's expected — once text or voice changes, click Speak to synthesize the new version.
Was this helpful?
Feedback

PortBay is pre-MVP software. Use the docs as an operating guide, not a stability guarantee.