Professional AI Audio Production · On-Device

Your words,
broadcast quality.

Phonara Studio turns books, scripts, and podcast episodes into distribution-ready audio — ACX-compliant MP3 audiobooks and podcast episodes, generated entirely on your machine.

Buy Phonara Studio V1

Phonara Studio — Chapter 12 · The Signal

✓ 97% — Validated ⚠ 83% — Review ⚡ Generating… MP3 · ACX Ready

Everything you need

A complete production studio
in a single application.

📖

Smart Import

Import ePub, DOCX, Google Docs, or plain text. Phonara detects chapters automatically and shows word counts so you choose exactly what to produce.

🎤

Voice Cloning

Clone any voice from a 5–30 second reference clip. Up to 4 independent speaker channels per project for multi-voice narration and dialogue.

⚡

Full Automation

Set Save & Continue and walk away. Phonara generates, validates, saves, and advances through every chapter overnight without intervention.

🔍

Whisper Validation

Every paragraph is transcribed by OpenAI Whisper and compared word-for-word. Green, yellow, and red badges instantly surface lines that need a re-take.

🎙

Podcast Studio

Scaffold episodes with Intro, Main, and Outro segments. Pin show defaults once and reuse them across every episode automatically.

🎛

Timeline Editor

Visual waveform timeline with clip trimming, drag-to-reorder, multi-track layout, undo/redo, and real-time playback position tracking.

📡

Distribution Ready

Download produces ACX-compliant MP3 (192kbps) for audiobooks or 128kbps for podcasts — no DAW, no post-processing, no extra steps.

🔒

Fully Local

Everything runs on your hardware. No cloud API keys, no usage fees, no internet connection required after initial model download.

🎨

11 Themes

Sapphire, Midnight, Burgundy, Emerald, Plum, and six more. Dark and light mode. Your workspace, your aesthetic.

Audiobook Production

ACX-compliant.
Upload and go.

Phonara handles the entire technical spec automatically. Every download meets the Audible/ACX audio requirements out of the box.

Format MP3 · 192kbps CBR · Mono

Sample Rate 44.1 kHz

RMS Level −20 dB (ACX: −23 to −18)

Peak −3 dBFS max

Room Tone 1s open · 1s close

Compatible ACX · Audible · KDP · Findaway

ACX Compliance Check

RMS Level−20.1 dB ✓

Peak Level−3.2 dBFS ✓

Noise Floor−62 dB ✓

Bitrate192 kbps CBR ✓

✓ Ready to submit to ACX

🎙 The Signal Podcast — Episode 12

🎵

Intro

Show default · 45 seconds

✓ Pinned

✏

Main Content

Custom · 24 minutes 18 seconds

✓ Done

🎵

Outro

Show default · 30 seconds

✓ Pinned

⚡ Combine Episode MP3 · 128kbps · −16 dB RMS

Podcast Production

Intro. Main. Outro.
Combined. Done.

Phonara manages show-level defaults so your intro and outro record once and reuse forever. Every episode is one click from a submission-ready MP3.

📌

Auto-pin show defaults

First recorded intro/outro becomes the show standard automatically

🔗

Smart episode library

All episodes indexed by show with segment done-states preserved

📦

Distribution-ready MP3

128kbps CBR · 44.1kHz · −16 dB RMS — upload directly to any host

How it works

From manuscript
to marketplace.

Five steps from import to a file ready to upload to ACX, Audible, Spotify, or any podcast host.

Load model & create voice

Select your device in Settings, load the AI model, and upload a 5–30 second reference clip to clone a voice. The speaker tab turns green when ready.

Import your content

Drop an ePub, DOCX, Google Doc, or text file. Phonara detects chapters, shows word counts, and lets you select exactly which sections to produce.

Parse & review

Text is split into paragraph cards. Edit inline, adjust pauses, assign speakers. Every detail is editable before a single byte of audio is generated.

Generate & validate

Click Generate All. Whisper validates every paragraph in real time. Green, yellow, and red badges surface issues. Regenerate any line with one click.

Download & distribute

Click Download. Receive an ACX-compliant MP3 — resampled, loudness-normalized, room tone added — ready to upload without any further processing.

Advanced Control

Expressive syntax,
inline.

Embed non-verbal sounds and pronunciation overrides directly in your text. No post-processing. No audio editor.

Non-Verbal Sounds

[laughter] [sigh] [confirmation-en] [question-en] [question-ah] [question-oh] [surprise-ah] [surprise-oh] [surprise-wa] [dissatisfaction-hnn]

"You're serious?" she said. [question-oh] He laughed. [laughter] "Completely."

CMU Pronunciation Override

The algorithm at Theranos[TH EH R AH N OW S] had been designed to deceive. She worked at MIT[EH M AY T IY] for three years before leaving.

Use the CMU Pronouncing Dictionary to find phoneme strings for any English word. Copy the uppercase sequence in brackets immediately after the word.

Production Output

The right format,
every time.

📖

Audiobook

MP3 · 192kbps CBR

Sample Rate44.1 kHz

RMS Level−20 dB

Peak−3 dBFS

Room Tone1s open + close

PlatformsACX · Audible · KDP

🎙

Podcast

MP3 · 128kbps CBR

Sample Rate44.1 kHz

RMS Level−16 dB

Peak−3 dBFS

Room ToneNone

PlatformsSpotify · Apple · Buzzsprout

⚙

Internal / Preview

Float32 · 24 kHz

UsageBrowser playback

Voice Refsvoice_temp/*.wav

ValidationWhisper input

NoteNot for distribution

Converts toMP3 on download

System Requirements

What you need
to get started.

Phonara runs entirely on your hardware. An NVIDIA GPU dramatically speeds up generation, but CPU mode is fully supported for all platforms.

Windows

Recommended

Full CUDA GPU support · Fastest generation

Linux

Fully Supported

CUDA GPU + CPU · Native performance

macOS

Supported

CPU mode · Apple Silicon MPS planned

Component

Minimum

Recommended

Notes

🐍

Python

Runtime

3.10

3.11

3.12 supported · 3.9 or earlier will not work

🎞

FFmpeg

Audio encoding

6.0

6.1+

Required for MP3 export · libmp3lame must be included

🔍

OpenAI Whisper

Validation engine

any

latest

Installed via pip · Auto-downloads model weights on first use

🔥

PyTorch

AI framework

2.1

2.8 + CUDA 12.8

CPU builds supported · CUDA 11.8+ for GPU acceleration

💾

GPU (NVIDIA)

Optional · strongly recommended

8 GB VRAM

16+ GB VRAM

RTX 3080+ ideal · CPU fallback ~10–20× slower

🖥

RAM

System memory

8 GB

16+ GB

More RAM helps with long chapters and multi-track sessions

💿

Disk Space

Model + project storage

10 GB

20+ GB

~5 GB for model weights · remaining for your episode library

Python Libraries — installed automatically by setup script

flaskWeb server

flask-corsCORS headers

omnivoiceTTS model

openai-whisperValidation

soundfileAudio I/O

scipyResampling

numpyAudio math

torchAI runtime

torchaudioAudio models

transformersHuggingFace

huggingface-hubModel downloads

ebooklibePub parsing

python-docxDOCX parsing

beautifulsoup4HTML parsing

requestsHTTP client

Installation

Up and running
in minutes.

Our platform setup scripts handle every dependency automatically — Python, PyTorch with the right CUDA version, FFmpeg, Whisper, and all libraries. One script, done.

Download & Extract

Download phonara-studio-windows.zip and extract it to your preferred location — e.g. C:\PhonaraStudio\

Run the setup script

Right-click setup_windows.bat and choose Run as administrator. The script will:

Check for Python 3.11 and install via winget if missing
Install FFmpeg via winget (required for MP3 export)
Create an isolated virtual environment in venv\
Detect your GPU and install the correct CUDA PyTorch build
Install all Python dependencies from requirements.txt

Launch Phonara Studio

Double-click start_phonara.bat. The app opens at http://localhost:7000 in your browser automatically.

setup_windows.bat

:: [1/6] Check / install Python 3.11
python --version >nul 2>&1 || (
  winget install --id Python.Python.3.11 --silent
)

:: [2/6] Install FFmpeg
ffmpeg -version >nul 2>&1 || (
  winget install --id Gyan.FFmpeg --silent
)

:: [3/6] Create virtual environment
python -m venv venv
call venv\Scripts\activate.bat

:: [4/6] Detect GPU — correct PyTorch build
nvidia-smi >nul 2>&1 && (
  pip install torch torchaudio ^
    --extra-index-url https://download.pytorch.org/whl/cu128
) || (
  pip install torch torchaudio ^
    --index-url https://download.pytorch.org/whl/cpu
)

:: [5/6] Install all Python dependencies
pip install -r requirements.txt

:: [6/6] Create project folders
mkdir templates episodes voices voice_temp media shows

echo Setup complete — run start_phonara.bat

Download & Extract

Download phonara-studio-linux.tar.gz and extract it to your preferred directory, e.g. ~/PhonaraStudio/

Run the setup script

Open a terminal in the Phonara folder and run the script. It handles:

Installs FFmpeg and Python via apt, dnf, or pacman
Creates an isolated virtualenv in venv/
Detects NVIDIA GPU and installs matching CUDA PyTorch build
Installs all Python dependencies

Launch

Run ./start_phonara.sh. The app opens at http://localhost:7000. Optionally create a systemd service for auto-start on boot.

setup_linux.sh

#!/bin/bash
# [1/6] Install system packages
if command -v apt &>/dev/null; then
  sudo apt update && sudo apt install -y \
    ffmpeg python3 python3-venv
elif command -v dnf &>/dev/null; then
  sudo dnf install -y ffmpeg python3
elif command -v pacman &>/dev/null; then
  sudo pacman -S --noconfirm ffmpeg python
fi

# [3/6] Create virtual environment
python3 -m venv venv && source venv/bin/activate

# [4/6] Detect GPU & install PyTorch
if command -v nvidia-smi &>/dev/null; then
  pip install torch torchaudio \
    --index-url https://download.pytorch.org/whl/cu128
else
  pip install torch torchaudio \
    --index-url https://download.pytorch.org/whl/cpu
fi

# [5/6] Install dependencies
pip install -r requirements.txt

# [6/6] Create project folders
mkdir -p templates episodes voices voice_temp media shows

echo "✓ Done — run ./start_phonara.sh"

Download & Extract

Download phonara-studio-mac.zip and extract it. Open Terminal, cd to the folder, then run: chmod +x setup_mac.sh start_phonara.sh

Run the setup script

Run ./setup_mac.sh. It will:

Install Homebrew if not present
Install Python 3.11 and FFmpeg via brew
Create a virtualenv and install PyTorch (CPU / Apple Silicon MPS)
Install all Python dependencies

⚠ macOS has no NVIDIA CUDA. Generation runs on CPU (~10–20× slower than GPU). Apple Silicon MPS acceleration is planned for a future release.

Launch

Run ./start_phonara.sh. The app opens at http://localhost:7000. On first model load, weights (~5 GB) download once and are cached.

setup_mac.sh

#!/bin/bash
# [1/6] Install Homebrew if missing
if ! command -v brew &>/dev/null; then
  /bin/bash -c "$(curl -fsSL \
    https://brew.sh/install.sh)"
fi

# [2/6] Install Python & FFmpeg
brew install python@3.11 ffmpeg

# [3/6] Create virtual environment
python3.11 -m venv venv
source venv/bin/activate

# [4/6] Install PyTorch (CPU / MPS)
if [ "$(uname -m)" = "arm64" ]; then
  echo "Apple Silicon — MPS available"
fi
pip install torch torchaudio

# [5/6] Install dependencies
pip install -r requirements.txt

# [6/6] Create project folders
mkdir -p templates episodes voices voice_temp media shows

echo "✓ Done — run ./start_phonara.sh"

⏱

First Launch Note — On the very first launch, Phonara downloads the OmniVoice AI model weights from HuggingFace (~5 GB). This happens once and is cached to your models/ folder. Subsequent launches start in seconds with no internet connection required.

Get Phonara Studio

Professional audio.
Your machine. Your rules.

No subscriptions. No cloud fees. No API keys. Run on your GPU, produce ACX-ready audiobooks and podcast-ready episodes, and keep every file.

Buy Phonara Studio V1

Windows · Linux · macOS · NVIDIA GPU recommended · CPU supported

Your words,broadcast quality.

A complete production studioin a single application.

ACX-compliant.Upload and go.

Intro. Main. Outro.Combined. Done.

From manuscriptto marketplace.

Expressive syntax,inline.

The right format,every time.

What you needto get started.

Up and runningin minutes.

Professional audio.Your machine. Your rules.

Your words,
broadcast quality.

A complete production studio
in a single application.

ACX-compliant.
Upload and go.

Intro. Main. Outro.
Combined. Done.

From manuscript
to marketplace.

Expressive syntax,
inline.

The right format,
every time.

What you need
to get started.

Up and running
in minutes.

Professional audio.
Your machine. Your rules.