╔══════════════════════════════════════════════════════════════╗
║  EXPERIMENT: TTS Voiceover Generation                        ║
║  KEY METRIC: 6 scenes, ~1.7MB audio, ~3 min runtime          ║
║  workway.co                                                  ║
╚══════════════════════════════════════════════════════════════╝

ValidatedJanuary 15, 2026

TTS Voiceover Generation

Can AI-generated voiceover replace manual recording for product walkthrough videos?

Hypothesis

Claim: ElevenLabs TTS with SSML break tags, optimized voice settings, and "Nicely Said" script writing can produce natural-sounding voiceovers for product walkthroughs without manual recording.

Success Criteria

Generate 7 audio scenes covering complete workflow walkthrough
Audio sounds conversational, not robotic or "announcer-like"
Script iteration cycle under 5 minutes per adjustment
Total generation cost under $5

Methodology

What Was Built

• Screen capture script following "Nicely Said" framework (Fenton/Lee)
• Node.js script to generate audio via ElevenLabs API
• 6-scene voiceover for Focus Workflow walkthrough
• SSML break tags for natural pacing

Iteration Cycles

Iteration	Change	Result
1	Baseline voice (River)	Too fast, odd inflections
2	Try different voices (Dakota, Mark, Hope)	Better tone, still unnatural pacing
3	Rewrite script with natural cadence cues	Significant improvement
4	Add SSML break tags + speed 0.9	Natural, conversational delivery
5	Final voice (Jamahal) + stability 0.55	Production ready

Key Learnings: Script Writing for TTS

Avoid

• Choppy periods: "Task. Duration. Date."
• Numbered lists: "One: Two: Three:"
• Em dashes for pauses
• Brand names without context

Use

• Flowing commas: "The task, the duration, the date"
• Word ordinals: "First, Second, Third"
• SSML breaks: <break time="0.5s" />
• Spelled-out numbers: "twenty-three"

Results

Metric	Value
Total scenes	6
Total audio size	~2 MB
Runtime	~3 minutes
Generation time per iteration	~13 seconds
Voices tested	6
Total iterations	5

Final Configuration

Voice: Jamahal (DTKMou8ccj1ZaWGBiotd)
Model: eleven_turbo_v2_5
Settings:
  stability: 0.55
  similarity_boost: 0.75
  style: 0.0
  speed: 0.9
  
SSML: <break time="0.3s" /> to <break time="0.7s" />

Listen

The final voiceover for the Focus Workflow walkthrough. Six scenes, ~3 minutes total.

What Focus Does~45s

Working Without Interruption~18s

Completing the Block~25s

Automatic Logging to Notion~30s

The Setup~35s

Close~18s

Voice: Jamahal · Model: eleven_turbo_v2_5 · Speed: 0.9x

Honest Assessment

What This Proves

• TTS can produce usable voiceover for informational/product content
• Script writing matters more than voice selection
• SSML breaks are essential for natural pacing
• Iteration is fast enough (~13s) to experiment freely

What This Doesn't Prove

• Doesn't prove TTS works for emotional/storytelling content
• Doesn't prove this voice works for all demographics
• Doesn't prove long-form content (30+ minutes) works

Where Intervention Was Needed

• Script rewriting: Original script had unnatural phrasing for TTS
• Voice selection: Required listening to multiple options
• SSML tuning: Break timing required iteration

Reproducibility

Prerequisites

• ElevenLabs account (Creator tier for premium voices)
• Node.js 18+
• API key from ElevenLabs dashboard

Files

workway-platform/
├── scripts/generate-focus-voiceover.js   # Generation script
├── docs/FOCUS_WORKFLOW_SCREEN_CAPTURE_SCRIPT.md
└── docs/voiceover-audio/
    ├── 01-problem.mp3
    ├── 02-what-focus-does.mp3
    ├── 03-working.mp3
    ├── 04-completing.mp3
    ├── 05-notion.mp3
    ├── 06-setup.mp3
    ├── 07-close.mp3
    └── VOICEOVER_SCRIPT.md

Run Command

# Set API key
export ELEVENLABS_API_KEY=sk_...

# Generate audio
cd workway-platform
node scripts/generate-focus-voiceover.js

Outcome

Hypothesis Validated

ElevenLabs TTS with SSML breaks and optimized script writing produces natural-sounding voiceover suitable for product walkthrough videos.

Evidence: 6 scenes, ~3 minutes of audio, natural conversational delivery achieved in 5 iterations.

Next Steps

• Record screen capture synced to audio
• Test with actual users for clarity
• Create template for future workflow walkthroughs