╔══════════════════════════════════════════════════════════════╗ ║ EXPERIMENT: TTS Voiceover Generation ║ ║ KEY METRIC: 6 scenes, ~1.7MB audio, ~3 min runtime ║ ║ workway.co ║ ╚══════════════════════════════════════════════════════════════╝
ValidatedJanuary 15, 2026
TTS Voiceover Generation
Can AI-generated voiceover replace manual recording for product walkthrough videos?
Hypothesis
Claim: ElevenLabs TTS with SSML break tags, optimized voice settings, and "Nicely Said" script writing can produce natural-sounding voiceovers for product walkthroughs without manual recording.
Success Criteria
- Generate 7 audio scenes covering complete workflow walkthrough
- Audio sounds conversational, not robotic or "announcer-like"
- Script iteration cycle under 5 minutes per adjustment
- Total generation cost under $5
Methodology
What Was Built
- • Screen capture script following "Nicely Said" framework (Fenton/Lee)
- • Node.js script to generate audio via ElevenLabs API
- • 6-scene voiceover for Focus Workflow walkthrough
- • SSML break tags for natural pacing
Iteration Cycles
| Iteration | Change | Result |
|---|---|---|
| 1 | Baseline voice (River) | Too fast, odd inflections |
| 2 | Try different voices (Dakota, Mark, Hope) | Better tone, still unnatural pacing |
| 3 | Rewrite script with natural cadence cues | Significant improvement |
| 4 | Add SSML break tags + speed 0.9 | Natural, conversational delivery |
| 5 | Final voice (Jamahal) + stability 0.55 | Production ready |
Key Learnings: Script Writing for TTS
Avoid
- • Choppy periods: "Task. Duration. Date."
- • Numbered lists: "One: Two: Three:"
- • Em dashes for pauses
- • Brand names without context
Use
- • Flowing commas: "The task, the duration, the date"
- • Word ordinals: "First, Second, Third"
- • SSML breaks: <break time="0.5s" />
- • Spelled-out numbers: "twenty-three"
Results
| Metric | Value |
|---|---|
| Total scenes | 6 |
| Total audio size | ~2 MB |
| Runtime | ~3 minutes |
| Generation time per iteration | ~13 seconds |
| Voices tested | 6 |
| Total iterations | 5 |
Final Configuration
Voice: Jamahal (DTKMou8ccj1ZaWGBiotd) Model: eleven_turbo_v2_5 Settings: stability: 0.55 similarity_boost: 0.75 style: 0.0 speed: 0.9 SSML: <break time="0.3s" /> to <break time="0.7s" />
Listen
The final voiceover for the Focus Workflow walkthrough. Six scenes, ~3 minutes total.
What Focus Does~45s
Working Without Interruption~18s
Completing the Block~25s
Automatic Logging to Notion~30s
The Setup~35s
Close~18s
Voice: Jamahal · Model: eleven_turbo_v2_5 · Speed: 0.9x
Honest Assessment
What This Proves
- • TTS can produce usable voiceover for informational/product content
- • Script writing matters more than voice selection
- • SSML breaks are essential for natural pacing
- • Iteration is fast enough (~13s) to experiment freely
What This Doesn't Prove
- • Doesn't prove TTS works for emotional/storytelling content
- • Doesn't prove this voice works for all demographics
- • Doesn't prove long-form content (30+ minutes) works
Where Intervention Was Needed
- • Script rewriting: Original script had unnatural phrasing for TTS
- • Voice selection: Required listening to multiple options
- • SSML tuning: Break timing required iteration
Reproducibility
Prerequisites
- • ElevenLabs account (Creator tier for premium voices)
- • Node.js 18+
- • API key from ElevenLabs dashboard
Files
workway-platform/
├── scripts/generate-focus-voiceover.js # Generation script
├── docs/FOCUS_WORKFLOW_SCREEN_CAPTURE_SCRIPT.md
└── docs/voiceover-audio/
├── 01-problem.mp3
├── 02-what-focus-does.mp3
├── 03-working.mp3
├── 04-completing.mp3
├── 05-notion.mp3
├── 06-setup.mp3
├── 07-close.mp3
└── VOICEOVER_SCRIPT.mdRun Command
# Set API key export ELEVENLABS_API_KEY=sk_... # Generate audio cd workway-platform node scripts/generate-focus-voiceover.js
Outcome
Hypothesis Validated
ElevenLabs TTS with SSML breaks and optimized script writing produces natural-sounding voiceover suitable for product walkthrough videos.
Evidence: 6 scenes, ~3 minutes of audio, natural conversational delivery achieved in 5 iterations.
Next Steps
- • Record screen capture synced to audio
- • Test with actual users for clarity
- • Create template for future workflow walkthroughs