Speaker Management

OpenTranscribe provides powerful speaker diarization and management features to identify and organize speakers across all your media files.

Speaker Diarization

Automatic speaker detection using PyAnnote.audio identifies different speakers and segments audio by "who spoke when".

Enabling Speaker Diarization

Configure HuggingFace token (required):
- See HuggingFace Setup

Enable in UI settings or .env:

MIN_SPEAKERS=1
MAX_SPEAKERS=20  # Increase for large meetings/conferences

Process files - speakers automatically detected

Speaker Profiles

Creating Speaker Profiles

Speakers are automatically identified as "Speaker 1", "Speaker 2", etc. You can create persistent profiles:

Click on speaker label
Enter speaker name
Save profile

Auto-profile creation: When you label a speaker, OpenTranscribe automatically creates a global profile that can be matched across videos.

Cross-Video Speaker Recognition

OpenTranscribe uses voice fingerprinting to identify the same speaker across different media files:

Voice embeddings analyzed for similarity
High-confidence matches auto-linked
Speaker labels propagate across videos
View all appearances of a speaker

LLM-Powered Identification

If LLM is configured, get AI-powered speaker name suggestions based on:

Conversation context
Topics discussed
Speaking patterns
Professional role indicators

Speaker Analytics

View comprehensive speaker statistics:

Talk Time: Total speaking duration
Word Count: Words spoken
Turn-Taking: Number of speaking turns
Interruptions: Detected interruptions
Speaking Pace: Words per minute
Question Frequency: Questions asked
Cross-Media Appearances: Videos featuring speaker

Managing Speakers

Edit Speaker Labels

Click speaker name in transcript
Edit name
Changes apply to all segments

Merge Speakers

If diarization incorrectly splits one speaker:

Select segments
Assign to same speaker profile
Consolidate analytics

Speaker Verification Status

Track speaker identification confidence:

✅ Verified: Manually confirmed
🤖 AI Suggested: LLM identification
🎯 Auto-Matched: Voice fingerprint match
❓ Unverified: Default detection

Configuration

Adjust Speaker Detection Range

For meetings with many participants:

# .env configuration
MIN_SPEAKERS=2       # Minimum speakers to detect
MAX_SPEAKERS=50      # Maximum speakers (no hard limit)

Note: PyAnnote can handle 50+ speakers for large conferences.

Speaker Display Preferences

Customize in UI settings:

Color coding by speaker
Show/hide speaker analytics
Filter by speaker
Export with speaker labels

Troubleshooting

All Speakers Shown as "Speaker 1"

Causes:

HuggingFace token not configured
Single speaker in audio
Poor audio quality

Solutions:

Verify HuggingFace setup
Check audio has multiple speakers
Ensure clear audio quality

Too Many/Few Speakers Detected

Solutions:

# Adjust detection range
MIN_SPEAKERS=1
MAX_SPEAKERS=30  # Tune based on actual speaker count

Speaker Segments Fragmented

Cause: Diarization split one speaker into multiple

Solution: Manually merge segments to same profile

Best Practices

Label Important Speakers: Create profiles for frequent speakers
Verify AI Suggestions: Review LLM-suggested names
Use Consistent Names: Maintain naming convention
Review Cross-Video Matches: Confirm auto-matched speakers
Adjust Detection Range: Tune MIN/MAX_SPEAKERS for your use case

Speaker Diarization​

Enabling Speaker Diarization​

Speaker Profiles​

Creating Speaker Profiles​

Cross-Video Speaker Recognition​

LLM-Powered Identification​

Speaker Analytics​

Managing Speakers​

Edit Speaker Labels​

Merge Speakers​

Speaker Verification Status​

Configuration​

Adjust Speaker Detection Range​

Speaker Display Preferences​

Troubleshooting​

All Speakers Shown as "Speaker 1"​

Too Many/Few Speakers Detected​

Speaker Segments Fragmented​

Best Practices​

Next Steps​