HuggingFace Token Setup
Speaker diarization in OpenTranscribe requires access to gated PyAnnote models on HuggingFace. This page guides you through obtaining a free token and accepting the necessary model agreements.
Speaker diarization will NOT work without a valid HuggingFace token and acceptance of both gated model agreements. Transcription will still work, but speakers will not be identified.
Why is HuggingFace Required?
OpenTranscribe uses PyAnnote.audio for speaker diarization (identifying "who spoke when"). PyAnnote's pre-trained models are hosted on HuggingFace as "gated" repositories, meaning you must:
- Create a free HuggingFace account
- Accept the model license agreements
- Use an access token to download the models
This is a one-time setup process. Once configured, models are cached locally and don't require internet access.
Step 1: Create HuggingFace Account
If you don't already have a HuggingFace account:
- Visit https://huggingface.co/join
- Sign up with email or GitHub/Google account
- Verify your email address
Time required: 2 minutes
Step 2: Generate Access Token
- Go to https://huggingface.co/settings/tokens
- Click "New token"
- Configure the token:
- Name:
OpenTranscribe(or any descriptive name) - Role: Select "Read" (default)
- Description: Optional
- Name:
- Click "Generate token"
- IMPORTANT: Copy the token and save it securely (you won't see it again)
Example token format: hf_ followed by random characters
Save your token in a password manager or secure note. You'll need it during OpenTranscribe setup. Tokens don't expire unless you delete them.
Time required: 2 minutes
Step 3: Accept Gated Model Agreements
You must accept the license for BOTH PyAnnote models. This is required for speaker diarization to work.
Model 1: PyAnnote Segmentation 3.0
- Visit https://huggingface.co/pyannote/segmentation-3.0
- Scroll to the model card
- Click "Agree and access repository"
- ✅ You should see "You have been granted access to this model"
Model 2: PyAnnote Speaker Diarization 3.1
- Visit https://huggingface.co/pyannote/speaker-diarization-3.1
- Scroll to the model card
- Click "Agree and access repository"
- ✅ You should see "You have been granted access to this model"
Accepting only one model agreement will result in errors. You must accept BOTH model agreements for speaker diarization to function.
Time required: 2 minutes
Step 4: Configure OpenTranscribe
Quick Install Method
If using the one-liner installer:
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
The installer will prompt you:
Enter your HuggingFace token (or press Enter to skip): hf_your_token_here
Paste your token and press Enter. The installer will:
- Validate the token
- Check model access permissions
- Download and cache models (~500MB)
- Configure the
.envfile automatically
Manual Install Method
If installing from source:
-
Copy the environment template:
cp .env.example .env -
Edit
.envand add your token:# Required for speaker diarization
HUGGINGFACE_TOKEN=hf_your_token_here -
Start OpenTranscribe:
./opentr.sh start dev
Models will download automatically on first use (~10-30 minutes).
Verification
Method 1: Check Model Cache
After first transcription with speaker diarization:
# Check if PyAnnote models were downloaded
ls -lh models/torch/pyannote/
# You should see:
# segmentation-3.0/
# speaker-diarization-3.1/
Method 2: Test Transcription
- Upload a test file with multiple speakers
- Enable speaker diarization
- Process the file
- Check for speaker labels (Speaker 1, Speaker 2, etc.)
If diarization works, you'll see:
- ✅ Speaker segments identified
- ✅ Different speakers color-coded
- ✅ Speaker analytics in dashboard
Method 3: Check Container Logs
./opentr.sh logs celery-worker | grep -i pyannote
Success indicators:
✅ "Loading PyAnnote segmentation model"
✅ "Loading PyAnnote diarization pipeline"
✅ "Speaker diarization completed successfully"
Error indicators:
❌ "Cannot access gated repository"
❌ "Invalid HuggingFace token"
❌ "Model agreement not accepted"
Troubleshooting
Error: "Cannot access gated repository"
Cause: Model agreement not accepted or token invalid
Solution:
- Verify both model agreements accepted (see Step 3)
- Check token is correct in
.envfile - Regenerate token if needed
- Restart OpenTranscribe:
./opentr.sh restart
Error: "Invalid HuggingFace token"
Cause: Token format incorrect or expired
Solution:
- Verify token starts with
hf_ - Check for extra spaces or quotes in
.env - Regenerate token from HuggingFace settings
- Update
.envand restart
Models Download on Every Restart
Cause: Model cache not persisting
Solution:
- Check
MODEL_CACHE_DIRin.env(default:./models) - Verify directory permissions:
ls -la models/
# Should be owned by user running Docker - Fix permissions:
./scripts/fix-model-permissions.sh
Slow Model Download
Cause: Large model files (~500MB total)
Solution:
- Be patient on first setup (10-30 minutes)
- Models are cached permanently after first download
- Use wired connection for faster downloads
- Check internet speed: https://fast.com
Speaker Diarization Not Working
Checklist:
- HuggingFace token configured in
.env - Both model agreements accepted
- Models downloaded successfully (check logs)
- Speaker diarization enabled in UI settings
- Audio file has multiple speakers
- MIN_SPEAKERS and MAX_SPEAKERS configured correctly
Security Considerations
Token Security
Your HuggingFace token is sensitive information:
- ✅ DO: Store in
.envfile (git-ignored) - ✅ DO: Use read-only token permissions
- ✅ DO: Regenerate if compromised
- ❌ DON'T: Commit to version control
- ❌ DON'T: Share publicly
- ❌ DON'T: Use write permissions (unnecessary)
Revoking Access
If your token is compromised:
- Go to https://huggingface.co/settings/tokens
- Click "Revoke" next to the compromised token
- Generate a new token
- Update
.envwith new token - Restart OpenTranscribe
Model Caching
Storage Location
Models are cached at:
${MODEL_CACHE_DIR}/torch/pyannote/
Default: ./models/torch/pyannote/
Disk Space
PyAnnote models require:
- Segmentation model: ~250MB
- Diarization pipeline: ~250MB
- Total: ~500MB
Plus:
- WhisperX models: ~1.5GB
- Wav2Vec2 alignment: ~360MB
- Other models: ~200MB
- Grand total: ~2.5GB for all AI models
Offline Use
Once models are downloaded:
- ✅ No internet required for transcription
- ✅ Models cached permanently
- ✅ Works in airgapped environments
- ❌ Initial download requires internet
Alternative: Offline Installation
For airgapped/offline environments:
-
Download models on internet-connected machine:
# Set token and download models
export HUGGINGFACE_TOKEN=hf_your_token_here
python3 -c "from pyannote.audio import Model; Model.from_pretrained('pyannote/segmentation-3.0'); Model.from_pretrained('pyannote/speaker-diarization-3.1')" -
Copy model cache to offline machine:
# On internet machine
tar -czf pyannote-models.tar.gz ~/.cache/torch/pyannote/
# On offline machine
tar -xzf pyannote-models.tar.gz -C /path/to/opentranscribe/models/torch/ -
Configure
.envon offline machine:HUGGINGFACE_TOKEN=hf_your_token_here # Still required
MODEL_CACHE_DIR=./models
See Offline Installation for complete airgapped setup guide.
Quick Reference
URLs
- Create Account: https://huggingface.co/join
- Token Settings: https://huggingface.co/settings/tokens
- Segmentation Model: https://huggingface.co/pyannote/segmentation-3.0
- Diarization Model: https://huggingface.co/pyannote/speaker-diarization-3.1
Environment Variables
# Required for speaker diarization
HUGGINGFACE_TOKEN=hf_your_token_here
# Model cache location
MODEL_CACHE_DIR=./models
# Speaker detection range
MIN_SPEAKERS=1
MAX_SPEAKERS=20
Verification Commands
# Check token configured
grep HUGGINGFACE_TOKEN .env
# Check models downloaded
ls -lh models/torch/pyannote/
# Check container logs
./opentr.sh logs celery-worker | grep -i pyannote
Next Steps
- Docker Compose Installation - Complete installation guide
- GPU Setup - Configure GPU acceleration
- First Transcription - Test speaker diarization
- Troubleshooting - Fix common issues