HuggingFace Token Setup

Speaker diarization in OpenTranscribe requires access to gated PyAnnote models on HuggingFace. This page guides you through obtaining a free token and accepting the necessary model agreements.

Critical Requirement

Speaker diarization will NOT work without a valid HuggingFace token and acceptance of both gated model agreements. Transcription will still work, but speakers will not be identified.

Why is HuggingFace Required?

OpenTranscribe uses PyAnnote.audio for speaker diarization (identifying "who spoke when"). PyAnnote's pre-trained models are hosted on HuggingFace as "gated" repositories, meaning you must:

Create a free HuggingFace account
Accept the model license agreements
Use an access token to download the models

This is a one-time setup process. Once configured, models are cached locally and don't require internet access.

Step 1: Create HuggingFace Account

If you don't already have a HuggingFace account:

Visit https://huggingface.co/join
Sign up with email or GitHub/Google account
Verify your email address

Time required: 2 minutes

Step 2: Generate Access Token

Go to https://huggingface.co/settings/tokens
Click "New token"
Configure the token:
- Name: OpenTranscribe (or any descriptive name)
- Role: Select "Read" (default)
- Description: Optional
Click "Generate token"
IMPORTANT: Copy the token and save it securely (you won't see it again)

Example token format: hf_ followed by random characters

Token Storage

Save your token in a password manager or secure note. You'll need it during OpenTranscribe setup. Tokens don't expire unless you delete them.

Time required: 2 minutes

Step 3: Accept Gated Model Agreements

You must accept the license for BOTH PyAnnote models. This is required for speaker diarization to work.

Model 1: PyAnnote Segmentation 3.0

Visit https://huggingface.co/pyannote/segmentation-3.0
Scroll to the model card
Click "Agree and access repository"
✅ You should see "You have been granted access to this model"

Model 2: PyAnnote Speaker Diarization 3.1

Visit https://huggingface.co/pyannote/speaker-diarization-3.1
Scroll to the model card
Click "Agree and access repository"
✅ You should see "You have been granted access to this model"

Both Models Required

Accepting only one model agreement will result in errors. You must accept BOTH model agreements for speaker diarization to function.

Time required: 2 minutes

Step 4: Configure OpenTranscribe

Quick Install Method

If using the one-liner installer:

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash

The installer will prompt you:

Enter your HuggingFace token (or press Enter to skip): hf_your_token_here

Paste your token and press Enter. The installer will:

Validate the token
Check model access permissions
Download and cache models (~500MB)
Configure the .env file automatically

Manual Install Method

If installing from source:

Copy the environment template:
```
cp .env.example .env
```

Edit .env and add your token:

# Required for speaker diarization
HUGGINGFACE_TOKEN=hf_your_token_here

Start OpenTranscribe:
```
./opentr.sh start dev
```

Models will download automatically on first use (~10-30 minutes).

Verification

Method 1: Check Model Cache

After first transcription with speaker diarization:

# Check if PyAnnote models were downloaded
ls -lh models/torch/pyannote/

# You should see:
# segmentation-3.0/
# speaker-diarization-3.1/

Method 2: Test Transcription

Upload a test file with multiple speakers
Enable speaker diarization
Process the file
Check for speaker labels (Speaker 1, Speaker 2, etc.)

If diarization works, you'll see:

✅ Speaker segments identified
✅ Different speakers color-coded
✅ Speaker analytics in dashboard

Method 3: Check Container Logs

./opentr.sh logs celery-worker | grep -i pyannote

Success indicators:

✅ "Loading PyAnnote segmentation model"
✅ "Loading PyAnnote diarization pipeline"
✅ "Speaker diarization completed successfully"

Error indicators:

❌ "Cannot access gated repository"
❌ "Invalid HuggingFace token"
❌ "Model agreement not accepted"

Troubleshooting

Error: "Cannot access gated repository"

Cause: Model agreement not accepted or token invalid

Solution:

Verify both model agreements accepted (see Step 3)
Check token is correct in .env file
Regenerate token if needed
Restart OpenTranscribe: ./opentr.sh restart

Error: "Invalid HuggingFace token"

Cause: Token format incorrect or expired

Solution:

Verify token starts with hf_
Check for extra spaces or quotes in .env
Regenerate token from HuggingFace settings
Update .env and restart

Models Download on Every Restart

Cause: Model cache not persisting

Solution:

Check MODEL_CACHE_DIR in .env (default: ./models)

Verify directory permissions:

ls -la models/
# Should be owned by user running Docker

Fix permissions:
```
./scripts/fix-model-permissions.sh
```

Slow Model Download

Cause: Large model files (~500MB total)

Solution:

Be patient on first setup (10-30 minutes)
Models are cached permanently after first download
Use wired connection for faster downloads
Check internet speed: https://fast.com

Speaker Diarization Not Working

Checklist:

HuggingFace token configured in .env
Both model agreements accepted
Models downloaded successfully (check logs)
Speaker diarization enabled in UI settings
Audio file has multiple speakers
MIN_SPEAKERS and MAX_SPEAKERS configured correctly

Security Considerations

Token Security

Your HuggingFace token is sensitive information:

✅ DO: Store in .env file (git-ignored)
✅ DO: Use read-only token permissions
✅ DO: Regenerate if compromised
❌ DON'T: Commit to version control
❌ DON'T: Share publicly
❌ DON'T: Use write permissions (unnecessary)

Revoking Access

If your token is compromised:

Go to https://huggingface.co/settings/tokens
Click "Revoke" next to the compromised token
Generate a new token
Update .env with new token
Restart OpenTranscribe

Model Caching

Storage Location

Models are cached at:

${MODEL_CACHE_DIR}/torch/pyannote/

Default: ./models/torch/pyannote/

Disk Space

PyAnnote models require:

Segmentation model: ~250MB
Diarization pipeline: ~250MB
Total: ~500MB

Plus:

WhisperX models: ~1.5GB
Wav2Vec2 alignment: ~360MB
Other models: ~200MB
Grand total: ~2.5GB for all AI models

Offline Use

Once models are downloaded:

✅ No internet required for transcription
✅ Models cached permanently
✅ Works in airgapped environments
❌ Initial download requires internet

Alternative: Offline Installation

For airgapped/offline environments:

Download models on internet-connected machine:

# Set token and download models
export HUGGINGFACE_TOKEN=hf_your_token_here
python3 -c "from pyannote.audio import Model; Model.from_pretrained('pyannote/segmentation-3.0'); Model.from_pretrained('pyannote/speaker-diarization-3.1')"

Copy model cache to offline machine:

# On internet machine
tar -czf pyannote-models.tar.gz ~/.cache/torch/pyannote/

# On offline machine
tar -xzf pyannote-models.tar.gz -C /path/to/opentranscribe/models/torch/

Configure .env on offline machine:

HUGGINGFACE_TOKEN=hf_your_token_here  # Still required
MODEL_CACHE_DIR=./models

See Offline Installation for complete airgapped setup guide.

Quick Reference

URLs

Create Account: https://huggingface.co/join
Token Settings: https://huggingface.co/settings/tokens
Segmentation Model: https://huggingface.co/pyannote/segmentation-3.0
Diarization Model: https://huggingface.co/pyannote/speaker-diarization-3.1

Environment Variables

# Required for speaker diarization
HUGGINGFACE_TOKEN=hf_your_token_here

# Model cache location
MODEL_CACHE_DIR=./models

# Speaker detection range
MIN_SPEAKERS=1
MAX_SPEAKERS=20

Verification Commands

# Check token configured
grep HUGGINGFACE_TOKEN .env

# Check models downloaded
ls -lh models/torch/pyannote/

# Check container logs
./opentr.sh logs celery-worker | grep -i pyannote

Next Steps

Docker Compose Installation - Complete installation guide
GPU Setup - Configure GPU acceleration
First Transcription - Test speaker diarization
Troubleshooting - Fix common issues

Why is HuggingFace Required?​

Step 1: Create HuggingFace Account​

Step 2: Generate Access Token​

Step 3: Accept Gated Model Agreements​

Model 1: PyAnnote Segmentation 3.0​

Model 2: PyAnnote Speaker Diarization 3.1​

Step 4: Configure OpenTranscribe​

Quick Install Method​

Manual Install Method​

Verification​

Method 1: Check Model Cache​

Method 2: Test Transcription​

Method 3: Check Container Logs​

Troubleshooting​

Error: "Cannot access gated repository"​

Error: "Invalid HuggingFace token"​

Models Download on Every Restart​

Slow Model Download​

Speaker Diarization Not Working​

Security Considerations​

Token Security​

Revoking Access​

Model Caching​

Storage Location​

Disk Space​

Offline Use​

Alternative: Offline Installation​

Quick Reference​

URLs​

Environment Variables​

Verification Commands​

Next Steps​

Why is HuggingFace Required?

Step 1: Create HuggingFace Account

Step 2: Generate Access Token

Step 3: Accept Gated Model Agreements

Model 1: PyAnnote Segmentation 3.0

Model 2: PyAnnote Speaker Diarization 3.1

Step 4: Configure OpenTranscribe

Quick Install Method

Manual Install Method

Verification

Method 1: Check Model Cache

Method 2: Test Transcription

Method 3: Check Container Logs

Troubleshooting

Error: "Cannot access gated repository"

Error: "Invalid HuggingFace token"

Models Download on Every Restart

Slow Model Download

Speaker Diarization Not Working

Security Considerations

Token Security

Revoking Access

Model Caching

Storage Location

Disk Space

Offline Use

Alternative: Offline Installation

Quick Reference

URLs

Environment Variables

Verification Commands

Next Steps