Welcome to OpenTranscribe
OpenTranscribe is a powerful, self-hosted AI-powered transcription and media analysis platform that turns your audio and video files into searchable, analyzable text with advanced features like speaker identification, AI summarization, and cross-media intelligence.
What is OpenTranscribe?
OpenTranscribe combines state-of-the-art AI models with a modern web interface to provide:
- High-accuracy transcription using WhisperX with word-level timestamps
- Automatic speaker identification with voice fingerprinting across videos
- AI-powered summarization with customizable prompts and LLM integration
- Full-text and semantic search powered by OpenSearch
- Privacy-first processing - everything runs locally on your infrastructure
Key Features
Advanced Transcription
- WhisperX 3.8.1 with faster-whisper BatchedInferencePipeline backend
- Native word-level timestamps via cross-attention DTW (all 100+ languages, no separate alignment model)
- Multi-language support with optional translation to English (requires
large-v3model) - 40x+ realtime speed on GPU (
large-v3-turbodefault, 6x faster thanlarge-v2) - Support for audio (MP3, WAV, FLAC, M4A) and video (MP4, MOV, AVI, MKV)
- Cloud ASR providers: Deepgram, AssemblyAI, OpenAI, Google, AWS, Azure, Speechmatics, Gladia
Smart Speaker Management
- Automatic speaker diarization using PyAnnote.audio
- Cross-video speaker recognition with voice fingerprinting
- LLM-enhanced speaker identification
- Global speaker profiles that persist across transcriptions
- Confidence scoring and manual verification workflow
AI-Powered Features
- LLM summarization with BLUF (Bottom Line Up Front) format
- Support for multiple LLM providers (OpenAI, Claude, vLLM, Ollama, OpenRouter)
- Custom AI prompts for different content types
- Intelligent section-by-section processing for unlimited transcript lengths
- Speaker analytics and interaction patterns
Search & Discovery
- Hybrid search combining BM25 keyword and neural semantic search via OpenSearch ML Commons
- OpenSearch 3.4.0 with ML Commons native neural search and RRF merging
- Advanced filtering by speaker, date, tags, duration
- Collections for organizing related media with group sharing
- Interactive waveform visualization with click-to-seek
Analytics & Insights
- Speaker analytics (talk time, interruptions, pace)
- Meeting efficiency metrics
- Action item extraction
- Cross-video speaker tracking
Why OpenTranscribe?
Open Source & Self-Hosted
- Full control over your data - nothing leaves your infrastructure
- AGPL-3.0 License - open source with network copyleft protection
- No subscription fees - one-time setup, unlimited use
- Privacy-first - ideal for sensitive content (legal, medical, business)
Production-Ready
- Docker-based deployment - runs anywhere
- GPU acceleration - NVIDIA GPUs supported
- Multi-worker architecture - process multiple files in parallel
- Offline capable - works in airgapped environments
Enterprise Security
- Multiple authentication methods - Local, LDAP/AD, OIDC/Keycloak, PKI/X.509
- Multi-factor authentication - TOTP-based MFA with backup codes
- Password policies - Configurable complexity, history, and expiration
- Audit logging - FedRAMP-compliant structured logging
- Account lockout - Progressive lockout after failed attempts
Modern Stack
- Svelte + TypeScript frontend - responsive, PWA-enabled
- FastAPI backend - high-performance Python
- PostgreSQL + OpenSearch - reliable, scalable storage
- Celery workers - distributed background processing
User Interface
- Light and dark mode - Toggle between themes via the sun/moon icon in the navbar, or let the app follow your system preference automatically
- 8 UI languages - Switch the interface language from Settings: English, Spanish, French, German, Portuguese, Chinese, Japanese, and Russian. The app also detects your browser language on first visit
- Grid and list views - Toggle between card-based grid view (with thumbnails) and compact list view in the file gallery using the view toggle button
- Virtual scrolling - Smooth performance when browsing large libraries with thousands of files, loading only visible items
- Progressive Web App (PWA) - Install OpenTranscribe as a standalone app on desktop or mobile from your browser's "Install" or "Add to Home Screen" option for a native app experience
Use Cases
OpenTranscribe is perfect for:
- Meeting transcriptions - Record and analyze team meetings with speaker identification
- Podcast production - Generate transcripts and show notes automatically
- Academic research - Transcribe interviews and lectures for analysis
- Legal & compliance - Accurate transcripts with speaker identification for depositions
- Customer service - Analyze support calls for quality and training
- Content creation - Generate subtitles and content from videos
Quick Look
# Install with one command
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
# Start the application
cd opentranscribe
./opentranscribe.sh start
# Access at http://localhost:5173
Architecture Overview
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Frontend │────▶│ Backend │────▶│ Workers │
│ (Svelte) │ │ (FastAPI) │ │ (Celery) │
└─────────────┘ └──────────────┘ └─────────────┘
│ │
▼ ▼
┌──────────────┐ ┌─────────────┐
│ PostgreSQL │ │ WhisperX │
│ MinIO │ │ PyAnnote │
│ OpenSearch │ │ LLM │
└──────────────┘ └─────────────┘
System Requirements
Minimum:
- 8GB RAM
- 4 CPU cores
- 50GB disk space
- Docker & Docker Compose
Recommended:
- 16GB+ RAM
- 8+ CPU cores
- 100GB+ SSD
- NVIDIA GPU with 8GB+ VRAM (RTX 3070 or better)
Next Steps
Ready to get started? Follow our Quick Start Guide to install OpenTranscribe in minutes.
Or explore:
- Installation Guide - Detailed installation instructions
- Hardware Requirements - Hardware recommendations
- Configuration - Customize your setup
- Authentication - Enterprise authentication options