OpenSearch Neural Search Setup

Neural search enables AI-powered semantic search capabilities in OpenTranscribe, allowing users to find transcripts based on meaning and context rather than just keywords.

Overview

What is Neural Search?

Neural search uses machine learning embeddings to understand the semantic meaning of text. Instead of matching keywords, it finds conceptually similar content across transcripts.

Examples:

Search for "financial reporting procedures" finds mentions of "quarterly earnings calls" and "budget reviews"
Search for "meeting organizer" finds references to "facilitator", "moderator", "lead coordinator"
Search for "action items" finds TODO discussions even if the word "action" isn't used

Why It Matters:

Hybrid search (full-text + neural) provides 9.5x faster vector search
Improves recall by finding related content traditional search misses
Works across synonyms, paraphrases, and different phrasings
Complements full-text search for comprehensive coverage

Hybrid Search

OpenTranscribe combines BM25 full-text search with neural search using Reciprocal Rank Fusion (RRF). This gives you the speed of keyword search plus the intelligence of semantic search.

Why Hybrid Search Beats Either Approach Alone

Keyword search (BM25) excels at exact matches -- names, dates, technical terms -- but fails when users phrase queries differently from the transcript. Neural search catches synonyms and paraphrases but can return false positives for conceptually adjacent but irrelevant content. The hybrid approach ensures both signals contribute:

A document ranked #1 in keyword search and #5 in semantic search scores higher than one ranked #3 in both, because RRF rewards strong signal in either dimension.
Documents appearing in both result lists get a natural boost, surfacing the most confidently relevant matches.
Semantic-only results face additional filtering: the weakest 35% are suppressed, and remaining results must exceed a minimum score threshold. This prevents low-confidence semantic matches from diluting precise keyword results.

In practice, hybrid search eliminates the "I know it's in there but I can't find it" problem common with keyword-only search on conversational transcripts, where speakers rarely use the exact terms a user would search for.

Prerequisites

Before enabling neural search, ensure:

OpenSearch 3.4.0+ - Required for ML Commons plugin
ML Commons Plugin Enabled - For model management and embeddings
Sufficient VRAM: Depends on model selection
- Small models (all-MiniLM-L6-v2): 2-4GB
- Medium models (all-mpnet-base-v2): 4-8GB
- Large models (bge-large-en-v1.5): 8GB+
Available Disk Space: ~500MB-2GB for model storage

Model Download

Models are downloaded automatically on first use. Ensure your OpenSearch container has internet access during initial setup.

Available Models

OpenTranscribe supports three embedding models in different performance tiers:

Tier 1: Small & Fast (Recommended for Most Users)

Model: sentence-transformers/all-MiniLM-L6-v2

Speed: Fastest (2-5ms per embedding)
VRAM: 2-4GB
Dimensions: 384
Quality: Good for most use cases
File Size: ~80MB

Best For:

Production deployments
Budget-conscious setups
Real-time search on large indexes
Most transcription search tasks

Tier 2: Medium & Balanced

Model: sentence-transformers/all-mpnet-base-v2

Speed: Medium (10-20ms per embedding)
VRAM: 4-8GB
Dimensions: 768
Quality: Better semantic understanding
File Size: ~420MB

Best For:

Systems with moderate VRAM
Higher semantic accuracy requirements
Mixed workloads
Specialized terminology

Tier 3: Large & High-Quality

Model: sentence-transformers/bge-large-en-v1.5

Speed: Slower (20-50ms per embedding)
VRAM: 8GB+
Dimensions: 1024
Quality: Highest accuracy, domain-optimized
File Size: ~1.3GB

Best For:

High-accuracy requirements
Advanced systems with ample VRAM
Production systems with performance optimization
Enterprise deployments

Model Comparison

All three models provide semantic search capabilities. Tier 1 (all-MiniLM) offers excellent value for most users. Choose Tier 2 or 3 only if you need higher accuracy and have the VRAM budget.

Model Selection Rationale

The default all-MiniLM-L6-v2 was chosen for several specific reasons:

Latency: At 2-5ms per embedding, it adds negligible overhead to the indexing pipeline and keeps search response times under 50ms even on large indexes. The 768-dim models (mpnet, distilroberta) take 3-4x longer per embedding.
Memory: At 80MB and 384 dimensions, the HNSW vector index stays small. A 10,000-transcript deployment with ~50,000 chunks uses ~73MB of vector index memory with 384-dim vs ~147MB with 768-dim.
Transcript search characteristics: Meeting transcripts are conversational English with limited vocabulary diversity. The accuracy gap between MiniLM-L6 and larger models is smaller on this domain than on academic benchmarks, because the semantic space is narrower.
Multilingual alternative: For non-English deployments, paraphrase-multilingual-MiniLM-L12-v2 provides 50+ language coverage at the same 384-dim footprint (420MB model, same embedding speed tier).

Changing models requires a full reindex of all transcripts, as the vector dimensions and semantic space differ between models.

Configuration Steps

Step 1: Access Admin Settings

Log in as admin
Navigate to Settings → Search Configuration
Look for "Neural Search" section

Step 2: Enable Neural Search

Toggle Enable Neural Search to ON
System will verify OpenSearch connectivity and ML Commons status

ML Commons Check

If ML Commons is disabled, you'll see a warning. Contact your infrastructure team to enable the ML Commons plugin on your OpenSearch cluster.

Step 3: Select Embedding Model

Click Select Model dropdown
Choose from available models:
- all-MiniLM-L6-v2 (Recommended - fastest)
- all-mpnet-base-v2 (Balanced)
- bge-large-en-v1.5 (Highest quality)
System shows VRAM requirements for selected model
Click Validate to verify your system can support the model

Step 4: Configure ML Service

Under ML Service Configuration:
- Embedding Service Endpoint: Auto-detected (usually internal)
- Batch Size: Default 32 (see Performance Tuning)
- Request Timeout: Default 300 seconds
Click Test Connection to verify setup

Step 5: Register Model Endpoint

Click Register Model
System downloads and registers the embedding model
Status indicator shows progress: "Downloading..." → "Registering..." → "Ready"

Initial Setup

First-time model registration takes 5-15 minutes depending on model size and internet speed. Monitor logs in container or check status from admin panel.

Step 6: Verify Model is Working

Once status shows Ready, click Test Embedding
Enter sample text: "This is a test transcript excerpt"
System generates embedding and confirms working
Status badge shows Active

Ready to Index

Once the model shows "Active" status, neural search is ready for use!

Performance Tuning

Batch Size Recommendations

Batch size determines how many texts are embedded simultaneously:

System VRAM	Small Model	Medium Model	Large Model
4GB	8-16	Not recommended	Not recommended
6GB	16-32	8-16	Not recommended
8GB	32-64	16-32	8-16
12GB+	64-128	32-64	16-32

How to Adjust:

Go to Settings → Search Configuration → ML Service Settings
Update Batch Size
Click Save
Recommendation: Start conservative (16) and increase if you have headroom

Hybrid Search Pipeline

Hybrid Search Strategy

OpenTranscribe uses Reciprocal Rank Fusion (RRF) to combine results:

How It Works:

BM25 full-text search returns top matches
Neural search returns semantically similar results
RRF merges rankings for balanced relevance

RRF Formula: score = 1/(60 + rank)

Example:

User searches: "quarterly business review"
BM25 finds: "Q3 earnings call", "financial summary report"
Neural finds: "performance assessment meeting", "progress update discussion"
Merged results show all related content

Tuning RRF Weights:

In settings:

BM25 Weight: How much to value keyword matches (default: 1.0)
Neural Weight: How much to value semantic similarity (default: 1.0)
Adjust weights to emphasize one signal over the other

Hybrid Search Recommendations

Use Case	BM25 Weight	Neural Weight	Reasoning
Specific terms (names, dates)	2.0	1.0	Prioritize exact matches
Conceptual search (ideas, topics)	1.0	2.0	Prioritize meaning
Balanced (default)	1.0	1.0	Equal importance

Reindexing Transcripts

When you enable neural search or change models, you must reindex existing transcripts:

Automatic Batch Indexing

Go to Settings → Search Configuration
Click Reindex All Transcripts
System displays progress:
- Total transcripts to process
- Currently processing count
- Estimated time remaining

Progress Monitoring

Monitor in Admin Dashboard → Background Tasks
View logs: ./opentr.sh logs opensearch
Estimated speed: 100-500 transcripts/hour (model-dependent)

Large Indexes

For thousands of transcripts, reindexing runs in background. Users can continue using search while indexing occurs (searches use both indexed and un-indexed results).

Incremental Indexing

New transcripts are automatically embedded when uploaded. Only use "Reindex All" when:

First enabling neural search
Changing embedding models
Recovering from indexing errors

Frontend Features

User Search Interface

Search Bar Enhancement:

Automatic hybrid search (both full-text and neural)
No user configuration required
Results ranked by RRF relevance

Advanced Search:

Click Advanced Search option
Options:
- Search Type: Full-text only, Neural only, or Hybrid (default)
- Min Score: Filter by relevance (0.0-1.0)
- Filters: Speaker, date, duration, tags

Search Type Selection:

Full-text only: Fast keyword matching (use for specific terms)
Neural only: Semantic matching (use for conceptual searches)
Hybrid (default): Combines both approaches

Model Selection in Settings

Users can see which embedding model is active:

User Settings → Search Preferences:

Current model name and description
Dimensions
Batch indexing progress (if active)

Admin Setting

Model selection is an admin-only feature. Users can only see which model is active and view search preferences.

Backend & Infrastructure

Architecture

Neural Search Pipeline:

Upload Transcript
    ↓
Store in MinIO
    ↓
Index in OpenSearch (BM25 full-text)
    ↓
Embedding Generation (ML Commons)
    ↓
Store Vector Index (HNSW)
    ↓
Ready for Hybrid Search

Key Components:

ML Commons: OpenSearch plugin for model management -- registers, deploys, and serves embedding models server-side so the backend sends raw text and receives vectors without loading models itself
ONNX Runtime: Fast inference engine for embeddings
HNSW: Hierarchical Navigable Small World graph for approximate nearest neighbor search
RRF: Reciprocal Rank Fusion for result merging

HNSW Vector Indexing

The vector index uses HNSW (Hierarchical Navigable Small World) with these parameters:

Parameter	Value	Purpose
`ef_construction`	256	Build-time quality (higher = more accurate index, slower build)
`m`	16	Number of bi-directional links per node (higher = better recall, more memory)
Similarity	Cosine	Distance metric for comparing embeddings

These settings prioritize search recall over index build speed, which is appropriate because transcripts are indexed once (during post-processing) but searched many times. The index is also sorted by file_uuid + chunk_index to optimize the collapse-by-file grouping used in search results.

Server-Side Embeddings via ML Commons

OpenTranscribe generates embeddings server-side within OpenSearch rather than in the backend application. Documents are sent as raw text through an OpenSearch ingest pipeline that automatically calls the deployed ML model to generate vector embeddings during indexing. This eliminates network round-trips for embedding generation, ensures consistency (one model version across all indexed documents), and keeps the embedding model out of the backend's memory footprint.

Database Schema

New tables created for neural search:

search_models - Available models and configurations
embeddings - Cached embeddings for transcripts
neural_index_status - Indexing progress tracking

Existing tables enhanced:

transcript - New has_neural_embedding flag
search_query - New search_type field (full_text, neural, hybrid)

API Endpoints

Admin endpoints (requires authentication):

POST /api/admin/search/enable-neural
GET  /api/admin/search/models
POST /api/admin/search/register-model
POST /api/admin/search/reindex
GET  /api/admin/search/reindex-status

User endpoints:

GET /api/search/hybrid
POST /api/search/validate

Troubleshooting

Issue: Models Not Discovered

Symptoms: Model dropdown appears empty or shows "No models available"

Causes:

OpenSearch not running
ML Commons plugin not enabled
Network connectivity issue

Solutions:

Verify OpenSearch is running:
```
docker compose ps | grep opensearch
```

Check ML Commons enabled:

curl -s http://localhost:5180/_plugins/_ml/models | jq .

Check backend logs:

./opentr.sh logs backend | grep -i "neural\|embedding"

Restart backend:
```
./opentr.sh restart backend
```

Issue: Embedding Generation Errors

Symptoms: Status shows "Error generating embeddings" or "Model failed to load"

Error Messages & Solutions:

Error	Cause	Solution
"Out of memory"	Batch size too large	Reduce batch size (settings → ML Service)
"Model not found"	Model registration failed	Re-register model from settings
"Timeout exceeded"	Model too slow	Increase timeout or select faster model
"CUDA not available"	GPU not detected	Verify GPU setup (see GPU Setup guide)

Debugging Steps:

Check OpenSearch logs:

docker compose logs -f opensearch | grep -i error

Monitor GPU/memory:

nvidia-smi  # GPU memory usage
docker compose top opensearch  # CPU/memory

Test with simple text:
- Go to Settings → Search Configuration
- Click "Test Embedding" with 1-2 word phrase
- Check if it succeeds

Issue: Search Performance Problems

Symptoms: Search queries are slow or timing out

Possible Causes & Solutions:

Symptom	Cause	Solution
Slow hybrid search	Large index + heavy load	Reduce batch size, increase timeouts
Neural-only search slow	Model too large	Switch to smaller model (all-MiniLM)
Timeout errors	Long request queue	Increase timeout in settings
High memory usage	Index too large	Monitor VRAM, consider archiving

Optimization Steps:

Check index size:

curl -s http://localhost:5180/_cat/indices?v | grep transcript

Monitor search performance:
- Track average query time in dashboard
- Compare full-text vs neural vs hybrid performance
Adjust RRF weights:
- If neural results poor: increase BM25 weight
- If keyword search insufficient: increase neural weight

Issue: Memory Issues

Symptoms: "Out of memory", "Allocation failure", or crashes

Quick Fixes:

Reduce batch size:

Settings → Search Configuration → Batch Size = 8 (from default 32)

Switch to smaller model:

Settings → Select Model → all-MiniLM-L6-v2

Increase Docker memory allocation:

# Edit docker-compose.yml for opensearch service
mem_limit: 8g  # Increase from current limit

Clear old embeddings cache:

# Backend endpoint to clear embedding cache
POST /api/admin/search/clear-cache

Issue: Reindexing Stuck

Symptoms: Reindex status shows "In Progress" for hours

Solutions:

Check background task queue:

# Monitor Celery tasks
open http://localhost:5175/flower

Check container resources:
```
docker compose stats opensearch
```

Force cancellation (if needed):

# Backend endpoint to cancel reindex
POST /api/admin/search/cancel-reindex

Restart OpenSearch:
```
./opentr.sh restart opensearch
```

Offline & Airgapped Setup

For environments without internet access, download models on an internet-connected machine first.

Step 1: Download Models on Internet Machine

# Download sentence-transformers models
python3 << 'EOF'
from sentence_transformers import SentenceTransformer

# Download desired model (example: all-MiniLM)
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
print(f"Model cached at: {model.model_path}")

# Models are cached at: ~/.cache/huggingface/hub/
EOF

Step 2: Package Models

# Create archive of downloaded models
tar -czf embedding-models.tar.gz ~/.cache/huggingface/hub/

# Transfer to offline machine via USB, network share, etc.

Step 3: Configure Offline Machine

# Extract models on offline machine
mkdir -p /path/to/opentranscribe/models/huggingface/
tar -xzf embedding-models.tar.gz -C /path/to/opentranscribe/models/huggingface/

# Update .env
echo "HUGGINGFACE_OFFLINE_MODE=true" >> .env

Step 4: Disable Model Downloads

In admin settings:

Settings → Search Configuration
Offline Mode: Toggle ON
System will only use locally cached models
No internet access required

Model Sync

If adding new models to offline machine:

Update models in step 1-2 on internet machine
Transfer updated archive to offline machine
Extract over existing models
Restart OpenTranscribe

Advanced Configuration

Using Custom Models

Advanced Only

Custom model support requires backend code modifications. Only attempt if familiar with ONNX format and Python.

Prerequisites:

Model in ONNX format (preferred) or PyTorch
Dimensions ≥ 256, ≤ 1536
Model card with input/output specifications

Steps:

Save model to: /models/custom-models/your-model-name/

Add to backend config:

# backend/app/core/config.py
CUSTOM_EMBEDDING_MODELS = {
    "custom/your-model": {
        "path": "./models/custom-models/your-model-name/",
        "dimensions": 768,
        "batch_size": 32,
    }
}

Restart backend
Model appears in selection dropdown

Scaling to Large Indexes

For deployments with 10,000+ transcripts:

Recommended Configuration:

Model: all-MiniLM-L6-v2 (fast)
Batch Size: 64-128 (if VRAM allows)
OpenSearch Heap: 4-8GB minimum
Async Indexing: Enable background batch processing

Settings:

Search Configuration → Advanced Settings
- Enable Batch Processing: ON
- Batch Size: 64
- Process Interval: 3600 seconds (1 hour)
- Max Concurrent Tasks: 4

Next Steps

Search & Filters User Guide - How users interact with search
Performance Optimization - Advanced tuning
Troubleshooting - General system issues

Overview​

Why Hybrid Search Beats Either Approach Alone​

Prerequisites​

Available Models​

Tier 1: Small & Fast (Recommended for Most Users)​

Tier 2: Medium & Balanced​

Tier 3: Large & High-Quality​

Model Selection Rationale​

Configuration Steps​

Step 1: Access Admin Settings​

Step 2: Enable Neural Search​

Step 3: Select Embedding Model​

Step 4: Configure ML Service​

Step 5: Register Model Endpoint​

Step 6: Verify Model is Working​

Performance Tuning​

Batch Size Recommendations​

Hybrid Search Pipeline​

Hybrid Search Strategy​

Hybrid Search Recommendations​

Reindexing Transcripts​

Automatic Batch Indexing​

Progress Monitoring​

Incremental Indexing​

Frontend Features​

User Search Interface​

Model Selection in Settings​

Backend & Infrastructure​

Architecture​

HNSW Vector Indexing​

Server-Side Embeddings via ML Commons​

Database Schema​

API Endpoints​

Troubleshooting​

Issue: Models Not Discovered​

Issue: Embedding Generation Errors​

Issue: Search Performance Problems​

Issue: Memory Issues​

Issue: Reindexing Stuck​

Offline & Airgapped Setup​

Step 1: Download Models on Internet Machine​

Step 2: Package Models​

Step 3: Configure Offline Machine​

Step 4: Disable Model Downloads​

Advanced Configuration​

Using Custom Models​

Scaling to Large Indexes​

Next Steps​

Overview

Why Hybrid Search Beats Either Approach Alone

Prerequisites

Available Models

Tier 1: Small & Fast (Recommended for Most Users)

Tier 2: Medium & Balanced

Tier 3: Large & High-Quality

Model Selection Rationale

Configuration Steps

Step 1: Access Admin Settings

Step 2: Enable Neural Search

Step 3: Select Embedding Model

Step 4: Configure ML Service

Step 5: Register Model Endpoint

Step 6: Verify Model is Working

Performance Tuning

Batch Size Recommendations

Hybrid Search Pipeline

Hybrid Search Strategy

Hybrid Search Recommendations

Reindexing Transcripts

Automatic Batch Indexing

Progress Monitoring

Incremental Indexing

Frontend Features

User Search Interface

Model Selection in Settings

Backend & Infrastructure

Architecture

HNSW Vector Indexing

Server-Side Embeddings via ML Commons

Database Schema

API Endpoints

Troubleshooting

Issue: Models Not Discovered

Issue: Embedding Generation Errors

Issue: Search Performance Problems

Issue: Memory Issues

Issue: Reindexing Stuck

Offline & Airgapped Setup

Step 1: Download Models on Internet Machine

Step 2: Package Models

Step 3: Configure Offline Machine

Step 4: Disable Model Downloads

Advanced Configuration

Using Custom Models

Scaling to Large Indexes

Next Steps