Skip to main content

Backup Completeness Audit

This page is an honest, store-by-store assessment of OpenTranscribe's backup coverage as currently shipped. It complements the how-to in Backup & Restore: that page tells you how to run a backup; this page tells you what is and isn't protected and where you must act yourself.

The one thing most people get wrong

A database backup is worthless without the encryption keys. OpenTranscribe encrypts secrets (user API keys, the S3 backup secret, watch-source credentials, email passwords, MFA secrets) into the database using a key that lives in .env, not in the database. If you back up the database but lose .env, those columns are permanently undecryptable — and if your scheduled backups are gpg-encrypted, the backup itself is unrecoverable without its passphrase. Back up .env (and any gpg passphrase) separately from, and as carefully as, the database. See Configuration & Secrets below.

At a glance

StoreWhat's protected todayGapSeverityRecommendation
PostgreSQL (users, transcripts, segments, speakers, settings)In-app scheduled pg_dump -Fc (GFS retention, optional gpg) to a local mount or S3-compatible bucket; manual ./opentr.sh backup [--encrypt]; restore documentedRestore is documented but not automatically verified (no scheduled restore drill / checksum)LowRun the quarterly restore drill in Backup & Restore. Good as shipped.
MinIO media (~484 GB, irreplaceable originals)Not in the in-app backup system. Protected only by host RAID/NAS, which is not a backup (no offsite copy, no point-in-time recovery, no protection from deletion/ransomware/bit-rot)No automated, off-host, point-in-time copy of the irreplaceable mediaHigh (co-critical with Postgres)Add an off-host media copy: mc mirror to another box/drive on a schedule, and/or S3 bucket versioning + an offsite replica. See MinIO media.
OpenSearch (search + vector indices)Optional in-app fs snapshot alongside each dump (backup.include_opensearch); fully rebuildable from Postgres via reindexNone that matters — derived dataLowLeave snapshots off unless you want to skip reindex time on restore. Confirmed adequate.
Configuration & Secrets (.env: ENCRYPTION_KEY, JWT_SECRET_KEY, DB/MinIO creds; gpg passphrase)Nothing automated. Keys are environment-sourced and are not part of any backup artifactA DB backup is undecryptable without these keys; they are the single point of total data lossCriticalBack up .env + gpg passphrase to a separate secure location (password manager / secrets vault). See §4.
Redis (Celery broker/cache)Nothing — by designNoneNoneEphemeral. Tasks re-queue (acks-late). No backup needed. Confirmed.
Model cache (~2.5 GB AI weights)Nothing — by designNoneNoneRe-downloaded on first use. Back up only for air-gapped installs.
Backup-failure visibilitybackup.last_result is recorded and readable on the admin Backups pageA silently failing scheduled backup is not surfaced via metric, notification, or banner — you only see it if you go lookMediumEmit a Prometheus gauge + a notification on failure. See §6.

1. PostgreSQL — adequate

The relational store (every user, transcript, segment, speaker, and setting) is the authoritative state of the system and is well covered:

  • Scheduled, in-app: backend/app/services/backup_service.py runs pg_dump --format=custom from the worker on the existing celery-beat schedule (no host cron), applies grandfather-father-son retention, optionally gpg-encrypts (AES-256), and writes to either a mounted folder or an S3-compatible bucket — the latter already gets the dump off the host.
  • Manual: ./opentr.sh backup [--encrypt] and ./opentr.sh restore <file>.
  • Restore is documented for plain SQL, gzip, and custom-format dumps, including a full from-scratch disaster-recovery runbook.

Gap: restore is documented but not automatically verified. An untested backup is a hypothesis, not a backup. Recommendation: schedule the quarterly restore drill in Testing Backups. Severity: Low.

2. MinIO media — the ~484 GB gap

The uploaded audio/video originals in MinIO are irreplaceable — unlike OpenSearch they cannot be rebuilt from anything else. They are co-critical with PostgreSQL: losing either leaves you with half a system.

The in-app scheduler does not back up media (it is a pg_dump + optional OpenSearch snapshot only). Today the media is protected solely by host-level RAID/NAS. RAID is not a backup — it survives a disk failure but not an accidental/malicious delete, a bad migration, ransomware, bit-rot, or loss of the whole machine.

Options (assessed, not yet built):

  • mc mirror to a second location — incremental copy of the media bucket to another machine, an external drive, or a remote S3 endpoint. Simplest path to an off-host copy; for write-once media the steady-state delta is tiny. This is the most direct fix and pairs naturally with the existing celery-beat schedule.
  • S3 bucket versioning — turns deletes/overwrites into recoverable previous versions. Near-zero steady-state cost for write-once video. (David is still evaluating this; it is complementary to — not a substitute for — an off-host copy, since versioning still lives in one bucket on one machine.)
  • S3 replication — bucket-to-bucket replication to a second provider/region for a true offsite second copy.

Recommendation: add an automated off-host media mirror (mirror and/or replication), and turn on versioning for deletion protection. Until that ships, mirror manually with mc mirror per Backup & Restore → MinIO. Severity: High.

3. OpenSearch — adequate (derived data)

Every search and vector index is rebuildable from PostgreSQL via the reindex tasks, so OpenSearch is not a data-safety concern. The in-app scheduler can optionally take an fs snapshot beside each dump (backup.include_opensearch) purely to skip reindex time on restore. Leave it off and nothing is lost. Confirmed adequate. Severity: Low.

4. Configuration & Secrets — the sneaky-critical gap

This is the audit's most important finding.

How keys are sourced. backend/app/core/config.py reads ENCRYPTION_KEY and JWT_SECRET_KEY from the environment (i.e. .env), with insecure built-in defaults that only trigger a warning:

JWT_SECRET_KEY: str = os.getenv("JWT_SECRET_KEY", "this_should_be_changed_in_production")
ENCRYPTION_KEY: str = os.getenv("ENCRYPTION_KEY", "this_should_be_changed_in_production_for_api_key_encryption")

What the encryption key protects. backend/app/utils/encryption.py derives an AES-256-GCM key (PBKDF2-SHA256, 600k iterations) from ENCRYPTION_KEY and encrypts every sensitive column into the database, including:

  • user-configured LLM / ASR API keys,
  • the S3 backup secret key (backup.s3_secret_key — yes, the backup destination's own credential),
  • watch-source S3 secrets and SMB passwords (encrypted_s3_secret_key, encrypted_smb_password),
  • email SMTP / M365 / Exchange passwords,
  • MFA secrets.

The trap. These ciphertexts live in the database, but the key that decrypts them lives in .env. A database backup does not contain the key. If you restore a database onto a new host with a different (or default) ENCRYPTION_KEY, every encrypted column is permanently undecryptable — users must re-enter every API key and credential, and any data that depended on those secrets is lost. The same applies to JWT_SECRET_KEY for session continuity. And if your scheduled backups are gpg-encrypted, the gpg passphrase is a second key with the same property: lose it and the backup file itself is unrecoverable.

Is any of this backed up? No. The keys are not part of any backup artifact the product produces. The how-to docs mention copying .env, but there is no automated protection and no prominent warning that the DB backup is inert without it.

Recommendation. Treat .env (specifically ENCRYPTION_KEY and JWT_SECRET_KEY) and any gpg backup passphrase as first-class backup artifacts: store them in a password manager or secrets vault, separately from the database dumps (so a single compromised location can't expose both), and verify them as part of every restore drill. Severity: Critical — this is almost always the biggest real-world gap.

5. Redis — no backup needed (confirmed)

Redis is the Celery broker and a cache. Tasks are dispatched with acks-late, so in-flight work re-queues after a restart; cached values regenerate. Redis state is ephemeral by design and intentionally excluded from backups. No action. Severity: None.

6. Backup-failure visibility

The scheduled backup records its outcome in backup.last_result (status, error, duration), which the admin Backups page reads on demand. But a failing scheduled backup is not surfaced anywhere proactively — no Prometheus metric, no notification, no UI banner. A silently failing backup is worse than no backup, because it creates false confidence.

Recommendation: on each scheduled run, (a) emit a Prometheus gauge (e.g. opentranscribe_backup_last_success_timestamp_seconds / ..._last_status) from backend/app/core/metrics.py so the existing Grafana/Prometheus stack can alert on "no successful backup in N hours", and (b) send a notification when last_result.ok is false. Severity: Medium.

3-2-1 for OpenTranscribe

The industry baseline is 3-2-1: 3 copies of your data, on 2 different media, with 1 copy offsite. Mapped onto OpenTranscribe:

3-2-1 elementHow to satisfy it
3 copies(1) live data in Postgres + MinIO; (2) the scheduled pg_dump + a media mirror; (3) a second, independent copy of both (e.g. the S3 backup destination on a different box, plus an mc mirror target).
2 mediaDon't keep every copy on the same RAID array. Use the host volume and a different machine / external drive / object store.
1 offsitePoint the in-app S3 destination (and a media mirror/replica) at a bucket on a different machine or provider, so a fire/theft/ransomware event on the primary host can't take the backups with it. The in-app S3 destination already makes this one config change away for Postgres.

Plus the cross-cutting keys. 3-2-1 covers your data; it does not automatically cover the encryption keys that make that data usable. Back up .env (ENCRYPTION_KEY + JWT_SECRET_KEY) and any gpg passphrase alongside your 3-2-1 strategy, in a separate secure location. A perfect 3-2-1 of an undecryptable database is still total data loss.

Where OpenTranscribe stands today: the in-app S3 backup destination gets you most of the way to 1 offsite for the database. The remaining gaps to a real 3-2-1 are (a) an off-host media copy (§2), and (b) a deliberate, separate backup of the keys (§4).