On-Prem Disaster Recovery SOP
This document provides disaster recovery procedures for Unstract and LLMWhisperer on-premise deployments. It covers recovery from AZ or region-wide failures using a cold standby approach. Cold standby is recommended since Unstract is usually business critical but not mission critical.
| Objective | Target |
|---|---|
| Recovery Time Objective (RTO) | 6 hours |
| Recovery Point Objective (RPO) | 24 hours |
1. Scope
- Unstract platform recovery
- LLMWhisperer service recovery
- Configuration and data restoration
- Application-specific validation
Customer Responsibilities
- Infrastructure provisioning in DR region
- Database backup/restore operations
- Network, SSL and DNS configuration
- Cloud Object Storage replication setup (required for Prompt Studio sample files to be restored)
Data Loss Considerations
| Component | Impact of 24-hour RPO |
|---|---|
| Processed documents | May need re-processing |
| Workflows in progress | Will remain in limbo state, can be retriggered |
| Configuration changes | Reverted to last backup |
| LLMWhisperer APIs | In-flight requests lost, must be retried |
| Usage data | Potential mismatch with vendor records |
2. Prerequisites
Documentation Requirements
- Current
values.yamlandsecret.yamlfiles (version controlled) -
artifact-key.jsonfor image registry access - Deployment documentation reference
- DR region infrastructure specifications
Infrastructure Prerequisites
Customer must have the ability to provision within 4 hours:
- Recreate the same infrastructure required as per the deployment docs
- Restore the backed up database to a new instance, or use a cross-region replica if available
- For cloud object storage, use the replicated storage
Backup Requirements
- Database automated backups (daily minimum)
- Cloud Object Storage cross-region replication active
- Configuration files in version control or another secure location
3. Backup Procedures
Automated Backups (Customer Managed)
- Databases: Daily automated snapshots with 7-day retention
- Cloud Object Storage: Real-time cross-region replication
- Monitoring: Backup success/failure alerts
Kubernetes Resource Backup (Optional)
Kubernetes resources do not need to be backed up as they can be restored using Helm. However, if you have directly made resource changes in Kubernetes or added additional resources, you may choose to back them up.
4. Recovery Procedures
Phase 1: Environment Preparation (2 hours)
Infrastructure Provisioning
Customer Action Required — Provision the following in the DR region:
- Kubernetes cluster matching production specs
- PostgreSQL instances (2 databases)
- Cloud Storage buckets with replication
- Load balancers and networking
- DNS entries (can be updated later)
Database Restoration
Customer Action Required:
- Restore both PostgreSQL databases from latest backup
- Verify connectivity and update endpoints
Phase 2: Configuration Update (30 minutes)
Update your values.yaml and secret.yaml files with:
- New database endpoints
- New storage bucket names/endpoints
- New ingress domains
- Any region-specific configurations
Ensure ENCRYPTION_KEY in secret.yaml remains unchanged to decrypt existing data.
Phase 3: Application Deployment (1 hour)
Deploy applications following the standard deployment procedures:
- Deploy Unstract: Follow the Deployment Guide — Installation section
- Deploy LLMWhisperer: Follow the LLMWhisperer deployment guide
Use the same Helm commands and values files as documented in your original deployment, with the updated DR configuration files.
Phase 4: Network Configuration (Customer Managed)
- Update DNS to point to DR region
- Configure SSL certificates
- Verify ingress timeout settings (900 seconds)
5. Post-Recovery Validation
System Health Checks
Ensure all health checks at pod level (and at ingress level, if configured) are passing.
kubectl get pods -n unstract
All pods should be in Running state with zero restarts.
Functional Validation
-
Unstract Platform:
- Access web interface
- Login
- Create test workflow / View existing workflows
- Upload and process test document
- Verify API deployments
-
LLMWhisperer:
- Access dashboard
- Test extract endpoint
- Verify usage tracking
Known Issues Post-Recovery
- In-flight workflow executions could remain as in-progress in the DB record at the time of the last snapshot. Anything after that will be lost.
- LLMWhisperer in-flight requests are lost
- Usage data may be inconsistent with vendor records
- Configuration changes made in the last 24 hours (last backup point) to adapters, Prompt Studio, workflows, etc. may be lost
6. Testing Guidelines
Testing Schedule
- Frequency: Semi-annually
- Type: Table-top exercise or partial test
Test Procedure
- Document review and update
- Verify backup availability
- Test Kubernetes resource restoration
- Validate configuration files
- Deploy to test namespace (optional)
Success Criteria
- All pods running without crashes
- API endpoints responding
- Test document processed successfully
- No critical errors in logs
Appendix
A. Quick Reference Commands
# Check deployment status
helm list -n unstract
helm list -n llmwhisperer
# View current configuration
helm get values unstract-platform -n unstract > current-unstract-values.yaml
helm get values whisperer -n llmwhisperer > current-llmwhisperer-values.yaml
# Debug pod issues
kubectl describe pod [POD_NAME] -n [NAMESPACE]
kubectl logs [POD_NAME] -n [NAMESPACE] --previous
B. Troubleshooting
-
Image Pull Errors
- Verify
gcr-artifact-secretexists in namespace - Check
artifact-key.jsonvalidity - Ensure Helm registry login succeeded
- Verify
-
Database Connection Failed
- Verify database endpoints in
secret.yaml - Check network connectivity
- Validate credentials
- Verify database endpoints in
-
Storage Access Denied
- Verify IAM policies match production
- Check bucket names in
values.yaml - Validate credentials in
secret.yaml
-
Pods in CrashLoopBackOff
- Check pod logs for specific errors
- Verify all secrets are present
- Ensure resource limits are adequate
C. Contact Information
| Support Type | Contact | When to Use |
|---|---|---|
| Unstract Support | support@unstract.com | Application-specific issues |
| Unstract Docs | On-prem Edition Docs | Unstract setup reference |
| LLMWhisperer Docs | On-prem Edition Docs | LLMWhisperer setup reference |