On-Prem Deployment Guide
This guide provides comprehensive instructions for deploying LLMWhisperer in an on-premises environment. LLMWhisperer is deployed inside a Kubernetes cluster, packaged as a Helm chart.
Overview
LLMWhisperer On-Prem is a self-hosted deployment that runs entirely within your infrastructure. It includes:
- LLMWhisperer Backend — the core text extraction API service
- LLMWhisperer Dashboard — a web UI for usage monitoring and management
- OCR Workers — document processing workers that scale based on load
- RabbitMQ — message broker for distributed task processing
- Redis — caching layer for performance optimization
1. Infrastructure Prerequisites
Kubernetes Cluster
- Recommended version: >= 1.29 (latest tested: 1.33)
- Node autoscaling should be enabled
- Recommended to create in a single Availability Zone since some statefulset workloads do not have HA support yet. Multi-AZ can lead to volume attach errors
- Ingress controller as a K8s cluster add-on for load balancer creation (recommended)
- Ingress requires a maximum timeout of 900 seconds to work as expected
- In-house or cloud provider observability stack (recommended)
PostgreSQL Database
- Supported version: 15.0
- Minimum specs: 1 vCPU, 8 GiB RAM, 50 GiB SSD
- Autoscale enabled (recommended)
- A dedicated database for LLMWhisperer should be created within the PostgreSQL instance
DNS & SSL
- A domain for pointing to LLMWhisperer (e.g.,
llmwhisperer.<customer-domain>.com) - An active SSL certificate is required for the domain
Node Profile
Add 50 GiB SSD for application data to each machine.
| Machine Type | Label | Taint (NoSchedule) | Min | Max |
|---|---|---|---|---|
| 8 vCPU and 32 GiB | service: llmwhisperer | service: llmwhisperer | 1 | 60 |
GPU Nodes (Optional — for document insights mode)
| Cloud Provider | Instance Type | GPU Family | Label | Taint (NoSchedule) | Min | Max |
|---|---|---|---|---|---|---|
| AWS | g6.xlarge | NVIDIA L4 Tensor Core | service: llmwhisperer-gpu | service: llmwhisperer-gpu | 1 | 1 |
| GCP | g2-standard-4 | NVIDIA L4 Tensor Core | service: llmwhisperer-gpu | service: llmwhisperer-gpu | 1 | 1 |
It is expected that the workloads are to be deployed on non-spot nodepools.
2. Configuration
Files Provided by Unstract Team
The following files will be provided by the Unstract team:
| File | Description |
|---|---|
artifact-key.json | GCP service account key for Helm chart registry login and container image pull |
sample.onprem.values.yaml | Sample Helm chart values (non-sensitive configuration) |
onprem-profile.values.yaml | Profile values for resource allocation and scaling configuration |
Required Configuration Values
These values must be provided by the customer or the Unstract team to deploy LLMWhisperer:
| Variable | Description | Source |
|---|---|---|
DB_LLMW_HOST | PostgreSQL host | Customer |
DB_LLMW_USERNAME | PostgreSQL username | Customer |
DB_LLMW_PASSWORD | PostgreSQL password | Customer |
DB_LLMW_NAME | PostgreSQL database name | Customer |
ENCRYPTION_KEY | Encryption key for sensitive data — must be backed up securely | Self-generated |
LICENSE_PORTAL_API_KEY | License portal API key | Unstract Team |
endpoint (azureOcrBilling) | Azure Cognitive Services OCR endpoint | Unstract Team |
apiKey (azureOcrBilling) | Azure OCR API key | Unstract Team |
INITIAL_PASSWORD | Initial admin password for the dashboard | Customer |
X_CELERY_BROKER_USERNAME | RabbitMQ username | Customer |
X_CELERY_BROKER_PASSWORD | RabbitMQ password | Customer |
The ENCRYPTION_KEY is used to encrypt data at rest and is required when retrieving the data. Do not rotate, delete, or lose this key — doing so will render existing encrypted data inaccessible.
Using Kubernetes Secrets (existingSecret)
Each configuration section in sample.onprem.values.yaml supports two approaches for providing sensitive values:
Option 1: Inline values — Provide values directly in the values file. Suitable for initial setup and testing.
global:
sharedConfigs:
database:
DB_LLMW_HOST: "postgres.example.com"
DB_LLMW_USERNAME: "postgres"
DB_LLMW_PASSWORD: "your-password"
DB_LLMW_NAME: "llmwhisperer"
Option 2: Kubernetes secrets (recommended for production) — Pre-create Kubernetes secrets with the matching variable names as keys, then reference the secret name via existingSecret. This avoids storing sensitive values in the Helm values file.
global:
sharedConfigs:
database:
existingSecret: "llmwhisperer-db-credentials"
The following configuration sections support existingSecret:
| Section | Example Secret Name | Keys |
|---|---|---|
global.sharedConfigs.database | llmwhisperer-db-credentials | DB_LLMW_HOST, DB_LLMW_USERNAME, DB_LLMW_PASSWORD, DB_LLMW_NAME, DB_LLMW_PORT |
global.sharedConfigs.redis | llmwhisperer-redis-credentials | REDIS_HOST, REDIS_PORT, REDIS_DB, REDIS_PASSWORD, REDIS_USER |
global.sharedConfigs.workerRedis | llmwhisperer-worker-redis-credentials | WORKER_REDIS_HOST, WORKER_REDIS_PORT, WORKER_REDIS_DB, WORKER_REDIS_PASSWORD |
global.sharedConfigs.celeryBroker | llmwhisperer-celery-broker-credentials | X_CELERY_BROKER_BASE_URL, X_CELERY_BROKER_USERNAME, X_CELERY_BROKER_PASSWORD, X_CELERY_BACKEND_URL |
global.sharedConfigs.apiKeys | llmwhisperer-api-keys | ENCRYPTION_KEY |
global.sharedConfigs.dashboardCredentials | llmwhisperer-dashboard-credentials | INITIAL_USER_NAME, INITIAL_PASSWORD |
global.sharedConfigs.license | llmwhisperer-license-secret | LICENSE_PORTAL_URL, LICENSE_PORTAL_API_KEY |
global.azureOcrBilling | azure-ocr-billing-credentials | endpoint, apiKey |
Example of creating a Kubernetes secret:
kubectl create secret generic llmwhisperer-db-credentials \
--namespace $NAMESPACE \
--from-literal=DB_LLMW_HOST="postgres.example.com" \
--from-literal=DB_LLMW_USERNAME="postgres" \
--from-literal=DB_LLMW_PASSWORD="your-password" \
--from-literal=DB_LLMW_NAME="llmwhisperer" \
--from-literal=DB_LLMW_PORT="5432"
3. Installation (One-Time)
Step 1: Check Cluster Connectivity
kubectl cluster-info
Step 2: Deploy RabbitMQ Operator (Once Per Cluster)
RabbitMQ operator is used for provisioning the RabbitMQ cluster within the namespace using its CRD. Refer to the official documentation.
kubectl apply -f "https://github.com/rabbitmq/cluster-operator/releases/latest/download/cluster-operator.yml"
Step 3: Create Namespace
export NAMESPACE=<namespace_name>
kubectl create namespace $NAMESPACE
Step 4: Authenticate Helm Registry
cat artifact-key.json | helm registry login -u _json_key --password-stdin https://us-central1-docker.pkg.dev
Step 5: Create Image Pull Secret
kubectl create secret docker-registry artifact-registry \
--namespace $NAMESPACE \
--docker-server=us-central1-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat artifact-key.json)"
Validate the secret was created successfully:
kubectl get secret artifact-registry -n $NAMESPACE -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d
Step 6: Configure Values File
- Create a copy of
sample.onprem.values.yamlasonprem.values.yaml - Fill in all values marked with
# <REQUIRED>— refer to the Configuration section for details on each value
Step 7: Install Helm Chart
- Requires 3 x 8 vCPU 32 GB nodes by default
- Processes ~1,800 pages/hour, maximum concurrency of 10 pages, response time of 12–14 seconds
- With HPA enabled: up to ~7,200 pages/hour, concurrency of 30 pages, response time of 15–16 seconds (uses ~9 x 8 vCPU machines)
- Capacity can be further tuned based on the processing modes in use
helm install whisperer oci://us-central1-docker.pkg.dev/pandoras-tamer/charts/llmwhisperer \
--version <version> \
-f /path/to/onprem.values.yaml \
-f /path/to/onprem-profile.values.yaml \
-n $NAMESPACE
Replace <version> with the target release version (see Version History).
4. Deployment Validation
Health Checks
| Service | Port | Network Type | Endpoint |
|---|---|---|---|
whisperer-backend | 3006 | HTTP | /health/ping |
llmwhisperer-dashboard | 3007 | HTTP | /health/ping |
Validation Steps
-
Check that all pods in the namespace are running without restarts:
kubectl get pods -n $NAMESPACE -
Validate the ingress configured for both the LLMWhisperer dashboard and the backend
-
Log in to the dashboard using the credentials configured in
onprem.values.yaml -
Validate the backend API — refer to the API documentation
5. Upgrading
- Configure
onprem.values.yamlas required for the target release version - Run the upgrade command:
helm upgrade whisperer oci://us-central1-docker.pkg.dev/pandoras-tamer/charts/llmwhisperer \
--version <version> \
-f /path/to/onprem.values.yaml \
-f /path/to/onprem-profile.values.yaml \
-n $NAMESPACE
If you are on AWS Ingress and upgrading from a version older than v2.36.0, ensure the following annotation is present:
alb.ingress.kubernetes.io/target-type: ip (see Appendix a).
6. Admin Login / Onboarding
Once LLMWhisperer is successfully deployed:
- Log in to the LLMWhisperer Dashboard using the
INITIAL_PASSWORDconfigured during installation - Change the password after first login
Appendix
a. Ingress Configuration
All ingress types must support a 900-second timeout.
AWS ALB Ingress Controller
-
Required annotation:
# REF: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/how-it-works/#ip-mode
alb.ingress.kubernetes.io/target-type: ip
Nginx Ingress Controller
Required annotations (Community Version syntax):
# Default is 60. Must be increased to 900.
nginx.ingress.kubernetes.io/proxy-read-timeout: "900"
# Default is 1 MB. Must be increased for large document uploads.
# REF: https://docs.nginx.com/nginx-ingress-controller/configuration/ingress-resources/advanced-configuration-with-annotations/
nginx.org/client-max-body-size: "200m"
Avoid using nginx.ingress.kubernetes.io/rewrite-target annotation. In Community NGINX Controller versions >= v0.22.0, the old rewrite-target: / syntax causes authentication failures (401 Unauthorized responses). If you encounter login issues, remove any rewrite-target annotations from your ingress configuration.
b. Outgoing Data (OCR Billing)
In an on-prem deployment, the only outgoing data from the OCR containers is billing information sent to Azure for metering purposes. No document content leaves your infrastructure.
You can find details about how Azure container billing works here.
The following billing data is sent to Unstract for license metering:
{
"subscription_id": "<subscription_id:uuid4>",
"deployment_id": "<deployment_id:uuid4>",
"page_count_total": "<total_page_count:int>",
"native_text_page_count": "<non_ocr_page_count:int>",
"low_cost_page_count": "<low_cost_page_count:int>",
"high_quality_page_count": "<high_quality_page_count:int>",
"form_page_count": "<form_page_count:int>",
"from_date": "<timestamp>",
"to_date": "<timestamp>"
}
c. Useful Commands
Kubernetes:
kubectl get pod -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
Helm:
helm list -n <namespace>
helm show values oci://us-central1-docker.pkg.dev/pandoras-tamer/charts/llmwhisperer --version <version>
helm rollback whisperer <revision-number> -n <namespace>
helm uninstall whisperer -n <namespace>