Skip to main content
Version: 2.0.0

On-Prem Deployment Guide

This guide provides comprehensive instructions for deploying LLMWhisperer in an on-premises environment. LLMWhisperer is deployed inside a Kubernetes cluster, packaged as a Helm chart.

Overview

LLMWhisperer On-Prem is a self-hosted deployment that runs entirely within your infrastructure. It includes:

  • LLMWhisperer Backend — the core text extraction API service
  • LLMWhisperer Dashboard — a web UI for usage monitoring and management
  • OCR Workers — document processing workers that scale based on load
  • RabbitMQ — message broker for distributed task processing
  • Redis — caching layer for performance optimization

1. Infrastructure Prerequisites

Kubernetes Cluster

  • Recommended version: >= 1.29 (latest tested: 1.33)
  • Node autoscaling should be enabled
  • Recommended to create in a single Availability Zone since some statefulset workloads do not have HA support yet. Multi-AZ can lead to volume attach errors
  • Ingress controller as a K8s cluster add-on for load balancer creation (recommended)
    • Ingress requires a maximum timeout of 900 seconds to work as expected
  • In-house or cloud provider observability stack (recommended)

PostgreSQL Database

  • Supported version: 15.0
  • Minimum specs: 1 vCPU, 8 GiB RAM, 50 GiB SSD
  • Autoscale enabled (recommended)
  • A dedicated database for LLMWhisperer should be created within the PostgreSQL instance

DNS & SSL

  • A domain for pointing to LLMWhisperer (e.g., llmwhisperer.<customer-domain>.com)
  • An active SSL certificate is required for the domain

Node Profile

Add 50 GiB SSD for application data to each machine.

Machine TypeLabelTaint (NoSchedule)MinMax
8 vCPU and 32 GiBservice: llmwhispererservice: llmwhisperer160

GPU Nodes (Optional — for document insights mode)

Cloud ProviderInstance TypeGPU FamilyLabelTaint (NoSchedule)MinMax
AWSg6.xlargeNVIDIA L4 Tensor Coreservice: llmwhisperer-gpuservice: llmwhisperer-gpu11
GCPg2-standard-4NVIDIA L4 Tensor Coreservice: llmwhisperer-gpuservice: llmwhisperer-gpu11
warning

It is expected that the workloads are to be deployed on non-spot nodepools.

2. Configuration

Files Provided by Unstract Team

The following files will be provided by the Unstract team:

FileDescription
artifact-key.jsonGCP service account key for Helm chart registry login and container image pull
sample.onprem.values.yamlSample Helm chart values (non-sensitive configuration)
onprem-profile.values.yamlProfile values for resource allocation and scaling configuration

Required Configuration Values

These values must be provided by the customer or the Unstract team to deploy LLMWhisperer:

VariableDescriptionSource
DB_LLMW_HOSTPostgreSQL hostCustomer
DB_LLMW_USERNAMEPostgreSQL usernameCustomer
DB_LLMW_PASSWORDPostgreSQL passwordCustomer
DB_LLMW_NAMEPostgreSQL database nameCustomer
ENCRYPTION_KEYEncryption key for sensitive data — must be backed up securelySelf-generated
LICENSE_PORTAL_API_KEYLicense portal API keyUnstract Team
endpoint (azureOcrBilling)Azure Cognitive Services OCR endpointUnstract Team
apiKey (azureOcrBilling)Azure OCR API keyUnstract Team
INITIAL_PASSWORDInitial admin password for the dashboardCustomer
X_CELERY_BROKER_USERNAMERabbitMQ usernameCustomer
X_CELERY_BROKER_PASSWORDRabbitMQ passwordCustomer
warning

The ENCRYPTION_KEY is used to encrypt data at rest and is required when retrieving the data. Do not rotate, delete, or lose this key — doing so will render existing encrypted data inaccessible.

Using Kubernetes Secrets (existingSecret)

Each configuration section in sample.onprem.values.yaml supports two approaches for providing sensitive values:

Option 1: Inline values — Provide values directly in the values file. Suitable for initial setup and testing.

global:
sharedConfigs:
database:
DB_LLMW_HOST: "postgres.example.com"
DB_LLMW_USERNAME: "postgres"
DB_LLMW_PASSWORD: "your-password"
DB_LLMW_NAME: "llmwhisperer"

Option 2: Kubernetes secrets (recommended for production) — Pre-create Kubernetes secrets with the matching variable names as keys, then reference the secret name via existingSecret. This avoids storing sensitive values in the Helm values file.

global:
sharedConfigs:
database:
existingSecret: "llmwhisperer-db-credentials"

The following configuration sections support existingSecret:

SectionExample Secret NameKeys
global.sharedConfigs.databasellmwhisperer-db-credentialsDB_LLMW_HOST, DB_LLMW_USERNAME, DB_LLMW_PASSWORD, DB_LLMW_NAME, DB_LLMW_PORT
global.sharedConfigs.redisllmwhisperer-redis-credentialsREDIS_HOST, REDIS_PORT, REDIS_DB, REDIS_PASSWORD, REDIS_USER
global.sharedConfigs.workerRedisllmwhisperer-worker-redis-credentialsWORKER_REDIS_HOST, WORKER_REDIS_PORT, WORKER_REDIS_DB, WORKER_REDIS_PASSWORD
global.sharedConfigs.celeryBrokerllmwhisperer-celery-broker-credentialsX_CELERY_BROKER_BASE_URL, X_CELERY_BROKER_USERNAME, X_CELERY_BROKER_PASSWORD, X_CELERY_BACKEND_URL
global.sharedConfigs.apiKeysllmwhisperer-api-keysENCRYPTION_KEY
global.sharedConfigs.dashboardCredentialsllmwhisperer-dashboard-credentialsINITIAL_USER_NAME, INITIAL_PASSWORD
global.sharedConfigs.licensellmwhisperer-license-secretLICENSE_PORTAL_URL, LICENSE_PORTAL_API_KEY
global.azureOcrBillingazure-ocr-billing-credentialsendpoint, apiKey

Example of creating a Kubernetes secret:

kubectl create secret generic llmwhisperer-db-credentials \
--namespace $NAMESPACE \
--from-literal=DB_LLMW_HOST="postgres.example.com" \
--from-literal=DB_LLMW_USERNAME="postgres" \
--from-literal=DB_LLMW_PASSWORD="your-password" \
--from-literal=DB_LLMW_NAME="llmwhisperer" \
--from-literal=DB_LLMW_PORT="5432"

3. Installation (One-Time)

Step 1: Check Cluster Connectivity

kubectl cluster-info

Step 2: Deploy RabbitMQ Operator (Once Per Cluster)

RabbitMQ operator is used for provisioning the RabbitMQ cluster within the namespace using its CRD. Refer to the official documentation.

kubectl apply -f "https://github.com/rabbitmq/cluster-operator/releases/latest/download/cluster-operator.yml"

Step 3: Create Namespace

export NAMESPACE=<namespace_name>
kubectl create namespace $NAMESPACE

Step 4: Authenticate Helm Registry

cat artifact-key.json | helm registry login -u _json_key --password-stdin https://us-central1-docker.pkg.dev

Step 5: Create Image Pull Secret

kubectl create secret docker-registry artifact-registry \
--namespace $NAMESPACE \
--docker-server=us-central1-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat artifact-key.json)"

Validate the secret was created successfully:

kubectl get secret artifact-registry -n $NAMESPACE -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d

Step 6: Configure Values File

  1. Create a copy of sample.onprem.values.yaml as onprem.values.yaml
  2. Fill in all values marked with # <REQUIRED> — refer to the Configuration section for details on each value

Step 7: Install Helm Chart

  • Requires 3 x 8 vCPU 32 GB nodes by default
  • Processes ~1,800 pages/hour, maximum concurrency of 10 pages, response time of 12–14 seconds
  • With HPA enabled: up to ~7,200 pages/hour, concurrency of 30 pages, response time of 15–16 seconds (uses ~9 x 8 vCPU machines)
  • Capacity can be further tuned based on the processing modes in use
helm install whisperer oci://us-central1-docker.pkg.dev/pandoras-tamer/charts/llmwhisperer \
--version <version> \
-f /path/to/onprem.values.yaml \
-f /path/to/onprem-profile.values.yaml \
-n $NAMESPACE

Replace <version> with the target release version (see Version History).

4. Deployment Validation

Health Checks

ServicePortNetwork TypeEndpoint
whisperer-backend3006HTTP/health/ping
llmwhisperer-dashboard3007HTTP/health/ping

Validation Steps

  1. Check that all pods in the namespace are running without restarts:

    kubectl get pods -n $NAMESPACE
  2. Validate the ingress configured for both the LLMWhisperer dashboard and the backend

  3. Log in to the dashboard using the credentials configured in onprem.values.yaml

  4. Validate the backend API — refer to the API documentation

5. Upgrading

  1. Configure onprem.values.yaml as required for the target release version
  2. Run the upgrade command:
helm upgrade whisperer oci://us-central1-docker.pkg.dev/pandoras-tamer/charts/llmwhisperer \
--version <version> \
-f /path/to/onprem.values.yaml \
-f /path/to/onprem-profile.values.yaml \
-n $NAMESPACE
info

If you are on AWS Ingress and upgrading from a version older than v2.36.0, ensure the following annotation is present: alb.ingress.kubernetes.io/target-type: ip (see Appendix a).

6. Admin Login / Onboarding

Once LLMWhisperer is successfully deployed:

  1. Log in to the LLMWhisperer Dashboard using the INITIAL_PASSWORD configured during installation
  2. Change the password after first login

Appendix

a. Ingress Configuration

All ingress types must support a 900-second timeout.

AWS ALB Ingress Controller

  • Ingress configuration in EKS Auto Mode

  • Required annotation:

    # REF: https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/how-it-works/#ip-mode
    alb.ingress.kubernetes.io/target-type: ip

Nginx Ingress Controller

Required annotations (Community Version syntax):

# Default is 60. Must be increased to 900.
nginx.ingress.kubernetes.io/proxy-read-timeout: "900"
# Default is 1 MB. Must be increased for large document uploads.
# REF: https://docs.nginx.com/nginx-ingress-controller/configuration/ingress-resources/advanced-configuration-with-annotations/
nginx.org/client-max-body-size: "200m"
warning

Avoid using nginx.ingress.kubernetes.io/rewrite-target annotation. In Community NGINX Controller versions >= v0.22.0, the old rewrite-target: / syntax causes authentication failures (401 Unauthorized responses). If you encounter login issues, remove any rewrite-target annotations from your ingress configuration.

b. Outgoing Data (OCR Billing)

In an on-prem deployment, the only outgoing data from the OCR containers is billing information sent to Azure for metering purposes. No document content leaves your infrastructure.

You can find details about how Azure container billing works here.

The following billing data is sent to Unstract for license metering:

{
"subscription_id": "<subscription_id:uuid4>",
"deployment_id": "<deployment_id:uuid4>",
"page_count_total": "<total_page_count:int>",
"native_text_page_count": "<non_ocr_page_count:int>",
"low_cost_page_count": "<low_cost_page_count:int>",
"high_quality_page_count": "<high_quality_page_count:int>",
"form_page_count": "<form_page_count:int>",
"from_date": "<timestamp>",
"to_date": "<timestamp>"
}

c. Useful Commands

Kubernetes:

kubectl get pod -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>

Helm:

helm list -n <namespace>

helm show values oci://us-central1-docker.pkg.dev/pandoras-tamer/charts/llmwhisperer --version <version>

helm rollback whisperer <revision-number> -n <namespace>

helm uninstall whisperer -n <namespace>