Post-Processing Webhook

Overview

The Post-Processing Webhook feature allows you to send the LLM's extracted data to an external webhook endpoint for custom processing, validation, or transformation before the final output is returned. This feature is particularly useful when you need to:

Apply custom business logic to the extracted data
Validate extracted values against external systems
Transform or enrich the data with additional information
Integrate with external APIs or services for real-time processing

The webhook receives the structured output from the LLM and can modify it before it's returned as the final result.

info

This feature is currently available only for prompts with Enforce Type: JSON.

Configuration

The Post-Processing Webhook is configured on a per-prompt basis within Prompt Studio.

Steps to Enable Post-Processing Webhook

Select Your Prompt: Navigate to the prompt card you want to configure in the Document Parser view.
Enable the Feature:
- Locate the checkbox labeled "Enable Postprocessing Webhook"
- Check the box to enable the feature
Enter Webhook URL:
- In the text field that appears, enter the full URL of your webhook endpoint
- Example: https://example.com/webhook
- Important: The URL must use HTTPS protocol (HTTP is not allowed for security reasons)
Save Configuration: The settings are automatically saved once you enter the URL.

Post-Processing Webhook Configuration

note

The webhook URL field includes a helpful placeholder text: "Enter service URL for JSON postprocessing"

Webhook Request Format

When the LLM completes extraction and returns a structured JSON output, the system will send a POST request to your configured webhook URL.

Request Headers

Content-Type: application/json

Request Payload Structure

The webhook receives a JSON payload with the following structure:

{
  "structured_output": {
    // The extracted data from the LLM in the format defined by your prompt
  },
  "highlight_data": [
    // Optional: Array of highlight information showing where data was found in the source document
    // Only included if highlighting is enabled for the prompt
  ]
}

Example Request Payload

{
  "structured_output": {
    "invoice_number": "INV-2024-001",
    "total_amount": "1250.00",
    "date": "2024-01-15",
    "vendor_name": "Acme Corp"
  },
  "highlight_data": [
    {
      "field": "invoice_number",
      "page": 1,
      "coordinates": [100, 200, 300, 220]
    }
  ]
}

Webhook Response Format

Your webhook endpoint must respond with a JSON payload that includes the processed data. The system will use this response to update the final output.

Expected Response Structure

{
  "structured_output": {
    // Your modified/validated/enriched data
  },
  "highlight_data": [
    // Optional: Updated highlight information (must be a list)
  ]
}

Response Requirements

HTTP Status Code: Must return 200 OK for successful processing
Content-Type: Must be application/json
structured_output: This field is required and must contain either:
- A JSON object ({})
- A JSON array ([])
highlight_data: This field is optional but if provided, must be a list/array

Example Response

{
  "structured_output": {
    "invoice_number": "INV-2024-001",
    "total_amount": "1250.00",
    "total_amount_formatted": "$1,250.00",
    "date": "2024-01-15",
    "date_formatted": "January 15, 2024",
    "vendor_name": "Acme Corp",
    "vendor_id": "VND-12345",
    "validation_status": "approved"
  },
  "highlight_data": [
    {
      "field": "invoice_number",
      "page": 1,
      "coordinates": [100, 200, 300, 220]
    }
  ]
}

Webhook Behavior

Timeout

The webhook request has a 60-second timeout
If your webhook doesn't respond within 60 seconds, the original LLM output will be used
This provides sufficient time for complex processing while ensuring the extraction pipeline doesn't hang indefinitely

Error Handling

The system gracefully handles various error scenarios:

Error Scenario	System Behavior	Notes
Non-200 HTTP status	Returns original LLM output	A warning is logged with the status code
Request timeout	Returns original LLM output	Timeout set to 60 seconds
Invalid JSON response	Returns original LLM output	JSON parsing errors are logged
Missing `structured_output`	Returns original LLM output	Warning logged about missing key
Invalid `structured_output` type	Returns original LLM output	Must be dict or list
Invalid `highlight_data` type	Uses original highlight data	Updated data must be a list
Network errors	Returns original LLM output	Connection failures, DNS errors, etc.
Unexpected exceptions	Returns original LLM output	All errors are caught and logged

tip

The system is designed to be fault-tolerant. If your webhook fails for any reason, the extraction will still complete successfully using the original LLM output.

Security Considerations

HTTPS Only: Only HTTPS URLs are allowed for webhook endpoints. HTTP URLs will be rejected for security reasons.
SSRF Protection: The system validates webhook URLs to prevent Server-Side Request Forgery (SSRF) attacks:
- Blocks private/loopback addresses (localhost, 127.0.0.1, etc.)
- Blocks internal network addresses (192.168.x.x, 10.x.x.x, etc.)
- Validates all DNS records to prevent DNS rebinding attacks
No Redirects: The webhook client does not follow redirects to prevent redirect-based SSRF attacks
No Authentication: Currently, the webhook does not support authentication headers (planned for future releases)
Validate Input: Your webhook should validate the incoming data structure
Sanitize Output: Ensure your webhook response doesn't introduce malicious data

Use Cases

1. Data Validation

Validate extracted values against business rules or external databases:

# Example webhook endpoint for validation
@app.post("/validate-invoice")
def validate_invoice(data: dict):
    structured = data["structured_output"]

    # Validate invoice number format
    if not is_valid_invoice_number(structured["invoice_number"]):
        structured["validation_errors"] = ["Invalid invoice number format"]

    # Check if vendor exists in system
    vendor = lookup_vendor(structured["vendor_name"])
    if vendor:
        structured["vendor_id"] = vendor["id"]

    return {"structured_output": structured}

2. Data Enrichment

Add additional information from external systems:

@app.post("/enrich-customer-data")
def enrich_customer(data: dict):
    structured = data["structured_output"]

    # Look up customer details
    customer = get_customer_by_email(structured["email"])
    if customer:
        structured["customer_id"] = customer["id"]
        structured["customer_tier"] = customer["tier"]
        structured["lifetime_value"] = customer["ltv"]

    return {"structured_output": structured}

3. Data Transformation

Format or standardize the extracted data:

@app.post("/format-financial-data")
def format_data(data: dict):
    structured = data["structured_output"]

    # Convert date formats
    structured["date_iso"] = convert_to_iso_date(structured["date"])

    # Format currency values
    structured["total_formatted"] = format_currency(
        structured["total_amount"],
        currency="USD"
    )

    # Standardize phone numbers
    structured["phone_e164"] = format_phone_number(structured["phone"])

    return {"structured_output": structured}

4. Multi-System Integration

Trigger actions in external systems:

@app.post("/process-order")
def process_order(data: dict):
    structured = data["structured_output"]

    # Create order in ERP system
    order = create_erp_order(structured)
    structured["erp_order_id"] = order["id"]

    # Send notification
    send_notification(structured["email"], order)

    # Update CRM
    update_crm_opportunity(structured["customer_id"], order)

    return {"structured_output": structured}

Testing Your Webhook

Using a Local Development Server

For local testing, you can use tools like ngrok to expose your local webhook endpoint:

# Start your local webhook server
python webhook_server.py

# In another terminal, create a tunnel
ngrok http 8000

# Use the ngrok HTTPS URL in Prompt Studio
# Example: https://abc123.ngrok.io/webhook

Sample Python Webhook Server

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.post("/webhook")
def postprocess():
    try:
        data = request.get_json()
        structured = data.get("structured_output", {})
        highlight = data.get("highlight_data")

        # Your processing logic here
        processed = process_data(structured)

        response = {"structured_output": processed}
        if highlight is not None:
            response["highlight_data"] = highlight

        return jsonify(response), 200

    except Exception as e:
        # Return error - system will use original data
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    app.run(port=8000)

Limitations

Authentication: No authentication mechanism is currently supported
JSON Only: Only available for prompts with Enforce Type: JSON
Timeout: Fixed 60-second timeout (not configurable)
No Retries: Failed webhook calls are not retried
Synchronous Only: The webhook is called synchronously during extraction

Best Practices

Keep Processing Fast: Your webhook should respond within 60 seconds to avoid timeouts
Return Valid JSON: Always return properly formatted JSON with the structured_output key
Handle Errors Gracefully: Return appropriate HTTP status codes
Log Everything: Maintain logs of all webhook requests for debugging
Validate Input: Don't assume the input structure is always correct
Use Idempotency: Design your webhook to handle duplicate requests safely
Monitor Performance: Track webhook response times and error rates
Test Thoroughly: Test with various input scenarios before deploying

Troubleshooting

Webhook Not Being Called

Verify the "Enable Postprocessing Webhook" checkbox is checked
Ensure the prompt's Enforce Type is set to JSON
Check that the webhook URL uses HTTPS (not HTTP)
Verify the webhook URL is not pointing to localhost or private IP addresses
Check that the webhook URL is valid and accessible from the internet
Verify your webhook endpoint is running and accepting connections

Original Data Returned Instead of Processed Data

Check your webhook is returning HTTP 200 status
Verify the response includes the structured_output key
Ensure response is valid JSON
Check webhook response time is under 60 seconds
Review system logs for error messages

Highlight Data Not Updated

Verify your response includes highlight_data as a list/array
Check that the data structure matches the expected format
Ensure you're not returning null or an invalid type

Managing Grammar - Define custom synonyms for better extraction
Combined Output - Combine multiple prompt outputs
Output Analyzer - Analyze and validate prompt outputs
Creating and Managing Prompts - Learn about prompt configuration

Overview​

Configuration​

Steps to Enable Post-Processing Webhook​

Webhook Request Format​

Request Headers​

Request Payload Structure​

Example Request Payload​

Webhook Response Format​

Expected Response Structure​

Response Requirements​

Example Response​

Webhook Behavior​

Timeout​

Error Handling​

Security Considerations​

Use Cases​

1. Data Validation​

2. Data Enrichment​

3. Data Transformation​

4. Multi-System Integration​

Testing Your Webhook​

Using a Local Development Server​

Sample Python Webhook Server​

Limitations​

Best Practices​

Troubleshooting​

Webhook Not Being Called​

Original Data Returned Instead of Processed Data​

Highlight Data Not Updated​

Related Features​