Skip to main content

Post-Processing Webhook

Overview

The Post-Processing Webhook feature allows you to send the LLM's extracted data to an external webhook endpoint for custom processing, validation, or transformation before the final output is returned. This feature is particularly useful when you need to:

  • Apply custom business logic to the extracted data
  • Validate extracted values against external systems
  • Transform or enrich the data with additional information
  • Integrate with external APIs or services for real-time processing

The webhook receives the structured output from the LLM and can modify it before it's returned as the final result.

info

This feature is currently available only for prompts with Enforce Type: JSON.

Configuration

The Post-Processing Webhook is configured on a per-prompt basis within Prompt Studio.

Steps to Enable Post-Processing Webhook

  1. Select Your Prompt: Navigate to the prompt card you want to configure in the Document Parser view.

  2. Enable the Feature:

    • Locate the checkbox labeled "Enable Postprocessing Webhook"
    • Check the box to enable the feature
  3. Enter Webhook URL:

    • In the text field that appears, enter the full URL of your webhook endpoint
    • Example: https://example.com/webhook
    • Important: The URL must use HTTPS protocol (HTTP is not allowed for security reasons)
  4. Save Configuration: The settings are automatically saved once you enter the URL.

Post-Processing Webhook Configuration

note

The webhook URL field includes a helpful placeholder text: "Enter service URL for JSON postprocessing"

Webhook Request Format

When the LLM completes extraction and returns a structured JSON output, the system will send a POST request to your configured webhook URL.

Request Headers

Content-Type: application/json

Request Payload Structure

The webhook receives a JSON payload with the following structure:

{
"structured_output": {
// The extracted data from the LLM in the format defined by your prompt
},
"highlight_data": [
// Optional: Array of highlight information showing where data was found in the source document
// Only included if highlighting is enabled for the prompt
]
}

Example Request Payload

{
"structured_output": {
"invoice_number": "INV-2024-001",
"total_amount": "1250.00",
"date": "2024-01-15",
"vendor_name": "Acme Corp"
},
"highlight_data": [
{
"field": "invoice_number",
"page": 1,
"coordinates": [100, 200, 300, 220]
}
]
}

Webhook Response Format

Your webhook endpoint must respond with a JSON payload that includes the processed data. The system will use this response to update the final output.

Expected Response Structure

{
"structured_output": {
// Your modified/validated/enriched data
},
"highlight_data": [
// Optional: Updated highlight information (must be a list)
]
}

Response Requirements

  • HTTP Status Code: Must return 200 OK for successful processing
  • Content-Type: Must be application/json
  • structured_output: This field is required and must contain either:
    • A JSON object ({})
    • A JSON array ([])
  • highlight_data: This field is optional but if provided, must be a list/array

Example Response

{
"structured_output": {
"invoice_number": "INV-2024-001",
"total_amount": "1250.00",
"total_amount_formatted": "$1,250.00",
"date": "2024-01-15",
"date_formatted": "January 15, 2024",
"vendor_name": "Acme Corp",
"vendor_id": "VND-12345",
"validation_status": "approved"
},
"highlight_data": [
{
"field": "invoice_number",
"page": 1,
"coordinates": [100, 200, 300, 220]
}
]
}

Webhook Behavior

Timeout

  • The webhook request has a 60-second timeout
  • If your webhook doesn't respond within 60 seconds, the original LLM output will be used
  • This provides sufficient time for complex processing while ensuring the extraction pipeline doesn't hang indefinitely

Error Handling

The system gracefully handles various error scenarios:

Error ScenarioSystem BehaviorNotes
Non-200 HTTP statusReturns original LLM outputA warning is logged with the status code
Request timeoutReturns original LLM outputTimeout set to 60 seconds
Invalid JSON responseReturns original LLM outputJSON parsing errors are logged
Missing structured_outputReturns original LLM outputWarning logged about missing key
Invalid structured_output typeReturns original LLM outputMust be dict or list
Invalid highlight_data typeUses original highlight dataUpdated data must be a list
Network errorsReturns original LLM outputConnection failures, DNS errors, etc.
Unexpected exceptionsReturns original LLM outputAll errors are caught and logged
tip

The system is designed to be fault-tolerant. If your webhook fails for any reason, the extraction will still complete successfully using the original LLM output.

Security Considerations

  • HTTPS Only: Only HTTPS URLs are allowed for webhook endpoints. HTTP URLs will be rejected for security reasons.
  • SSRF Protection: The system validates webhook URLs to prevent Server-Side Request Forgery (SSRF) attacks:
    • Blocks private/loopback addresses (localhost, 127.0.0.1, etc.)
    • Blocks internal network addresses (192.168.x.x, 10.x.x.x, etc.)
    • Validates all DNS records to prevent DNS rebinding attacks
  • No Redirects: The webhook client does not follow redirects to prevent redirect-based SSRF attacks
  • No Authentication: Currently, the webhook does not support authentication headers (planned for future releases)
  • Validate Input: Your webhook should validate the incoming data structure
  • Sanitize Output: Ensure your webhook response doesn't introduce malicious data

Use Cases

1. Data Validation

Validate extracted values against business rules or external databases:

# Example webhook endpoint for validation
@app.post("/validate-invoice")
def validate_invoice(data: dict):
structured = data["structured_output"]

# Validate invoice number format
if not is_valid_invoice_number(structured["invoice_number"]):
structured["validation_errors"] = ["Invalid invoice number format"]

# Check if vendor exists in system
vendor = lookup_vendor(structured["vendor_name"])
if vendor:
structured["vendor_id"] = vendor["id"]

return {"structured_output": structured}

2. Data Enrichment

Add additional information from external systems:

@app.post("/enrich-customer-data")
def enrich_customer(data: dict):
structured = data["structured_output"]

# Look up customer details
customer = get_customer_by_email(structured["email"])
if customer:
structured["customer_id"] = customer["id"]
structured["customer_tier"] = customer["tier"]
structured["lifetime_value"] = customer["ltv"]

return {"structured_output": structured}

3. Data Transformation

Format or standardize the extracted data:

@app.post("/format-financial-data")
def format_data(data: dict):
structured = data["structured_output"]

# Convert date formats
structured["date_iso"] = convert_to_iso_date(structured["date"])

# Format currency values
structured["total_formatted"] = format_currency(
structured["total_amount"],
currency="USD"
)

# Standardize phone numbers
structured["phone_e164"] = format_phone_number(structured["phone"])

return {"structured_output": structured}

4. Multi-System Integration

Trigger actions in external systems:

@app.post("/process-order")
def process_order(data: dict):
structured = data["structured_output"]

# Create order in ERP system
order = create_erp_order(structured)
structured["erp_order_id"] = order["id"]

# Send notification
send_notification(structured["email"], order)

# Update CRM
update_crm_opportunity(structured["customer_id"], order)

return {"structured_output": structured}

Testing Your Webhook

Using a Local Development Server

For local testing, you can use tools like ngrok to expose your local webhook endpoint:

# Start your local webhook server
python webhook_server.py

# In another terminal, create a tunnel
ngrok http 8000

# Use the ngrok HTTPS URL in Prompt Studio
# Example: https://abc123.ngrok.io/webhook

Sample Python Webhook Server

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.post("/webhook")
def postprocess():
try:
data = request.get_json()
structured = data.get("structured_output", {})
highlight = data.get("highlight_data")

# Your processing logic here
processed = process_data(structured)

response = {"structured_output": processed}
if highlight is not None:
response["highlight_data"] = highlight

return jsonify(response), 200

except Exception as e:
# Return error - system will use original data
return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
app.run(port=8000)

Limitations

  • Authentication: No authentication mechanism is currently supported
  • JSON Only: Only available for prompts with Enforce Type: JSON
  • Timeout: Fixed 60-second timeout (not configurable)
  • No Retries: Failed webhook calls are not retried
  • Synchronous Only: The webhook is called synchronously during extraction

Best Practices

  1. Keep Processing Fast: Your webhook should respond within 60 seconds to avoid timeouts
  2. Return Valid JSON: Always return properly formatted JSON with the structured_output key
  3. Handle Errors Gracefully: Return appropriate HTTP status codes
  4. Log Everything: Maintain logs of all webhook requests for debugging
  5. Validate Input: Don't assume the input structure is always correct
  6. Use Idempotency: Design your webhook to handle duplicate requests safely
  7. Monitor Performance: Track webhook response times and error rates
  8. Test Thoroughly: Test with various input scenarios before deploying

Troubleshooting

Webhook Not Being Called

  • Verify the "Enable Postprocessing Webhook" checkbox is checked
  • Ensure the prompt's Enforce Type is set to JSON
  • Check that the webhook URL uses HTTPS (not HTTP)
  • Verify the webhook URL is not pointing to localhost or private IP addresses
  • Check that the webhook URL is valid and accessible from the internet
  • Verify your webhook endpoint is running and accepting connections

Original Data Returned Instead of Processed Data

  • Check your webhook is returning HTTP 200 status
  • Verify the response includes the structured_output key
  • Ensure response is valid JSON
  • Check webhook response time is under 60 seconds
  • Review system logs for error messages

Highlight Data Not Updated

  • Verify your response includes highlight_data as a list/array
  • Check that the data structure matches the expected format
  • Ensure you're not returning null or an invalid type