Skip to main content

Limitations and Best Practices

Limitations and Edge Cases

  • Beta status. Agentic Prompt Studio is in beta. Agent behavior, pipeline structure, and output formats may change in future releases.
  • Sample quality matters. The agents can only learn patterns present in your uploaded samples. If your production documents include variants not represented in the samples, the generated schema and prompts may not cover those cases. Upload the widest range of representative samples you can.
  • Complex nested structures. While the FinalizerAgent handles nested structures (e.g., line items, addresses), deeply nested or highly irregular structures may require manual schema adjustments after generation.
  • Prompt generation depends on schema quality. If the schema is incomplete or inaccurate, the generated prompt will reflect those gaps. Review and validate the schema before generating prompts.
  • Verification Sets require manual effort upfront. You need to provide manually verified JSON for each baseline document. This initial investment pays off during iterative prompt development, but the baselines themselves must be accurate.
  • LLM adapter required. The agentic pipelines use LLM calls under the hood, so a configured LLM adapter with valid API keys is required.
  • JSON parsing edge cases. LLM responses may occasionally be malformed, truncated, or wrapped in markdown code fences. The system uses a multi-method parsing strategy (direct parse → markdown extraction → JSON repair → empty fallback), but rare edge cases can produce incomplete output.
  • Context window limitations. Documents exceeding the LLM's context window cannot be processed in a single pass. Keep documents under 100 pages when possible. Intelligent context window management with chunking and multi-pass extraction is planned for a future release.
  • No built-in cost tracking. There is currently no way to track LLM usage costs within Agentic Prompt Studio. Monitor spending through your LLM provider's dashboard.

Best Practices

Document Preparation

  • Use 3–5 representative samples that cover the range of document variants you expect in production.
  • Include edge cases: documents with missing fields, unusual formatting, or multi-page layouts.
  • Ensure OCR quality by using LLMWhisperer or a comparable preprocessing tool.
  • Prefer documents under 100 pages where possible, as longer documents may approach LLM context limits.

Schema Design

  • Review auto-generated schemas before generating prompts. The schema directly shapes the prompt, so gaps or errors in the schema carry forward.
  • Add or refine field descriptions to improve extraction guidance.
  • Use consistent naming conventions (e.g., snake_case for all field names).
  • Test the schema against diverse samples to confirm it covers all expected fields.

Prompt Optimization

  • Start with the auto-generated prompt and iterate from there. The agents have already incorporated document-specific patterns, so targeted edits are more effective than rewriting from scratch.
  • Tune the prompt after verifying at least 10 documents to establish a reliable accuracy baseline.
  • Use the Compare Prompt Versions feature to understand the impact of each change.
  • Add version notes when saving edits to maintain a clear change history.

Verification Workflow

  • Build verified data early. The sooner you have baselines, the sooner accuracy tracking becomes useful.
  • Verify consistently — use the same standards for what constitutes "correct" extraction across all documents.
  • Use PDF highlighting to cross-reference extracted values against source documents.
  • Focus prompt refinement on high-error fields identified by the Field Comparison and Mismatch Matrix.
  • Re-verify and recalculate accuracy every 10–20 prompt edits to track trends.

Quality Monitoring

  • Run regular accuracy checks against your verification set.
  • Review the Mismatch Matrix to spot systemic extraction issues — these are fields that fail across multiple documents.
  • Track accuracy trends across prompt versions using the Version History accuracy column.
  • Define a production accuracy threshold and do not deploy until that threshold is met.