SummarizedExtraction Introduction

In Prompt Studio, when you are working on field extractions, it's way easier for you to think of one prompt as something used for extracting one field, or a group of closely related fields (like the individual items in a line item) from a document. This also means that you can tweak a prompt to make it better without having to worry about it affecting the quality of other prompts. This one to one relationship between a prompt and a field is a design feature of Prompt Studio that makes long-term maintenance of projects with dozens of complex extractions easier.

Comparison with SinglePass Extraction

By default, each prompt in Unstract is run against the full context of the input document. This means that, if there are a lot of prompts, it can get pretty expensive in terms of LLM token costs. One powerful feature available in Unstract is SinglePass Extraction, which uses an LLM to construct a single, large prompt by combining all your prompts and executing it just once against the full context, saving LLM costs. In the SinglePass Extraction Introduction section, we discuss how it works, but also how it has limitations around the prompt getting too big and complex, thereby increasing the chances the extraction quality suffers.

SummarizedExtraction on the other hand, uses a completely different technique. Let's see how it works.

How SummarizedExtraction works

SummarizedExtraction is a four pass system.

Pass #1: An LLM is used for subquestion retrieval to understand what the user is trying to extract, completely ignoring the formatting instructions in each prompt, thus creating relatively simple prompts out of potentially complex prompts.
Pass #2: All these simple prompts are then combined into one single prompt.
Pass #3: This prompt is then executed against the full context of the input document to create a highly summarized version of the input document with only the information needed to satisfy the extractions the user needs.
Pass #4: Each of the users prompts is then executed against this summarized context. The individual outputs are combined to form the final response JSON.

The cost of SummarizedExtraction is more than the cost of SinglePass Extraction, but less than that of using the full context against each of the prompts (the default). SummarizedExtraction avoids the limitations of SinglePass Extraction, while still maintaining good accuracy and keeping costs down.

A note on chunking

For extraction uses case that Unstract is designed for, it's always a good idea to avoid chunking. You should only consider it if the document in question will never fit into the input context window of the chosen LLM.

You need to think about document data extraction from unstructured documents differently from regular RAG (Retrieval Augmented Generation) use cases. Most high volume document data extraction uses cases use documents that are a few pages long and so and almost always, 100% accuracy is targeted and is achievable as well. This is the reason, we need to operate with full context vs. retrieved chunked context.

This advice against chunking is for accuracy reasons. While Unstract supports chunking, retrieval is generally the weakest link in any RAG application and can severely impact the overall quality of the extraction.

To understand more about chunking, please read our Chunking Guide.

Comparison with SinglePass Extraction​

How SummarizedExtraction works​

A note on chunking​

Comparison with SinglePass Extraction

How SummarizedExtraction works

A note on chunking