Getting Started

This guide walks you through a practical example of how easily Unstract can help you extract structured data from unstructured documents that have 4 different variants that look very different from each other. We'll do with this minimal effort, leveraging the power of Large Language Models.

Choose an Unstract edition

To get started with Unstract, you'll first need to get access to one of the available Unstract editions:

Unstract Cloud Edition (Easiest to get started with; offers a free trial.)
Unstract Open Source Edition
Unstract On-Premise Edition

The following pages contain common instructions for you to get started with any edition of Unstract. Where there are differences in features, they are clearly pointed out.

Your trial comes with batteries included

If you chose to go the Unstract Cloud route, to have you hit the ground running, when you sign up for Unstract's 14-day free trial, your account comes pre-configured with an LLM, a vector database, an embedding model and LLMWhisperer for text extraction. Of course, you are welcome to configure additional services that are integrated with Unstract if you plan to run a deeper trial.

The Quick Start Exercise

Let's consider that you want to extract structured data from a bunch of credit card statements from different issuers. We're taking credit card statements as an example since no matter what your background, it's safe to assume that you'll know what a typical credit card statement looks like and what key points of information it might contain, since credit cards are fairly common part of our lives. At the same time, we know that every issuer has their own format and even for the same issuer, the format of the statement can keep changing from time to time. So, it's a pretty decent challenge to use such statements to build our first Unstract project.

img Credit Card Statement Samples

Credit card statements are typically emailed to users as PDF documents. Like most unstructured documents, these statements, although most of them consist of the same bits of key information (customer name, customer address, issuer name, statement date, list of spends, etc), they come in wildly different formatting and lengths like we discussed before. It has never been easy to get data from these varied types of statements into a database or into an application in structured form for easy querying, analysis or visualization. The Unstract Platform lets you do this with no code needed by leveraging the power of Large Language Models.

Not only will we build a simple, generic parser for credit card statements, we will also deploy this parser as both an API (to which you can send a PDF statement and get JSON data back) and also as an ETL pipeline (which can structure PDF statements and push data into a data warehouse or database for further analysis).

Choose an Unstract edition​

Your trial comes with batteries included​

The Quick Start Exercise​

Choose an Unstract edition

Your trial comes with batteries included

The Quick Start Exercise