Getting Started
This guide walks you through a practical example of how easily Unstract can help you extract structured data from unstructured documents that have 4 different variants that look very different from each other. We'll do with this minimal effort, leveraging the power of Large Language Models.
Choose an Unstract edition
To get started with Unstract, you'll first need to get access to one of the available Unstract editions:
- Unstract Cloud Edition (Easiest to get started with; offers a free trial.)
- Unstract Open Source Edition
- Unstract On-Premise Edition
The following pages contain common instructions for you to get started with any edition of Unstract. Where there are differences in features, they are clearly pointed out.
Your trial comes with batteries included
If you chose to go the Unstract Cloud route, to have you hit the ground running, when you sign up for Unstract's 14-day free trial, your account comes pre-configured with an LLM, a vector database, an embedding model and LLMWhisperer for text extraction. Of course, you are welcome to configure additional services that are integrated with Unstract if you plan to run a deeper trial.
The Quick Start Exercise
Let's consider that you want to extract structured data from a bunch of credit card statements from different issuers. We're taking credit card statements as an example since no matter what your background, it's safe to assume that you'll know what a typical credit card statement looks like and what key points of information it might contain, since credit cards are fairly common part of our lives. At the same time, we know that every issuer has their own format and even for the same issuer, the format of the statement can keep changing from time to time. So, it's a pretty decent challenge to use such statements to build our first Unstract project.
Credit card statements are typically emailed to users as PDF documents. Like most unstructured documents, these statements, although most of them consist of the same bits of key information (customer name, customer address, issuer name, statement date, list of spends, etc), they come in wildly different formatting and lengths like we discussed before. It has never been easy to get data from these varied types of statements into a database or into an application in structured form for easy querying, analysis or visualization. The Unstract Platform lets you do this with no code needed by leveraging the power of Large Language Models.
Not only will we build a simple, generic parser for credit card statements, we will also deploy this parser as both an API (to which you can send a PDF statement and get JSON data back) and also as an ETL pipeline (which can structure PDF statements and push data into a data warehouse or database for further analysis).