What's needed to get started
Choose an Unstract edition
To get started with Unstract, you'll first need to get access to one of the available Unstract editions:
- Unstract Cloud Edition (Easiest to get started with; offers a free trial.)
- Unstract Open Source Edition
- Unstract On-Premise Edition
The following pages contain common instructions for you to get started with any edition of Unstract. Where there are differences in features, they are clearly pointed out.
Initial set up
You'll need access to four services to started (Don't worry, these are easy to set up and are available for free to get started!):
- LLM or Large Language Model service: this service takes in raw text, helps reason and structure specific data/fields we care about. We will use OpenAI for this.
- Embedding Model: this service assigns special codes to words based on their meaning and helps organize data within a document or data source. This is useful when we need to extract relevant portions—based on a user's query or specific data we need to extract—from large documents. We will use OpenAI for this as well.
- Vector Database: this service works in conjunction with the embedding model to actually store special representations of documents and retrieve portions we're interested in from them. To put it in simple words, the embedding model contains the logic to organize data and vector databases help in storing and retrieving it. We will use Qdrant Cloud (pronounced "quadrant") for this.
- Text Extractor: This service, much like its name implies, extracts text from documents and images, typically PDFs. These PDFs can be native text or composed simply of scanned images. OCR is typically built into these services. However, in the near future, we're also adding the ability to used 3rd party OCR services.