Test CLI Script
The LLMWhisperer Test CLI Script is a simple command-line client for LLMWhisperer that allows you to quickly test the document extraction capabilities from your terminal. This tool is perfect for developers who want to integrate LLMWhisperer into their workflows or test the service before implementing it in their applications.
Overview
The Test CLI Script provides a straightforward way to extract text from PDFs, images, and scanned documents directly from the command line. It supports all of LLMWhisperer's extraction modes, allowing you to process different document types with optimal settings. Whether you're working with native text PDFs, scanned documents, forms, or tables, the CLI script offers flexible options to get the best extraction results.
Key Features
- Multiple Document Format Support: Extract text from PDFs, images, and scanned documents
- Flexible Extraction Modes: Choose from
native_text
,low_cost
,high_quality
(default),form
, andtable
modes to optimize extraction for your specific document type - Table Structure Preservation: Optionally preserve and recreate table borders for better structure retention
- Page-Specific Extraction: Extract text from specific pages or page ranges
- Flexible Output Options: Direct output to terminal or save to a file
- Environment-Based Configuration: Securely manage your API key through environment variables
Getting Started
To get started with the Test CLI Script:
-
Prerequisites: Ensure you have Python 3.7+ and pip installed on your system
-
Installation:
git clone https://github.com/Zipstack/llmwhisperer-cli-test-script.git
cd llmwhisperer-cli-test-script
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt -
Configuration: Set up your LLMWhisperer API key by creating a
.env
file in the project directory:LLMWHISPERER_API_KEY=your_api_key_here
-
Basic Usage:
python llmwhisperer_cli.py document.pdf
The Test CLI Script makes it easy to experiment with LLMWhisperer's capabilities and test different extraction modes to find the optimal settings for your documents. Visit the GitHub repository for detailed usage examples, command-line options, and advanced configuration.