ETL Pipeline Introduction
Unstract's ETL Pipeline extracts and transforms unstructured data from various sources and loads it into databases.
Key Features
-
Extract from Filesystem Sources
-
Transform data using Tool exported from Prompt Studio
-
Load to popular Database Destinations
Smart File Handling
Unstract's ETL pipelines are designed for maximum efficiency and reliability:
- Keeps track of new files in each pipeline run to avoid reprocessing
- Skips duplicate files (same name, path, and content) automatically
- Executes files in parallel for faster batch processing
- Retries execution on failure to ensure reliability
Getting Started
To create your first ETL pipeline, see Set Up ETL Pipeline
Execution and Monitoring
Execution Modes:
- Manual: Instant execution via UI (ETL Pipeline > Actions > Sync Now)
- Scheduled: Automated cron-based execution configured in pipeline settings
- API: REST integration via ETL Pipeline > Actions > Download Postman Collection
Monitoring:
- Real-time status: Dashboard view and run details
- Execution history: MANAGE > Logs > ETL Sessions
Notifications:
- ETL Pipeline > Actions > Setup Notifications