Skip to main content

ETL Pipeline Introduction

Unstract's ETL Pipeline extracts and transforms unstructured data from various sources and loads it into databases.

Key Features

Smart File Handling

Unstract's ETL pipelines are designed for maximum efficiency and reliability:

  • Keeps track of new files in each pipeline run to avoid reprocessing
  • Skips duplicate files (same name, path, and content) automatically
  • Executes files in parallel for faster batch processing
  • Retries execution on failure to ensure reliability

Getting Started

To create your first ETL pipeline, see Set Up ETL Pipeline

Execution and Monitoring

Execution Modes:

  • Manual: Instant execution via UI (ETL Pipeline > Actions > Sync Now)
  • Scheduled: Automated cron-based execution configured in pipeline settings
  • API: REST integration via ETL Pipeline > Actions > Download Postman Collection

Monitoring:

  • Real-time status: Dashboard view and run details
  • Execution history: MANAGE > Logs > ETL Sessions

Notifications:

  • ETL Pipeline > Actions > Setup Notifications

Next Steps