← Back to Projects Data Warehousing

ETL Orchestration Pipeline

Airflow-based ETL pipeline with multi-source extraction, dbt transformations on DuckDB warehouse, SCD Type 2 snapshots, data quality checks, and lineage tracking.

Apache Airflow
dbt-duckdb
DuckDB
Docker Compose

Overview

Production ETL pipeline with orchestration, transformation, and data quality monitoring.

Architecture

  • Apache Airflow for DAG orchestration
  • Multi-source extraction (CSV, REST API, SQLite)
  • dbt transformations on DuckDB
  • SCD Type 2 slowly changing dimensions
  • Data quality checks and lineage tracking

Key Features

  • Automated pipeline scheduling
  • Data quality validation gates
  • Lineage tracking and documentation
  • SCD Type 2 historical snapshots