Skip to content

Tools

Warning

I will filter Data Engineering Tools on this session that do not dynamically and flexibility for the most Data Architect and Modern Data Strack.

Note

This session groups any Open-Soure Tools base on Modern Data Stack concept. Some topic I found the tools from the ReStack

This tools topic, I will focus with below contents:

  • Setting Connections
  • Implement its Features
  • Tuning & Optimization

Tools Stacks

The tools stacks choice for each Data Architecture that fit with cost and easy to implement for small to large scale.

    • Dagster or Mage.ai for orchestration (TBD)
    • Polars for lightning fast ETL workloads
    • Delta Lake as the storage layer
    • DuckDB as the analytical SQL interface
    • Rill or Evidence for data viz (TBD)

Tools Comparison

Open Table

File Format

Data Ingestion

Modern Data Stack: Reverse ETL

Computing

Dataframe API

DuckDB vs Polars

DuckDB vs Polars

Read More: Benchmarking Python Processing Engines: Who’s the Fastest?

Data Quality

https://medium.com/@brunouy/a-guide-to-open-source-data-quality-tools-in-late-2023-f9dbadbc7948

Data Orchestration

Apache Airflow vs Mage.ai in Data Engineering