Skip to content

Welcome to Data Develop & Engineer

Disclaimer: This docs add-on my opinion from Data Engineer experience and experiment around ~5 years (Since 2019).

Important

I do not have much proper English grammar because I am in the middle level of trying to practice writing and reading. Please understand this problem and open your mind before continue this documents 🥹

This project will deliver all Practice and Knowledge of Data Developer and Engineer area.


Getting Started

First, Data Engineering is a critical part of the Data Lifecycle that enables organizations to manage and process large volumes of data efficiently and reliably3. By these concepts, Data Engineer should design and implement Data Pipeline and Data Management Strategy that meet the requirements of their organizations and ensure that their data is managed Consistently and Reliably.

What is DE do?

Data Engineer is who able to Develop, Operate, and Maintain of Data Infrastructure, either On-Premises or Clouds, comprising databases, storages, compute engines, and pipelines.1

Life Cycle of Data Engineering
Life Cycle of Data Engineering

Fundamentals of Data Engineering

Data Engineering is the development, implementation, and maintenance of systems and processes that take in raw data and produce high-quality, consistent information that supports downstream use cases, such as analysis and machine learning. Data engineering is the intersection of security, data management, DataOps, data architecture, orchestration, and software engineering.

A Data Engineer manages the Data Engineering Lifecycle, beginning with getting data from source systems and ending with serving data for use cases, such as analysis or machine learning.

— Joe Reis and Matt Housley in Fundamentals of Data Engineering

You will see that stages of the cycle include Data Ingestion, Data Transformation, Data Serving, and Data Storage.

Best practice Importance
Proactive data monitoring Regularly checks datasets for anomalies to maintain data integrity. This includes identifying missing, duplicate, or inconsistent data entries.
Schema drift management Detects and addresses changes in data structure, ensuring compatibility and reducing data pipeline breaks.
Continuous documentation Manages descriptive information about data, aiding in discoverability and comprehension.
Data security measures Controls and monitors access to data sources, enhancing security and compliance.
Version control and backups Tracks change to datasets over time, aiding in reproducibility and audit trails.

When I started on this role, I got the idea about the future of my responsibilities. I know the Data Engineering tools will shift so fast because the last three year I start with the map-reduce processing on Hadoop HDFS but nowadays, it changes to in-memory processing like Impala or Spark.

The 2023 MAD (ML/AI/DATA) Landscape

You will see the right picture, the 2023 MAD (ML/AI/Data) Landscape 2, that show about how many possibility tools that able to use on your project. It has many area that you should to choose which one that match with the current architect or fit with your cost planing model.


Finally, the below diagram shows how the focus areas of Data Engineering Shift as the analytics organization evolves. That mean Data Engineer does not create a part of data ingestion or serving only. When data engineering tools change very quickly, The focus of data engineers has changed as well.

Data Engineering Shift
Data Engineering Shift

Based upon this illustration, we can observe three distinct focus areas for the role:

  • Data Infrastructure: One example of a problem being solved in this instance might be setting up a spark cluster for users to issue HQL queries against data on S3.

  • Data Integration: An example task would be creating a dataset via SQL query, joining tens of other datasets, and then scheduling the query to run daily using the orchestration framework.

  • Data Accessibility: An example could be enabling end-users to analyze significant metrics movements in a self-serve manner.


Additional, the trend of Modern Data Stack will make the Data Engineering so easy and force you have the time to focus on Business Problem. In the another hand, Business Users able to use less of technical knowledge to interact the Data in the Data Platform that make less of SLA to require Data Engineer a lot! 🥳

You can follow the Modern Data Stack on this topics:


Roles

In the future, if I do not in love with communication or management level skill that make me be Lead Data Engineer, I will go to any specialize roles such as,

  • Data Platform Engineer


    Data Platform Engineer

    Read More about Data Architect

  • DataOps Engineer


    DataOps Engineer

    Read More about DataOps

  • MLOps Engineer


    MLOps Engineers Build and Maintain a platform to enable the development and deployment of machine learning models. They typically do that through standardization, automation, and monitoring.

    MLOps Engineers reiterate the platform and processes to make the machine learning model development and deployment quicker, more reliable, reproducible, and efficient.

    Read More about MLOps

  • Analytic Engineer


    Analytic Engineer is who make sure that companies can understand their data and use it to Solve Problems, Answer Questions, or Make Decisions.

    Read More about Analytic Engineer

The role from above, I reference from Types of Data Professionals4.


Communities

This below is the list of Communities that you must join for keep update knowledge for Developer and Data Engineer trends.

  • Data Engineering


    The Medium Tag for Data Engineering knowledge and solutions

  • Data Engineer Cafe


    An Area of Discussing Blog for Data Engineer like talk to your close friend at the Cafe

  • ODDS Team


    The Medium Group that believes software development should be joyful and advocates deliberate practice

  • TPA Roadmap


    Community Driven Roadmaps, Articles and Resources for developers in Thailand

  • TestDriven


    Learn to build high-quality web apps with best practices

  • Second Brain


    My inspiration Data Engineering document website.


  1. Information of this quote reference from What is Data Engineering? 

  2. The 2023 MAD (ML/AI/DATA) Landscape 

  3. Unlocking the Power of Data: A Beginner’s Guide to Data Engineering 

  4. Types of Data Professionals, credit to Kevin Rosamont Prombo for creating the Infographic