AvailableOpen to Work

Building Scalable Data Infrastructure

Data Engineer specializing in high-throughput ETL pipelines, real-time analytics, and cloud architecture.

Workspace
Data Engineer

Khang Do

Building data systems that scale.

Open to Work

2+ Years

Professional Experience

7 Projects Delivered

Infrastructure as Code

Automated provisioning and deployment.

etl_pipeline.pypython
# Airflow DAG Pipeline
from airflow import DAG
from airflow.operators import \
    PythonOperator

with DAG('etl_daily',
     schedule='@daily',
     start_date=datetime(2024,1,1)):

    extract = PythonOperator(
        task_id='extract')

GitHub Activity

Contributions & Commits

Active contributor
150+ commits

7

Projects Completed

ETLData WarehouseWeb ScrapingAutomation
Projects

Data Pipeline Automation

Airflow
# Airflow DAG Pipeline
from airflow import DAG
from airflow.operators \
  import PythonOperator
from datetime import datetime

with DAG('etl_daily',
    schedule='@daily',
    start_date=datetime(2024,1,1),
    catchup=False):

  extract = PythonOperator(
    task_id='extract',
    python_callable=extract_data)

  transform = PythonOperator(
    task_id='transform', 
    python_callable=transform_data)

  load = PythonOperator(
    task_id='load',
    python_callable=load_to_dwh)

  validate = PythonOperator(
    task_id='validate',
    python_callable=run_dq_checks)

  # Task dependencies
  extract >> transform
  transform >> load
  load >> validate

Key Achievements

Career highlights

End-to-end ELT Pipeline

DBT + Airflow + BigQuery

Automated Data Sync

Real-time sync pipeline

Web Scraping System

Playwright + Python

Cloud Data Architecture

BigQuery + Fivetran

Tech Stack

Tools & Technologies

Python
SQL
Airflow / DBT
BigQuery / Snowflake
Docker / CI/CD

Technical Skills

Core competencies in data engineering, cloud infrastructure, and modern development tools.

Languages

Python90%
SQL85%
TypeScript75%
Bash70%

Data Tools

Apache Airflow85%
DBT85%
Apache Spark75%
Kafka70%

Cloud & Infra

AWS80%
GCP / BigQuery80%
Docker85%
Kubernetes65%

Databases

PostgreSQL85%
Snowflake75%
MongoDB70%
Redis70%

Featured Projects

A selection of data engineering projects showcasing end-to-end pipeline development, automation, and cloud infrastructure expertise.

Enterprise Data Sync Pipeline

End-to-end data synchronization pipeline for employee, member roles, and account management with comprehensive failure logging.

  • Designed Role/Member-Role Entity schema
  • Built Employee & Account Role sync pipeline
  • Implemented failure logging system
ETLDatabase DesignData Sync

ELT Pipeline with DBT

Modern Data Stack implementation with containerized ELT system, automated workflows, and real-time data quality monitoring.

  • 100% automated daily workflows via Airflow
  • Star Schema transformation with DBT
  • Data Quality Framework (6 dimensions)
DBTAirflowDockerStar Schema

CMS Migration Automation

Test automation framework for CMS migration using SeleniumBase, enabling automated QC and content verification.

  • Automated login & page creation flows
  • QC scripts for CMS migration
  • Content consistency verification
SeleniumBasePythonQA Automation

Legal Data Platform

Backend development and refactoring for EU legal data platform, including web crawlers, authentication, and subscription systems.

  • Refactored EU legal web crawlers
  • Improved authentication & billing
  • Role-based access control system
PythonWeb ScrapingBackendRBAC

Cloud Analytics Audit

Data stack audit and optimization for analytics infrastructure, including BigQuery, Fivetran, and BI tool evaluation.

  • Audited data architecture
  • Resolved cloud data warehouse connectivity issues
  • Hex vs Looker Studio comparison
BigQueryFivetranLookerMixpanel

Experience & Growth

Career progression focusing on data engineering, cloud infrastructure, and building scalable data systems.

Data Engineering

3.74/5 - Exceeding Expectation
Talos Squad2024 - Present

Focusing on Modern Data Stack, architecting end-to-end containerized ELT pipelines using Airflow and DBT.

  • Built 100% automated daily data workflows
  • Implemented real-time data quality monitoring
  • Led data infrastructure design for enterprise clients

Data Pipeline Architect

Enterprise Data Sync & Cloud Audit Projects2024

Designed and implemented data synchronization pipelines and audited cloud data architectures for enterprise clients.

  • Designed Role/Member Entity database schema for sync pipeline
  • Resolved cloud data warehouse connectivity issues
  • Created comprehensive architecture documentation

Backend & Automation Engineer

Legal Data Platform & CMS Migration Projects2024

Refactored backend systems and built automation frameworks for testing and content migration.

  • Refactored web crawlers for EU legal data sources
  • Built automated QC scripts for CMS migration
  • Improved authentication & subscription billing systems

B.Sc. Information Technology

HCMC University of Technology and Education (HCMUTE)2021 - 2025

Specialized in software development with focus on data systems and cloud computing.

  • Relevant coursework: Database Systems, Data Structures, Cloud Computing
  • Completed capstone project on data pipeline automation
  • Active member of IT Club & participated in hackathons

Learning Path

Cloud Data Engineering (Target)

AWS/GCP

CI/CD & DevOps for Data Pipelines

Self-learning

Infrastructure as Code & Kubernetes

Self-learning

Career Goal

Lead the design and implementation of end-to-end data infrastructures for complex projects. Master the Modern Data Stack ecosystem to deliver optimized, scalable solutions.

Khang Do

2+ Years

Experience

12+

Projects Delivered

99.9%

Uptime Achieved

About Me

Data Engineer passionate about building scalable systems

I specialize in designing and implementing end-to-end data infrastructures using the Modern Data Stack. My focus is on building ETL/ELT pipelines with tools like Airflow, DBT, and Spark, while ensuring data quality and system reliability.

Currently working as a data engineering, I've delivered solutions for clients across various industries including legal tech, real estate analytics, and enterprise data management.

Vietnam
dokhang1703@gmail.com
Available for opportunities