2025 | Streaming Ingestion, Idempotent Upserts, Analytics-Ready Warehousing

Designed and implemented a CDC-style analytics pipeline to reliably capture insert, update, and delete events from an OLTP system and materialize them into analytics-ready warehouse tables. The pipeline supports replayability, fault tolerance, and historical correctness, enabling near real-time analytics without sacrificing data integrity.
CDC systems must handle out-of-order events, late-arriving updates, retries, and deletes while maintaining a consistent downstream state. Naive ingestion approaches can lead to duplicated records, missed deletes, or broken history. The challenge was to build a system that remained correct under replays and failures while supporting scalable analytics workloads.
Built a fault-tolerant CDC architecture with the following design:
Core components and tooling used:
Data Engineer
Nov 2025 – Present
1 member (solo project)
Delivered a replayable, fault-tolerant CDC pipeline with reliable upserts and deletes, preserved historical state, and near real-time freshness for downstream analytics and dashboards.