Debezium Blog

When it comes to replicating operational data for analytics, Change Data Capture (CDC) is the gold standard. It offers scalability, near real-time performance, and captures all data modifications, ensuring your analytical datasets are always up-to-date. Debezium is a leading tool in this space, connecting to a wide range of databases and exporting CDC events in various formats like JSON and Avro, making integration with diverse systems a breeze.

While Debezium itself is a Java-based project, the data engineering world increasingly relies on Python. This blog post demonstrates how to leverage Debezium within a Python environment, using pydbzengine. We’ll explore how to use these technologies to build a robust and scalable CDC solution.

In this post, we are going to talk about a CDC-CQRS pipeline between a normalized relational database, MySQL, as the command database and a de-normalized NoSQL database, MongoDB, as the query database resulting in the creation of DDD Aggregates via Debezium & Kafka-Streams.