Debezium Blog

Subatomic & Effortless Change Data Capture with Debezium Extensions for Quarkus Subatomic & Effortless Change Data Capture with Debezium Extensions for Quarkus

August 6, 2025 by Giovanni Panice

cdc debezium graalvm mandrel performance native embedded engine quarkus

When I started working on Debezium, two questions came to mind: Is it possible to build a native version of Debezium? Can I receive change data capture (CDC) events directly inside my microservice without relying on additional infrastructure?

This led us to work on a new Debezium stream: I’m excited to announce the first release of Debezium Extensions for Quarkus!

Data pipeline troubleshooting: Root cause analysis through lineage metadata Data pipeline troubleshooting: Root cause analysis through lineage metadata

July 21, 2025 by Fiore Mario Vitale

releases cdc integration openlineage data lineage

Remember when debugging streaming data pipelines felt like playing detective in a crime scene where the evidence kept moving? Well, grab your magnifying glass because we’re about to turn you into Sherlock Holmes of the streaming world. After our introduction to OpenLineage integration with Debezium, it’s time to roll up our sleeves and get our hands dirty with some real detective work. We’ll build a complete order processing pipeline that captures database changes with Debezium, processes them through Apache Flink, and tracks every breadcrumb of data lineage using OpenLineage and Marquez – because losing track of your data is like losing your keys, except infinitely more embarrassing in production.

Case definition

In this showcase, we demonstrate how to leverage lineage metadata to troubleshoot issues in data pipelines. Our e-commerce order processing pipeline, despite its simplicity, effectively illustrates the benefits of lineage metadata for operational monitoring and debugging. We will simulate a configuration change in the Debezium connectors that causes the order processing job to skip records. Using the lineage graph, we’ll navigate through the pipeline components to identify the root cause of the problem and understand how metadata tracking enables faster issue resolution.

Native data lineage in Debezium with OpenLineage Native data lineage in Debezium with OpenLineage

June 13, 2025 by Fiore Mario Vitale

releases cdc integration openlineage data lineage

The modern data landscape bears little resemblance to the centralized databases and simple ETL processes of the past. Today’s organizations operate in environments characterized by diverse data sources, real-time streaming, microservices architectures, and multi-cloud deployments. What began as straightforward data flows from operational systems to reporting databases has evolved into complex networks of interconnected pipelines, transformations, and dependencies. The shift from ETL to ELT patterns, the adoption of data lakes, and the proliferation of streaming platforms like Apache Kafka have created unprecedented flexibility in data processing. However, this flexibility comes at a cost: understanding how data moves, transforms, and evolves through these systems has become increasingly challenging.

Understanding data lineage

Data lineage is the process of tracking the flow and transformations of data from its origin to its final destination. It essentially maps the "life cycle" of data, showing where it comes from, how it’s changed, and where it ends up within a data pipeline. This includes documenting all transformations, joins, splits, and other manipulations the data undergoes during its journey.

At its core, data lineage answers critical questions: Where did this data originate? What transformations has it undergone? Which downstream systems depend on it? When issues arise, where should teams focus their investigation?

Debezium Management Platform: Simplifying Change Data Capture Debezium Management Platform: Simplifying Change Data Capture

April 4, 2025 by Fiore Mario Vitale

releases cdc operator platform UI

A year ago we began this incredible journey to create a modern approach to Change Data Capture. We had the desire to create a tool where you can focus on your data, defining how it flows from sources to destinations, with a pipeline-based approach. All this paired with a new and modern user interface to simplify interaction with it.

We named it Debezium Management Platform, or if you prefer, just Debezium Platform.

We are excited that Debezium 3.1 is the first official release of this years-long effort.

Real-time Data Replication with Debezium and Python Real-time Data Replication with Debezium and Python

February 1, 2025 by Ismail Simsek

debezium python data engineering dlt dlthub cdc data replication pydbzengine

When it comes to replicating operational data for analytics, Change Data Capture (CDC) is the gold standard. It offers scalability, near real-time performance, and captures all data modifications, ensuring your analytical datasets are always up-to-date. Debezium is a leading tool in this space, connecting to a wide range of databases and exporting CDC events in various formats like JSON and Avro, making integration with diverse systems a breeze.

While Debezium itself is a Java-based project, the data engineering world increasingly relies on Python. This blog post demonstrates how to leverage Debezium within a Python environment, using pydbzengine. We’ll explore how to use these technologies to build a robust and scalable CDC solution.

DDD Aggregates via CDC-CQRS Pipeline using Kafka & Debezium DDD Aggregates via CDC-CQRS Pipeline using Kafka & Debezium

February 4, 2023 by Purnima Jain

ddd cdc cqrs debezium kafka

In this post, we are going to talk about a CDC-CQRS pipeline between a normalized relational database, MySQL, as the command database and a de-normalized NoSQL database, MongoDB, as the query database resulting in the creation of DDD Aggregates via Debezium & Kafka-Streams.