Debezium is a distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Now the good news — Debezium 0.1 is now available and includes several significant features:

  • A connector for MySQL to monitor MySQL databases. It’s a Kafka Connect source connector, so simply install it into a Kafka Connect service (see below) and use the service’s REST API to configure and manage connectors to each DBMS server. The connector reads the MySQL binlog and generates data change events for every committed row-level modification in the monitored databases. The MySQL connector generates events based upon the tables' structure at the time the row is changed, and it automatically handles changes to the table structures.

  • A small library so applications can embed any Kafka Connect connector and consume data change events read directly from the source system. This provides a much lighter weight system (since Zookeeper, Kafka, and Kafka Connect services are not needed), but as a consequence is not as fault tolerant or reliable since the application must maintain state normally kept inside Kafka’s distributed and replicated logs. Thus the application becomes completely responsible for managing all state.

Although Debezium is really intended to be used as turnkey services, all of Debezium’s JARs and other artifacts are available in Maven Central. Detailed information about the features, tasks, and bugs are outlined in our release notes.

To make it easier to use a Debezium’s connector inside your own Kafka Connect service, we created a plugin archive (in both zip and tar.gz formats) that includes all JARs used by the connector not already included in Kafka Connect 0.9.0.1. Simply download, extract to your Kafka Connect 0.9.0.1 installation, and add all of the JARs to the service’s classpath. Once the service is restarted, you can then use the REST API to configure and manage connector instances that monitor the databases of your choice. MySQL connector plugin archive is located in Maven Central, so it’s even possible to use Maven to build a customized Kafka Connect service. We’ll generate these plugins for future connectors, too.

The Debezium platform has a lot of moving parts in Zookeeper, Kafka, and Kafka Connect. To make it much easier for you to try it out and play with it, we created Docker images and a tutorial that walks you through using Debezium. First, it has you use Docker to start a container for each of these services and a MySQL server with an example "inventory" database. It shows you how to use the RESTful API to register a connector to monitor the inventory database, how to watch the streams of data changes for various tables, and how changing the database produces new change events with very low latency. It also walks you through shutting down the Kafka Connect service, changing data while the service is not monitoring the database, and then restarting the Kafka Connect service to see how all of the data changes that occurred while the service was not running are still captured correctly in the streams. This tutorial really is a great way to interactively learn the basics of Debezium and change data capture.

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve the MySQL connector and add more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue. We plan to release 0.2 very soon with at least one additional connector.

Thanks to Emmanuel, Chris, Akshath, James, and Paul for their help with the release, questions, and discussions!

Randall Hauch

Randall is an open source software developer at Red Hat, and has been working in data integration for almost 20 years. He is the founder of Debezium and has worked on several other open source projects. He lives in Edwardsville, IL, near St. Louis.

     


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.