Debezium Blog

Change data capture is a hot topic. Debezium’s goal is to make change data capture easy for multiple DBMSes, but admittedly we’re still a young open source project and so far we’ve only released a connector for MySQL with a connector for MongoDB that’s just around the corner. So it’s great to see how others are using and implementing change data capture. In this post, we’ll review Yelp’s approach and see how it is strikingly similar to Debezium’s MySQL connector.

I’m happy to announce that Debezium 0.2.3 is now available for use with Kafka Connect 0.9.0.1. This release corrects the MySQL connector behavior when working with TINYINT and SMALLINT columns or with TIME, DATE, and TIMESTAMP columns. See our release notes for details of these changes and for upgrading recommendations.

We’ve also updated the Debezium Docker images (with label 0.2 and latest) used in our tutorial.

Thanks to Chris, Christian, Laogang, and Tony for their help with the release, issues, discussions, contributions, and questions! Stay tuned for our next release, which will be 0.3 and will have a new MongoDB connector and will support Kafka Connect 0.10.0.0.

I’m happy to announce that Debezium 0.2.2 is now available. This release fixes several bugs in the MySQL connector that can produce change events with incorrect source metadata, and that eliminates the possibility a poorly-timed connector crash causing the connector to only process some of the rows in a multi-row MySQL event. See our release notes for details of these changes and for upgrading recommendations.

Also, thanks to a community member for reporting that Debezium 0.2.x can only be used with Kafka Connect 0.9.0.1. Debezium 0.2.x cannot be used with Kafka Connect 0.10.0.0 because of its backward incompatible changes to the consumer API. Our next release of Debezium will support Kafka 0.10.x.

We’ve also updated the Debezium Docker images (with label 0.2 and latest) used in our tutorial.

I’m happy to announce that Debezium 0.2.1 is now available. The MySQL connector has been significantly improved and is now able to monitor and produce change events for HA MySQL clusters using GTIDs, perform a consistent snapshot when starting up the first time, and has a completely redesigned event message structure that provides a ton more information with every event. Our change log has all the details about bugs, enhancements, new features, and backward compatibility notices. We’ve also updated our tutorial.

Update (Oct. 11 2019): An alternative, and much simpler, approach for running Debezium (and Apache Kafka and Kafka Connect in general) on Kubernetes is to use a K8s operator such as Strimzi. You can find instructions for the set-up of Debezium on OpenShift here, and similar steps apply for plain Kubernetes.

Our Debezium Tutorial walks you step by step through using Debezium by installing, starting, and linking together all of the Docker containers running on a single host machine. Of course, you can use things like Docker Compose or your own scripts to make this easier, although that would just automating running all the containers on a single machine. What you really want is to run the containers on a cluster of machines. In this blog, we’ll run Debezium using a container cluster manager from Red Hat and Google called Kubernetes.

Kubernetes is a container (Docker/Rocket/Hyper.sh) cluster management tool. Like many other popular cluster management and compute resource scheduling platforms, Kubernetes' roots are in Google, who is no stranger to running containers at scale. They start, stop, and cluster 2 billion containers per week and they contributed a lot of the Linux kernel underpinnings that make containers possible. One of their famous papers talks about an internal cluster manager named Borg. With Kubernetes, Google got tired of everyone implementing their papers in Java so they decided to implement this one themselves :)

Kubernetes is written in Go-lang and is quickly becoming the de-facto API for scheduling, managing, and clustering containers at scale. This blog isn’t intended to be a primer on Kubernetes, so we recommend heading over to the Getting Started docs to learn more about Kubernetes.