Debezium Blog

When I first learned about the Debezium project last year, I was very excited about it right away.

I could see how this project would be very useful for many people out there and I was very impressed by the professional way it was set up: a solid architecture for change data capture based on Apache Kafka, a strong focus on robustness and correctness also in the case of failures, the overall idea of creating a diverse eco-system of CDC connectors. All that based on the principles of open source, combined with extensive documentation from day one, a friendly and welcoming web site and a great getting-started experience.

So you can imagine that I was more than enthusiastic about the opportunity to take over the role of Debezium’s project lead. Debezium and CDC have close links to some data-centric projects I’ve been previously working on and also tie in with ideas I’ve been pursuing around CQRS, even sourcing and denormalization. As core member of the Hibernate team at Red Hat, I’ve implemented the initial Elasticsearch support for Hibernate Search (which deals with full-text index updates via JPA/Hibernate). I’ve also contributed to Hibernate OGM - a project which connects JPA and the world of NoSQL. One of the plans for OGM is to create a declarative denormalization engine for creating read models optimized for specific use cases. It will be very interesting to see how this plays together with the capabilities provided by Debezium.

Just before I started the Debezium project in early 2016, Martin Kleppmann gave several presentations about turning the database inside out and how his Bottled Water project demonstrated the importantance that change data capture can play in using Kafka for stream processing. Then Kafka Connect was announced, and at that point it seemed obvious to me that Kafka Connect was the foundation upon which practical and reusable change data capture can be built. As these techniques and technologies were becoming more important to Red Hat, I was given the opportunity to start a new open source project and community around building great CDC connectors for a variety of databases management systems.

Over the past few years, we have created Kafka Connect connectors for MySQL, then MongoDB, and most recently PostgreSQL. Each were initially limited and had a number of problems and issues, but over time more and more people have tried the connectors, asked questions, answered questions, mentioned Debezium on Twitter, tested connectors in their own environments, reported problems, fixed bugs, discussed limitations and potential new features, implemented enhancements and new features, improved the documentation, and wrote blog posts. Simply put, people with similar needs and interests have worked together and have formed a community. Additional connectors for Oracle and SQL Server are in the works, but could use some help to move things along more quickly.

It’s really exciting to see how far we’ve come and how the Debezium community continues to evolve and grow. And it’s perhaps as good a time as any to hand the reigns over to someone else. In fact, after nearly 10 wonderful years at Red Hat, I’m making a bigger change and as of today am part of Confluent’s engineering team, where I expect to play a more active role in the broader Kafka community and more directly with Kafka Connect and Kafka Streams. I definitely plan to stay involved in the Debezium community, but will no longer be leading the project. That role will instead be filled by Gunnar Morling, who’s recently joined the Debezium community but has extensive experience in open source, the Hibernate community, and the Bean Validation specification effort. Gunnar is a great guy and an excellent developer, and will be an excellent lead for the Debezium community.

We’re happy to announce that Debezium 0.5.0 is now available for use with Kafka Connect 0.10.2.0. This release also includes a few fixes for the MySQL connector. See the release notes for specifics on these changes, and be sure to check out the Kafka documentation for compatibility with the version of the Kafka broker that you are using.

Kafka Connect 0.10.2.0 comes with a significant new feature called Single Message Transforms, and you can now use them with Debezium connectors. SMTs allow you to modify the messages produced by Debezium connectors and any oher Kafka Connect source connectors, before those messages are written to Kafka. SMTs can also be used with Kafka Connect sink connectors to modify the messages before the sink connectors processes them. You can use SMTs to filter out or mask specific fields, add new fields, modify existing fields, change the topic and/or topic partition to which the messages are written, and even more. And you can even chain multiple SMTs together.

Kafka Connect comes with a number of built-in SMTs that you can simply configure and use, but you can also create your own SMT implementations to do more complex and interesting things. For example, although Debezium connectors normally map all of the changes in each table (or collection) to separate topics, you can write a custom SMT that uses a completely different mapping between tables and topics and even add fields to message keys and/or values. Using your new SMT is also very easy - simply put it on the Kafka Connect classpath and update the connector configuration to use it.

We’ve also added Debezium Docker images labelled 0.5 and latest, which we use in our tutorial.

Thanks to Sanjay and everyone in the community for their help with this release, issues, discussions, contributions, and questions!

We’re happy to announce that Debezium 0.4.1 is now available for use with Kafka Connect 0.10.1.1. This release includes several fixes for the MongoDB connector and MySQL connector, including improved support for Amazon RDS and Amazon Aurora (MySQL compatibility). See the release notes for specifics on these changes.

We’ve also updated the Debezium Docker images labelled 0.4 and latest, which we use in our tutorial.

Thanks to Jan, Horia, David, Josh, Johan, Sanjay, Saulius, and everyone in the community for their help with this release, issues, discussions, contributions, and questions!

This post originally appeared on the WePay Engineering blog.

Change data capture has been around for a while, but some recent developments in technology have given it new life. Notably, using Kafka as a backbone to stream your database data in realtime has become increasingly common.

If you’re wondering why you might want to stream database changes into Kafka, I highly suggest reading The Hardest Part About Microservices: Your Data. At WePay, we wanted to integrate our microservices and downstream datastores with each other, so every system could get access to the data that it needed. We use Kafka as our data integration layer, so we needed a way to get our database data into it.

Last year, Yelp’s engineering team published an excellent series of posts on their data pipeline. These included a discussion on how they stream MySQL data into Kafka. Their architecture involves a series of homegrown pieces of software to accomplish the task, notably schematizer and MySQL streamer. The write-up triggered a thoughtful post on Debezium’s blog about a proposed equivalent architecture using Kafka connect, Debezium, and Confluent’s schema registry. This proposed architecture is what we’ve been implementing at WePay, and this post describes how we leverage Debezium and Kafka connect to stream our MySQL databases into Kafka.