Debezium Blog

In November last year, we announced we were looking for reinforcements for the team. And I have two pieces of news for you today: a good one and an even better one.

As you are probably well aware, Gunnar Morling has stepped down from his position as Debezium project lead and is now pursuing new exciting adventures. It is sad, but every cloud has a silver lining!

What can it be? We (the Debezium team and Red Hat) are hiring! Are you a community contributor? Do you have any pull requests under your belt? Are you a happy Debezium user and eager to do more, or are you a seasoned Java developer looking for work in an exciting and inclusive open-source environment?

Some time in early 2017, I got a meeting invite from Debezium’s founder, Randall Hauch. He was about to begin a new chapter in his professional career and was looking for someone to take over as the project lead for Debezium. So we hopped on a call to talk things through, and I was immediately sold on the concept of change data capture, its large number of potential use cases and applications, and the idea of making this available to the community as open-source. After some short consideration I decided to take up this opportunity, and without a doubt this has been one of the best decisions I’ve ever made in my job.

When developing the tests for your project, sooner or later you will probably get into the situation when some of the tests fail randomly. These tests, also known as flaky tests, are very unpleasant as you never know if the failure was random or there is a regression in your code. In the worst case you just ignore these tests because you know they are flaky. Most of the testing frameworks even have a dedicated annotation or other means to express that the test is flaky and if it fails, the failure should be ignored. The value of such a test is very questionable. The best thing you can do with such a test is of course to fix it so that it doesn’t fail randomly. That’s easy to say, but harder to do. The hardest part is usually to make the test fail in your development environment so that you can debug it and understand why it fails and what is the root cause of the failure. In this blog post I’ll try to show a few techniques which may help you to simulate random test failures on you local machine.

As you probably noticed, we have started work on Debezium 2.0. One of the planned changes for the 2.0 release is to switch to Java 11 as a baseline. While some Java build providers still support Java 8, other Java 8 distributions already reached their end of life/support. Users are moving to Java 11 anyways, as surveys like New Relic’s State of the Java Ecosystem Report indicate. But it is not only matter of support: Java 11 comes with various performance improvements, useful tools like JDK Flight Recorder, which was open-sourced in Java 11, and more. So we felt it was about time to start thinking about using a more recent JDK as the baseline for Debezium, and the new major release is a natural milestone when to do the switch.

Hi everyone, my name is Vojtěch Juránek and I recently joined the Debezium team.

Most of my professional IT career I’ve spent at Red Hat. I have a background in particle physics, but I did quite a lot programming even before joining Red Hat, when working on simulations of high-energy particle collisions and their data analysis. The science is by default open and all software I was using was open source as well. Here started my love for open source.

At ScyllaDB, we develop a high-performance NoSQL database Scylla, API-compatible with Apache Cassandra, Amazon DynamoDB and Redis. Earlier this year, we introduced support for Change Data Capture in Scylla 4.3. This new feature seemed like a perfect match for integration with the Apache Kafka ecosystem, so we developed the Scylla CDC Source Connector using the Debezium framework. In this blogpost we will cover the basic structure of Scylla’s CDC, reasons we chose the Debezium framework and design decisions we made.

Welcome to the latest edition of "Debezium Community Stories With…​", a series of interviews with members of the Debezium and change data capture community, such as users, contributors or integrators. Today it’s my pleasure to talk to Sergei Morozov.

Welcome to the newest edition of the Debezium community newsletter, in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.

It’s been a long time since our last edition. But we are back again! In case you missed our last edition, you can check it out here.

Hello everyone, my name is Anisha Mohanty and I recently joined Red Hat and the Debezium team.

I started my journey with Red Hat in April 2020 after completing my graduation. I was introduced to open source in my early college days, but I wasn’t aware of how organizations work and wanted to get the essence of open source ethics and values. That is something that I am fascinated to learn as I joined Red Hat.

My work started under the Data Virtualization team with Teiid and then under the GRAPHQLCRUD project which is a standard for a generic query interface on top of GraphQL. The project has started well and is in great shape right now. We have successfully added CRUD capabilities, paging, and filtering specifications.

Coming to Debezium, I first heard about it as some DV members started contributing here, well back then it was a completely new thing for me. I started exploring more, and it was not long when I had my first interaction with Gunnar and Jiri. With a warm welcome and great team here, I am really excited to work with the Debezium Community.

Over the last five years, Debezium has become a leading open-source solution for change data capture for a variety of databases. Users from all kinds of industries work with Debezium for use cases like replication of data from operational databases into data warehouses, updating caches and search indexes, driving streaming queries via Kafka Streams or Apache Flink, synchronizing data between microservices, and many more.

When talking to Debezium users, we generally receive very good feedback on the range of applications enabled by Debezium and its flexibility: e.g. each connector can be configured and fine-tuned in many ways, depending on your specific requirements. A large number of metrics provide deep insight into the state of running Debezium connectors, allowing to safely operate CDC pipelines also in huge installations with thousands of connectors.

All this comes at the cost of a learning curve, though: users new to Debezium need to understand the different options and settings as well as learn about best practices for running Debezium in production. We’re therefore constantly exploring how the user experience of Debezium can be further improved, allowing people to set up and operate its connectors more easily.

Welcome to the first edition of "Debezium Community Stories With…​", a new series of interviews with members of the Debezium and change data capture community, such as users, contributors or integrators. We’re planning to publish more parts of this series in a loose rhythm, so if you’d like to be part of it, please let us know. In today’s edition it’s my pleasure to talk to Renato Mefi, a long-time Debezium user and contributor.

Hello everyone, my name is René Kerner and I recently joined Red Hat and the Debezium team.

I was working at trivago since 2011, and in 2016 we started using Debezium at version 0.4/0.5 for capturing clickstreams in the offshore datacenters into Kafka and aggregate them in the central cluster. We really intensified Debezium usage within one year and in 2017 we also used it for trivago’s main data.

In 2014 I did my first OSS contributions to Composer, PHP’s dependency management and gave my first talk on it at the Developer Conference (called code.talks for many years now). Then in 2017 I did my first contributions to Debezium with work on the MySQL snapshot process and fixing a MySQL TIME data type issue.

In 2018 I left trivago and started working at Codecentric as a consultant for software architecture and development (mainly JVM focus) and Apache Kafka, doing many trainings and workshops at German "Fortune 500" companies (insurances, industrial sector, media). I was doing lots of networking at that time, where I learned how awesome the community around Kafka is. I was always quite sad I didn’t have more time to focus on OSS projects.

Welcome to the latest edition of the Debezium community newsletter, in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.

In case you missed our last edition, you can check it out here.

Welcome to the Debezium community newsletter in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.

This past summer has been a super exciting time for the team. Not only have we been working hard on Debezium 0.10 but we have unveiled some recent changes to debezium.io.

Welcome to the first edition of the Debezium community newsletter in which we share blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.

Hello everyone, my name is Chris Cranford and I recently joined the Debezium team.

My journey at Red Hat began just over three years ago; however I have been in this line of work for nearly twenty years. All throughout my career, I have advocated and supported open source software. Many of my initial software endeavors were based on open source software, several which are still heavily used today such as Hibernate ORM.

When I first learned about the Debezium project last year, I was very excited about it right away.

I could see how this project would be very useful for many people out there and I was very impressed by the professional way it was set up: a solid architecture for change data capture based on Apache Kafka, a strong focus on robustness and correctness also in the case of failures, the overall idea of creating a diverse eco-system of CDC connectors. All that based on the principles of open source, combined with extensive documentation from day one, a friendly and welcoming web site and a great getting-started experience.

So you can imagine that I was more than enthusiastic about the opportunity to take over the role of Debezium’s project lead. Debezium and CDC have close links to some data-centric projects I’ve been previously working on and also tie in with ideas I’ve been pursuing around CQRS, even sourcing and denormalization. As core member of the Hibernate team at Red Hat, I’ve implemented the initial Elasticsearch support for Hibernate Search (which deals with full-text index updates via JPA/Hibernate). I’ve also contributed to Hibernate OGM - a project which connects JPA and the world of NoSQL. One of the plans for OGM is to create a declarative denormalization engine for creating read models optimized for specific use cases. It will be very interesting to see how this plays together with the capabilities provided by Debezium.

Just before I started the Debezium project in early 2016, Martin Kleppmann gave several presentations about turning the database inside out and how his Bottled Water project demonstrated the importantance that change data capture can play in using Kafka for stream processing. Then Kafka Connect was announced, and at that point it seemed obvious to me that Kafka Connect was the foundation upon which practical and reusable change data capture can be built. As these techniques and technologies were becoming more important to Red Hat, I was given the opportunity to start a new open source project and community around building great CDC connectors for a variety of databases management systems.

Over the past few years, we have created Kafka Connect connectors for MySQL, then MongoDB, and most recently PostgreSQL. Each were initially limited and had a number of problems and issues, but over time more and more people have tried the connectors, asked questions, answered questions, mentioned Debezium on Twitter, tested connectors in their own environments, reported problems, fixed bugs, discussed limitations and potential new features, implemented enhancements and new features, improved the documentation, and wrote blog posts. Simply put, people with similar needs and interests have worked together and have formed a community. Additional connectors for Oracle and SQL Server are in the works, but could use some help to move things along more quickly.

It’s really exciting to see how far we’ve come and how the Debezium community continues to evolve and grow. And it’s perhaps as good a time as any to hand the reigns over to someone else. In fact, after nearly 10 wonderful years at Red Hat, I’m making a bigger change and as of today am part of Confluent’s engineering team, where I expect to play a more active role in the broader Kafka community and more directly with Kafka Connect and Kafka Streams. I definitely plan to stay involved in the Debezium community, but will no longer be leading the project. That role will instead be filled by Gunnar Morling, who’s recently joined the Debezium community but has extensive experience in open source, the Hibernate community, and the Bean Validation specification effort. Gunnar is a great guy and an excellent developer, and will be an excellent lead for the Debezium community.