Debezium Blog

Welcome to the newest edition of the Debezium community newsletter, in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.

It’s been a long time since our last edition. But we are back again! In case you missed our last edition, you can check it out here.

Hello everyone, my name is Anisha Mohanty and I recently joined Red Hat and the Debezium team.

I started my journey with Red Hat in April 2020 after completing my graduation. I was introduced to open source in my early college days, but I wasn’t aware of how organizations work and wanted to get the essence of open source ethics and values. That is something that I am fascinated to learn as I joined Red Hat.

My work started under the Data Virtualization team with Teiid and then under the GRAPHQLCRUD project which is a standard for a generic query interface on top of GraphQL. The project has started well and is in great shape right now. We have successfully added CRUD capabilities, paging, and filtering specifications.

Coming to Debezium, I first heard about it as some DV members started contributing here, well back then it was a completely new thing for me. I started exploring more, and it was not long when I had my first interaction with Gunnar and Jiri. With a warm welcome and great team here, I am really excited to work with the Debezium Community.

Create new topics / pipes

When you are working with Kafka Connect Distributed then you might have realized that once you start Kafka Connect there are already some internal Kafka Connect related topics created for you:

$ kafka-topics.sh --bootstrap-server $HOSTNAME:9092 --list

connect_configs
connect_offsets
connect_statuses

This is done automatically for you by Kafka Connect with a sane, customized default topic configuration that fits the needs of these internal topics.

When you start a Debezium connector the topics for the captured events are created by the Kafka broker based on a default, maybe customized, configuration in the broker if auto.create.topics.enable = true is enabled in the broker config:

auto.create.topics.enable = true
default.replication.factor = 1
num.partitions = 1
compression.type = producer
log.cleanup.policy = delete
log.retention.ms = 604800000  ## 7 days

But often, when you use Debezium and Kafka in a production environment you might choose to disable Kafka’s topic auto creation capability with auto.create.topics.enable = false, or you want the connector topics to be configured differently from the default. In this case you have to create topics for Debezium’s captured data sources upfront.
But there’s good news! Beginning with Kafka Connect version 2.6.0, this can be automated since KIP-158 is implemented to enable customizable topic creation with Kafka Connect.

Hello everyone, my name is René Kerner and I recently joined Red Hat and the Debezium team.

I was working at trivago since 2011, and in 2016 we started using Debezium at version 0.4/0.5 for capturing clickstreams in the offshore datacenters into Kafka and aggregate them in the central cluster. We really intensified Debezium usage within one year and in 2017 we also used it for trivago’s main data.

In 2014 I did my first OSS contributions to Composer, PHP’s dependency management and gave my first talk on it at the Developer Conference (called code.talks for many years now). Then in 2017 I did my first contributions to Debezium with work on the MySQL snapshot process and fixing a MySQL TIME data type issue.

In 2018 I left trivago and started working at Codecentric as a consultant for software architecture and development (mainly JVM focus) and Apache Kafka, doing many trainings and workshops at German "Fortune 500" companies (insurances, industrial sector, media). I was doing lots of networking at that time, where I learned how awesome the community around Kafka is. I was always quite sad I didn’t have more time to focus on OSS projects.

Welcome to the latest edition of the Debezium community newsletter, in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.

In case you missed our last edition, you can check it out here.

Welcome to the Debezium community newsletter in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.

This past summer has been a super exciting time for the team. Not only have we been working hard on Debezium 0.10 but we have unveiled some recent changes to debezium.io.

Welcome to the first edition of the Debezium community newsletter in which we share blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.

Hello everyone, my name is Chris Cranford and I recently joined the Debezium team.

My journey at Red Hat began just over three years ago; however I have been in this line of work for nearly twenty years. All throughout my career, I have advocated and supported open source software. Many of my initial software endeavors were based on open source software, several which are still heavily used today such as Hibernate ORM.

When I first learned about the Debezium project last year, I was very excited about it right away.

I could see how this project would be very useful for many people out there and I was very impressed by the professional way it was set up: a solid architecture for change data capture based on Apache Kafka, a strong focus on robustness and correctness also in the case of failures, the overall idea of creating a diverse eco-system of CDC connectors. All that based on the principles of open source, combined with extensive documentation from day one, a friendly and welcoming web site and a great getting-started experience.

So you can imagine that I was more than enthusiastic about the opportunity to take over the role of Debezium’s project lead. Debezium and CDC have close links to some data-centric projects I’ve been previously working on and also tie in with ideas I’ve been pursuing around CQRS, even sourcing and denormalization. As core member of the Hibernate team at Red Hat, I’ve implemented the initial Elasticsearch support for Hibernate Search (which deals with full-text index updates via JPA/Hibernate). I’ve also contributed to Hibernate OGM - a project which connects JPA and the world of NoSQL. One of the plans for OGM is to create a declarative denormalization engine for creating read models optimized for specific use cases. It will be very interesting to see how this plays together with the capabilities provided by Debezium.

Just before I started the Debezium project in early 2016, Martin Kleppmann gave several presentations about turning the database inside out and how his Bottled Water project demonstrated the importantance that change data capture can play in using Kafka for stream processing. Then Kafka Connect was announced, and at that point it seemed obvious to me that Kafka Connect was the foundation upon which practical and reusable change data capture can be built. As these techniques and technologies were becoming more important to Red Hat, I was given the opportunity to start a new open source project and community around building great CDC connectors for a variety of databases management systems.

Over the past few years, we have created Kafka Connect connectors for MySQL, then MongoDB, and most recently PostgreSQL. Each were initially limited and had a number of problems and issues, but over time more and more people have tried the connectors, asked questions, answered questions, mentioned Debezium on Twitter, tested connectors in their own environments, reported problems, fixed bugs, discussed limitations and potential new features, implemented enhancements and new features, improved the documentation, and wrote blog posts. Simply put, people with similar needs and interests have worked together and have formed a community. Additional connectors for Oracle and SQL Server are in the works, but could use some help to move things along more quickly.

It’s really exciting to see how far we’ve come and how the Debezium community continues to evolve and grow. And it’s perhaps as good a time as any to hand the reigns over to someone else. In fact, after nearly 10 wonderful years at Red Hat, I’m making a bigger change and as of today am part of Confluent’s engineering team, where I expect to play a more active role in the broader Kafka community and more directly with Kafka Connect and Kafka Streams. I definitely plan to stay involved in the Debezium community, but will no longer be leading the project. That role will instead be filled by Gunnar Morling, who’s recently joined the Debezium community but has extensive experience in open source, the Hibernate community, and the Bean Validation specification effort. Gunnar is a great guy and an excellent developer, and will be an excellent lead for the Debezium community.