Welcome to the newest edition of the Debezium community newsletter, in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.
It’s been a long time since our last edition. But we are back again! In case you missed our last edition, you can check it out here.
Upcoming Events
Due to the ongoing global pandemic, all the conferences, and meet-ups have gone virtual. On the bright side, this means you get to attend some nice events from the comfort of your couch:
-
Apache Pinot meet-up — "Analyzing Real-time Order Deliveries using CDC with Debezium and Pinot" by Kenny Bastani and Gunnar Morling
-
MongoDB.Live — "Dissecting our Legacy: The Strangler Fig Pattern with Apache Kafka, Debezium and MongoDB" by Hans-Peter Grahsl and Gunnar Morling
If you’d like to have a session on Debezium at your virtual meetup or conference, please get in touch!
Articles
There have been several blog posts about Debezium lately; here are some of the latest ones that you should not miss:
-
Capturing Every Change From Shopify’s Sharded Monolith by John Martin and Adam Bellemare
-
Streaming Vitess at Bolt by Kewei Shang, and Ruslan Gibaiev
-
Saga Orchestration for Microservices Using the Outbox Pattern by Gunnar Morling
-
Application modernization patterns with Apache Kafka, Debezium, and Kubernetes by Bilgin Ibryam
-
Enhancing the outbox pattern with Kafka Streams by Hinrik Örn Sigurðsson
-
Kubernetes-Run Analytics at the Edge: Postgres, Kafka, Debezium by Jonathan Katz
-
Understanding Non-Key Joins With the Quarkus Extension for Kafka Streams by Anisha Mohanty
-
A series of really insightful blog posts about Debezium and change data capture in general by Dunith Dhanushka:
-
Change Data Analysis with Debezium and Apache Pinot by Kenny Bastani
-
Change Data Capture with Flink SQL and Debezium by Marta Paes
-
Change Data Capture at DeviantArt by Ruslan Danilin
And if watching a talk is more your kind of thing, here’s the recording of the session Change Data Streaming Patterns in Distributed Systems from this year’s Berlin Buzzwords, by Gunnar Morling and Hans-Peter Grahsl:
Please also check out our compiled list of resources around Debezium for even more related posts, articles, podcasts and presentations.
Integrations
A few cool integrations and usages of Debezium appeared over the last few weeks and months. Here are several ones which we found especially fascinating:
-
A Debezium Server outbound adaptor for Apache Iceberg
-
The ScyllaDB CDC Source Connector, based on Debezium’s CDC connector framework
-
Bespoke support for the Debezium change event format in Apache Flink
-
Support for Debezium change events in Materialize
Examples
If you are getting started with Debezium, you can get hands-on learning and better understanding of how things work from the examples and demos in our examples repository. We have introduced several new examples and updated the existing ones. Out of which we’d like to highlight some new additions:
If you are interested in showcasing a new demo or an example, please send us a GitHub pull request or reach out to us directly through our community channels found here.
Time to Upgrade
Debezium version 1.6.0.Final was released last week. Apart from Debezium Server sinks for Apache Kafka and Pravega, the 1.6 release brought a brand-new feature for incremental and ad-hoc snapshots, providing long-awaited capabilities like resuming long-running snapshots after a connector restart, Re-snapshotting selected tables during streaming, and snapshotting tables newly added to the list of captured tables after changing the filter configuration. A big shout-out to Netflix engineers Andreas Andreakis and Ioannis Papapanagiotou for their paper DBLog: A Watermark Based Change-Data-Capture Framework, upon which incremental snapshotting is based.
Given the long time since the last community newsletter, it’s also worth mentioning some of the new features added in Debezium 1.5, released in April this year: the MySQL connector saw a substantial rewrite, now also supporting transaction marker events, Debezium’s LogMiner-based CDC implementation for Oracle was declared stable, and we’ve added support for Redis Streams to Debezium Server.
If you are using an older version, we urge you to check out the latest major release. For details on all the bug fixes, enhancements, and improvements, check out the release-notes.
The Debezium team has also begun active development on the next version, 1.7. The major focus in 1.7 is implementing incremental snapshotting for more connectors (MongoDB, Oracle), reworking the transaction buffer for the Oracle connector, and expanding the Debezium UI. For details on the further upcoming release check out the Debezium roadmap.
You can keep track of bug fixes, enhancements, and changes that will be coming up in the 1.7 release by visiting our releases page.
Questions and Answers
-
MongoDB as sink connector not capturing data as expected - kafka?
-
Additional unique index referencing columns not exposed by CDC causes exception
-
Unable to deserialise dynamic json with Jackson using generics
-
Flink: Interrupted while waiting for data to be acknowledged by pipeline
-
Debezium, Kafka connect: is there a way to send only payload and not schema?
Getting Involved
Getting started with a huge, and an existing code base can be intimidating, but we want to make sure that the process of getting started is extremely easy and smooth for you here. We are now a vibrant community with 270+ contributors overall, and we welcome all kinds of community contributions, discussions, and enhancements. As a beginner you can grab some of the issues labeled with easy-starter
if you want to dive in quickly. Below is a list of issues that are open to grab:
-
Document "schema.include.list"/"schema.exclude.list" for SQL Server connector (DBZ-2793)
-
Limit log output for "Streaming requested from LSN" warnings (DBZ-3007)
-
Create smoke test to make sure Debezium Server container image works (DBZ-3226)
-
Add signal table automatically to include list (DBZ-3293)
-
Implement support for JSON_TABLE in MySQL parser (DBZ-3575)
-
Implement window function in MySQL parser (DBZ-3576)
-
Standardize "snapshot.fetch.size default" values across connectors (DBZ-3694)
If you are new to open source, please check out our contributing guidelines to get started!
Call to Action
Our community users page includes a variety of organizations that are currently using Debezium. If you are a user of Debezium, and would like to be included, please send us a GitHub pull request or reach out to us directly through our community channels found here.
And if you haven’t yet done so, please consider adding a ⭐ for the GitHub repo; keep them coming, we’re almost at 5,000 stars!
Also, we’d like to learn about your requirements for future Debezium versions. In particular, we’d be very curious about your feedback on the CDC-based Sagas approach mentioned above. Is it something you’d like to see supported in our Quarkus extension for instance? Please let us know about this, as well as any other feedback you may have, via the Debezium mailing list.
Lastly, we’re planning to continue our interview series Debezium Community Stories With…; so if you got exciting stories to tell about your usage of Debezium, please reach out!
And as always, stay safe, and healthy. Wish you and your loved ones good health and strength.
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.