Welcome to the newest edition of the Debezium community newsletter, in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.
It’s been a long time since our last edition. But we are back again! In case you missed our last edition, you can check it out here.
Due to the ongoing global pandemic, all the conferences, and meet-ups have gone virtual. On the bright side, this means you get to attend some nice events from the comfort of your couch:
Apache Pinot meet-up — "Analyzing Real-time Order Deliveries using CDC with Debezium and Pinot" by Kenny Bastani and Gunnar Morling
MongoDB.Live — "Dissecting our Legacy: The Strangler Fig Pattern with Apache Kafka, Debezium and MongoDB" by Hans-Peter Grahsl and Gunnar Morling
If you’d like to have a session on Debezium at your virtual meetup or conference, please get in touch!
There have been several blog posts about Debezium lately; here are some of the latest ones that you should not miss:
Capturing Every Change From Shopify’s Sharded Monolith by John Martin and Adam Bellemare
Streaming Vitess at Bolt by Kewei Shang, and Ruslan Gibaiev
Saga Orchestration for Microservices Using the Outbox Pattern by Gunnar Morling
Enhancing the outbox pattern with Kafka Streams by Hinrik Örn Sigurðsson
Kubernetes-Run Analytics at the Edge: Postgres, Kafka, Debezium by Jonathan Katz
Understanding Non-Key Joins With the Quarkus Extension for Kafka Streams by Anisha Mohanty
A series of really insightful blog posts about Debezium and change data capture in general by Dunith Dhanushka:
Change Data Analysis with Debezium and Apache Pinot by Kenny Bastani
Change Data Capture with Flink SQL and Debezium by Marta Paes
Change Data Capture at DeviantArt by Ruslan Danilin
And if watching a talk is more your kind of thing, here’s the recording of the session Change Data Streaming Patterns in Distributed Systems from this year’s Berlin Buzzwords, by Gunnar Morling and Hans-Peter Grahsl:
Please also check out our compiled list of resources around Debezium for even more related posts, articles, podcasts and presentations.
A few cool integrations and usages of Debezium appeared over the last few weeks and months. Here are several ones which we found especially fascinating:
If you are getting started with Debezium, you can get hands-on learning and better understanding of how things work from the examples and demos in our examples repository. We have introduced several new examples and updated the existing ones. Out of which we’d like to highlight some new additions:
If you are interested in showcasing a new demo or an example, please send us a GitHub pull request or reach out to us directly through our community channels found here.
Time to Upgrade
Debezium version 1.6.0.Final was released last week. Apart from Debezium Server sinks for Apache Kafka and Pravega, the 1.6 release brought a brand-new feature for incremental and ad-hoc snapshots, providing long-awaited capabilities like resuming long-running snapshots after a connector restart, Re-snapshotting selected tables during streaming, and snapshotting tables newly added to the list of captured tables after changing the filter configuration. A big shout-out to Netflix engineers Andreas Andreakis and Ioannis Papapanagiotou for their paper DBLog: A Watermark Based Change-Data-Capture Framework, upon which incremental snapshotting is based.
Given the long time since the last community newsletter, it’s also worth mentioning some of the new features added in Debezium 1.5, released in April this year: the MySQL connector saw a substantial rewrite, now also supporting transaction marker events, Debezium’s LogMiner-based CDC implementation for Oracle was declared stable, and we’ve added support for Redis Streams to Debezium Server.
If you are using an older version, we urge you to check out the latest major release. For details on all the bug fixes, enhancements, and improvements, check out the release-notes.
The Debezium team has also begun active development on the next version, 1.7. The major focus in 1.7 is implementing incremental snapshotting for more connectors (MongoDB, Oracle), reworking the transaction buffer for the Oracle connector, and expanding the Debezium UI. For details on the further upcoming release check out the Debezium roadmap.
You can keep track of bug fixes, enhancements, and changes that will be coming up in the 1.7 release by visiting our releases page.
Questions and Answers
Getting started with a huge, and an existing code base can be intimidating, but we want to make sure that the process of getting started is extremely easy and smooth for you here. We are now a vibrant community with 270+ contributors overall, and we welcome all kinds of community contributions, discussions, and enhancements. As a beginner you can grab some of the issues labeled with
easy-starter if you want to dive in quickly. Below is a list of issues that are open to grab:
Document "schema.include.list"/"schema.exclude.list" for SQL Server connector (DBZ-2793)
Limit log output for "Streaming requested from LSN" warnings (DBZ-3007)
Create smoke test to make sure Debezium Server container image works (DBZ-3226)
Add signal table automatically to include list (DBZ-3293)
Implement support for JSON_TABLE in MySQL parser (DBZ-3575)
Implement window function in MySQL parser (DBZ-3576)
Standardize "snapshot.fetch.size default" values across connectors (DBZ-3694)
If you are new to open source, please check out our contributing guidelines to get started!
Call to Action
Our community users page includes a variety of organizations that are currently using Debezium. If you are a user of Debezium, and would like to be included, please send us a GitHub pull request or reach out to us directly through our community channels found here.
And if you haven’t yet done so, please consider adding a ⭐ for the GitHub repo; keep them coming, we’re almost at 5,000 stars!
Also, we’d like to learn about your requirements for future Debezium versions. In particular, we’d be very curious about your feedback on the CDC-based Sagas approach mentioned above. Is it something you’d like to see supported in our Quarkus extension for instance? Please let us know about this, as well as any other feedback you may have, via the Debezium mailing list.
Lastly, we’re planning to continue our interview series Debezium Community Stories With…; so if you got exciting stories to tell about your usage of Debezium, please reach out!
And as always, stay safe, and healthy. Wish you and your loved ones good health and strength.
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.