Debezium 0.10.0.Beta4 Released

Incubating Cassandra Connector

If you have been following this blog lately, you’ll have read about the latest addition to the Debezium family in Joy Gao’s excellent posts about the new connector (part 1, part 2).

In case you haven’t read those yet, we’d highly recommend to do so in order to learn more about the challenges encountered when implementing a CDC connector for a distributed datastore such as Cassandra as well as the design decisions made in order to come up with a first "minimal viable product". Joy also did a great talk at QCon last year, which touches on the topic of CDC for Cassandra.

Having been originally developed internally at long-term Debezium user WePay, the WePay team decided to open-source their work, put it under the Debezium umbrella and continue to evolve it there. That’s really great news for the Debezium community! We couldn’t be happier about this contribution and look forward to evolving this new connector together in the open.

At this point the Cassandra connector is in "incubating" state, i.e. its design and implementation are still pretty much in flux, the event structure which it creates may change in future releases etc. Note that, unlike the other Debezium connectors, this one currently is not based on Kafka Connect. Instead, it is implemented as a standalone process running on Cassandra node(s) themselves. Refer to the blog posts linked above for the reasoning behind this design and possible future developments around this. Needless to say, any ideas and contributions in this area will be highly welcomed.

Together with the connector we’ve also provided an initial draft of the connector documentation; this is still work-in-progress and will be amended in the next few days.

Further New Features

The Postgres connector supports the metrics known from SQL Server and Oracle now (DBZ-777). When using the SQL Server connector, it is now ensured that tables are snapshotted in a deterministic order, as defined by the given table whitelist configuration (DBZ-1254).

There have also been two improvements to our SMTs (single message transformations):

The SMT for new record state extraction allows to add additional columns for propagating metadata fields from the source block (DBZ-1395, e.g. useful to propagate the transaction into sink tables).
The default structure produced by the outbox routing SMT has been further streamlined (DBZ-1385); the message value will now only contain the contents of the configured outbox table payload column. In case you want to re-add the eventType value, you can configure it as an "additional field", which either goes into the message as a header (recommended) or into the message value, which as before will be a nested structure then.

Bugfixes and Other Improvements

Finally, here’s an overview of asorted bugfixes in the 0.10 Beta4 release:

The MySQL connector handles GRANT DELETE ON <table> statements correctly (DBZ-1411)
Superfluous tables scans are avoided when using the initial_schema_only snapshot strategy with SQL Server (DBZ-1417)
The superfluous creation of connections is avoided when obtaining the xmin position of Postgres (DBZ-1381)
The new record state extraction SMT handles heartbeat events correctly (DBZ-1430)

Please refer to the 0.10.0.Beta4 release notes for the complete list of addressed issues and the upgrading procedure.

A big thank you goes out to all the contributors from the Debezium community who worked on this release: Joy Gao, Renato Mefi and Guillaume Rosauro!

Gunnar Morling

Gunnar is a software engineer and open-source enthusiast by heart, currently working as a Technologist at Confluent. Previously, he helped to build a realtime stream processing platform based on Apache Flink and led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct. Gunnar is an avid blogger (morling.dev) and has spoken at various conferences like QCon, Java One, and Devoxx. He lives in Hamburg, Germany.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.