The temperatures are slowly cooling off after the biggest summer heat, an the Debezium community is happy to announce the release of Debezium 0.10.0.Beta4. In this release we’re happy to share some news we don’t get to share too often: with Apache Cassandra, another database gets added to the list of databases supported by Debezium!

In addition, we finished our efforts for rebasing the existing Postgres connector to Debezium framework structure established for the SQL Server and Oracle connectors. This means more shared coded between these connectors, and in turn reduced maintenance efforts for the development team going forward; but there’s one immediately tangible advantage for you coming with this, too: the Postgres connector now exposes the same metrics you already know from the other connectors.

Finally, the new release contains a range of bugfixes and other useful improvements. Let’s explore some details below.

Incubating Cassandra Connector

If you have been following this blog lately, you’ll have read about the latest addition to the Debezium family in Joy Gao’s excellent posts about the new connector (part 1, part 2).

In case you haven’t read those yet, we’d highly recommend to do so in order to learn more about the challenges encountered when implementing a CDC connector for a distributed datastore such as Cassandra as well as the design decisions made in order to come up with a first "minimal viable product". Joy also did a great talk at QCon last year, which touches on the topic of CDC for Cassandra.

Having been originally developed internally at long-term Debezium user WePay, the WePay team decided to open-source their work, put it under the Debezium umbrella and continue to evolve it there. That’s really great news for the Debezium community! We couldn’t be happier about this contribution and look forward to evolving this new connector together in the open.

At this point the Cassandra connector is in "incubating" state, i.e. its design and implementation are still pretty much in flux, the event structure which it creates may change in future releases etc. Note that, unlike the other Debezium connectors, this one currently is not based on Kafka Connect. Instead, it is implemented as a standalone process running on Cassandra node(s) themselves. Refer to the blog posts linked above for the reasoning behind this design and possible future developments around this. Needless to say, any ideas and contributions in this area will be highly welcomed.

Together with the connector we’ve also provided an initial draft of the connector documentation; this is still work-in-progress and will be amended in the next few days.

Further New Features

The Postgres connector supports the metrics known from SQL Server and Oracle now (DBZ-777). When using the SQL Server connector, it is now ensured that tables are snapshotted in a deterministic order, as defined by the given table whitelist configuration (DBZ-1254).

There have also been two improvements to our SMTs (single message transformations):

  • The SMT for new record state extraction allows to add additional columns for propagating metadata fields from the source block (DBZ-1395, e.g. useful to propagate the transaction into sink tables).

  • The default structure produced by the outbox routing SMT has been further streamlined (DBZ-1385); the message value will now only contain the contents of the configured outbox table payload column. In case you want to re-add the eventType value, you can configure it as an "additional field", which either goes into the message as a header (recommended) or into the message value, which as before will be a nested structure then.

Bugfixes and Other Improvements

Finally, here’s an overview of asorted bugfixes in the 0.10 Beta4 release:

  • The MySQL connector handles GRANT DELETE ON <table> statements correctly (DBZ-1411)

  • Superfluous tables scans are avoided when using the initial_schema_only snapshot strategy with SQL Server (DBZ-1417)

  • The superfluous creation of connections is avoided when obtaining the xmin position of Postgres (DBZ-1381)

  • The new record state extraction SMT handles heartbeat events correctly (DBZ-1430)

Please refer to the 0.10.0.Beta4 release notes for the complete list of addressed issues and the upgrading procedure.

A big thank you goes out to all the contributors from the Debezium community who worked on this release: Joy Gao, Renato Mefi and Guillaume Rosauro!

Gunnar Morling

Gunnar is a software engineer at Decodable and an open-source enthusiast by heart. He has been the project lead of Debezium over many years. Gunnar has created open-source projects like kcctl, JfrUnit, and MapStruct, and is the spec lead for Bean Validation 2.0 (JSR 380). He’s based in Hamburg, Germany.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.