Debezium 1.6.0.Beta1 Released

Incremental Snapshotting - SQL Server / Db2

Debezium first introduced incremental snapshotting in 1.6.0.Alpha1. As discussed in this blog post, there are several pain points that exist when running Debezium:

the necessity to execute consistent snapshots before streaming has begun upon connector restarts
inability to trigger full or even partial snapshots after having the connector running for extended periods of time

With this release, this feature has been extended to both the SQL Server and Db2 connectors. We intend to continue to roll this feature out to additional connectors in future releases.

If you would like to try the feature yourself then you need to

provide a signalling table
trigger an ad-hoc incremental snapshot by using a SQL command like

INSERT INTO myschema.debezium_signal VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["schema1.table1", "schema1.table2"]}')

SQL Server Performance Improvement

The SQL Server connector option, source.timestamp.mode, controls how the timestamp for an emitted event is resolved. The default commit setting is designed to resolve the timestamp based on when the change record was committed in the database. It was identified that this method used separate JDBC calls to resolve the timestamp for an event, which caused a loss in both performance and throughput.

This release fixes the commit mode performance problem by moving where the timestamp is resolved. This substantially increases the connector’s performance and throughput while maintaining existing functionality.

We would like to thank Sergei Morozov for identifying and contributing a solution to this problem.

Oracle Large Object Data Types

In the era of "Big Data", its not all that uncommon to use data types such as BLOB and CLOB to store large object data. The Debezium Oracle connector has supported a wide range of data types and we’re happy to report that we’ve now extended that support to cover large both BLOB and CLOB for both the XStream and LogMiner based implementations.

When emitting events that contain BLOB or CLOB data, the memory footprint of the connector as well as the emitted event’s message size will be directly impacted by the size of the large object data. As a result, the connector’s JVM process may require additional memory as well as adjusting some Kafka configurations, such as message.max.bytes.

We encourage the community to test drive the support for these new data types and report any and all feedback.

Other Features

Further fixes and improvements in this release include the following:

The Debezium connector for Oracle now supports ALTER TABLE and DROP TABLE automatically (DBZ-2916)
The Debezium connector for Oracle is tested and validated using ojdbc.jar version 21.1.0.0 (DBZ-3460)
The Debezium connector for MonogDB could lead to lost change events where a long running snapshot was greater than the configured oplog window (DBZ-3331); the connector now validates the oplog position’s existance when streaming starts
The Debezium connector for Cassandra was not responding to schema changes correctly (DBZ-3417)

Altogether, a total of 52 issues have been addressed for this release.

As always, a big thank you to all the community members who contributed: Alfusainey Jallow, Bingqin Zhou, Cao Manh Dat, John Martin, John Wu, Mike, Olivier Jacquemart, Sergei Morozov, SiuFay, Stefan Miklosovic, Thomas Aregger, and Vadzim Ramanenka.

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.