The old saying is "April showers bring May flowers"; however, in this case it seems a new Debezium release has sprouted packed with many new features. We’re pleased to announce the release of Debezium 2.7.0.Alpha2, the next pre-release in the Debezium 2.7 stream, is now available for testing.

This release includes new ROW_ID serialization for the Oracle connector, PostgreSQL array support for the JDBC sink connector, NATs authentication with Debezium Server, performance improvements with Oracle LogMiner and large tables, and more. Let’s walk through the highlights of this release and discuss these and more in-depth…​

New features and improvements

Debezium 2.7.0.Alpha2 also introduces many improvements and features, lets take a look at each individually.

Oracle ROW_ID included in change events

While ROW_ID is not unique across all rows of a table for the table’s lifespan, it can be used in certain situations when the lifecycle of the table and rows are managed in a very strict way. At the community’s request, we’ve added a new row_id field to the Oracle connector’s change event source information block (DBZ-4332). This new field will be populated with the ROW_ID value under the following conditions:

  • Only populated from streaming events for inserts, updates, and deletes.

  • Snapshot evnets will not contain a row_id value.

  • Only provided by the LogMiner and XStream streaming adapters, OpenLogReplicator is not supported.

Any event that does not match the criteria will not include a row_id field as its marked as optional.

PostreSQL Arrays with the JDBC sink

The JDBC sink connector supports the use of mapping source columns to Kafka ARRAY-based payload field types. With Debezium 2.7, you can now serialize ARRAY-based fields to a target PostgreSQL database, with no change in configuration. The new support should be completely transparent (DBZ-7752).

Oracle flush table with custom schema names

In prior versions of Debezium, the Oracle connector was strictly designed to create the LogMiner flush table in the default tablespace of the connector user account. This wasn’t always useful in situations where the user’s default tablespace may not be the ideal destination and the DBA would prefer that table to exist in a separate tablespace.

Previously, users would need to modify the user account or use a new user with the correct tablespace to have the table created in the right tablespace location. With Debezium 2.7, this is no longer required, and you can safely include the name of the target schema/tablespace in the configuration (DBZ-7819).

Example using a custom schema name
log.mining.flush.table.name=THE_OTHER_SCHEMA.LOG_MINING_FLUSH_TABLE

The schema name is optional and if not supplied, the connector continues to use the same legacy behavior of creating the flush table and checking for its existence in the user’s default tablespace.

NATS authentication with JWT/seed

The Debezium Server NATs streaming sink adapter was improved, supporting JWT/seed based authentication (DBZ-7829). To get started using the JWT/seed-based authentication, supply the following necessary values in the configuration:

JWT authentication example
debezium.sink.nats-jetstream.auth.jwt=<your_jwt_token>
Seed authentication example
debezium.sink.nats-jetstream.auth.seed=<your_nkey_seed>

For this and more, please see the NATS documentation for details about JWT and NKey seed based authentication.

Oracle query filter with large numbers of tables

The Debezium Oracle connector can support thousands of tables in a single connector deployment with ease; however, you may have found you wanted to customize the query filter using the IN mode. This mode is used in situations where you may have a high volume of changes for other tables and you want to filter that dataset out at the database level before the changes are passed to Debezium for processing.

In earlier versions, users may have noticed that setting log.mining.query.filter.mode with a value of in and where your table include list contained more than 1000 elements generated a SQL error. Oracle does not permit more than 1000 elements within an in-clause; however, Debezium 2.7 addresses this limitation by using a disjunction between multiple buckets of 1000 item in-clause lists (DBZ-7847).

Other changes

Altogether, 27 issues were fixed in this release. Here are a list of some additional noteworthy changes:

  • Log exception details early in case MySQL keep-alive causes deadlock on shutdown DBZ-7570

  • Extend mongodb system tests with ssl option DBZ-7605

  • > io.debezium.text.ParsingException : SQL Contains Partition DBZ-7805

  • Ad-hoc blocking snapshot not working through file channeling without inserting a row in the database. DBZ-7806

  • Postgres: Potential data loss on connector restart DBZ-7816

  • DEBEZIUM_VERSION is wrongly set to 2.6.0.Alpha1 DBZ-7827

  • Sql Server incorrectly applying quoted snapshot statement overrides DBZ-7828

  • Debezium JDBC Sink not handle order correctly DBZ-7830

  • Bump Outbox Extension to Quarkus 3.10.0 DBZ-7842

  • Support Oracle DDL Alter Audit Policy DBZ-7864

  • Support Oracle DDL Create Audit Policy DBZ-7865

What’s next?

We have our team face-to-face next week, and it’s going to be absolutely fantastic since its the first time we get to meet in person since the Covid pandemic. We’re going to use this time to discuss all the community feedback we’re received throughout Debezium 2, reflect on what worked, and put together action plans for what didn’t.

The main focus for our meeting is to develop an action plan for Debezium 3.0 and beyond, assign tasks and priorities across the team so that as we focus on Debezium 3 next quarter, we can make this next major release an easy replacement for the community while also delivering a new, refreshing, feature-rich major version. We will be updating the roadmap and deliverables when we’re back, so be sure to stay tuned to our road map.

In terms of Debezium 2.7, we’re halfway through the quarter, and we’re about to turn our focus on the last half where we will address any bugs, regressions, and polish new features. If you have the chance to test-drive the pre-releases, we strongly encourage you to do so and file bug reports.

Until next time…​

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.