We are pleased to announce the release of Debezium 2.6.0.Beta1. We enter the home stretch with this release, packed with many improvements, enhancements, bug fixes, and yes a brand new Db2 connector for iSeries. There is a lot to cover in this release, so lets dive right in!

Breaking changes

The team aims to avoid any potential breaking changes between minor releases; however, such changes are sometimes inevitable.

Oracle

In older versions of Debezium, users were required to manually install the ojdbc8.jar JDBC driver. With 2.6, the connector now bundles the Oracle JDBC driver with the connector, so manual installation is no longer necessary (DBZ-7364).

We’ve also updated the driver to version 21.11.0.0, please verify that you do not have multiple versions after upgrading to Debezium 2.6 (DBZ-7365).

Container Images

The handling of the MAVEN_DEP_DESTINATION environment variable has changed in the connect-base container image, which is the basis for debezium/connect. It is no longer used for downloading all dependencies, including connectors, but only for general purpose Maven Central located dependencies (DBZ-7551). If you were using custom images that relied on this environment variable, your image build steps may require modifications.

Improvements and changes

Db2 for iSeries connector

Debezium 2.6 introduces a brand-new connector for IBM fans to stream changes from Db2 iSeries/AS400 using the IBM iJournal system. This collaboration is a multi-year development effort from the community, and we’re pleased that the community has allowed this to be distributed under the Debezium umbrella.

The new connector can be obtained from Maven Central using the following coordinates or a direct download.

<dependency>
    <groupId>io.debezium</groupId>
    <artifactId>debezium-connector-ibmi</artifactId>
    <version>2.6.0.Beta1</version>
</dependency>

The documentation for this new connector is still a work-in-progress. If you have any questions, please be sure to reach out to the team on Zulip or the mailing list.

Incremental snapshot row-value constructors for PostgreSQL

The PostgreSQL driver supports a SQL syntax called a row-value constructor using the ROW() function. This allows a query to express predicate conditions in a more efficient way when working with multi-columned primary keys that have a suitable index. The incremental snapshot process is an ideal candidate for the use of the ROW() function, the process involves issuing a series of select SQL statements to fetch data in chunks. Each statement, aka chunk query, should ideally be as efficient as possible to minimize the cost overhead of these queries to maximize throughput of your WAL changes to your topics.

There are no specific changes needed, but the query issued for PostgreSQL incremental snapshots has been adjusted to take advantage of this new syntax, and therefore users who utilize incremental snapshots should see performance improvements.

An example of the old query used might look like this for a simple table:

SELECT *
  FROM users
 WHERE (a = 10 AND (b > 2 OR b IS NULL)) OR (a > 10) OR (a IS NULL)
 ORDER BY a, b LIMIT 1024

The new implementation constructs this query using the ROW() function as follows:

SELECT *
  FROM users
 WHERE row(a,b) > row(10,2)
ORDER BY a, b LIMIT 1024

We’d be interested in any feedback on this change, and what performance improvements are observed.

Signal table watermark metadata

An incremental snapshot process requires a signal table to write open/close markers to coordinate the change boundaries with the data recorded in the transaction logs, unless you’re using MySQL’s read-only flavor. In some cases, users would like to be able to track the window time slot, knowing when the window was opened and closed.

Starting with Debezium 2.6, the data column in the signal table will be populated with the time window details, allowing users to obtain when the window was opened and closed. The following shows the details of the data column for each of the two signal markers:

Window Open Marker
{"openWindowTimestamp": "<window-open-time>"}
Window Close Marker
{"openWindowTimestamp": "<window-open-time>", "closeWindowTimestamp": "<window-close-time>"}

Oracle Redo SQL per event with LogMiner

We have improved the Oracle connector’s event structure for inserts, updates, and deletes to optionally contain the SQL that was reconstructed by LogMiner in the source information block. This feature is an opt-in only feature that you must enable as this can easily more than double the size of your existing event payload.

To enable the inclusion of the REDO SQL as part of the change event, add the following connector configuration:

"log.mining.include.redo.sql": "true"

With this option enabled, the source information block contains a new field redo_sql, as shown below:

"source": {
  ...
  "redo_sql": "INSERT INTO \"DEBEZIUM\".\"TEST\" (\"ID\",\"DATA\") values ('1', 'Test');"
}

This feature cannot be used with lob.enabled set to true due to how LogMiner reconstructs the SQL related to CLOB, BLOB, and XML data types. If the above configuration is added with lob.enabled set to true, the connector will start with an error about this misconfiguration.

Oracle LogMiner transaction buffer improvements

A new delay-strategy for transaction registration has been added when using LogMiner. This strategy effectively delays the creation of the transaction record in the buffer until we observe the first captured change for that transaction.

For users who use the Infinispan cache or who have enabled lob.enabled, this delayed strategy cannot be used due to how specific operations are handled in these two modes of the connector.

Delaying transaction registration has a number of benefits, which include:

  • Reducing the overhead on the transaction cache, especially in a highly concurrent transaction scenario.

  • Avoids long-running transactions that have no changes that are being captured by the connector.

  • Should aid in advancing the low-watermark SCN in the offsets more efficiently in specific scenarios.

We are looking into how we can explore this change for Infinispan-based users in a future build; however, due to the nature of how lob.enabled works with LogMiner, this feature won’t be possible for that use case.

Improved event timestamp precision

Debezium 2.6 introduces a new community requested feature to improve the precision of timestamps in change events. Users will now notice the addition of 4 new fields, two at the envelope level and two in the source information block as shown below:

{
  "source": {
    ...,
    "ts_us": "1559033904863123",
    "ts_ns": "1559033904863123000"
  },
  "ts_us": "1580390884335451",
  "ts_ns": "1580390884335451325",
}

The envelope values will always provide both microsecond (ts_us) and nanosecond (ts_ns) values while the source information block may have both micro and nano -second precision values truncated to a lower precision if the source database does not provide that level of precision.

Informix appends LSN to Transaction Identifier

Informix databases only increases the transaction identifier when there are concurrent transactions, otherwise the value remains identical for sequential transactions. This can prove difficult for users who may want to utilize the transaction metadata to order change events in a post processing step.

Debezium 2.6 for Informix will now append the log sequence number (LSN) to the transaction identifier so that users can easily sort change events based on the transaction metadata. The transaction identifier field will now use the format <id>:<lsn>. This change affects transaction metadata events and the source information block for change events, as shown below:

Transaction Begin Event
{
  "status": "BEGIN",
  "id": "571:53195829",
  ...
}
Transaction End Event
{
  "status": "END",
  "id": "571:53195832",
  ...
}
Change Events
{
  ...
  "source": {
    "id": "571:53195832"
    ...
  }
}

New Arbitrary-based payload formats

While it’s common for users to utilize serialization based on Json, Avro, Protobufs, or CloudEvents, there may be reasons to use a more simplistic format. Thanks to a community contribution as part of DBZ-7512, Debezium can be configured to use two new formats called simplestring and binary.

The simplestring and binary formats are configured in Debezium server using the debezium.format configurations. For simplestring, the payload will be serialized as a single STRING data type into the topic. For binary, the payload will be serialized as a BYTES using a byte[] (byte array).

Oracle LogMiner Hybrid Mining Strategy

Debezium 2.6 also introduces a new Oracle LogMiner mining strategy called hyrid, which can be enabled by setting the configuration property log.mining.strategy with the value of hybrid. This new strategy is designed to support all schema evolution features of the default mining strategy while taking advantage of all the performance optimizations from the online catalog strategy.

The main problem with the online_catalog strategy is that if a mining step observes a schema change and a data change in the same mining step, LogMiner is incapable of reconstructing the SQL correctly, which will result in the table name being OBJ# xxxxxx or the columns represented as COL1, COL2, and so on. To avoid this while using the online catalog strategy, users are recommended to perform schema changes in a lock-step pattern to avoid a mining step that observes both a schema change and a data change together; however, this is not always feasible.

The new hybrid strategy works by tracking a table’s object id at the database level and then using this identifier to look up the schema associated with the table from Debezium’s relational table model. In short, this allows Debezium to do what Oracle LogMiner is unable to do in these specific corner cases. The table name will be taken from the relational model’s table name and columns will be mapped by column position.

Unfortunately, Oracle does not provide a way to reconstruct failed SQL operations for CLOB, BLOB, and XML data types. This means that the new hybrid strategy cannot be configured with configurations that use lob.enabled set to true. If a connector is started using the hybrid strategy and has lob.enabled set to true, the connector will fail to start and report a configuration failure.

Other changes

Altogether, 86 issues were fixed in this release:

  • MySQL config values validated twice DBZ-2015

  • PostgreSQL connector doesn’t restart properly if database if not reachable DBZ-6236

  • NullPointerException in MongoDB connector DBZ-6434

  • Tests in RHEL system testsuite throw errors without ocp cluster DBZ-7002

  • Move timeout configuration of MongoDbReplicaSet into Builder class DBZ-7054

  • Several Oracle tests fail regularly on Testing Farm infrastructure DBZ-7072

  • Remove obsolete MySQL version from TF DBZ-7173

  • Add Oracle 23 to CI test matrix DBZ-7195

  • Refactor sharded mongo ocp test DBZ-7221

  • Implement Snapshotter SPI Oracle DBZ-7302

  • Align snapshot modes for SQLServer DBZ-7303

  • Update snapshot mode documentation DBZ-7309

  • Cassandra-4: Debezium connector stops producing events after a schema change DBZ-7363

  • Upgrade ojdbc8 to 21.11.0.0 DBZ-7365

  • Document relation between column type and serializers for outbox DBZ-7368

  • Callout annotations rendered multiple times in downstream User Guide DBZ-7418

  • Test testEmptyChangesProducesHeartbeat tends to fail randomly DBZ-7453

  • Align snapshot modes for PostgreSQL, MySQL, Oracle DBZ-7461

  • PreparedStatement leak in Oracle ReselectColumnsProcessor DBZ-7479

  • Allow special characters in signal table name DBZ-7480

  • Document toggling MariaDB mode DBZ-7487

  • Poor snapshot performance with new reselect SMT DBZ-7488

  • Debezium Oracle Connector ParsingException on XMLTYPE with lob.enabled=true DBZ-7489

  • Add informix to main repository CI workflow DBZ-7490

  • Db2ReselectColumnsProcessorIT does not clean-up after test failures DBZ-7491

  • Disable Oracle Integration Tests on GitHub DBZ-7494

  • Unify and adjust thread time outs DBZ-7495

  • Completion callback called before connector stop DBZ-7496

  • Add "IF [NOT] EXISTS" DDL support for Oracle 23 DBZ-7498

  • Deployment examples show attribute name instead of its value DBZ-7499

  • Fix MySQL 8 event timestamp resolution logic error where fallback to seconds occurs erroneously for non-GTID events DBZ-7500

  • Remove incubating from Debezium documentation DBZ-7501

  • Add ability to parse Map<String, Object> into ConfigProperties DBZ-7503

  • LogMinerHelperIT test shouldAddCorrectLogFiles randomly fails DBZ-7504

  • Support Oracle 23 SELECT without FROM DBZ-7505

  • Add Oracle 23 Annotation support for CREATE/ALTER TABLE statements DBZ-7506

  • TestContainers MongoDbReplicaSetAuthTest randomly fails DBZ-7507

  • MySQl ReadOnlyIncrementalSnapshotIT testStopSnapshotKafkaSignal fails randomly DBZ-7508

  • Add Informix to Java Outreach DBZ-7510

  • Disable parallel record processing in DBZ server tests against Apicurio DBZ-7515

  • Add Start CDC hook in Reselect Columns PostProcessor Tests DBZ-7516

  • Remove the unused 'connector' parameter in the createSourceTask method in EmbeddedEngine.java DBZ-7517

  • Update commons-compress to 1.26.0 DBZ-7520

  • Promote JDBC sink from Incubating DBZ-7521

  • Allow to download containers also from Docker Hub DBZ-7524

  • Update rocketmq version DBZ-7525

  • signalLogWithEscapedCharacter fails with pgoutput-decoder DBZ-7526

  • Move RocketMQ dependency to debezium server DBZ-7527

  • Rework shouldGenerateSnapshotAndContinueStreaming assertions to deal with parallelization DBZ-7530

  • Multi-threaded snapshot can enqueue changes out of order DBZ-7534

  • AsyncEmbeddedEngineTest#testTasksAreStoppedIfSomeFailsToStart fails randomly DBZ-7535

  • MongoDbReplicaSetAuthTest fails randomly DBZ-7537

  • SQLServer tests taking long time due to database bad state DBZ-7541

  • Explicitly import jakarta dependencies that are excluded via glassfish filter DBZ-7545

  • ReadOnlyIncrementalSnapshotIT#testStopSnapshotKafkaSignal fails randomly DBZ-7553

  • Include RocketMQ and Redis container output into test log DBZ-7557

  • Allow XStream error ORA-23656 to be retried DBZ-7559

  • Numeric default value decimal scale mismatch DBZ-7562

  • Wait for Redis server to start DBZ-7564

  • Documentation conflict DBZ-7565

  • Fix null event timestamp possible from FORMAT_DESCRIPTION and PREVIOUS_GTIDS events in MySqlStreamingChangeEventSource::setEventTimestamp DBZ-7567

  • AsyncEmbeddedEngineTest.testExecuteSmt fails randomly DBZ-7568

  • Debezium fails to compile with JDK 21 DBZ-7569

  • Upgrade PostgreSQL driver to 42.6.1 DBZ-7571

  • Upgrade Kafka to 3.7.0 DBZ-7574

  • Redis tests fail randomly with JedisConnectionException: Unexpected end of stream DBZ-7576

  • RedisOffsetIT.testRedisConnectionRetry fails randomly DBZ-7578

  • Oracle connector always brings OLR dependencies DBZ-7579

  • Correct JDBC connector dependencies DBZ-7580

  • Improved logging in case of PostgreSQL failure DBZ-7581

  • Unavailable Toasted HSTORE Json Storage Mode column causes serialization failure DBZ-7582

  • Reduce debug logs on tests DBZ-7588

  • Server SQS sink doesn’t support quick profile DBZ-7590

  • Oracle Connector REST Extension Tests Fail DBZ-7597

  • Serialization of XML columns with NULL values fails using Infinispan Buffer DBZ-7598

Outlook & What’s next?

The next few weeks will be focused primarily on stability and bug fixes. We expect to release Debezium 2.6.0.Final in just under three weeks, so we courage you to download and test the latest Beta and provide your feedback.

If you have any questions or interested in what the roadmap holds for not only 2.6 but also the road to the new Debezium 3.0 later this fall, we encourage you to take a look at our road map. If you have any suggestions or ideas, please feel free to get in touch with us on our mailing list or in our Zulip chat.

And in closing, our very own Mario Vitale will be speaking at Open Source Day 2024, where he will talk about Dealing with data consistency - a CDC approach to dual writes. Please be sure to check out his session on Day 1 as a part of the Beta track at 10:45am!

Until next time…​

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.