Debezium 2.2.0.Alpha1 Released

Breaking Change

An edge case was reported in DBZ-5996 where if a temporal column used ZonedTimestamp and if the column’s value had 0 micro or nanoseconds, rather than emitting the value as 2023-01-19T12:30:00.123000Z, the value would be emitted in a truncated way as 2023-01-19T12:30:00.123Z. This could lead to other issues with converters used in the event pipeline when the output from that column could be formatted inconsistently.

In order to remedy the edge case, the ZonedTimestamp implementation will now pad the fraction-based seconds value of the column’s value to the length/scale of the source database column. Using the example above of a TIMESTAMP(6) MySQL column type, the emitted value will now properly reflect a value of 2023-01-19T12:30:00.123000Z.

While this change in behavior is likely to have minimal impact to most users, we wanted to bring attention to it in the event that you’ve perhaps used other means to handle this edge case in your pipelines. If you have, you should be able to rely on Debezium to emit the value consistently, even when the fraction-based seconds is 0.

Ingesting changes from Oracle logical stand-bys

The Debezium for Oracle connector normally manages what is called a flush table, which is an internal table used to manage the flush cycles used by the Oracle Log Writer Buffer (LGWR) process. This flushing process requires that the user account the connector uses to have permission to create and write to this table. Logical stand-by databases often have more restrictive rules about data manipulation and may even be read-only, therefore, writing to the database is unfavorable or even not permissible.

To support an Oracle read-only logical stand-by database, we introduced a flag to disable the creation and management of this flush table. This feature can be used with both Oracle Standalone and Oracle RAC installations, and is currently considered incubating, meaning its subject to change in the future.

In order to enable Oracle read-only logical stand-by support, add the following connector option:

internal.log.mining.read.only=true

In a future version, we plan to add support for an Oracle read-only physical stand-by database.

This configuration option is prefixed with internal., meaning that it’s considered an undocumented and experimental feature. The semantics and behavior of this option are subject to change in future versions that may not be guaranteed forward or backward compatible.

Using Amazon S3 buckets with Storage API

Debezium provides a Storage API framework that enables connectors to store offset and schema history state in a variety of persistence datastores. Moreover, the framework enables contributors to extend the API by adding new storage implementations with ease. Currently, the Storage API framework supports the local FileSystem, a Kafka Topic, or Redis datastores.

With Debezium 2.2, we’re pleased to add Amazon S3 buckets as part of that framework, allowing the schema history to be persisted to an S3 bucket. An example connector configuration using S3 might look like the following:

...
schema.history.internal=io.debezium.storage.s3.history
schema.history.internal.s3.access.key.id=aa
schema.history.internal.s3.secret.access.key=bb
schema.history.internal.s3.region.name=aws-global
schema.history.internal.s3.bucket.name=debezium
schema.history.internal.s3.object.name=db-history.log
schema.history.internal.s3.endpoint=http://<server>:<port>

schema.history.internal.s3.access.key.id: Specifies the access key required to authenticate to S3.
schema.history.internal.s3.secret.access.key: Specifies the secret access key required to authenticate to S3.
schema.history.internal.s3.region.name: Specifies the region where the S3 bucket is available.
schema.history.internal.s3.bucket.name: Specifies the name of the S3 bucket where the schema history is to be persisted.
schema.history.internal.s3.object.name: Specifies the object name in the bucket where the schema history is to be persisted.
schema.history.internal.s3.endpoint: Specifies the S3 endpoint with the format of http://<server>:<port>;.

Retry database connections on start-up

In previous releases of Debezium, the connector start-up phase used a fail-fast strategy. Simply put, this meant that if we couldn’t connect, authenticate, or performs any of the start-up phase steps required by the connector, the connector would enter a FAILED state.

One specific problem area for users is if the connector gracefully starts, runs for a period of time, and then eventually encounters some fatal error. If the error is related to a resource that wasn’t accessed during the connector’s start-up lifecycle, the connector would typically gracefully restart just fine. However, the situation is different if the problem was related to the database’s availability and the database was still unavailable during the connector’s start-up phase. In this situation, the connector would fail-fast, and would enter a FAILED state, requiring manual intervention.

The fail-fast approach served Debezium well over the years, but in a world where a resource can come and go without warning, it became clear that changes were needed to improve Debezium’s reliability and resiliency. While the Kafka Connect’s retry/back-off framework has helped in this regard, that doesn’t address the concerns with start-up resources being unavailable with how the code is currently written.

Debezium 2.2 changes this landscape, shifting how we integrate with Kafka Connect’s source connector API slightly. Instead of accessing potentially unavailable resources during the start-up lifecycle, we moved that access to a later phase in the connector’s lifecycle. In effect, the Debezium start-up code is executed lazily that accesses potentially unavailable resources, which allows us to take advantage of the Kafka Connect retry/back-off framework even during our start-up code. In short, if the database is still unavailable during the connector’s start-up, the connector will continue to retry/back-off if Kafka Connect retries are enabled. Only once the maximum number of retry attempts has been reached or a non-retriable error occurs will the connector task enter a FAILED state.

We hope this brings more reliability and resiliency for the Debezium experience, improving how errors are handled in an ever-changing landscape, and provides a solid foundation to manage connector lifecycles.

RocketMQ and Infinispan support in Debezium Server

Debezium Server is a Quarkus-based framework that allows executing a Debezium connector from the command line, without Kafka or Kafka Connect, allowing the delivery of Debezium change events to any destination framework. With Debezium 2.2, two new sink connectors have been added to Debezium Server to support sending change events to Apache RocketMQ and to Infinispan.

RocketMQ

Apache RocketMQ is a cloud-native messaging, eventing, and streaming real-time data processing platform that covers cloud-edge-device collaboration scenarios. In order to integrate Debezium Server with RocketMQ, the Debezium Server application.properties must be modified to include the following entries:

application.properties

debezium.sink.type=rocketmq
debezium.sink.rocketmq.producer.name.srv.addr=<hostname>:<port>
debezium.sink.rocketmq.producer.group=debezuim-group
debezium.sink.rocketmq.producer.max.message.size=4194304
debezium.sink.rocketmq.producer.send.msg.timeout=3000
debezium.sink.rocketmq.producer.acl.enabled=false
debezium.sink.rocketmq.producer.access.key=<access-key>
debezium.sink.rocketmq.producer.secret.key=<secret-key>

The above configuration specifies that the sink type to be used is rocketmq, which enables the use of the RocketMQ module. The following is a description of each of the properties shown above:

debezium.sink.rocketmq.producer.name.srv.addr: Specifies the host and port where Apache RocketMQ is available.
debezium.sink.rocketmq.producer.group: Specifies the name associated with the Apache RocketMQ producer group.
debezium.sink.rocketmq.producer.max.message.size: (Optional) Specifies the maximum number of bytes a message can be. Defaults to 4193404 (4MB).
debezium.sink.rocketmq.producer.send.msg.timeout: (Optional) Specifies the timeout in milliseconds when sending messages. Defaults to 3000 (3 seconds).
debezium.sink.rocketmq.producer.acl.enabled: (Optional) Controls whether access control lists are enabled. Defaults to false.
debezium.sink.rocketmq.producer.access.key: (Optional) The access key used for connecting to the Apache RocketMQ cluster.
debezium.sink.rocketmq.producer.secret.key: (Optional) The access secret used for connecting to the Apache RocketMQ cluster.

For more information on using Debezium Server with RocketMQ, see the documentation.

Infinispan

Infinispan is an in-memory, distributed data store that offers flexible deployment options with robust capabilities to store, manage, and process data. Infinispan is based on the notion of a key-value store that allows storing any data type. In order to integrate Debezium Server with Infinispan, the Debezium Server application.properties must be modified to include the following entries:

application.properties

debezium.sink.type=infinispan
debezium.sink.infinispan.server.host=<hostname>
debezium.sink.infinispan.server.port=<port>
debezium.sink.infinispan.cache=<cache-name>
debezium.sink.infinispan.user=<user>
debezium.sink.infinispan.password=<password>

The above configuration specifies that the sink type to be used is infinispan, which enables the use of the Infinispan module. The following is a description of each of the properties shown above:

debezium.sink.infinispan.server.host: Specifies the host name of one of the servers in the Infinispan cluster. This configuration option can also supply a comma-separated list of hostnames as well, such as hostname1,hostname2.
debezium.sink.infinispan.server.port: Specifies the port of the Infinispan cluster. Defaults to 11222.
debezium.sink.infinispan.cache: Specifies the name of the Infinispan cache to write change events.

The Infinispan sink requires that the cache be created manually ahead of time. This enables the ability to create the cache with any variable configuration needed to fit your requirements.

debezium.sink.infinispan.user: An optional configuration to specify the user to authenticate with, if authentication is required.
debezium.sink.infinispan.password: An optional configuration to specify the password for the authenticating user, if authentication is required.

For more information on using Debezium Server with Infinispan, see the documentation.

Other fixes

There were quite a number of bugfixes and stability changes in this release, some noteworthy are:

Remove option for specifying driver class from MySQL Connector DBZ-4663
Debezium is not working with Apicurio and custom truststores DBZ-5282
Show/Hide password does not work on Connectors View details screen DBZ-5322
Oracle cannot undo change DBZ-5907
Postgresql Data Loss on restarts DBZ-5915
Add support for Connect Headers to Debezium Server DBZ-5926
Oracle Multithreading lost data DBZ-5945
Spanner connector is missing JSR-310 dependency DBZ-5959
Truncate records incompatible with ExtractNewRecordState DBZ-5966
Computed partition must not be negative DBZ-5967
Table size log message for snapshot.select.statement.overrides tables not correct DBZ-5985
NPE in execute snapshot signal with exclude.tables config on giving wrong table name DBZ-5988
There is a problem with postgresql connector parsing the boundary value of money type DBZ-5991
Log statement for unparseable DDL statement in MySqlDatabaseSchema contains placeholder DBZ-5993
Postgresql connector parses the null of the money type into 0 DBZ-6001
Postgres LSN check should honor event.processing.failure.handling.mode DBZ-6012

Altogether, 42 issues were fixed for this release. A big thank you to all the contributors from the community who worked on this release: Akshansh Jain, Gabor, Anil Dasari, Animesh Kumar, Anisha Mohanty, Bob Roldan, Chris Cranford, Erdinç Taşkın, Govinda Sakhare, Harvey Yue, Hossein Torabi, Indra Shukla, Jakub Zalas, Jeremy Ford, Jiri Pechanec, Jochen Schalanda, Luca Scannapieco, Mario Fiore Vitale, Mark Lambert, Rajendra Dangwal, Sun Xiao Jian, Vojtech Juranek, Yohei Yoshimuta, and yohei yoshimuta!

What’s Next?

As the road to Debezium 2.2 is just starting, this initial release covers quite a lot of the features we’ve outlined our recent 2023 road map update. However, there are still a number of features that are still in active development, which include:

Configurable signal channels, enabling users to send signals not only from a database table or a Kafka topic, but also from other means such as an HTTP endpoint, the file system, etc.
The Debezium JDBC sink connector that supports native Debezium change events out-of-the-box, without requiring the use of the Event Flattening transformation.
A new single message transformation, ExtractChangedRecordState, that supports adding headers to the emitted event that describes that fields were changed or unchanged by the source event.
And a plethora of enhancements to Debezium’s UI

As we continue development on Debezium 2.2 and bugfixes to Debezium 2.1, we would love to hear your feedback or suggestions, whether it’s regarding our road map, the changes in this release, or something you’d like to see that we haven’t mentioned. Be sure to get in touch with us on the mailing list or our chat if there is. Or if you just want to stop by and give us a "Hello", we’d welcome that too.

Until next time…

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.