It’s my pleasure to announce the first release of the Debezium 2.3 series, 2.3.0.Alpha1!
This release brings many new and exciting features as well as bug fixes, including Debezium status notifications, storage of Debezium state into a JDBC data store, configurable signaling channels, the ability to edit connector configurations via Debezium UI, the parallelization of Vitess shards processing, and much more.
This release contains changes for 59 issues, so lets take a moment and dive into several of these new features and any potential bug fixes or breaking changes that are noteworthy!
Breaking Changes
Debezium for PostgreSQL and MySQL can be configured to use a secured SSL connection. For PostgreSQL, this can be done by configuring database.sslmode
while for MySQL this can be done with database.ssl.mode
.
With Debezium 2.3, this configuration option no longer defaults to disable
(PostgreSQL) or disabled
(MySQL) but instead defaults to prefer
(PostgreSQL) and preferred
(MySQL). This means that when attempting to connect using an encrypted, secure connection is unavailable, the connector will fallback to using an unsecured connection by default unless configured otherwise.
Status Notifications
Debezium 2.3 introduces a brand-new feature called notifications, allowing Debezium to emit events that can be consumed by any external system to know the status of various stages of Debezium’s lifecycle.
Notification events are represented as a series of key/value tuples, with a structure that contains several out-of-the-box fields. The following is an example of a simple notification event.
{
"id": "c485ccc3-16ff-47cc-b4e8-b56a57c3bad2",
"aggregate_type": "Snapshot",
"type": "Started",
"additional_data": {
...
}
}
Each notification event consists of an id
field, a UUID to identify the notification, an aggregate_type
field to which the notification is related based on the concept of domain-driven design, a type
field that is mean to given more detail about the aggregate type itself, and an optional additional_data
field which consists of a map of string-based key/value pairs with additional information about the event.
At this time, there are two notification event types supported by Debezium:
-
Status of the initial snapshot
-
Monitoring of the incremental snapshot
Initial Snapshot Notifications
An initial snapshot is the consistent capture of the existing data when a connector first starts. An initial snapshot event will have an aggregate type with the value of "Initial Snapshot"
and the type of event will consist of one of three logical values:
SKIPPED
-
Represents the initial snapshot was skipped.
ABORTED
-
Represents the initial snapshot was aborted.
COMPLETED
-
Represents the initial snapshot has concluded successfully.
The following is an example of a notification about the completion of the initial snapshot:
{
"id": "5563ae14-49f8-4579-9641-c1bbc2d76f99",
"aggregate_type": "Initial Snapshot",
"type": "COMPLETED"
}
Incremental Snapshot Notifications
An incremental snapshot is a capture of the existing data from a configured set of tables while the connector is actively streaming changes. An incremental snapshot event will have an aggregate type with the value of "Incremental Snapshot"
and the type will consist of one of several logical values:
STARTED
-
Indicates an incremental snapshot has started.
PAUSED
-
Indicates an incremental snapshot has been temporarily paused.
RESUMED
-
Indicates an incremental snapshot that had been paused has now resumed.
STOPPED
-
Indicates an incremental snapshot has stopped.
IN_PROGRESS
-
Indicates an incremental snapshot is in-progress.
TABLE_SCAN_COMPLETED
-
Indicates an incremental snapshot has concluded for a given table.
COMPLETED
-
Indicates that an incremental snapshot has concluded for all tables.
Configuring Notifications
Debezium notifications are configured via the connector’s configuration. The following examples show how to configure the out-of-the-box Kafka Topic or Log based channels.
{
"notification.enable.channels": "sink",
"notification.sink.topic.name": "debezium_notifications",
...
}
{
"notification.enable.channels": "log"
}
JDBC Storage Module
Debezium 2.3 introduces a new storage module implementation supporting the persistence of schema history and offset data in a datastore via JDBC. For environments where you may not have easy access to persistent filesystems, this offers yet another alternative for storage via a remote, persistent storage platform.
In order to take advantage of this new module, the following dependency must be added to your project or application:
<dependency>
<groupId>io.debezium</groupId>
<artifactId>debezium-storage-jdbc</artifactId>
<version>2.3.0.Alpha1</version>
</dependency>
The following examples show how to configure Offset or Schema History storage via the JDBC storage module:
{
"offset.storage.jdbc.url": "<jdbc-connection-url>",
"offset.storage.jdbc.user": "dbuser",
"offset.storage.jdbc.password": "secret",
"offset.storage.jdbc.offset_table_name": "debezium_offset_storage"
}
{
"schema.history.internal.jdbc.url": "<jdbc-connection-url>",
"schema.history.internal.jdbc.user": "dbuser",
"schema.history.internal.jdbc.password": "secret",
"schema.history.internal.jdbc.schema.history.table.name": "debezium_database_history"
}
Other fixes
There were quite a number of bugfixes and stability changes in this release, some noteworthy are:
-
Toasted varying character array and date array are not correcly processed DBZ-6122
-
Introduce LogMiner query filtering modes DBZ-6254
-
Lock contention on LOG_MINING_FLUSH table when multiple connectors deployed DBZ-6256
-
Ensure that the connector can start from a stale timestamp more than one hour into the past DBZ-6307
-
The rs_id field is null in Oracle change event source information block DBZ-6329
-
Add JWT authentication to HTTP Client DBZ-6348
-
Using pg_replication_slot_advance which is not supported by PostgreSQL10. DBZ-6353
-
log.mining.transaction.retention.hours should reference last offset and not sysdate DBZ-6355
-
Support multiple tasks when streaming shard list DBZ-6365
-
Kinesis Sink - AWS Credentials Provider DBZ-6372
-
Toasted hstore are not correcly processed DBZ-6379
-
Oracle DDL shrink space for table partition can not be parsed DBZ-6386
-
__source_ts_ms r (read) operation date is set to future for SQL Server DBZ-6388
-
PostgreSQL connector task fails to resume streaming because replication slot is active DBZ-6396
-
MongoDB connector crashes on invalid resume token DBZ-6402
-
NPE on read-only MySQL connector start up DBZ-6440
Altogether, 59 issues were fixed for this release. A big thank you to all the contributors from the community who worked on this release: Anisha Mohanty, Bertrand Paquet, Bob Roldan, Breno Moreira, Chris Cranford, Frederic Laurent, Gong Chang Hua, Harvey Yue, Hidetomi Umaki, Jakub Cechacek, Jiri Pechanec, Kanthi Subramanian, Katerina Galieva, Mario Fiore Vitale, Martin Medek, Miguel Angel Sotomayor, Nir Levy, Oren Elias, RJ Nowling, Robert Roldan, Ronak Jain, Sergey Eizner, Stephen Clarkson, Thomas Thornton, and 蔡灿材!
What’s next?
With Debezium 2.3 underway, I do expect a rather quick cycle of alpha, beta, and final releases over the next six weeks. We still have a lot to do in this time period that we hope to get into this release, so stay tuned. As we get closer to the end of June, we’ll begin our planning for Debezium 2.4!
Also, Red Hat Summit 2023 is next week in Boston. There will be a break-out session where Hugo and Chris will be discussing the new Debezium JDBC sink connector. If you’re able to attend, we’d love to have an opportunity to chat with you before or after the session.
Until next time…
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.