Debezium 3.0.0.Beta Released

Breaking changes

The team aims to avoid any potential breaking changes between minor releases; however, such changes are sometimes inevitable.

Debezium Server Kafka Sink: The Debezium Server Kafka sink adapter could wait indefinitely when a Kafka broker becomes unavailable. A new configurable timeout has been added to the sink adapter to force the adapter to fail when the timeout is reached. The new option, debezium.sink.kafka.wait.message.delivery.timeout.ms, has a default value of 30 seconds. Please adjust this accordingly if the default is insufficient for your needs (DBZ-7575).
Debezium Server RabbitMQ sink: The Debezium Server RabbitMQ sink adapter was sending all changes to the same single stream. While this may be useful for some scenarios, this does not align well with other broker systems where each table is streamed to its own unique topic or stream. With Debezium 3, this logic has changed and each table will be streamed to its own unique stream by default. When setting debezium.sink.rabbitmqstream.stream, you can enable the legacy behavior of streaming all changes to the same stream (DBZ-8118).

New features and improvements

Debezium 3.0.0.Beta1 also introduces many improvements and features, lets take a look at each individually.

Detailed metrics per table

Debezium will now begin to track metrics based on the individual create, update, and delete operations performed per relational table. For some connectors such as PostgreSQL and Oracle, these new detailed metrics also track the truncate operations performed per relational table. This can be quite useful for situations where you need to detect specific mutation patterns or where you may want to integrate analytics or observability stacks where this detailed information could be valuable to identifying problems.

For users upgrading to Debezium 3, these new metrics are captured automatically. They are exposed using a map-based pattern of Map<String, Long> where the key is the table name and the value is the number of events observed. The new metrics names are NumberOfCreateEventsSeen, NumberOfDeleteEventsSeen, NumberOfUpdateEventsSeen, and NumberOfTruncateEventsSeen (DBZ-8035).

PostgreSQL replication slot creation timeout

When the PostgreSQL connector is first deployed, one of its first tasks is to create a replication slot in the database if it doesn’t already exist. The replication slot is pivotal to how the connector works and facilitates the capture and dispatch of changes to Debezium. Unfortunately, there are some database operations that will block the creation of replication slots, such as in-progress transactions, forcing the connector to block indefinitely while waiting for the transaction to conclude. For short-lived transactions, this isn’t generally a concern; however, for long-running transactions that’s an entirely different situation.

In order to improve this experience, a new internal option was added, internal.create.slot.command.timeout, which defaults to 90 seconds. If the creation of the replication slot does not complete within 90 seconds, it will retry up to slot.max.retries. Once the retries are exhausted, the connector will throw an unrecoverable error (DBZ-8073).

Support for PostgreSQL `PgVector` data types

The pgvector extension introduces vector search functionality for PostgreSQL. There are three data types this extension introduces: vector, halfvec, and sparsevec.

In Debezium 3, all three data types will be streamed like any other data type. Each data type is emitted based on the following semantic mappings:

vector as an ARRAY of numeric values
halfvec as an ARRAY of numeric values
sparsevec as a Struct with number of dimensions and map of index to values

There is no additional configuration required after enabling the pgvector extension in your database. Please see the documentation for more details on the semantic mappings (DBZ-8121).

Oracle Ehcache transaction buffer implementation

Debezium 3 introduces as new Oracle connector transaction buffer implementation, based on Ehcache to provide off-heap storage of transaction processing and event data. This new implementation adds to the existing Java Heap, Infinispan Embedded, and Infinispan Remote buffer types.

To begin taking advantage of the Ehcache implementation, the log.mining.buffer.type must be set to ehcache. By default, the buffer type is memory to use the JVM’s heap for optimal performance.

In order to for the Ehcache library to start successfully, several additional configurations must be provided to explicitly configure the caches maintained by the cache manager. These new configuration options are:

log.mining.buffer.ehcache.global.config
log.mining.buffer.ehcache.transactions.config
log.mining.buffer.ehcache.processedtransactions.config
log.mining.buffer.ehcache.schemachanges.config
log.mining.buffer.ehcache.events.config

Debezium creates the Ehcache configuration using XML, so each of these configurations provide XML snippets.

The global configuration is optional, and allows you to provide details about persistence and other Ehcache attributes, excluding specifying <cache> or <default-serializers> tags, which are handled separately. The other individual cache configurations are meant to supply the inner XML bits of a <cache> configuration tag, excluding its <key-type> and <value-type>, which are managed directly by Debezium.

An example configuration

{
  "log.mining.buffer.type": "ehcache",
  "log.mining.buffer.ehcache.global.config": "<persistence directory=\"./data\"/>",
  "log.mining.buffer.ehcache.transactions.config": "<resources><heap unit=\"entries\">256</heap><disk unit=\"B\">10485760</disk></resources>",
  "log.mining.buffer.ehcache.processedtransactions.config": "<resources><heap unit=\"entries\">256</heap><disk unit=\"B\">10485760</disk></resources>",
  "log.mining.buffer.ehcache.schemachanges.config": "<resources><heap unit=\"entries\">256</heap><disk unit=\"B\">10485760</disk></resources>",
  "log.mining.buffer.ehcache.events.config": "<resources><heap unit=\"entries\">256</heap><disk unit=\"B\">10485760</disk></resources>"
}

In this example, Ehcache will maintain a combination of heap and off-heap storage for the caches, maintaining at most 256 entries in heap at all times and flushing to disk. The disk caches will be stored at the relative path ./data. This implies that you will need a persistent storage volume available when using disk-based caches.

This is a new feature and is experimental, so we would love your feedback on how we can improve this (DBZ-7758).

Transformation to decode PostgreSQL logical messages

PostgreSQL is unique in that you can implement the Outbox pattern without creating an outbox table, by writing logical messages directly into the WAL using pg_logical_emit_message. The unfortunate part is that this data is then sent to Kafka as a series of bytes, which may not always be ideal for consumers who may be looking for structured messages.

Debezium 3 introduces a new PostgreSQL-specific transform called DecodeLogicalDecodingMessageContent. This transform is specifically meant to convert the pg_logical_emit_message event bytes to a structured event payload that consumer applications are capable of understanding.

Given the following configuration:

{
  "transforms": "decode",
  "transforms.decode.type": "io.debezium.connector.postgresql.transforms.DecodeLogicalDecodingMessageContent"
}

The event’s value of an event written using pg_logical_emit_message before the transform would be:

{
  "op": "m",
  "ts_ms": 1723115240065,
  "source": {
    ...
  },
  "message": {
    "prefix": "test-prefix",
    "content": "eyJpZCI6IDEsICJpdGVtIjogIkRlYmV6aXVtIGluIEFjdGlvbiIsICJzdGF0dXMiOiAiRU5URVJFRCIsICJxdWFudGl0eSI6IDIsICJ0b3RhbFByaWNlIjogMzkuOTh9"
  }
}

After applying the transformation, the event’s value now looks like:

{
  "op": "c",
  "ts_ms": 1723115415729,
  "source": {
    ...
  },
  "after": {
        "id": 1,
        "item": "Debezium in Action",
        "status": "ENTERED",
        "quantity": 2,
        "totalPrice": 39.98
  }
}

So you can safely implement the Outbox pattern without the physical outbox table! (DBZ-8103).

Other changes

Altogether, 48 issues were fixed in this release. Here are a list of some additional noteworthy changes:

MySQL has deprecated mysql_native_password usage DBZ-7049
Upgrade to Apicurio 2.5.8 or higher DBZ-7357
Incremental snapshots don’t work with CloudEvent converter DBZ-7601
Snapshot retrying logic falls into infinite retry loop DBZ-7860
Move Debezium Conductor repository under Debezium Organisation DBZ-7973
Log additional details about abandoned transactions DBZ-8044
ConverterBuilder doesn’t pass Headers to be manipulated DBZ-8082
Bump Debezium Server to Quarkus 3.8.5 DBZ-8095
Primary Key Update/ Snapshot Race Condition DBZ-8113
Support DECIMAL(p) Floating Point DBZ-8114
Recalculating mining range upper bounds causes getScnFromTimestamp to fail DBZ-8119
Update Oracle connector doc to describe options for restricting access permissions for the Debezium LogMiner user DBZ-8124
ORA-00600: internal error code, arguments: [krvrdGetUID:2], [18446744073709551614], [], [], [], [], [], [], [], [], [], [] DBZ-8125
Use SQLSTATE to handle exceptions for replication slot creation command timeout DBZ-8127
ibmi Connector does not take custom properties into account anymore DBZ-8129
Unpredicatable ordering of table rows during insertion causing foreign key error DBZ-8130
schema_only crashes ibmi Connector DBZ-8131
Support larger database.server.id values DBZ-8134
Implement in process signal channel DBZ-8135
Re-add check to test for if assembly profile is active DBZ-8138
Validate log position method missing gtid info from SourceInfo DBZ-8140
Add LogMiner start mining session retry attempt counter to logs DBZ-8143
Open redo thread consistency check can lead to ORA-01291 - missing logfile DBZ-8144
SchemaOnlyRecoverySnapshotter not registered as an SPI service implementation DBZ-8147
Reduce logging verbosity of XStream DML event data DBZ-8148
When stopping the Oracle rac node the Debezium server throws an expections - ORA-12514: Cannot connect to database and retries DBZ-8149
Issue with Debezium Snapshot: DateTimeParseException with plugin pgoutput DBZ-8150
JDBC connector validation fails when using record_value with no primary.key.fields DBZ-8151
Vitess Connector Epoch should support parallelism & shard changes DBZ-8154
Add an option for publication.autocreate.mode to create a publication with no tables DBZ-8156
Taking RAC node offline and back online can lead to thread inconsistency DBZ-8162
Upgrade Outbox Extension to Quarkus 3.14.0 DBZ-8164

A huge thank you to all contributors from the community who worked on this release: Ashish Binu, Bue Von Hun, Chris Cranford, Harvey Yue, Jakub Cechacek, Jiri Pechanec, Lars M. Johansson, Mario Fiore Vitale, Ondrej Babec, Rajendra Dangwal, René Kerner, Robert Roldan, Roman Kudryashov, Ryan van Huuksloot, Thomas Thornton, Tudor Plugaru, Vojtech Juranek, and 张展业!

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.