Even as the summer heat continues to rise, the Debezium team has some new, cool news to share. We’re pleased to announce the first beta preview of Debezium 3, 3.0.0.beta1.

This release includes a host of new features and improvements, including detailed metrics for creates, updates, and deletes per table, replication slot creation timeout, support for PgVector data types with PostgreSQL, a new Oracle embedded buffer implementation based on Ehcache, and others. Let’s take a few moments and dive into these new features and how you can take advantage of them in Debezium 3!

Breaking changes

The team aims to avoid any potential breaking changes between minor releases; however, such changes are sometimes inevitable.

Debezium Server Kafka Sink

The Debezium Server Kafka sink adapter could wait indefinitely when a Kafka broker becomes unavailable. A new configurable timeout has been added to the sink adapter to force the adapter to fail when the timeout is reached. The new option, debezium.sink.kafka.wait.message.delivery.timeout.ms, has a default value of 30 seconds. Please adjust this accordingly if the default is insufficient for your needs (DBZ-7575).

Debezium Server RabbitMQ sink

The Debezium Server RabbitMQ sink adapter was sending all changes to the same single stream. While this may be useful for some scenarios, this does not align well with other broker systems where each table is streamed to its own unique topic or stream. With Debezium 3, this logic has changed and each table will be streamed to its own unique stream by default. When setting debezium.sink.rabbitmqstream.stream, you can enable the legacy behavior of streaming all changes to the same stream (DBZ-8118).

New features and improvements

Debezium 3.0.0.Beta1 also introduces many improvements and features, lets take a look at each individually.

Detailed metrics per table

Debezium will now begin to track metrics based on the individual create, update, and delete operations performed per relational table. For some connectors such as PostgreSQL and Oracle, these new detailed metrics also track the truncate operations performed per relational table. This can be quite useful for situations where you need to detect specific mutation patterns or where you may want to integrate analytics or observability stacks where this detailed information could be valuable to identifying problems.

For users upgrading to Debezium 3, these new metrics are captured automatically. They are exposed using a map-based pattern of Map<String, Long> where the key is the table name and the value is the number of events observed. The new metrics names are NumberOfCreateEventsSeen, NumberOfDeleteEventsSeen, NumberOfUpdateEventsSeen, and NumberOfTruncateEventsSeen (DBZ-8035).

PostgreSQL replication slot creation timeout

When the PostgreSQL connector is first deployed, one of its first tasks is to create a replication slot in the database if it doesn’t already exist. The replication slot is pivotal to how the connector works and facilitates the capture and dispatch of changes to Debezium. Unfortunately, there are some database operations that will block the creation of replication slots, such as in-progress transactions, forcing the connector to block indefinitely while waiting for the transaction to conclude. For short-lived transactions, this isn’t generally a concern; however, for long-running transactions that’s an entirely different situation.

In order to improve this experience, a new internal option was added, internal.create.slot.command.timeout, which defaults to 90 seconds. If the creation of the replication slot does not complete within 90 seconds, it will retry up to slot.max.retries. Once the retries are exhausted, the connector will throw an unrecoverable error (DBZ-8073).

Support for PostgreSQL PgVector data types

The pgvector extension introduces vector search functionality for PostgreSQL. There are three data types this extension introduces: vector, halfvec, and sparsevec.

In Debezium 3, all three data types will be streamed like any other data type. Each data type is emitted based on the following semantic mappings:

  • vector as an ARRAY of numeric values

  • halfvec as an ARRAY of numeric values

  • sparsevec as a Struct with number of dimensions and map of index to values

There is no additional configuration required after enabling the pgvector extension in your database. Please see the documentation for more details on the semantic mappings (DBZ-8121).

Oracle Ehcache transaction buffer implementation

Debezium 3 introduces as new Oracle connector transaction buffer implementation, based on Ehcache to provide off-heap storage of transaction processing and event data. This new implementation adds to the existing Java Heap, Infinispan Embedded, and Infinispan Remote buffer types.

To begin taking advantage of the Ehcache implementation, the log.mining.buffer.type must be set to ehcache. By default, the buffer type is memory to use the JVM’s heap for optimal performance.

In order to for the Ehcache library to start successfully, several additional configurations must be provided to explicitly configure the caches maintained by the cache manager. These new configuration options are:

  • log.mining.buffer.ehcache.global.config

  • log.mining.buffer.ehcache.transactions.config

  • log.mining.buffer.ehcache.processedtransactions.config

  • log.mining.buffer.ehcache.schemachanges.config

  • log.mining.buffer.ehcache.events.config

Debezium creates the Ehcache configuration using XML, so each of these configurations provide XML snippets.

The global configuration is optional, and allows you to provide details about persistence and other Ehcache attributes, excluding specifying <cache> or <default-serializers> tags, which are handled separately. The other individual cache configurations are meant to supply the inner XML bits of a <cache> configuration tag, excluding its <key-type> and <value-type>, which are managed directly by Debezium.

An example configuration
{
  "log.mining.buffer.type": "ehcache",
  "log.mining.buffer.ehcache.global.config": "<persistence directory=\"./data\"/>",
  "log.mining.buffer.ehcache.transactions.config": "<resources><heap unit=\"entries\">256</heap><disk unit=\"B\">10485760</disk></resources>",
  "log.mining.buffer.ehcache.processedtransactions.config": "<resources><heap unit=\"entries\">256</heap><disk unit=\"B\">10485760</disk></resources>",
  "log.mining.buffer.ehcache.schemachanges.config": "<resources><heap unit=\"entries\">256</heap><disk unit=\"B\">10485760</disk></resources>",
  "log.mining.buffer.ehcache.events.config": "<resources><heap unit=\"entries\">256</heap><disk unit=\"B\">10485760</disk></resources>"
}

In this example, Ehcache will maintain a combination of heap and off-heap storage for the caches, maintaining at most 256 entries in heap at all times and flushing to disk. The disk caches will be stored at the relative path ./data. This implies that you will need a persistent storage volume available when using disk-based caches.

This is a new feature and is experimental, so we would love your feedback on how we can improve this (DBZ-7758).

Transformation to decode PostgreSQL logical messages

PostgreSQL is unique in that you can implement the Outbox pattern without creating an outbox table, by writing logical messages directly into the WAL using pg_logical_emit_message. The unfortunate part is that this data is then sent to Kafka as a series of bytes, which may not always be ideal for consumers who may be looking for structured messages.

Debezium 3 introduces a new PostgreSQL-specific transform called DecodeLogicalDecodingMessageContent. This transform is specifically meant to convert the pg_logical_emit_message event bytes to a structured event payload that consumer applications are capable of understanding.

Given the following configuration:

{
  "transforms": "decode",
  "transforms.decode.type": "io.debezium.connector.postgresql.transforms.DecodeLogicalDecodingMessageContent"
}

The event’s value of an event written using pg_logical_emit_message before the transform would be:

{
  "op": "m",
  "ts_ms": 1723115240065,
  "source": {
    ...
  },
  "message": {
    "prefix": "test-prefix",
    "content": "eyJpZCI6IDEsICJpdGVtIjogIkRlYmV6aXVtIGluIEFjdGlvbiIsICJzdGF0dXMiOiAiRU5URVJFRCIsICJxdWFudGl0eSI6IDIsICJ0b3RhbFByaWNlIjogMzkuOTh9"
  }
}

After applying the transformation, the event’s value now looks like:

{
  "op": "c",
  "ts_ms": 1723115415729,
  "source": {
    ...
  },
  "after": {
        "id": 1,
        "item": "Debezium in Action",
        "status": "ENTERED",
        "quantity": 2,
        "totalPrice": 39.98
  }
}

So you can safely implement the Outbox pattern without the physical outbox table! (DBZ-8103).

Other changes

Altogether, 48 issues were fixed in this release. Here are a list of some additional noteworthy changes:

  • MySQL has deprecated mysql_native_password usage DBZ-7049

  • Upgrade to Apicurio 2.5.8 or higher DBZ-7357

  • Incremental snapshots don’t work with CloudEvent converter DBZ-7601

  • Snapshot retrying logic falls into infinite retry loop DBZ-7860

  • Move Debezium Conductor repository under Debezium Organisation DBZ-7973

  • Log additional details about abandoned transactions DBZ-8044

  • ConverterBuilder doesn’t pass Headers to be manipulated DBZ-8082

  • Bump Debezium Server to Quarkus 3.8.5 DBZ-8095

  • Primary Key Update/ Snapshot Race Condition DBZ-8113

  • Support DECIMAL(p) Floating Point DBZ-8114

  • Recalculating mining range upper bounds causes getScnFromTimestamp to fail DBZ-8119

  • Update Oracle connector doc to describe options for restricting access permissions for the Debezium LogMiner user DBZ-8124

  • ORA-00600: internal error code, arguments: [krvrdGetUID:2], [18446744073709551614], [], [], [], [], [], [], [], [], [], [] DBZ-8125

  • Use SQLSTATE to handle exceptions for replication slot creation command timeout DBZ-8127

  • ibmi Connector does not take custom properties into account anymore DBZ-8129

  • Unpredicatable ordering of table rows during insertion causing foreign key error DBZ-8130

  • schema_only crashes ibmi Connector DBZ-8131

  • Support larger database.server.id values DBZ-8134

  • Implement in process signal channel DBZ-8135

  • Re-add check to test for if assembly profile is active DBZ-8138

  • Validate log position method missing gtid info from SourceInfo DBZ-8140

  • Add LogMiner start mining session retry attempt counter to logs DBZ-8143

  • Open redo thread consistency check can lead to ORA-01291 - missing logfile DBZ-8144

  • SchemaOnlyRecoverySnapshotter not registered as an SPI service implementation DBZ-8147

  • Reduce logging verbosity of XStream DML event data DBZ-8148

  • When stopping the Oracle rac node the Debezium server throws an expections - ORA-12514: Cannot connect to database and retries DBZ-8149

  • Issue with Debezium Snapshot: DateTimeParseException with plugin pgoutput DBZ-8150

  • JDBC connector validation fails when using record_value with no primary.key.fields DBZ-8151

  • Vitess Connector Epoch should support parallelism & shard changes DBZ-8154

  • Add an option for publication.autocreate.mode to create a publication with no tables DBZ-8156

  • Taking RAC node offline and back online can lead to thread inconsistency DBZ-8162

  • Upgrade Outbox Extension to Quarkus 3.14.0 DBZ-8164

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.