Debezium 2.2.0.Beta1 Released

JDBC Sink Connector

The Debezium 2.2 release ushers in a new era for Debezium which has had a longstanding focus purely on providing a set of source connectors for relational and non-relational databases. This release alters that landscape, introducing a new JDBC sink connector implementation.

The Debezium JDBC sink connector is quite different from other vendor implementations in that it is capable of ingesting change events emitted by Debezium connectors without the need for event flattening. This has the potential to reduce the processing footprint in your pipeline, simplifies the pipeline’s configuration, and allows Debezium’s JDBC sink connector to take advantage of numerous Debezium source connector features such as column type propagation and much more.

Getting started with the Debezium JDBC sink connector is quite simple, lets take a look at an example.

Let’s say we have a Kafka topic called orders that contains Debezium change events that were created without using the ExtractNewRecordState transformation from MySQL. A simple configuration to ingest these change events into a PostgreSQL database might look this the following:

{
  "name": "mysql-to-postgres-pipeline",
  "config": {
    "connector_class": "io.debezium.connector.jdbc.JdbcSinkConnector",
    "topics": "orders",
    "connection.url": "jdbc://postgresql://<host>:<port>/<database>",
    "connection.user": "<username>",
    "connection.password": "<password>",
    "insert.mode": "upsert",
    "delete.enabled": "true",
    "primary.key.mode": "record_key",
    "schema.evolution": "basic"
  }
}

In this example, we’ve specified a series of connection.* properties that define the connection string and credentials for accessing the destination PostgreSQL database. Additionally, records will use UPSERT semantics when writing to the destination database, choosing to use an insert if the record doesn’t exist or updating the record if it does. We have also enabled schema evolution and specified that a table’s key columns should be derived from the event’s primary key.

The JDBC sink connector presently has support for the following relational databases:

Db2
MySQL
Oracle
PostgreSQL
SQL Server

We do intend to add additional dialects in the future, and if there one you’d like to see, please get in touch with us either on our mailing list, in chat, or opening a Jira enhancement.

MongoDB Sharded Cluster Improvements

When using the Debezium for MongoDB connector in a sharded cluster deployment, the connector opens a connection with each of the shard’s replica sets directly. This is not a recommended approach and instead MongoDB suggests that the connector open a connection with the mongos instance (the router) instead.

This release aligns with this recommended strategy and users should be prepared to adjust their configurations slightly and when using the connector in such a deployment, point the connector as the mongos instance instead. There should be be other changes required.

Spanner PostgreSQL Dialect Support

Google’s Cloud Spanner platform supports a PostgreSQL interface, which combines the scalability and reliability of the Google Spanner platform with the familiarity and portability of PostgreSQL. When operating Google Spanner with this PostgreSQL interface, metadata of columns and tables is different than when using the standard GoogleSQL dialect.

This release extends the Debezium Spanner connector support not only for the GoogleSQL dialect but also for users that use the Spanner PostgreSQL dialect feature. This means regardless of which dialect your spanner environment relies on, you will be able to capture change events from Spanner using the Debezium Spanner connector seamlessly.

So if you’re using Spanner’s PostgreSQL dialect, upgrade to Debezium 2.2.0.Beta1 or later and start capturing changes!

RabbitMQ Debezium Server Sink

Debezium Server is a ready-made Quarkus-based runtime for Debezium source and sink connectors. Debezium Server provides the capability to send Debezium change events from any source connector to a variety of messaging infrastructure platforms, particularly for users who would prefer something other than Apache Kafka.

In this release, a new sink adapter has been added to the Debezium Server portfolio, allowing Debezium users to send change events to RabbitMQ. The following configuration shows a simple example of how easy it is to configure:

debezium.sink.type=rabbitmq

# Connection details
debezium.sink.rabbitmq.connection.host=<hostname>
debezium.sink.rabbitmq.connection.port=<port>

# The routing key specifies an override of where events are published
debezium.sink.rabbitmq.routingKey=<routing-key>

# The default is 30 seconds, specified in milliseconds
debezium.sink.rabbitmq.ackTimeout=30000

The debezium.sink.rabbitmq.connection.* properties are required while the latter two properties for routingKey and ackTimeout are optional or have preset defaults that should be sufficient for most use cases.

Other fixes

There were quite a number of other improvements, bug fixes, and stability changes in this release, some noteworthy are:

Create an endpoint to update a connector DBZ-5314
Refactor snapshotting to use change streams instead of oplog DBZ-5987
Update the design for Debezium based connectors Filter step DBZ-6060
NPE when setting schema.history.internal.store.only.captured.tables.ddl=true DBZ-6072
Postgres connector stuck when replication slot does not have confirmed_flush_lsn DBZ-6092
java.lang.NullPointerException in MySQL connector with max.queue.size.in.bytes DBZ-6104
debezium-connector-mysql failed to parse serveral DDLs of 'CREATE TABLE' DBZ-6124
Connect and stream from sharded clusters through mongos instances DBZ-6170
Support Azure blob storage as Debezium history storage DBZ-6180
Zerofill property failed for different int types DBZ-6185
GRANT DELETE HISTORY couldn’t be parsed in mariadb DBZ-6186
ddl parse failed for key partition table DBZ-6188
Config options internal.schema.history.internal.ddl.filter not working DBZ-6190
Support Database role in Connector Config. DBZ-6192
Use CHARSET for alterByConvertCharset clause DBZ-6194
Remove duplicated createDdlFilter method from historized connector config DBZ-6197
Create new SMT to copy/move header to record value DBZ-6201
Data loss upon connector restart DBZ-6204
ParsingException: DDL statement couldn’t be parsed DBZ-6217
The CHARACTER/CHARACTER(p)/CHARACTER VARYING(p) data types not recognized as JDBC type CHAR DBZ-6221
MySQL treats the BOOLEAN synonym differently when processed in snapshot vs streaming phases. DBZ-6225
MySQL treats REAL synonym differently when processed in snapshot vs streaming phases. DBZ-6226
Spanner Connector - Deadlock in BufferedPublisher when publish gives exception DBZ-6227
Publish of sync event fails when message becomes very large. DBZ-6228
MySQL treats NCHAR/NVARCHAR differently when processed in snapshot vs streaming phases. DBZ-6231
Add support for columns of type "bytea[]" - array of bytea (byte array) DBZ-6232
MySQL singleDeleteStatement parser does not support table alias DBZ-6243
Support ImageFromDockerfile with Debezium’s testcontainers suite DBZ-6244
Testcontainers MongoDbReplicaSetTest failing with MongoDB 4.2 DBZ-6247
Expose EmbeddedEngine configurations DBZ-6248
Wrong error thrown when snapshot.custom_class=custom and no snapshot.custom.class DBZ-6249
Missing GEOMETRY keyword which can be used as column name DBZ-6250
Postgres connector stuck trying to fallback to restart_lsn when replication slot confirmed_flush_lsn is null. DBZ-6251
MariaDB’s UUID column type cannot be parsed when scheme is loaded DBZ-6255

Altogether, 52 issues were fixed for this release. A big thank you to all the contributors from the community who worked on this release: Đỗ Ngọc Sơn, Anatolii Popov, Anisha Mohanty, Bob Roldan, Chris Cranford, Gunnar Morling, Harvey Yue, Hossein Torabi, Jakub Cechacek, Jiri Pechanec, Mario Fiore Vitale, Nir Levy, Plugaru Tudor, Robert Roldan, Russell Mora, Vojtech Juranek, Vojtěch Juránek, and tony joseph!

Outlook & What’s Next?

As we approach the end of the Debezium 2.2 development cycle, with a final release expected in the next two weeks, we’re going to begin to turn our attention toward Debezium 2.3. The Debezium 2.3 release will be a much more condensed and focused release, as our goal is to release it in late June.

We will be refining our roadmap in the coming days, so I would pay close attention to this to get an understanding of what lies ahead in the near future for Debezium 2.3. We would like to hear your feedback or suggestions, so if you have anything you’d like to share be sure to get in touch with us on the mailing list or our chat.

DevNexus 2023 is also underway this week, from April 4th until April 6th and I will be presenting a talk on CDC Patterns with Distributed Systems using Debezium. If you’re in the Atlanta area and plan to attend DevNexus on Thursday, April 6th, drop me a line.

Until next time, let the changes continue to stream…

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.