The team is excited to announce the first beta release of the Debezium 2.2 release stream, Debezium 2.2.0.Beta1.

This release includes a plethora of bug fixes, improvements, and a number of new features including, but not limited to, a new JDBC sink connector implementation, MongoDB sharded cluster improvements, Google Spanner PostgreSQL dialect support, and a RabbitMQ sink implementation for Debezium Server to just name a few.

Let’s take moment and dive into what’s new!

JDBC Sink Connector

The Debezium 2.2 release ushers in a new era for Debezium which has had a longstanding focus purely on providing a set of source connectors for relational and non-relational databases. This release alters that landscape, introducing a new JDBC sink connector implementation.

The Debezium JDBC sink connector is quite different from other vendor implementations in that it is capable of ingesting change events emitted by Debezium connectors without the need for event flattening. This has the potential to reduce the processing footprint in your pipeline, simplifies the pipeline’s configuration, and allows Debezium’s JDBC sink connector to take advantage of numerous Debezium source connector features such as column type propagation and much more.

Getting started with the Debezium JDBC sink connector is quite simple, lets take a look at an example.

Let’s say we have a Kafka topic called orders that contains Debezium change events that were created without using the ExtractNewRecordState transformation from MySQL. A simple configuration to ingest these change events into a PostgreSQL database might look this the following:

{
  "name": "mysql-to-postgres-pipeline",
  "config": {
    "connector_class": "io.debezium.connector.jdbc.JdbcSinkConnector",
    "topics": "orders",
    "connection.url": "jdbc://postgresql://<host>:<port>/<database>",
    "connection.user": "<username>",
    "connection.password": "<password>",
    "insert.mode": "upsert",
    "delete.enabled": "true",
    "primary.key.mode": "record_key",
    "schema.evolution": "basic"
  }
}

In this example, we’ve specified a series of connection.* properties that define the connection string and credentials for accessing the destination PostgreSQL database. Additionally, records will use UPSERT semantics when writing to the destination database, choosing to use an insert if the record doesn’t exist or updating the record if it does. We have also enabled schema evolution and specified that a table’s key columns should be derived from the event’s primary key.

The JDBC sink connector presently has support for the following relational databases:

  • Db2

  • MySQL

  • Oracle

  • PostgreSQL

  • SQL Server

We do intend to add additional dialects in the future, and if there one you’d like to see, please get in touch with us either on our mailing list, in chat, or opening a Jira enhancement.

MongoDB Sharded Cluster Improvements

When using the Debezium for MongoDB connector in a sharded cluster deployment, the connector opens a connection with each of the shard’s replica sets directly. This is not a recommended approach and instead MongoDB suggests that the connector open a connection with the mongos instance (the router) instead.

This release aligns with this recommended strategy and users should be prepared to adjust their configurations slightly and when using the connector in such a deployment, point the connector as the mongos instance instead. There should be be other changes required.

Spanner PostgreSQL Dialect Support

Google’s Cloud Spanner platform supports a PostgreSQL interface, which combines the scalability and reliability of the Google Spanner platform with the familiarity and portability of PostgreSQL. When operating Google Spanner with this PostgreSQL interface, metadata of columns and tables is different than when using the standard GoogleSQL dialect.

This release extends the Debezium Spanner connector support not only for the GoogleSQL dialect but also for users that use the Spanner PostgreSQL dialect feature. This means regardless of which dialect your spanner environment relies on, you will be able to capture change events from Spanner using the Debezium Spanner connector seamlessly.

So if you’re using Spanner’s PostgreSQL dialect, upgrade to Debezium 2.2.0.Beta1 or later and start capturing changes!

RabbitMQ Debezium Server Sink

Debezium Server is a ready-made Quarkus-based runtime for Debezium source and sink connectors. Debezium Server provides the capability to send Debezium change events from any source connector to a variety of messaging infrastructure platforms, particularly for users who would prefer something other than Apache Kafka.

In this release, a new sink adapter has been added to the Debezium Server portfolio, allowing Debezium users to send change events to RabbitMQ. The following configuration shows a simple example of how easy it is to configure:

debezium.sink.type=rabbitmq

# Connection details
debezium.sink.rabbitmq.connection.host=<hostname>
debezium.sink.rabbitmq.connection.port=<port>

# The routing key specifies an override of where events are published
debezium.sink.rabbitmq.routingKey=<routing-key>

# The default is 30 seconds, specified in milliseconds
debezium.sink.rabbitmq.ackTimeout=30000

The debezium.sink.rabbitmq.connection.* properties are required while the latter two properties for routingKey and ackTimeout are optional or have preset defaults that should be sufficient for most use cases.

Other fixes

There were quite a number of other improvements, bug fixes, and stability changes in this release, some noteworthy are:

  • Create an endpoint to update a connector DBZ-5314

  • Refactor snapshotting to use change streams instead of oplog DBZ-5987

  • Update the design for Debezium based connectors Filter step DBZ-6060

  • NPE when setting schema.history.internal.store.only.captured.tables.ddl=true DBZ-6072

  • Postgres connector stuck when replication slot does not have confirmed_flush_lsn DBZ-6092

  • java.lang.NullPointerException in MySQL connector with max.queue.size.in.bytes DBZ-6104

  • debezium-connector-mysql failed to parse serveral DDLs of 'CREATE TABLE' DBZ-6124

  • Connect and stream from sharded clusters through mongos instances DBZ-6170

  • Support Azure blob storage as Debezium history storage DBZ-6180

  • Zerofill property failed for different int types DBZ-6185

  • GRANT DELETE HISTORY couldn’t be parsed in mariadb DBZ-6186

  • ddl parse failed for key partition table DBZ-6188

  • Config options internal.schema.history.internal.ddl.filter not working DBZ-6190

  • Support Database role in Connector Config. DBZ-6192

  • Use CHARSET for alterByConvertCharset clause DBZ-6194

  • Remove duplicated createDdlFilter method from historized connector config DBZ-6197

  • Create new SMT to copy/move header to record value DBZ-6201

  • Data loss upon connector restart DBZ-6204

  • ParsingException: DDL statement couldn’t be parsed DBZ-6217

  • The CHARACTER/CHARACTER(p)/CHARACTER VARYING(p) data types not recognized as JDBC type CHAR DBZ-6221

  • MySQL treats the BOOLEAN synonym differently when processed in snapshot vs streaming phases. DBZ-6225

  • MySQL treats REAL synonym differently when processed in snapshot vs streaming phases. DBZ-6226

  • Spanner Connector - Deadlock in BufferedPublisher when publish gives exception DBZ-6227

  • Publish of sync event fails when message becomes very large. DBZ-6228

  • MySQL treats NCHAR/NVARCHAR differently when processed in snapshot vs streaming phases. DBZ-6231

  • Add support for columns of type "bytea[]" - array of bytea (byte array) DBZ-6232

  • MySQL singleDeleteStatement parser does not support table alias DBZ-6243

  • Support ImageFromDockerfile with Debezium’s testcontainers suite DBZ-6244

  • Testcontainers MongoDbReplicaSetTest failing with MongoDB 4.2 DBZ-6247

  • Expose EmbeddedEngine configurations DBZ-6248

  • Wrong error thrown when snapshot.custom_class=custom and no snapshot.custom.class DBZ-6249

  • Missing GEOMETRY keyword which can be used as column name DBZ-6250

  • Postgres connector stuck trying to fallback to restart_lsn when replication slot confirmed_flush_lsn is null. DBZ-6251

  • MariaDB’s UUID column type cannot be parsed when scheme is loaded DBZ-6255

Altogether, 52 issues were fixed for this release. A big thank you to all the contributors from the community who worked on this release: Đỗ Ngọc Sơn, Anatolii Popov, Anisha Mohanty, Bob Roldan, Chris Cranford, Gunnar Morling, Harvey Yue, Hossein Torabi, Jakub Cechacek, Jiri Pechanec, Mario Fiore Vitale, Nir Levy, Plugaru Tudor, Robert Roldan, Russell Mora, Vojtech Juranek, Vojtěch Juránek, and tony joseph!

Outlook & What’s Next?

As we approach the end of the Debezium 2.2 development cycle, with a final release expected in the next two weeks, we’re going to begin to turn our attention toward Debezium 2.3. The Debezium 2.3 release will be a much more condensed and focused release, as our goal is to release it in late June.

We will be refining our roadmap in the coming days, so I would pay close attention to this to get an understanding of what lies ahead in the near future for Debezium 2.3. We would like to hear your feedback or suggestions, so if you have anything you’d like to share be sure to get in touch with us on the mailing list or our chat.

DevNexus 2023 is also underway this week, from April 4th until April 6th and I will be presenting a talk on CDC Patterns with Distributed Systems using Debezium. If you’re in the Atlanta area and plan to attend DevNexus on Thursday, April 6th, drop me a line.

Until next time, let the changes continue to stream…​

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.