I am thrilled to share that Debezium 2.0.0.Alpha2 has been released!

This release is packed with tons of bugfixes and improvements, 110 issues resolved in total. Just, WOW!

A few noteworthy changes include incremental snapshots gaining support for regular expressions and a new stop signal. We also did some housekeeping and removed a number of deprecated configuration options and as well as the legacy MongoDB oplog implementation.

Lets take a look at these in closer detail.

Incremental snapshot changes

First, incremental snapshots has been a tremendous success. The feedback we’ve gotten from the community has been overwhelmingly positive about how this process works and how its helped streamline capturing changes, particularly for users with very large datasets. So we took an opportunity in this release to build upon that momentum and introduced several new options:

  • The ability to stop an in-progress incremental snapshot

  • Support the use of regular expressions

Stopping incremental snapshots

Since we first introduced incremental snapshots, users have asked for a way to stop an in-progress snapshot. To accomplish this, we have added a new signal, stop-snapshot, which allows stopping an in-progress incremental snapshot. This signal is to be sent just like any other, by inserting a row into the signal table/collection, as shown below:

INSERT INTO schema.signal_table (id, type,data)
VALUES ('unique-id', 'stop-snapshot', '_<signal payload>_`);

The stop-snapshot payload looks very similar to its execute-snapshot counterpart. An example:

{
  "data-collections": ["schema1.table1", "schema2.table2"],
  "type": "incremental"
}

This example removes both schema1.table1 and schema2.table2 from the incremental snapshot, so long as the table or collection had not already finished its incremental snapshot. If other tables or collections remain outstanding after the removal of those specified by data-collections, the incremental snapshot will continue to process those that are outstanding. If no other table or collection remains, the incremental snapshot will stop.

Another example of a stop-snapshot payload is quite simply:

{
  "type": "incremental"
}

This example does not specify the data-collections property, it is optional for the stop-snapshot signal. When this property isn’t specified, the signal implies the current in-progress incremental snapshot should be stopped entirely. This gives the ability to stop an incremental snapshot without knowledge of the current or outstanding tables or collections yet to be captured.

Signals support regular expressions

Incremental snapshot signals have required the use of explicit table/collection names in the data-collections payload attribute. While this worked well, there may be situations where broad capture configurations could take advantage of regular expression usage. We already support regular expressions in connector configuration options, such as include/exclude lists, so it made sense to extend that to incremental snapshots as well.

Starting in Debezium 2.0, all incremental snapshot signals can use regular expressions in the data-collections payload property. Using one of the stop signal examples from above, the payload can be rewritten using regular expressions:

{
  "data-collections": ["schema[1|2].table[1|2]"],
  "type": "incremental"
}

Just like the explicit usage, this signal with regular expressions would also stop both schema1.table1 and schema2.table2.

Removal of MongoDB oplog support

In Debezium 1.8, we introduced the new MongoDB change stream feature while also deprecating the oplog implementation. The transition to change streams offers a variety of benefits, such as being able to stream changes from non-primary nodes, the ability to emit update events with a full document representation for downstream consumers, and so much more. In short, change streams is just a much more superior way to perform change data capture with MongoDB.

The removal of the oplog implementation also means that MongoDB 3.x is no longer supported. If you are using MongoDB 3.x, you will need to upgrade to at least MongoDB 4.0 or later with Debezium 2.0.

Configuration option clean-up

Debezium 1.x has seen a lot of evolution over the years. We added connector-specific options to handle migration or specific features that have been deprecated or even replaced by common options that are universal for all connectors. One of the major tasks for Debezium 2.0 is to do some internal housekeeping on configuration options as many have been deprecated.

With that, there is also more configuration housekeeping coming in the future when we look at option namespaces. Suffice to say, it will be important as a part of the upgrade path to review the connector’s documentation on its relevant options with current connector configurations. You just might find that you can streamline your configurations with fewer options or that some option names have changed entirely.

Other fixes & improvements

There are several bugfixes and stability changes in this release, some noteworthy are:

  • Postgres existing publication is not updated with the new table DBZ-3921

  • MySQL connector increment snapshot failed parse datetime column length when connector set "snapshot.fetch.size": 20000 DBZ-4939

  • DateTimeParseException: Text 'infinity' could not be parsed in Postgres connector DBZ-5014

  • PostgreSQL ENUM default values are missing from generated schema DBZ-5038

  • All connectors now use multi-partitioned codebase DBZ-5042

  • Oracle LogMiner: records missed during switch from snapshot to streaming mode DBZ-5085

  • Introduce a new field "ts_ms" to identify the process time for schema change event DBZ-5098

  • Parsing zero-day fails DBZ-5099

Altogether, an amazing 110 issues were fixed for this release.

What’s Next?

So while this release is a bit behind schedule, Debezium 2.0 is shaping up quite well.

The next major milestones includes unifying snapshot modes across connectors, a new Snapshotter API for all connectors, compactable JSON database history, offset unification, offset storage API and much more. So the coming weeks do have a lot in store, as we continue to work on Debezium 2.0. And as usual, you can expect some (hopefully all) of these in approximately 3-weeks, sticking to our usual release cadence.

Until then, let the data capturing continue!

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.