I’m very happy to announce the release of Debezium 0.10.0.Alpha1!
The major theme for Debezium 0.10 will be to do some clean-up (that’s what you do at this time of the year, right?); we’ve planned to remove a few deprecated features and to streamline some details in the structure the CDC events produced by the different Debezium connectors.
This means that upgrading to Debezium 0.10 from earlier versions might take a bit more planning and consideration compared to earlier upgrades, depending on your usage of features and options already marked as deprecated in 0.9 and before. But no worries, we’re describing all changes in great detail in this blog post and the release notes.
Why?
First of all, let’s discuss a bit why we’re doing these changes.
Over the last three years, Debezium has grown from supporting just a single database into an entire family of CDC connectors for a range of different relational databases and MongoDB, as well as accompanying components such as message transformations for topic routing or implementing the outbox pattern.
As in any mature project, over time we figured that a few things should be done differently in the code base than we had thought at first. For instance we moved from a hand-written parser for processing MySQL DDL statements to a much more robust implementation based on Antlr. Also we realized the way certain temporal column types were exported was at risk of value overflow in certain conditions, so we added a new mode not prone to these issues. As a last example, we made options like the batch size used during snapshotting consistent across the different connectors.
Luckily, Debezium quickly gained traction and despite the 0.x version number, it is used heavily in production at a large number of organizations, and users rely on its stability. So whenever we did such changes, we aimed at making the upgrade experience as smooth as possible; usually that means that the previous behavior is still available but is marked as deprecated in the documentation, while a new improved option, implementation etc. is added and made the default behavior.
At the same time we realized that there are a couple of differences between the connectors which shouldn’t really be there. Specifically, the source
block of change events has some differences which make a uniform handling by consumers more complex than it should be; for instance the timestamp field is named "ts_sec" in MySQL events but "ts_usec" for Postgres.
With all this in mind, we decided that it is about time to clean up these issues. This done for a couple of purposes:
-
Keeping the code base maintainable and open for future development by removing legacy code such as deprecated options and their handling as well as the legacy MySQL DDL parser
-
Making CDC events from different connectors easier to consume by unifying the
source
block created by the different connectors as far as possible -
Preparing the project to go to version 1.0 with an even stronger promise of retaining backwards compatibility than already practiced today
What?
Now as we have discussed why we feel it’s time for some "clean-up", let’s take a closer look at the most relevant changes. Please also refer to the "breaking changes" section of the migration notes for more details.
-
The legacy DDL parser for MySQL has been removed (DBZ-736); if you are not using the Antlr-based one yet (it was introduced in 0.8 and became the default in 0.9), it’s highly recommended that you test it with your databases. Should you run into any parsing errors, please report them so we can fix them for the 0.10 Final release.
-
The SMTs for retrieving the new record/document state from change events have been renamed from
io.debezium.transforms.UnwrapFromEnvelope
andio.debezium.connector.mongodb.transforms.UnwrapFromMongoDbEnvelope
intoExtractNewRecordState
andExtractNewDocumentState
, respectively (DBZ-677). The old names can still be used as of 0.10, but doing so will raise a warning. They are planned for removal in Debezium 0.11. -
Several connector options that were deprecated in earlier Debezium versions have been removed (DBZ-1234): the
drop.deletes
option of new record/document state extraction SMTs (superseded bydelete.handling.mode
option), therows.fetch.size
option (superseded bysnapshot.fetch.size
), theadaptive
value oftime.precision.mode
option for MySQL (prone to value loss, useadaptive_microseconds
instead) and thesnapshot.minimal.locks
for the MySQL connector (superseded bysnapshot.locking.mode
) -
Several option names of the (incubating) SMT for the outbox pattern have been renamed for the sake of consistency (DBZ-1289)
-
Several fields within the
source
block of CDC events have been renamed for the sake of consistency (DBZ-596); as this is technically a backwards-incompatible change when using Avro and the schema registry, we’ve added a connector optionsource.struct.version
which, when set to the valuev1
, will have connectors produce the previoussource
structure.v2
is the default and any consumers should be adjusted to work with the newsource
structure as soon as possible.
New Features and Bugfixes
Besides these changes, the 0.10.0.Alpha1 release also contains some feature additions and bug fixes:
-
The SQL Server connector supports custom SELECT statements for snapshotting (DBZ-1224)
-
database, schema and table/collection names have been added consistently to the
source
block for CDC events from all connectors (DBZ-875) -
Client authentication works for the MySQL connector(DBZ-1228)
-
The embedded engine doesn’t duplicate events after restarts any longer (DBZ-1276)
-
A parser bug related to
CREATE INDEX
statements was fixed (DBZ-1264)
Overall, 30 issues were addressed in this release. Many thanks to Arkoprabho Chakraborti, Ram Satish and Yuchao Wang for their contributions to this release!
Speaking of contributors, we did some housekeeping to the list of everyone ever contributing to Debezium, too. Not less than exactly 111 individuals have contributed code up to this point, which is just phenomenal! Thank you so much everyone, you folks rock!
Outlook
Going forward, there are some more details we’d like to unify across the different connectors before going to Debezium 0.10 Final. For instance the source
attribute snapshot
will be changed so it can take one of three states: true
, false
or last
(indicating that this event is the last one created during initial snapshotting).
We’ll also continue our efforts to to migrate the existing Postgres connector to the framework classes established for the SQL Server and Oracle connectors. Another thing we’re actively exploring is how the Postgres could take advantage of the "logical replication" feature added in Postgres 10. This may provide us with a way to ingest change events without requiring a custom server-side logical decoding plug-in, which proves challenging in cloud environments where there’s typically just a limited set of logical decoding options available.
Gunnar Morling
Gunnar is a software engineer at Decodable and an open-source enthusiast by heart. He has been the project lead of Debezium over many years. Gunnar has created open-source projects like kcctl, JfrUnit, and MapStruct, and is the spec lead for Bean Validation 2.0 (JSR 380). He’s based in Hamburg, Germany.
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.