Subscribe


Debezium 0.7.5 Is Released

It’s my pleasure to announce the release of Debezium 0.7.5!

This is a bugfix release to the 0.7 release line, which we decided to do while working towards Debezium 0.8. Most notably it fixes an unfortunate bug introduced in 0.7.3 (DBZ-663), where the internal database history topic of the Debezium MySQL connector could be partly deleted under some specific conditions. Please see the dedicated blog post on this issue to find out whether this affects you and what you should do to prevent this issue.

Together with this, we released a couple of other fixes and improvements. Thanks to Maciej Brynski, the performance of the logical table routing SMT has been improved significantly (DBZ-655). Another fix contributed by Maciej is for DBZ-646 which lets the MySQL connector handle CREATE TABLE statements for the TokuDB storage engine now.

And we got some more bugfixes by our fantastic community: Long-term community member Peter Goransson fixed an issue about the snapshot JMX metrics of the MySQL connector, which are now also accessible after the snapshot has been completed (DBZ-640). Andrew Tongen spotted and fixed an issue for the Debezium embedded engine (DBZ-665) which caused offsets to be committed more often than needed. And Matthias Wessendorf upgraded the Debezium dependencies and Docker images to Apache Kafka 1.0.1 (DBZ-647).

Thank you all for your help!

Please refer to the change log for the complete list of changes in Debezium 0.7.5.

What’s next?

Please see the previous release announcement for the next planned features. Due to the unplanned 0.7.5 release, though, the schedule of the next one will likely be extended a little bit.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.7.4 Is Released

It’s my pleasure to announce the release of Debezium 0.7.4!

Continuing the 0.7 release line, this new version brings several bug fixes and a handful of new features. We recommend this upgrade to all users. When upgrading from earlier versions, please check out the release notes of all versions between the one you’re currently on and 0.7.4 in order to learn about any steps potentially required for upgrading.

New features

In terms of new features, there’s a new mode for handling decimal columns in Postgres and MySQL (DBZ-611). By setting the decimal.handling.mode connector option to string, Debezium will emit decimal and numeric columns as Strings. That oftentimes is easier to handle for consumers than the byte-array based representation used by default, while keeping the full precision. As a bonus, string also allows to convey the special numeric values NaN and Infinity as supported by Postgres. Note that this functionality required an update to Debezium’s logical decoding plug-in which runs within the Postgres database server. This plug-in must be upgraded to the new version before upgrading the Debezium Postgres connector.

Speaking of byte arrays, the BYTEA column type in Postgres is now also supported (DBZ-605).

For the MySQL connector, there’s a new option to the snapshotting routine: snapshot.locking.mode (DBZ-602). By setting this to NONE, this option allows to skip any table locks during snapshotting. This should be used if and only if you’re absolutely sure that the tables don’t undergo structural changes (columns added, removed etc.) while the snapshot is taken. But if that’s guaranteed, the new mode can be a useful tool for increasing overall system performance, as writes by concurrent processes won’t be blocked. That’s especially useful on environments such as Amazon RDS, where the connector otherwise would be required to keep a lock for the entirety of the snapshot. The new option supersedes the existing snapshot.minimal.locks option. Please see the connector documentation for the details. This feature was contributed by our community member Stephen Powis; many thanks to you!

Bug Fixes

0.7.4 brings multiple fixes related to how numeric columns are handled. E.g. columns without scale couldn’t correctly be processed by the MySQL connector during binlog reading (DBZ-615). That’s fixed now. And when using the Postgres connector, arbitrary precision column values are correctly converted into change data message fields now (DBZ-351).

We also noticed a regression introduced in Debezium 0.6: the field schema for NUMERIC columns was always marked as optional, also if that column was actually declared as NOT NULL. The same affected geo-spatial array types on Postgres as supported as of Debezium 0.7. This has been fixed with DBZ-635. We don’t expect any impact on consumers by this change (just as before, they’ll always get a value for such field, only its schema won’t be incorrectly marked as optional any more).

Please see the full change log for more details and the complete list of issues fixed in Debezium 0.7.4.

What’s next?

Following our three weeks release cadence, the next Debezium release is planned for March 28th. We got some exciting changes in the works for that: if things go as planned, we’ll release the first version of our Oracle connector (DBZ-20). This will be based on the Oracle XStream API in the first iteration and not support snapshots yet. But we felt it’d make sense to roll out this connector incrementally, so to get out the new feature early on and collect feedback on it. We’ve also planned to explore alternatives to using the XStream API in future releases.

Another great new feature will be Reactive Streams support (DBZ-566). Based on top of the existing embedded mode, this will make it very easy to consume change data events using Reactive Streams implementations such as RxJava 2, the Java 9 Flow API and many more. It’ll also be very useful to consume change events in reactive frameworks such as Vert.x. We’re really looking forward to shipping this feature and already have a pending pull request for it. If you like, take a look and let us know about your feedback!

Please also check out our roadmap for the coming months of Debezium’s development. This is our current plan for the things we’ll work on, but it’s not cast in stone, so please tell us about your feature requests by sending a message to our Google group. We’re looking forward to your feedback!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.7.3 Is Released

I’m very happy to announce the release of Debezium 0.7.3!

This is primarily a bugfix release, but we’ve also added a handful of smaller new features. It’s a recommended upgrade for all users. When upgrading from earlier versions, please check out the release notes of all versions between the one your’re currently on and 0.7.3 in order to learn about any steps potentially required for upgrading.

Let’s take a closer look at some of the new features.

All Connectors

Using the new connector option tombstones.on.delete you can now control whether upon record deletions a tombstone event should be emitted or not (DBZ-582). Doing so is usually the right thing and thus remains the default behaviour. But disabling tombstones may be desirable in certain situations, and this gets a bit easier now using that option (before you’d have to use an SMT - single message transform -, which for instance isn’t supported when using Debezium’s embedded mode). This feature was contributed by our community member Raf Liwoch. Thanks!

We’ve also spent some time on a few operational aspects: The sourceInfo element of Debezium’s change data messages contains a new field representing the version of the connector that created the message (DBZ-593). This lets message consumers take specific action based on the version. For instance this can be helpful where a new Debezium release fixes a bug, which consumers could work around so far. Now, after the update to that new Debezium version, that workaround should not be applied anymore. The version field will allow consumers to decide whether to apply the workaround or not.

The names of all the threads managed by Debezium are now structured in the form of "debezium-<connector>-…​" (DBZ-587). This helps with identifying Debezium’s threads when analyzing thread dumps for instance.

Postgres Connector

Here we’ve focused on improving the support for array types: besides fixing a bug related to numeric arrays (DBZ-577) we’ve also completed the support for the PostGIS types (which was introduced in 0.7.2), allowing you to capture array columns of types GEOMETRY and GEOGRAPHY.

Snapshots are now correctly interruptable (DBZ-586) and the connector will correctly handle the case where after a restart it should continue from a WAL position which isn’t available any more: it’ll stop, requiring you to do a new snapshot (DBZ-590).

MySQL Connector

The MySQL connector can create the DB history topic automatically, if needed (DBZ-278). This means you don’t have to create that topic yourself and you also don’t need to rely on Kafka’s automatic topic creation any longer (any change data topics will automatically be created by Kafka Connect).

Also the connector can optionally emit messages to a dedicated heartbeat topic in a configurable interval (DBZ-220). This comes in handy in situations where you only want to capture tables with low traffic, while other tables in the database are changed more frequently. In that case, no messages would have been emitted to Kafka Connect for a long time, and thus no offset would have been committed either. This could have caused trouble when restarting the connector: it wanted to resume from the last comitted offset, which may not be available in the binlogs any longer. But as the captured tables didn’t change, it actually wouldn’t be necessary to resume from such old binlog position. This all is avoided by emitting messages to the heartbeat topic regularly, which causes the last offset the connector has seen to be committed.

We’ll roll out this change to the other connectors, too, in future releases.

What’s next?

Please see the full change log for more details and the complete list of issues fixed in Debezium 0.7.3.

The next release is scheduled for March 7th. We’ll still have to decide whether that will be 0.7.4 or 0.8.0, depending on how far we are by then with our work on the Oracle connector (DBZ-137).

Please also our roadmap describing our ideas for future development of Debezium. This is our current thinking of the things we’d like to tackle in the coming months, but it’s not cast in stone, so please let us know about your feature requests by sending a message to our Google group. We’re looking forward to your feedback!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.7.2 Is Released

It’s my pleasure to announce the release of Debezium 0.7.2!

Amongst the new features there’s support for geo-spatial types, a new snapshotting mode for recovering a lost DB history topic for the MySQL connector, and a message transformation for converting MongoDB change events into a structure which can be consumed by many more sink connectors. And of course we fixed a whole lot of bugs, too.

Debezium 0.7.2 is a drop-in replacement for previous 0.7.x versions. When upgrading from versions earlier than 0.7.0, please check out the release notes of all 0.7.x releases to learn about any steps potentially required for upgrading.

A big thank you goes out to our fantastic community members for their hard work on this release: Andrey Pustovetov, Denis Mikhaylov, Peter Goransson, Robert Coup, Sairam Polavarapu and Tom Bentley.

Now let’s take a closer look at some of new features.

MySQL Connector

The biggest change of the MySQL connector is support for geo-spatial column types such as GEOMETRY, POLYGON, MULTIPOINT etc.

There are two new logical field types — io.debezium.data.geometry.Geometry and io.debezium.data.geometry.Geography — for representing geo-spatial columns in change data messages. These types represent geo-spatial data via WKB ("well-known binary") and SRID (coordinate reference system identifier), allowing downstream consumers to interpret the change events using any existing library with support for parsing WKB. A blog post with more details on this will follow soon.

The new snapshotting mode schema_only_recovery comes in handy when for some reason you lost (parts of) the DB history topic used by the MySQL connector. It’s also useful if you’d like to compact that topic by re-creating it. Please refer to the connector documentation for the details of this mode, esp. when it’s safe (and when not) to make use of it.

Another new feature related to managing the size of the DB history topic is the option to control whether to include all DDL events or only those pertaining to tables captured as per the whitelist/blacklist configuration. Again, check out the connector docs to learn more about the specifics of that setting.

Finally, we fixed a few shortcomings of the MySQL DDL parser (DBZ-524, DBZ-530).

PostgreSQL Connector

Similar to the MySQL connector, there’s largely improved support for geo-spatial columns in Postgres now. More specifically, PostGIS column types can be represented in change data events now. Thanks a lot for Robert Coup who contributed this feature!

Also the support for Postgres array columns has been expanded, e.g. we now support to track changes to VARCHAR and DATE array columns. Note that the connector doesn’t yet work with geo-spatial array columns (should you ever have those), but this should be added soon, too.

If you’d like to include just a subset of the rows of a captured table in snapshots, you may like the ability to specify dedicated SELECT statements to do so. For instance this can be used to exclude any logically deleted records — which you can recognize based on some flag in that table — from the snapshot.

A few bugs in this connector where reported and fixed by community members, too, e.g. the connector can be correctly paused now (thanks, Andrey Pustovetov), and we fixed an issue which could potentially have committed an incorrect offset to Kafka Connect (thanks, Thon Mekathikom).

MongoDB Connector

If you’ve ever compared the structures of change events emitted by the Debezium RDBMS connectors (MySQL, Postgres) and the MongoDB connector, you’ll know that the message structure of the latter is a bit different than the others. Due to the schemaless nature of MongoDB, the change events essentially contain a String with a JSON representation of the applied insert or patch. This structure cannot be consumed by existing sink connectors, such as the Confluent connectors for JDBC or Elasticsearch.

This gets possible now by means of a newly added single message transformation (SMT), which parses these JSON strings and creates a structured Kafka Connect record from it (thanks, Sairam Polavarapu!). When applying this SMT to the JDBC sink connector, you can now stream data changes from MongoDB to any supported relational database.

Note that this SMT is work-in-progress, details of its emitted message structure may still change. Also there are some inherent limitations to what can be achieved with it, if you e.g. have arrays in your MongoDB documents, the record created by this SMT will be structured accordingly, but many sink connectors cannot process such structure.

We have some ideas for further development here, e.g. there could be an option for flattening out (non-array) nested structures, so that e.g. { "address" { "street" : "..." } } would be represented as address_street, which then could be consumed by sink connectors expecting a flat structure.

The new SMT is described in detail in our docs.

What’s next?

Please see the full change log for more details and the complete list of issues fixed in Debezium 0.7.2.

The 0.7.3 release is scheduled for February 14th.

We’ll focus on some more bug fixes, also we’re working on having Debezium regulary emit heartbeat messages to a dedicated topic. This will be practical for diagnostic purposes but also help to regularly trigger commits of the offset in Kafka Connect. That’s beneficial in certain situations when capturing tables which only very infrequently change.

We’ve also worked out a roadmap describing our ideas for future work on Debezium, going beyond the next bugfix releases. While nothing is cast in stone, this is our idea of the features to add in the coming months. If you miss anything important on this roadmap, please tell us either in the comments below or send a message to our Google group. Looking forward to your feedback!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.7.1 Is Released

Just last few days before Christmas we are releasing Debezium 0.7.1! This is a bugfix release that fixes few annoying issues that were found during first rounds of use of Debezium 0.7 by our community. All issues relate to either newly provided wal2json support or reduced risk of internal race condition improvement.

Robert Coup has found a performance regression in situations when 0.7.0 was used with old version of Protobuf decoder.

Suraj Savita (and others) has found an issue when our code failed to correctly detect it runs with Amazon RDS wal2json plug-in. We are outsmarted by the JDBC driver internals and included a distinct plugin decoder name wal2json_rds that bypasses detection routine and by default expects it runs against Amazon RDS instance. This mode should be used only with RDS instances.

We have also gathered feedback from first tries to run with Amazon RDS and included a short section in our documentation on this topic.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


back to top