Subscribe


Debezium 0.9.0.Beta1 Released

It’s my pleasure to announce the release of Debezium 0.9.0.Beta1! Oh, and to those of you who are celebrating it — Happy Thanksgiving!

This new Debezium release comes with several great improvements to our work-in-progress SQL Server connector:

  • Initial snapshots can be done using the snapshot isolation level if enabled in the DB (DBZ-941)

  • Changes to the structures of captured tables after the connector has been set up are supported now (DBZ-812)

  • New connector option decimal.handling.mode (DBZ-953) and pass-through of any database.* option to the JDBC driver (DBZ-964)

Besides that, we spent some time on supporting the latest versions of the different databases. The Debezium connectors now support Postgres 11 (DBZ-955) and MongoDB 4.0 (DBZ-974). We are also working on supporting MySQL 8.0, which should be completed in the next 0.9.x release. The Debezium container images have been updated to Kafka 2.0.1 (DBZ-979) and the Kafka Connect image now supports the STATUS_STORAGE_TOPIC environment variable, bringing consistency with CONFIG_STORAGE_TOPIC and OFFSET_STORAGE_TOPIC that already were supported before (DBZ-893).

As usual, several bugs were fixed, too. Several of them dealt with the new Antlr-based DDL parser for the MySQL connector. By now we feel confident about its implementation, so it’s the default DDL parser as of this release (DBZ-757). If you would like to continue to use the legacy parser for some reason, you can do so by setting the ddl.parser.mode connector option to "legacy". This implementation will remain available in the lifetime of Debezium 0.9.x and is scheduled for removal after that. So please make sure to fail issues in JIRA should you run into any problems with the Antlr parser.

Overall, this release contains 21 fixes. Thanks a lot to all the community members who helped with making this happen: Anton Martynov, Deepak Barr, Grzegorz Kołakowski, Olavi Mustanoja, Renato Mefi, Sagar Rao and Shivam Sharma!

What else?

While the work towards Debezium 0.9 continues, we’ve lately been quite busy with presenting Debezium at multiple conferences. You can find the slides and recordings from Kafka Summit San Francisco and Voxxed Days Microservices on our list of online resources around Debezium.

There you also can find the links to the slides of the great talk "The Why’s and How’s of Database Streaming" by Joy Gao of WePay, a Debezium user of the first hour, as well as the link to a blog post by Hans-Peter Grahsl about setting up a CDC pipeline from MySQL into Cosmos DB running on Azure. If you know about other great articles, session recordings or similar on Debezium and change data capture which should be added there, please let us know.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.9.0.Alpha2 Released

It’s my pleasure to announce the release of Debezium 0.9.0.Alpha2!

While the work on the connectors for SQL Server and Oracle continues, we decided to do another Alpha release, as lots of fixes and new features - many of them contributed by community members - have piled up, which we wanted to get into your hands as quickly as possible.

This release supports Apache Kafka 2.0, comes with support for Postgres' HSTORE column type, allows to rename and filter fields from change data messages for MongoDB and contains multiple bug fixes and performance improvements. Overall, this release contains 55 fixes (note that a few of these have been merged back to 0.8.x and are contained in earlier 0.8 releases, too).

A big "Thank You" is in order to community members Andrey Pustovetov, Artiship Artiship, Cliff Wheadon, Deepak Barr, Ian Axelrod, Liu Hanlin, Maciej Bryński, Ori Popowski, Peng Lyu, Philip Sanetra, Sagar Rao and Syed Muhammad Sufyian for their contributions to this release. We salute you!

Kafka Upgrade

Debezium runs with and has been tested on top of the recently released Apache Kafka 2.0 (DBZ-858). The widely used version Kafka 1.x continues to be supported as well.

Note that 0.10.x is not supported due to Debezium’s usage of the admin client API which is only available in later versions. It shouldn’t be too hard to work around this, so if someone is interested in helping out with this, this would be a great contribution (see DBZ-883).

Support for HSTORE columns in Postgres

Postgres is an amazingly powerful and flexible RDBMS, not the least due to its wide range of column types which go far beyond what’s defined by the SQL standard. One of these types being HSTORE, which is a string-to-string map essentially.

Debezium can capture changes to columns of this type now (DBZ-898). By default, the field values will be represented using Kafka Connect’s map data type. As this may not be supported by all sink connectors, you might alternatively represent them as a string-ified JSON by setting the new hstore.handling.mode connector option to json. In this case, you’d see HSTORE columns represented as values in change messages like so: { "key1" : "val1", "key2" : "val2" }.

Field filtering and renaming for MongoDB

Unlike the connectors for MySQL and Postgres, the Debezium MongoDB connector so far didn’t allow to exclude single fields of captured collections from CDC messages. Also renaming them wasn’t supported e.g. by means of Kafka’s ReplaceField SMT. The reason being that MongoDB doesn’t mandate a fixed schema for the documents of a given collection, and documents therefore are represented in change messages using a single string-ified JSON field.

Thanks to the fantastic work of community member Andrey Pustovetov, this finally has changed, i.e. you can remove given fields (DBZ-633) now from the CDC messages of given collections or have them renamed (DBZ-881). Please refer to the description of the new connector options field.blacklist and field.renames in the MongoDB connector documentation to learn more.

Extended source info

Another contribution by Andrey is the new optional connector field within the source info block of CDC messages (DBZ-918). This tells the type of source connector that produced the messages ("mysql", "postgres" etc.), which can come in handy in cases where specific semantics need to be applied on the consumer side depending on the type of source database.

Bug fixes and version upgrades

The new release contains a good number of bug fixes and other smaller improvements. Amongst them are

  • correct handling of invalid temporal default values with MySQL (DBZ-927),

  • support for table/collection names with special characters for MySQL (DBZ-878) and MongoDB (DBZ-865) and

  • fixed handling of blacklisted tables with the new Antlr-based DDL parser (DBZ-872).

Community member Ian Axelrod provided a fix for a potential performance issue, where changes to tables with TOAST columns in Postgres would cause repeated updates to the connector’s internal schema metadata, which can be a costly operation (DBZ-911). Please refer to the Postgres connector documentation for details on the new schema.refresh.mode option, which deals with this issue.

In terms of version upgrades we migrated to the latest releases of the MySQL (DBZ-763, DBZ-764) and Postgres drivers (DBZ-912). The former is part of a longer stream of work leading towards support of MySQL 8 which should be finished in one of the next Debezium releases. For Postgres we provide a Docker image with Debezium’s supported logical decoding plug-ins based on Alpine now, which might be interesting to those concerned about container size (DBZ-705).

Please see the change log for the complete list of fixed issues.

What’s next?

The work towards Debezium 0.9 continues, and we’ll focus mostly on improvements to the SQL Server and Oracle connectors. Other potential topics include support for MySQL 8 and native logical decoding as introduced with Postgres 10, which should greatly help with using the Debezium Postgres connectors in cloud environments such as Amazon RDS.

We’ll also be talking about Debezium at the following conferences:

Already last week I had the opportunity to present Debezium at JUG Saxony Day. If you are interested, you can find the (German) slideset of that talk on Speaker Deck.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.8.3.Final Released

As temperatures are cooling off, the Debezium team is getting into full swing again and we’re happy to announce the release of Debezium 0.8.3.Final!

This is a bugfix release to the current stable release line of Debezium, 0.8.x, while the work on Debezium 0.9 goes on in parallel. There are 14 fixes in this release. As in earlier 0.8.x releases, we’ve further improved the new Antlr-based DDL parser used by the MySQL connector (see DBZ-901, DBZ-903 and DBZ-910).

The Postgres connector saw a huge improvement to its start-up time for databases with lots of custom types (DBZ-899). The user reporting this issue had nearly 200K entries in pg_catalog.pg_type, and due to an N + 1 SELECT issue within the Postgres driver itself, this caused the connector to take 24 minutes to start. By using a custom query for obtaining the type metadata, we were able to cut down this time to 5 seconds! Right now we’re working with the maintainers of the Postgres driver to get this issue fixed upstream, too.

More Flexible Propagation of DELETEs

Besides those bug fixes we decided to also merge one new feature from the 0.9.x branch into the 0.8.3.Final release, which those of you may find useful who are using the SMT for extracting the "after" state from change events (DBZ-857).

This SMT can be employed to stream changes to sink connectors which expect just a "flat" row representation of data instead of Debezium’s complex event structure. Not all sink connectors support the handling of deletions, though. E.g. some connectors will fail when encountering tombstone events. Therefore the SMT can now optionally rewrite delete events into updates of a special "deleted" marker field.

For that, set the delete.handling.mode option of the SMT to "rewrite":

...
"transforms" : "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.delete.handling.mode" : "rewrite",
...

When a DELETE event is propagated, the "__deleted" field of outgoing records will be set to true. So when for instance consuming the events with the JDBC sink connector, you’d see this being reflected in a corresponding column in the sink tables:

__deleted | last_name |  id  | first_name |         email
-----------+-----------+------+------------+-----------------------
false     | Thomas    | 1001 | Sally      | sally.thomas@acme.com
false     | Bailey    | 1002 | George     | gbailey@foobar.com
false     | Kretchmar | 1004 | Anne       | annek@noanswer.org
true      | Walker    | 1003 | Edward     | ed@walker.com

You then for instance can use a batch job running on your sink to remove all records flagged as deleted.

What’s next?

We’re continuing the work on Debezium 0.9, which will mostly be about improvements to the SQL Server and Oracle connectors. The current plan is to do the next 0.9 release (either Alpha2 or Beta1) in two weeks from now.

Also it’s the beginning of the conference season, so we’ll spend some time with preparing demos and presenting Debezium at multiple locations. There will be sessions on change data capture with Debezium a these conferences:

If you are at any of these conferences, come and say Hi; we’d love to exchange with you about your use cases, feature requests, feedback on our roadmap and any other ideas around Debezium.

Finally, a big "Thank You" goes to our fantastic community members Andrey Pustovetov, Maciej Bryński and Peng Lyu for their contributions to this release!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.8.2 Released

The Debezium team is back from summer holidays and we’re happy to announce the release of Debezium 0.8.2!

This is a bugfix release to the current stable release line of Debezium, 0.8.x, while the work on Debezium 0.9 is continuing.

Note: By accident the version of the release artifacts is 0.8.2 instead of 0.8.2.Final. This is not in line with our recently established convention of always letting release versions end with qualifiers such as Alpha1, Beta1, CR1 or Final. The next version in the 0.8 line will be 0.8.3.Final and we’ll improve our release pipeline to make sure that this situation doesn’t occur again.

The 0.8.2 release contains 10 fixes overall, most of them dealing with issues related to DDL parsing as done by the Debezium MySQL connector. For instance, implicit non-nullable primary key columns will be handled correctly now using the new Antlr-based DDL parser (DBZ-860). Also the MongoDB connector saw a bug fix (DBZ-838): initial snapshots will be interrupted now if the connector is requested to stop (e.g. when shutting down Kafka Connect). More a useful improvement rather than a bug fix is the Postgres connector’s capability to add the table, schema and database names to the source block of emitted CDC events (DBZ-866).

Thanks a lot to community members Andrey Pustovetov, Cliff Wheadon and Ori Popowski for their contributions to this release!

What’s next?

We’re continuing the work on Debezium 0.9, which will mostly be about improvements to the SQL Server and Oracle connectors. Both will get support for handling structural changes to captured tables while the connectors are running. Also the exploration of alternatives to using the XStream API for the Oracle connector continues.

Finally, a recurring theme of our work is to further consolidate the code bases of the different connectors, which will allow us to roll out new and improved features more quickly across all the Debezium connectors. The recently added Oracle and SQL Server connectors already share a lot of code, and in the next step we’ve planned to move the existing Postgres connector to the new basis established for these two connectors.

If you’d like to learn more about some middle and long term ideas, please check out our roadmap. Also please get in touch with us if you got any ideas or suggestions for future development.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.8 Final Is Released

I’m very happy to announce the release of Debezium 0.8.0.Final!

The key features of Debezium 0.8 are the first work-in-progress version of our Oracle connector (based on the XStream API) and a brand-new parser for MySQL DDL statements. Besides that, there are plenty of smaller new features (e.g. propagation of default values to corresponding Connect schemas, optional propagation of source queries in CDC messages and a largely improved SMT for sinking changes from MongoDB into RDBMS) as well as lots of bug fixes (e.g. around temporal and numeric column types, large transactions with Postgres).

Please see the previous announcements (Beta 1, CR 1) to learn about all the changes in more depth. The Final release largely resembles CR1; apart from further improvements to the Oracle connector (DBZ-792) there’s one nice addition to the MySQL connector contributed by Peter Goransson: when doing a snapshot, it will now expose information about the processed rows via JMX (DBZ-789), which is very handy when snapshotting larger tables.

Please take a look at the change log for the complete list of changes in 0.8.0.Final and general upgrade notes.

What’s next?

We’re continuing our work on the Oracle connector. The work on initial snapshotting is well progressing and it should be part of the next release. Other improvements will be support for structural changes to captured tables after the initial snapshot has been made, more extensive source info metadata and more. Please track DBZ-716 for this work; the improvements are planned to be released incrementally in the upcoming versions of Debezium.

We’ve also started to explore ingesting changes via LogMiner. This is more involved in terms of engineering efforts than using XStream, but it comes with the huge advantage of not requiring a separate license (LogMiner comes with the Oracle database itself). It’s not quite clear yet when we can release something on this front, and we’re also actively exploring further alternatives. But we are quite optimistic and hope to have something some time soon.

The other focus of work is a connector for SQL Server (see DBZ-40). Work on this has started as well, and there should be an Alpha1 release of Debezium 0.9 with a first drop of that connector within the next few weeks.

To find out about some more long term ideas, please check out our roadmap and get in touch with us, if you got any ideas or suggestions for future development.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


back to top