Subscribe


Debezium 0.9.0.Beta1 Released

It’s my pleasure to announce the release of Debezium 0.9.0.Beta1! Oh, and to those of you who are celebrating it — Happy Thanksgiving!

This new Debezium release comes with several great improvements to our work-in-progress SQL Server connector:

  • Initial snapshots can be done using the snapshot isolation level if enabled in the DB (DBZ-941)

  • Changes to the structures of captured tables after the connector has been set up are supported now (DBZ-812)

  • New connector option decimal.handling.mode (DBZ-953) and pass-through of any database.* option to the JDBC driver (DBZ-964)

Besides that, we spent some time on supporting the latest versions of the different databases. The Debezium connectors now support Postgres 11 (DBZ-955) and MongoDB 4.0 (DBZ-974). We are also working on supporting MySQL 8.0, which should be completed in the next 0.9.x release. The Debezium container images have been updated to Kafka 2.0.1 (DBZ-979) and the Kafka Connect image now supports the STATUS_STORAGE_TOPIC environment variable, bringing consistency with CONFIG_STORAGE_TOPIC and OFFSET_STORAGE_TOPIC that already were supported before (DBZ-893).

As usual, several bugs were fixed, too. Several of them dealt with the new Antlr-based DDL parser for the MySQL connector. By now we feel confident about its implementation, so it’s the default DDL parser as of this release (DBZ-757). If you would like to continue to use the legacy parser for some reason, you can do so by setting the ddl.parser.mode connector option to "legacy". This implementation will remain available in the lifetime of Debezium 0.9.x and is scheduled for removal after that. So please make sure to fail issues in JIRA should you run into any problems with the Antlr parser.

Overall, this release contains 21 fixes. Thanks a lot to all the community members who helped with making this happen: Anton Martynov, Deepak Barr, Grzegorz Kołakowski, Olavi Mustanoja, Renato Mefi, Sagar Rao and Shivam Sharma!

What else?

While the work towards Debezium 0.9 continues, we’ve lately been quite busy with presenting Debezium at multiple conferences. You can find the slides and recordings from Kafka Summit San Francisco and Voxxed Days Microservices on our list of online resources around Debezium.

There you also can find the links to the slides of the great talk "The Why’s and How’s of Database Streaming" by Joy Gao of WePay, a Debezium user of the first hour, as well as the link to a blog post by Hans-Peter Grahsl about setting up a CDC pipeline from MySQL into Cosmos DB running on Azure. If you know about other great articles, session recordings or similar on Debezium and change data capture which should be added there, please let us know.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.9.0.Alpha2 Released

It’s my pleasure to announce the release of Debezium 0.9.0.Alpha2!

While the work on the connectors for SQL Server and Oracle continues, we decided to do another Alpha release, as lots of fixes and new features - many of them contributed by community members - have piled up, which we wanted to get into your hands as quickly as possible.

This release supports Apache Kafka 2.0, comes with support for Postgres' HSTORE column type, allows to rename and filter fields from change data messages for MongoDB and contains multiple bug fixes and performance improvements. Overall, this release contains 55 fixes (note that a few of these have been merged back to 0.8.x and are contained in earlier 0.8 releases, too).

A big "Thank You" is in order to community members Andrey Pustovetov, Artiship Artiship, Cliff Wheadon, Deepak Barr, Ian Axelrod, Liu Hanlin, Maciej Bryński, Ori Popowski, Peng Lyu, Philip Sanetra, Sagar Rao and Syed Muhammad Sufyian for their contributions to this release. We salute you!

Kafka Upgrade

Debezium runs with and has been tested on top of the recently released Apache Kafka 2.0 (DBZ-858). The widely used version Kafka 1.x continues to be supported as well.

Note that 0.10.x is not supported due to Debezium’s usage of the admin client API which is only available in later versions. It shouldn’t be too hard to work around this, so if someone is interested in helping out with this, this would be a great contribution (see DBZ-883).

Support for HSTORE columns in Postgres

Postgres is an amazingly powerful and flexible RDBMS, not the least due to its wide range of column types which go far beyond what’s defined by the SQL standard. One of these types being HSTORE, which is a string-to-string map essentially.

Debezium can capture changes to columns of this type now (DBZ-898). By default, the field values will be represented using Kafka Connect’s map data type. As this may not be supported by all sink connectors, you might alternatively represent them as a string-ified JSON by setting the new hstore.handling.mode connector option to json. In this case, you’d see HSTORE columns represented as values in change messages like so: { "key1" : "val1", "key2" : "val2" }.

Field filtering and renaming for MongoDB

Unlike the connectors for MySQL and Postgres, the Debezium MongoDB connector so far didn’t allow to exclude single fields of captured collections from CDC messages. Also renaming them wasn’t supported e.g. by means of Kafka’s ReplaceField SMT. The reason being that MongoDB doesn’t mandate a fixed schema for the documents of a given collection, and documents therefore are represented in change messages using a single string-ified JSON field.

Thanks to the fantastic work of community member Andrey Pustovetov, this finally has changed, i.e. you can remove given fields (DBZ-633) now from the CDC messages of given collections or have them renamed (DBZ-881). Please refer to the description of the new connector options field.blacklist and field.renames in the MongoDB connector documentation to learn more.

Extended source info

Another contribution by Andrey is the new optional connector field within the source info block of CDC messages (DBZ-918). This tells the type of source connector that produced the messages ("mysql", "postgres" etc.), which can come in handy in cases where specific semantics need to be applied on the consumer side depending on the type of source database.

Bug fixes and version upgrades

The new release contains a good number of bug fixes and other smaller improvements. Amongst them are

  • correct handling of invalid temporal default values with MySQL (DBZ-927),

  • support for table/collection names with special characters for MySQL (DBZ-878) and MongoDB (DBZ-865) and

  • fixed handling of blacklisted tables with the new Antlr-based DDL parser (DBZ-872).

Community member Ian Axelrod provided a fix for a potential performance issue, where changes to tables with TOAST columns in Postgres would cause repeated updates to the connector’s internal schema metadata, which can be a costly operation (DBZ-911). Please refer to the Postgres connector documentation for details on the new schema.refresh.mode option, which deals with this issue.

In terms of version upgrades we migrated to the latest releases of the MySQL (DBZ-763, DBZ-764) and Postgres drivers (DBZ-912). The former is part of a longer stream of work leading towards support of MySQL 8 which should be finished in one of the next Debezium releases. For Postgres we provide a Docker image with Debezium’s supported logical decoding plug-ins based on Alpine now, which might be interesting to those concerned about container size (DBZ-705).

Please see the change log for the complete list of fixed issues.

What’s next?

The work towards Debezium 0.9 continues, and we’ll focus mostly on improvements to the SQL Server and Oracle connectors. Other potential topics include support for MySQL 8 and native logical decoding as introduced with Postgres 10, which should greatly help with using the Debezium Postgres connectors in cloud environments such as Amazon RDS.

We’ll also be talking about Debezium at the following conferences:

Already last week I had the opportunity to present Debezium at JUG Saxony Day. If you are interested, you can find the (German) slideset of that talk on Speaker Deck.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.9 Alpha1 and 0.8.1 Released

Just two weeks after the Debezium 0.8 release, I’m very happy to announce the release of Debezium 0.9.0.Alpha1!

The main feature of the new version is a first work-in-progress version of the long-awaited Debezium connector for MS SQL Server. Based on the CDC functionality available in the Enterprise and Standard editions, the new connector lets you stream data changes out of Microsoft’s popular RDBMS.

Besides that we’ve continued the work on the Debezium Oracle connector. Most notably, it supports initial snapshots of captured tables now. We’ve also upgraded Apache Kafka in our Docker images to 1.1.1 (DBZ-829).

Please take a look at the change log for the complete list of changes in 0.9.0.Alpha1 and general upgrade notes.

Note: At the time of writing (2018-07-26), the release artifacts (connector archives) are available on Maven Central. We’ll upload the Docker images for 0.9.0.Alpha1 to Docker Hub as soon as possible. The Docker images are already uplodaded and ready for use under tags 0.9.0.Alpha1 and rolling 0.9.

SQL Server Connector

Support for SQL Server had been on the wish list of Debezium users for a long time (the original issue was DBZ-40). Thanks to lots of basic infrastructure created while working on the Oracle connector, we were finally able to come up with a first preview of this new connector in comparatively short time of development.

Just as the Oracle connector, the one for SQL Server is under active development and should be considered an incubating feature at this point. So for instance the structure of emitted change messages may change in upcoming releases. In terms of features, it supports initial snapshotting and capturing changes via SQL Server’s CDC functionality. There’s support for the most common column types, table whitelisting/blacklisting and more. The most significant feature missing is support for structural changes of tables while the connector is running. This is the next feature we’ll work on and it’s planned to be delivered as part of the next 0.9 release (see DBZ-812).

We’d be very happy to learn about any feedback you may have on this newest connector of the Debezium family. If you spot any bugs or have feature requests for it, please create a report in our JIRA tracker.

Oracle Connector

The Debezium connector for Oracle is able to take initial snapshots now. By means of the new connector option snapshot.mode you can control whether read events for all the records of all the captured tables should be emitted.

In addition the support for numeric data types has been honed (DBZ-804); any integer columns (i.e. NUMBER with a scale <\= 0) will be emitted using the corresponding int8/int16/int32/int64 field type, if the columns precision allows for that.

We’ve also spent some time on expanding the Oracle connector documentation, which covers the structure of emitted change events and all the data type mappings in detail now.

Debezium 0.8.1.Final

Together with Debezium 0.9.0.Alpha1 we also did another release of the current stable Debezium version 0.8.

While 0.9 at this point is more interesting to those eager to try out the latest developments in the Oracle and SQL Server connectors, 0.8.1.Final is a recommended upgrade especially to the users of the Postgres connector. This release fixes an issue where it could happen that WAL segments on the server were retained longer than necessary, in case only records of non-whitelisted tables changed for a while. This has been addressed by means of supporting heartbeat messages (as already known from the MySQL connector) also for Postgres (DBZ-800). This lets the connector regularly commit offsets to Kafka Connect which also serves as the hook to acknowledge processed LSNs with the Postgres server.

You can find the list of all changes done in Debezium 0.8.1.Final in the change log.

What’s next?

As discussed above, we’ll work on supporting structural changes to captured tables while the SQL Server connector is running. The same applies to the Oracle connector. This will require some work on our DDL parsers, but thanks to the foundations provided by our recent migration of the MySQL DDL parser to Antlr, this should be manageable.

The other big focus of work with be to provide an alternative implementation for getting changes from Oracle which isn’t based on the XStream API. We’ve done some experiments with LogMiner and are also actively exploring further alternatives. While some details are still unclear, we are optimistic to have something to release in this area soon.

If you’d like to learn more about some middle and long term ideas, please check out our roadmap. Also please get in touch with us if you got any ideas or suggestions for future development.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


back to top