Debezium Blog

As temperatures are cooling off, the Debezium team is getting into full swing again and we’re happy to announce the release of Debezium 0.8.3.Final!

This is a bugfix release to the current stable release line of Debezium, 0.8.x, while the work on Debezium 0.9 goes on in parallel. There are 14 fixes in this release. As in earlier 0.8.x releases, we’ve further improved the new Antlr-based DDL parser used by the MySQL connector (see DBZ-901, DBZ-903 and DBZ-910).

The Postgres connector saw a huge improvement to its start-up time for databases with lots of custom types (DBZ-899). The user reporting this issue had nearly 200K entries in pg_catalog.pg_type, and due to an N + 1 SELECT issue within the Postgres driver itself, this caused the connector to take 24 minutes to start. By using a custom query for obtaining the type metadata, we were able to cut down this time to 5 seconds! Right now we’re working with the maintainers of the Postgres driver to get this issue fixed upstream, too.

Most of the times Debezium is used to stream data changes into Apache Kafka. What though if you’re using another streaming platform such as Apache Pulsar or a cloud-based solution such as Amazon Kinesis, Azure Event Hubs and the like? Can you still benefit from Debezium’s powerful change data capture (CDC) capabilities and ingest changes from databases such as MySQL, Postgres, SQL Server etc.?

Turns out, with just a bit of glue code, you can! In the following we’ll discuss how to use Debezium to capture changes in a MySQL database and stream the change events into Kinesis, a fully-managed data streaming service available on the Amazon cloud.

The Debezium team is back from summer holidays and we’re happy to announce the release of Debezium 0.8.2!

This is a bugfix release to the current stable release line of Debezium, 0.8.x, while the work on Debezium 0.9 is continuing.

Note: By accident the version of the release artifacts is 0.8.2 instead of 0.8.2.Final. This is not in line with our recently established convention of always letting release versions end with qualifiers such as Alpha1, Beta1, CR1 or Final. The next version in the 0.8 line will be 0.8.3.Final and we’ll improve our release pipeline to make sure that this situation doesn’t occur again.

The 0.8.2 release contains 10 fixes overall, most of them dealing with issues related to DDL parsing as done by the Debezium MySQL connector. For instance, implicit non-nullable primary key columns will be handled correctly now using the new Antlr-based DDL parser (DBZ-860). Also the MongoDB connector saw a bug fix (DBZ-838): initial snapshots will be interrupted now if the connector is requested to stop (e.g. when shutting down Kafka Connect). More a useful improvement rather than a bug fix is the Postgres connector’s capability to add the table, schema and database names to the source block of emitted CDC events (DBZ-866).

Thanks a lot to community members Andrey Pustovetov, Cliff Wheadon and Ori Popowski for their contributions to this release!

Just two weeks after the Debezium 0.8 release, I’m very happy to announce the release of Debezium 0.9.0.Alpha1!

The main feature of the new version is a first work-in-progress version of the long-awaited Debezium connector for MS SQL Server. Based on the CDC functionality available in the Enterprise and Standard editions, the new connector lets you stream data changes out of Microsoft’s popular RDBMS.

Besides that we’ve continued the work on the Debezium Oracle connector. Most notably, it supports initial snapshots of captured tables now. We’ve also upgraded Apache Kafka in our Docker images to 1.1.1 (DBZ-829).

Please take a look at the change log for the complete list of changes in 0.9.0.Alpha1 and general upgrade notes.

Note: At the time of writing (2018-07-26), the release artifacts (connector archives) are available on Maven Central. We’ll upload the Docker images for 0.9.0.Alpha1 to Docker Hub as soon as possible. The Docker images are already uplodaded and ready for use under tags 0.9.0.Alpha1 and rolling 0.9.

Yesterday I had the opportunity to present Debezium and the idea of change data capture (CDC) to the Darmstadt Java User Group. It was a great evening with lots of interesting discussions and questions. One of the questions being the following: what is the advantage of using a log-based change data capturing tool such as Debezium over simply polling for updated records?

So first of all, what’s the difference between the two approaches? With polling-based (or query-based) CDC you repeatedly run queries (e.g. via JDBC) for retrieving any newly inserted or updated rows from the tables to be captured. Log-based CDC in contrast works by reacting to any changes to the database’s log files (e.g. MySQL’s binlog or MongoDB’s op log).

As this wasn’t the first time this question came up, I thought I could provide a more extensive answer also here on the blog. That way I’ll be able to refer to this post in the future, should the question come up again :)

So without further ado, here’s my list of five advantages of log-based CDC over polling-based approaches.