Debezium Blog

Yesterday I had the opportunity to present Debezium and the idea of change data capture (CDC) to the Darmstadt Java User Group. It was a great evening with lots of interesting discussions and questions. One of the questions being the following: what is the advantage of using a log-based change data capturing tool such as Debezium over simply polling for updated records?

So first of all, what’s the difference between the two approaches? With polling-based (or query-based) CDC you repeatedly run queries (e.g. via JDBC) for retrieving any newly inserted or updated rows from the tables to be captured. Log-based CDC in contrast works by reacting to any changes to the database’s log files (e.g. MySQL’s binlog or MongoDB’s op log).

As this wasn’t the first time this question came up, I thought I could provide a more extensive answer also here on the blog. That way I’ll be able to refer to this post in the future, should the question come up again :)

So without further ado, here’s my list of five advantages of log-based CDC over polling-based approaches.

I’m very happy to announce the release of Debezium 0.8.0.Final!

The key features of Debezium 0.8 are the first work-in-progress version of our Oracle connector (based on the XStream API) and a brand-new parser for MySQL DDL statements. Besides that, there are plenty of smaller new features (e.g. propagation of default values to corresponding Connect schemas, optional propagation of source queries in CDC messages and a largely improved SMT for sinking changes from MongoDB into RDBMS) as well as lots of bug fixes (e.g. around temporal and numeric column types, large transactions with Postgres).

Please see the previous announcements (Beta 1, CR 1) to learn about all the changes in more depth. The Final release largely resembles CR1; apart from further improvements to the Oracle connector (DBZ-792) there’s one nice addition to the MySQL connector contributed by Peter Goransson: when doing a snapshot, it will now expose information about the processed rows via JMX (DBZ-789), which is very handy when snapshotting larger tables.

Please take a look at the change log for the complete list of changes in 0.8.0.Final and general upgrade notes.

A fantastic Independence Day to all the Debezium users in the U.S.! But that’s not the only reason to celebrate: it’s also with great happiness that I’m announcing the release of Debezium 0.8.0.CR1!

Following our new release scheme, the focus for this candidate release of Debezium 0.8 has been to fix bug reported for last week’s Beta release, accompanied by a small number of newly implemented features.

Thanks a lot to everyone testing the new Antlr-based DDL parser for the MySQL connector; based on the issues you reported, we were able to fix a few bugs in it. As announced recently, for 0.8 the legacy parser will remain the default implementation, but you are strongly encouraged to test out the new one (by setting the connector option ddl.parser.mode to antlr) and report any findings you may have. We’ve planned to switch to the new implementation by default in Debezium 0.9.

It’s with great excitement that I’m announcing the release of Debezium 0.8.0.Beta1!

This release brings many exciting new features as well as bug fixes, e.g. the first drop of our new Oracle connector, a brand new DDL parser for the MySQL connector, support for MySQL default values and the update to Apache Kafka 1.1.

Due to the big number of changes (the release contains exactly 42 issues overall), we decided to alter our versioning schema a little bit: going forward we may do one or more Beta and CR ("candidate release") releases before doing a final one. This will allow us to get feedback from the community early on, while still completing and polishing specific features. Final (stable) releases will be named like 0.8.0.Final etc.

Last updated at Nov 21st 2018 (adjusted to new KSQL Docker images).

Last year we have seen the inception of a new open-source project in the Apache Kafka universe, KSQL, which is a streaming SQL engine build on top of Kafka Streams. In this post, we are going to try out KSQL querying with data change events generated by Debezium from a MySQL database.

As a source of data we will use the database and setup from our tutorial. The result of this exercise should be similar to the recent post about aggregation of events into domain driven aggregates.