Debezium Blog

It’s my pleasure to announce the release of Debezium 0.9.0.Beta1! Oh, and to those of you who are celebrating it — Happy Thanksgiving!

This new Debezium release comes with several great improvements to our work-in-progress SQL Server connector:

  • Initial snapshots can be done using the snapshot isolation level if enabled in the DB (DBZ-941)

  • Changes to the structures of captured tables after the connector has been set up are supported now (DBZ-812)

  • New connector option decimal.handling.mode (DBZ-953) and pass-through of any database.* option to the JDBC driver (DBZ-964)

It’s my pleasure to announce the release of Debezium 0.9.0.Alpha2!

While the work on the connectors for SQL Server and Oracle continues, we decided to do another Alpha release, as lots of fixes and new features - many of them contributed by community members - have piled up, which we wanted to get into your hands as quickly as possible.

This release supports Apache Kafka 2.0, comes with support for Postgres' HSTORE column type, allows to rename and filter fields from change data messages for MongoDB and contains multiple bug fixes and performance improvements. Overall, this release contains 55 fixes (note that a few of these have been merged back to 0.8.x and are contained in earlier 0.8 releases, too).

A big "Thank You" is in order to community members Andrey Pustovetov, Artiship Artiship, Cliff Wheadon, Deepak Barr, Ian Axelrod, Liu Hanlin, Maciej Bryński, Ori Popowski, Peng Lyu, Philip Sanetra, Sagar Rao and Syed Muhammad Sufyian for their contributions to this release. We salute you!

Updating external full text search indexes (e.g. Elasticsearch) after data changes is a very popular use case for change data capture (CDC).

As we’ve discussed in a blog post a while ago, the combination of Debezium’s CDC source connectors and Confluent’s sink connector for Elasticsearch makes it straight forward to capture data changes in MySQL, Postgres etc. and push them towards Elasticsearch in near real-time. This results in a 1:1 relationship between tables in the source database and a corresponding search index in Elasticsearch, which is perfectly fine for many use cases.

It gets more challenging though if you’d like to put entire aggregates into a single index. An example could be a customer and all their addresses; those would typically be stored in two separate tables in an RDBMS, linked by a foreign key, whereas you’d like to have just one index in Elasticsearch, containing documents of customers with their addresses embedded, allowing you to efficiently search for customers based on their address.

Following up to the KStreams-based solution to this we described recently, we’d like to present in this post an alternative for materializing such aggregate views driven by the application layer.

As temperatures are cooling off, the Debezium team is getting into full swing again and we’re happy to announce the release of Debezium 0.8.3.Final!

This is a bugfix release to the current stable release line of Debezium, 0.8.x, while the work on Debezium 0.9 goes on in parallel. There are 14 fixes in this release. As in earlier 0.8.x releases, we’ve further improved the new Antlr-based DDL parser used by the MySQL connector (see DBZ-901, DBZ-903 and DBZ-910).

The Postgres connector saw a huge improvement to its start-up time for databases with lots of custom types (DBZ-899). The user reporting this issue had nearly 200K entries in pg_catalog.pg_type, and due to an N + 1 SELECT issue within the Postgres driver itself, this caused the connector to take 24 minutes to start. By using a custom query for obtaining the type metadata, we were able to cut down this time to 5 seconds! Right now we’re working with the maintainers of the Postgres driver to get this issue fixed upstream, too.

Most of the times Debezium is used to stream data changes into Apache Kafka. What though if you’re using another streaming platform such as Apache Pulsar or a cloud-based solution such as Amazon Kinesis, Azure Event Hubs and the like? Can you still benefit from Debezium’s powerful change data capture (CDC) capabilities and ingest changes from databases such as MySQL, Postgres, SQL Server etc.?

Turns out, with just a bit of glue code, you can! In the following we’ll discuss how to use Debezium to capture changes in a MySQL database and stream the change events into Kinesis, a fully-managed data streaming service available on the Amazon cloud.