Debezium Blog

Updating external full text search indexes (e.g. Elasticsearch) after data changes is a very popular use case for change data capture (CDC).
As we’ve discussed in a blog post a while ago, the combination of Debezium’s CDC source connectors and Confluent’s sink connector for Elasticsearch makes it straight forward to capture data changes in MySQL, Postgres etc. and push them towards Elasticsearch in near real-time. This results in a 1:1 relationship between tables in the source database and a corresponding search index in Elasticsearch, which is perfectly fine for many use cases.
It gets more challenging though if you’d like to put entire aggregates into a single index. An example could be a customer and all their addresses; those would typically be stored in two separate tables in an RDBMS, linked by a foreign key, whereas you’d like to have just one index in Elasticsearch, containing documents of customers with their addresses embedded, allowing you to efficiently search for customers based on their address.
Following up to the KStreams-based solution to this we described recently, we’d like to present in this post an alternative for materializing such aggregate views driven by the application layer.

As temperatures are cooling off, the Debezium team is getting into full swing again and we’re happy to announce the release of Debezium 0.8.3.Final!
This is a bugfix release to the current stable release line of Debezium, 0.8.x, while the work on Debezium 0.9 goes on in parallel. There are 14 fixes in this release. As in earlier 0.8.x releases, we’ve further improved the new Antlr-based DDL parser used by the MySQL connector (see DBZ-901, DBZ-903 and DBZ-910).
The Postgres connector saw a huge improvement to its start-up time for databases with lots of custom types (DBZ-899). The user reporting this issue had nearly 200K entries in pg_catalog.pg_type, and due to an N + 1 SELECT issue within the Postgres driver itself, this caused the connector to take 24 minutes to start. By using a custom query for obtaining the type metadata, we were able to cut down this time to 5 seconds! Right now we’re working with the maintainers of the Postgres driver to get this issue fixed upstream, too.

Most of the times Debezium is used to stream data changes into Apache Kafka. What though if you’re using another streaming platform such as Apache Pulsar or a cloud-based solution such as Amazon Kinesis, Azure Event Hubs and the like? Can you still benefit from Debezium’s powerful change data capture (CDC) capabilities and ingest changes from databases such as MySQL, Postgres, SQL Server etc.?
Turns out, with just a bit of glue code, you can! In the following we’ll discuss how to use Debezium to capture changes in a MySQL database and stream the change events into Kinesis, a fully-managed data streaming service available on the Amazon cloud.

The Debezium team is back from summer holidays and we’re happy to announce the release of Debezium 0.8.2!
This is a bugfix release to the current stable release line of Debezium, 0.8.x, while the work on Debezium 0.9 is continuing.
Note: By accident the version of the release artifacts is 0.8.2 instead of 0.8.2.Final. This is not in line with our recently established convention of always letting release versions end with qualifiers such as Alpha1, Beta1, CR1 or Final. The next version in the 0.8 line will be 0.8.3.Final and we’ll improve our release pipeline to make sure that this situation doesn’t occur again.
The 0.8.2 release contains 10 fixes overall, most of them dealing with issues related to DDL parsing as done by the Debezium MySQL connector. For instance, implicit non-nullable primary key columns will be handled correctly now using the new Antlr-based DDL parser (DBZ-860). Also the MongoDB connector saw a bug fix (DBZ-838): initial snapshots will be interrupted now if the connector is requested to stop (e.g. when shutting down Kafka Connect). More a useful improvement rather than a bug fix is the Postgres connector’s capability to add the table, schema and database names to the source
block of emitted CDC events (DBZ-866).
Thanks a lot to community members Andrey Pustovetov, Cliff Wheadon and Ori Popowski for their contributions to this release!

Just two weeks after the Debezium 0.8 release, I’m very happy to announce the release of Debezium 0.9.0.Alpha1!
The main feature of the new version is a first work-in-progress version of the long-awaited Debezium connector for MS SQL Server. Based on the CDC functionality available in the Enterprise and Standard editions, the new connector lets you stream data changes out of Microsoft’s popular RDBMS.
Besides that we’ve continued the work on the Debezium Oracle connector. Most notably, it supports initial snapshots of captured tables now. We’ve also upgraded Apache Kafka in our Docker images to 1.1.1 (DBZ-829).
Please take a look at the change log for the complete list of changes in 0.9.0.Alpha1 and general upgrade notes.
Note: At the time of writing (2018-07-26), the release artifacts (connector archives) are available on Maven Central. We’ll upload the Docker images for 0.9.0.Alpha1 to Docker Hub as soon as possible. The Docker images are already uplodaded and ready for use under tags 0.9.0.Alpha1
and rolling 0.9
.