Debezium Blog

This post originally appeared on the WePay Engineering blog.

In the first half of this blog post series, we explained our decision-making process of designing a streaming data pipeline for Cassandra at WePay. In this post, we will break down the pipeline into three sections and discuss each of them in more detail:

  1. Cassandra to Kafka with CDC agent

  2. Kafka with BigQuery with KCBQ

  3. Transformation with BigQuery view

This post originally appeared on the WePay Engineering blog.

Historically, MySQL had been the de-facto database of choice for microservices at WePay. As WePay scales, the sheer volume of data written into some of our microservice databases demanded us to make a scaling decision between sharded MySQL (i.e. Vitess) and switching to a natively sharded NoSQL database. After a series of evaluations, we picked Cassandra, a NoSQL database, primarily because of its high availability, horizontal scalability, and ability to handle high write throughput.

Debezium has received a huge improvement to the structure of its container images recently, making it extremely simple to extend its behaviour.

This is a small tutorial showing how you can for instance add Sentry, "an open-source error tracking [software] that helps developers monitor and fix crashes in real time". Here we’ll use it to collect and report any exceptions from Kafka Connect and its connectors. Note that this is only applicable for Debezium 0.9+.

We need a few things to have Sentry working, and we’ll add all of them and later have a Dockerfile which gets it all glued correctly:

  • Configure Log4j

  • SSL certificate for sentry.io, since it’s not by default in the JVM trusted chain

  • The sentry and sentry-log4j libraries

It’s my pleasure to announce the release of Debezium 0.10.0.Beta2!

This further stabilizes the 0.10 release line, with lots of bug fixes to the different connectors. 23 issues were fixed for this release; a couple of those relate to the DDL parser of the MySQL connector, e.g. around RENAME INDEX (DBZ-1329), SET NEW in triggers (DBZ-1331) and function definitions with the COLLATE keyword (DBZ-1332).

For the Postgres connector we fixed a potential inconsistency when flushing processed LSNs to the database (DBZ-1347). Also the "include.unknown.datatypes" option works as expected now during snapshotting (DBZ-1335) and the connector won’t stumple upon materialized views during snapshotting any longer (DBZ-1345).

The Debezium project strives to provide an easy deployment of connectors, so users can try and run connectors of their choice mostly by getting the right connector archive and unpacking it into the plug-in path of Kafka Connect.

This is true for all connectors but for the Debezium PostgreSQL connector. This connector is specific in the regard that it requires a logical decoding plug-in to be installed inside the PostgreSQL source database(s) themselves. Currently, there are two supported logical plug-ins:

  • postgres-decoderbufs, which uses Protocol Buffers as a very compact transport format and which is maintained by the Debezium community

  • JSON-based, which is based on JSON and which is maintained by its own upstream community

×