Subscribe


Debezium 0.10.0.Beta4 Released

The temperatures are slowly cooling off after the biggest summer heat, an the Debezium community is happy to announce the release of Debezium 0.10.0.Beta4. In this release we’re happy to share some news we don’t get to share too often: with Apache Cassandra, another database gets added to the list of databases supported by Debezium!

In addition, we finished our efforts for rebasing the existing Postgres connector to Debezium framework structure established for the SQL Server and Oracle connectors. This means more shared coded between these connectors, and in turn reduced maintenance efforts for the development team going forward; but there’s one immediately tangible advantage for you coming with this, too: the Postgres connector now exposes the same metrics you already know from the other connectors.

Finally, the new release contains a range of bugfixes and other useful improvements. Let’s explore some details below.

Incubating Cassandra Connector

If you have been following this blog lately, you’ll have read about the latest addition to the Debezium family in Joy Gao’s excellent posts about the new connector (part 1, part 2).

In case you haven’t read those yet, we’d highly recommend to do so in order to learn more about the challenges encountered when implementing a CDC connector for a distributed datastore such as Cassandra as well as the design decisions made in order to come up with a first "minimal viable product". Joy also did a great talk at QCon last year, which touches on the topic of CDC for Cassandra.

Having been originally developed internally at long-term Debezium user WePay, the WePay team decided to open-source their work, put it under the Debezium umbrella and continue to evolve it there. That’s really great news for the Debezium community! We couldn’t be happier about this contribution and look forward to evolving this new connector together in the open.

At this point the Cassandra connector is in "incubating" state, i.e. its design and implementation are still pretty much in flux, the event structure which it creates may change in future releases etc. Note that, unlike the other Debezium connectors, this one currently is not based on Kafka Connect. Instead, it is implemented as a standalone process running on Cassandra node(s) themselves. Refer to the blog posts linked above for the reasoning behind this design and possible future developments around this. Needless to say, any ideas and contributions in this area will be highly welcomed.

Together with the connector we’ve also provided an initial draft of the connector documentation; this is still work-in-progress and will be amended in the next few days.

Further New Features

The Postgres connector supports the metrics known from SQL Server and Oracle now (DBZ-777). When using the SQL Server connector, it is now ensured that tables are snapshotted in a deterministic order, as defined by the given table whitelist configuration (DBZ-1254).

There have also been two improvements to our SMTs (single message transformations):

  • The SMT for new record state extraction allows to add additional columns for propagating metadata fields from the source block (DBZ-1395, e.g. useful to propagate the transaction into sink tables).

  • The default structure produced by the outbox routing SMT has been further streamlined (DBZ-1385); the message value will now only contain the contents of the configured outbox table payload column. In case you want to re-add the eventType value, you can configure it as an "additional field", which either goes into the message as a header (recommended) or into the message value, which as before will be a nested structure then.

Bugfixes and Other Improvements

Finally, here’s an overview of asorted bugfixes in the 0.10 Beta4 release:

  • The MySQL connector handles GRANT DELETE ON <table> statements correctly (DBZ-1411)

  • Superfluous tables scans are avoided when using the initial_schema_only snapshot strategy with SQL Server (DBZ-1417)

  • The superfluous creation of connections is avoided when obtaining the xmin position of Postgres (DBZ-1381)

  • The new record state extraction SMT handles heartbeat events correctly (DBZ-1430)

Please refer to the 0.10.0.Beta4 release notes for the complete list of addressed issues and the upgrading procedure.

A big thank you goes out to all the contributors from the Debezium community who worked on this release: Joy Gao, Renato Mefi and Guillaume Rosauro!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.10.0.Beta3 Released

The summer is at its peak but Debezium community is not relenting in its effort so the Debezium 0.10.0.Beta3 is released.

This version not only continues in incremental improvements of Debezium but also brings new shiny features.

All of you who are using PostgreSQL 10 and higher as a service offered by different cloud providers definitely felt the complications when you needed to deploy logical decoding plugin necessary to enable streaming. This is no longer necessary. Debezium now supports (DBZ-766) pgoutput replication protocol that is available out-of-the-box since PostgreSQL 10.

There is a set of further minor improvements. The tombstones for deletes are configurable for all connectors now (DBZ-1365). Also tables without primary keys are now supported for all connectors (DBZ-916). This further reduces the gap between old and new connectors capabilities.

There are improvements for heartbeat system. Heartbeat messages now contain the timestamp (DBZ-1363) of when they were created in their body. The new messages are properly skipped by the Outbox router (DBZ-1388). MySQL connector additionally uses heartbeats for BinlogReader (DBZ-1338). MongoDB connector now utilizes heartbeats too (DBZ-1198).

As we now that metrics are very important for keeping Debezium happy in production we have extended the set of supported metrics. A new metric count of events in error (DBZ-1222) is added so it is easy to monitor any non-standards in processing. Database history recovery can take a long time during startup so it is now possible to monitor the progress of it (DBZ-1356).

The other changes include updating of Docker images to use Kafka 2.3.0 (DBZ-1358). PostgreSQL supports lockless snapshotting (DBZ-1238) and Outbox router now process delete messages (DBZ-1320).

We continue with stabilization of the 0.10 release line, with lots of bug fixes to the different connectors.

Multiple defects in MySQL parser have been fixed (DBZ-1398, (DBZ-1397, DBZ-1376) and SAVEPOINT statements are no longer recorded in database history (DBZ-794).

Under certain circumstances, it was possible that PostgreSQL connector lost the first event while switching to streaming from the snapshot (DBZ-1400).

Please refer to the 0.10.0.Beta3 release notes to learn more about all resolved issues and the upgrading procedure.

Many thanks to everybody from the Debezium community who contributed to this release: Addison Higham, Bin Li, Brandon Brown and Renato Mefi.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Tutorial for Adding Sentry into Debezium Container Images

Debezium has received a huge improvement to the structure of its container images recently, making it extremely simple to extend its behaviour.

This is a small tutorial showing how you can for instance add Sentry, "an open-source error tracking [software] that helps developers monitor and fix crashes in real time". Here we’ll use it to collect and report any exceptions from Kafka Connect and its connectors. Note that this is only applicable for Debezium 0.9+.

We need a few things to have Sentry working, and we’ll add all of them and later have a Dockerfile which gets it all glued correctly:

  • Configure Log4j

  • SSL certificate for sentry.io, since it’s not by default in the JVM trusted chain

  • The sentry and sentry-log4j libraries

Log4j Configuration

Let’s create a file config/log4j.properties in our local project which is a copy of the one shipped with Debezium images and add Sentry to it. Note we added Sentry to log4j.rootLogger and created the section log4j.appender.Sentry, the rest remains as the original configuration:

kafka.logs.dir=logs

log4j.rootLogger=INFO, stdout, appender, Sentry

# Disable excessive reflection warnings - KAFKA-5229
log4j.logger.org.reflections=ERROR

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.threshold=INFO
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p  %X{dbz.connectorType}|%X{dbz.connectorName}|%X{dbz.connectorContext}  %m   [%c]%n

log4j.appender.appender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.appender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.appender.File=${kafka.logs.dir}/connect-service.log
log4j.appender.appender.layout=org.apache.log4j.PatternLayout
log4j.appender.appender.layout.ConversionPattern=%d{ISO8601} %-5p  %X{dbz.connectorType}|%X{dbz.connectorName}|%X{dbz.connectorContext}  %m   [%c]%n

log4j.appender.Sentry=io.sentry.log4j.SentryAppender
log4j.appender.Sentry.threshold=WARN

Sentry.io SSL certificate

Download the getsentry.pem file from sentry.io and put it in your project’s directory under ssl/.

The Dockerfile

Now we can glue everything together in our Debezium image:

  • Let’s first create a JKS file with our Sentry certificate; this uses a Docker multi-stage building process, where we are generating a certificates.jks which we’ll later copy into our Kafka Connect with Debezium stage

  • Copy log4j.properties into $KAFKA_HOME/config/log4j.properties

  • Copy the JKS file from the multi-stage build

  • Set ENV with the Sentry version and m5sums

  • Download Sentry dependencies, the script you see called docker-maven-download is a helper which we ship by default in our images. In this case we’re using it to download a JAR file from Maven Central and put it in the Kafka libs directory. We do that by setting the ENV var MAVEN_DEP_DESTINATION=$KAFKA_HOME/libs:

FROM fabric8/java-centos-openjdk8-jdk:1.6 as ssl-jks

ARG JKS_STOREPASS="any random password, you can also set it outside via the arguments from docker build"

USER root:root

COPY /ssl /ssl

RUN chown -R jboss:jboss /ssl

USER jboss:jboss

WORKDIR /ssl

RUN keytool -import -noprompt -alias getsentry \
    -storepass "${JKS_STOREPASS}" \
    -keystore certificates.jks \
    -trustcacerts -file "/ssl/getsentry.pem"

FROM debezium/connect:0.10 AS kafka-connect

EXPOSE 8083

COPY config/log4j.properties "$KAFKA_HOME/config/log4j.properties"

COPY --from=ssl-jks --chown=kafka:kafka /ssl/certificates.jks /ssl/

ENV SENTRY_VERSION=1.7.23 \
    MAVEN_DEP_DESTINATION=$KAFKA_HOME/libs

RUN docker-maven-download \
        central io/sentry sentry "$SENTRY_VERSION" 4bf1d6538c9c0ebc22526e2094b9bbde && \
    docker-maven-download \
        central io/sentry sentry-log4j "$SENTRY_VERSION" 74af872827bd7e1470fd966449637a77

Build and Run

Now we can simply build the image:

$ docker build -t debezium/connect-sentry:1 --build-arg=JKS_STOREPASS="123456789" .

When running the image we have now to configure our Kafka Connect application to load the JKS file by setting KAFKA_OPTS: -Djavax.net.ssl.trustStore=/ssl/certificates.jks -Djavax.net.ssl.trustStorePassword=<YOUR TRUSTSTORE PASSWORD>.

Sentry can be configured in many ways, I like to do it via environment variables, the minimum we can set is the Sentry DSN (which is necessary to point to your project) and the actual running environment name (i.e.: production, staging).

In this case we can configure the variables: SENTRY_DSN=<GET THE DNS IN SENTRY’S DASHBOARD>, SENTRY_ENVIRONMENT=dev.

In case you’d like to learn more about using the Debezium container images, please check our tutorial.

And that’s it, a basic a recipe for extending our Docker setup using Sentry as an example; other modifications should also be as simple as this one. As an example how a RecordTooLarge exception from the Kafka producer would look like in this setup, see the picture below:

Sentry Exception example

Conclusion

Thanks to the recent refactor of the Debezium container images, it got very easy to amend them with your custom extensions. Downloading external dependencies and adding them to the images became a trivial task and we’d love to hear your feedback about it!

If you are curious about the refactoring itself, you can find the details in pull request debezium/docker-images#131.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.10.0.Beta2 Released

It’s my pleasure to announce the release of Debezium 0.10.0.Beta2!

This further stabilizes the 0.10 release line, with lots of bug fixes to the different connectors. 23 issues were fixed for this release; a couple of those relate to the DDL parser of the MySQL connector, e.g. around RENAME INDEX (DBZ-1329), SET NEW in triggers (DBZ-1331) and function definitions with the COLLATE keyword (DBZ-1332).

For the Postgres connector we fixed a potential inconsistency when flushing processed LSNs to the database (DBZ-1347). Also the "include.unknown.datatypes" option works as expected now during snapshotting (DBZ-1335) and the connector won’t stumple upon materialized views during snapshotting any longer (DBZ-1345).

The SQL Server connector will use much less memory in many situations (DBZ-1065) and it’s configurable now whether it should emit tombstone events for deletions or not (DBZ-835). This also was added for the Oracle connector, bringing consistency for this option across all the connectors.

Note that this release can be used with Apache Kafka 2.x, but not with 1.x. This was an unintentional change and compatibility with 1.x will be restored for the Beta3 release (the issue to track is DBZ-1361).

Please refer to the 0.10.0.Beta2 release notes to learn more about all resolved issues and the upgrading procedure.

Many thanks to everybody from the Debezium community who contributed to this release: Cheng Pan, Guillaume Rosauro, Mariusz Strzelecki and Stathis Souris.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.10.0.Beta1 Released

Another week, another Debezium release — I’m happy to announce the release of Debezium 0.10.0.Beta1!

Besides the upgrade to Apache Kafka 2.2.1 (DBZ-1316), this mostly fixes some bugs, including a regression to the MongoDB connector introduced in the Alpha2 release (DBZ-1317).

A very welcomed usability improvement is that the connectors will log a warning now if not at least one table is actually captured as per the whitelist/blacklist configuration (DBZ-1242). This helps to prevent the accidental exclusion all tables by means of an incorrect filter expression, in which case the connectors "work as intended", but no events are propagated to the message broker.

Please see the release notes for the complete list of issues fixed in this release. Also make sure to examine the upgrade guidelines for 0.10.0.Alpha1 and Alpha2 when upgrading from earlier versions.

Many thanks to community members Cheng Pan and Ching Tsai for their contributions to this release!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


back to top