Subscribe


Debezium 0.9.5.Final Released

It’s my pleasure to announce the release of Debezium 0.9.5.Final!

This is a recommended update for all users of earlier versions; besides bug fixes also a few new features are provide. The release contains 18 resolved issues overall.

Apache Kafka Update and New Features

This release has been built against and tested with Apache Kafka 2.2.0 (DBZ-1227). Earlier versions are continued to be supported as well.

For all the connectors it is possible now to specify the batch size when taking snapshots (DBZ-1247). The new connector option snapshot.fetch.size has been introduced for that. This option replaces the earlier option rows.fetch.size which existed in some of the connectors and which will be removed in Debezium 0.10. Existing connector instances should therefore be re-configured to use the new option.

Continuing the work from Debezium 0.9.4, the Postgres connector supports some more column types: MACADDR and MACADDR8 (DBZ-1193) as well as INT4RANGE, INT8RANGE and NUMRANGE (DBZ-1076).

Fixes

Amongst others, this release includes the following fixes:

  • Failing to specify value for database.server.name results in invalid Kafka topic name (DBZ-212)

  • Postgres Connector times out in schema discovery for DBs with many tables (DBZ-1214)

  • Oracle connector: JDBC transaction can only capture single DML record (DBZ-1223)

  • Lost precision for timestamp with timezone (DBZ-1236)

  • NullpointerException due to optional value for commitTime (DBZ-1241)

  • Default value for datetime(0) is incorrectly handled (DBZ-1243)

  • Microsecond precision is lost when reading timetz data from Postgres (DBZ-1260)

Please refer to the release notes for the complete list of issues fixed in Debezium 0.9.5.

We’re very thankful to the following community members who contributed to this release: Addison Higham, Andrey Pustovetov, Jork Zijlstra, Krizhan Mariampillai, Mathieu Rozieres and Shubham Rawat.

Outlook

This release is planned to be the last in the 0.9 line.

We’re now going to focus on Debezium 0.10, whose main topic will be to clean up a few things: we’d like to remove a few deprecated options and features (e.g. the legacy DDL parser in the MySQL connector). We’re also planning to do a thorough review of the event structure of the different connectors; for instance in the source block of CDC messages there are a some field names that should be unified. We believe users will benefit from a more consistent experience across the connectors.

Another focus area will be to migrate the existing Postgres connector to the framework classes established for the SQL Server and Oracle connectors. This will allow to expose some new features for the Postgres connector, e.g. the monitoring capabilities already rolled out for the other two connectors.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium’s Team Grows

Hello everyone, my name is Chris Cranford and I recently joined the Debezium team.

My journey at Red Hat began just over three years ago; however I have been in this line of work for nearly twenty years. All throughout my career, I have advocated and supported open source software. Many of my initial software endeavors were based on open source software, several which are still heavily used today such as Hibernate ORM.

When I first joined Red Hat, I had the pleasure to work on the Hibernate ORM team. I had been an end user of the project since 2.0, so it was an excellent fit to be able to contribute full time to a project that had served me well in the corporate world n-times over.

It wasn’t long ago when @gunnarmorling and I had a brief exchange about Debezium. I had not heard of the project and I was super stoked because I immediately saw parallel in its goals and Hibernate Envers, a change data capture solution that is based on Hibernate’s event framework that I was currently maintaining.

I believe one of my first "wow" moments was when I realized how well Debezium fits into the micro-services world. The idea of being able to share data between micro-services in a very decoupled way is a massive win for building reusable components and minimizes technical debt.

Debezium just felt like the next logical step. There are so many new and exciting things to come and the team and myself cannot wait to share them.

So lets get started!

--Chris


Debezium 0.9.4.Final Released

It’s my pleasure to announce the release of Debezium 0.9.4.Final!

This is a drop-in replacement for earlier Debezium 0.9.x versions, containing mostly bug fixes and some improvements related to metrics. Overall, 17 issues were resolved.

MySQL Connector Improvements

The Debezium connector for MySQL comes with two new metrics:

  • Whether GTID is enabled for offset tracking or not (DBZ-1221)

  • Number of filtered events (DBZ-1206)

It also supports database connections using TLS 1.2 (DBZ-1208) now.

New Postgres Datatypes

The Postgres connector now allows to capture changes to columns of the CIDR and INET types (DBZ-1189).

Bug Fixes

The fixed bugs include the following:

  • Closing connection after snapshotting (DBZ-1218)

  • Can parse ALTER statement affecting enum column with character set options (DBZ-1203)

  • Avoiding timeout after bootstrapping a new table (DBZ-1207)

Check out the release notes for the complete list of issues fixed in Debezium 0.9.4.

Many thanks to Debezium community members Andrey Pustovetov, Jordan Bragg, Joy Gao, Preethi Sadagopan, Renato Mefi, Sasha Kovryga, Shubham Rawat and Stephen Powis for their contributions to this release!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.9.3.Final Released

The Debezium team is happy to announce the release of Debezium 0.9.3.Final!

This is mostly a bug-fix release and a drop-in replacement for earlier Debezium 0.9.x versions, but there are few significant new features too. Overall, 17 issues were resolved.

Container images will be released with a small delay due to some Docker Hub configuration issues.

New Features

The 0.9.3 release comes with two larger new features:

  • A feature request was made to execute a partial recovery of the replication process after losing the replication slot with the PostgreSQL database, e.g. after failing over to a secondary database host (DBZ-1082). Instead of adding yet another snapshotting mode, we took a step back and decided to make the Postgres snapshotting process more customizable by introducing a service provider interface (SPI). This lets you implement and register your own Java class for controlling the snaphotting process. See the issue description of DBZ-1082 for one possible custom implementation of this SPI, which is based on Postgres' catalog_xmin property and selects all records altered after the last known xmin position. To learn more about the SPI, see the the Snapshotter contract. Note that the feature is still in incubating phase and the SPI should be considered unstable for the time being.

  • Not long ago we published blogpost about implementing the outbox pattern with Debezium for propagating data changes between microservices. Community member Renato Mefi expanded the idea and created a ready-made implementation of the single message transform (SMT) described in the post for routing events from the outbox table to specific topics. This SMT is part of the Debezium core library now (DBZ-1169). Its usage will be described in the documentation soon; for the time being please refer to the EventRouter type and the accompanying configuration class.

Bug fixes

We did a couple of fixes related to the Debezium Postgres connector:

  • A regression that introduced a deadlock in snapshotting process has been fixed (DBZ-1161)

  • The hstore datatype works correctly in snapshot phase (DBZ-1162)

  • The wal2json plug-in processes also empty events (DBZ-1181) as e.g. originating from materialize view updates; this should help to resolve some of the issues where log files in Postgres couldn’t be discarded due to Debezium’s replication slot not advancing.

  • The commit time is propely converted to microseconds (DBZ-1174)

Also the Debezium MySQL connector saw a number of fixes especially in SQL parser:

  • The SERIAL datatype and default value is now supported (DBZ-1185)

  • A specific detail in the MySQL grammar that allows to enumerate table options in ALTER TABLE without comma works (DBZ-1186)

  • A false alarm for empty MySQL password is no longer reported (DBZ-1188)

  • It is no longer necessary to create history topic manually for broker without default topic replication value (DBZ-1179)

It is now possible to process multiple schemas with a single Oracle connector (DBZ-1166).

Check out the release notes for the complete list of issues fixed in Debezium 0.9.3.

Many thanks to Debezium community members Renato Mefi, Shubham Rawat, Addison Higham, Jon Casstevens, Ashar Hassan and Josh Stanfield for their contributions to this release!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium meets Quarkus

Last week’s announcement of Quarkus sparked a great amount of interest in the Java community: crafted from the best of breed Java libraries and standards, it allows to build Kubernetes-native applications based on GraalVM & OpenJDK HotSpot. In this blog post we are going to demonstrate how a Quarkus-based microservice can consume Debezium’s data change events via Apache Kafka. For that purpose, we’ll see what it takes to convert the shipment microservice from our recent post about the outbox pattern into Quarkus-based service.

Quarkus is a Java stack designed for the development of cloud-native applications based on the Java platform. It combines and tightly integrates mature libraries such Hibernate ORM, Vert.x, Netty, RESTEasy and Apache Camel as well as the APIs from the Eclipse MicroProfile initiative, such as Config or Reactive Messaging. Using Quarkus, you can develop applications using both imperative and reactive styles, also combining both approaches as needed.

It is designed for significantly reduced memory consumption and improved startup time. Last but not least, Quarkus supports both OpenJDK HotSpot and GraalVM virtual machines. With GraalVM it is possible to compile the application into a native binary and thus reduce the resource consumption and startup time even more.

To learn more about Quarkus itself, we recommend to take a look at its excellent Getting Started guide.

Consuming Kafka Messages with Quarkus

In the original example application demonstrating the outbox pattern, there was a microservice ("shipment") based on Thorntail that consumed the events produced by the Debezium connector. We’ve extended the example with a new service named "shipment-service-quarkus". It provides the same functionality as the "shipment-service" but is implemented as a microservice based on Quarkus instead of Thorntail.

This makes the overall architecture look like so:

Outbox Pattern Overview

To retrofit the original service into a Quarkus-based application, only a few changes were needed:

  • Quarkus right now supports only MariaDB but not MySQL; hence we have included an instance of MariaDB to which the service is writing

  • The JSON-P API used do deserialize incoming JSON messages can currently not be used without RESTEasy (see issue #1480, which should be fixed soon); so the code has been modified to use the Jackson API instead

  • Instead of the Kafka consumer API, the Reactive Messaging API defined by MicroProfile is used to receive messages from Apache Kafka; as an implementation of that API, the one provided by the SmallRye project is used, which is bundled as a Quarkus extension

While the first two steps are mere technicalities, the Reactive Messaging API is a nice simplification over the polling loop in the original consumer. All that’s needed to consume messages from a Kafka topic is to annotate a method with @Incoming, and it will automatically be invoked when a new message arrives:

@ApplicationScoped
public class KafkaEventConsumer {

    @Incoming("orders")
    public CompletionStage<Void> onMessage(KafkaMessage<String, String> message)
            throws IOException {
        // handle message...

        return message.ack();
    }
}

The "orders" message source is configured via the MicroProfile Config API, which resolves it to the "OrderEvents" topic already known from the original outbox example.

Build Process

The build process is mostly the same as it was before. Instead of using the Thorntail Maven plug-in, the Quarkus Maven plug-in is used now.

The following Quarkus extensions are used:

  • io.quarkus:quarkus-hibernate-orm: support for Hibernate ORM and JPA

  • io.quarkus:quarkus-jdbc-mariadb: support for accessing MariaDB through JDBC

  • io.quarkus:quarkus-smallrye-reactive-messaging-kafka: support for accessing Kafka through the MicroProfile Reactive Messaging API

They pull in some other extensions too, e.g. quarkus-arc (the Quarkus CDI runtime) and quarkus-vertx (used by the reactive messaging support).

In addition, two more changes were needed:

  • A new build profile named native has been added; this is used to compile the service into a native binary image using the Quarkus Maven plug-in

  • the native-image.docker-build system property is enabled when running the build; this means that the native image build is done inside of a Docker container, so that GraalVM doesn’t have to be installed on the developer’s machine

All the heavy-lifting is done by the Quarkus Maven plug-in which is configured in pom.xml like so:

  <build>
    <finalName>shipment</finalName>
    <plugins>
      ...
      <plugin>
        <groupId>io.quarkus</groupId>
        <artifactId>quarkus-maven-plugin</artifactId>
        <version>${version.quarkus}</version>
        <executions>
          <execution>
            <goals>
              <goal>build</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  ...
    <profile>
      <id>native</id>
      <build>
        <plugins>
          <plugin>
            <groupId>io.quarkus</groupId>
            <artifactId>quarkus-maven-plugin</artifactId>
            <version>${version.quarkus}</version>
            <executions>
              <execution>
                <goals>
                  <goal>native-image</goal>
                </goals>
                <configuration>
                  <enableHttpUrlHandler>true</enableHttpUrlHandler>
                  <autoServiceLoaderRegistration>false</autoServiceLoaderRegistration>
                </configuration>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    </profile>

Configuration

As any Quarkus application, the shipment service is configured via the application.properties file:

quarkus.datasource.url: jdbc:mariadb://shipment-db-quarkus:3306/shipmentdb
quarkus.datasource.driver: org.mariadb.jdbc.Driver
quarkus.datasource.username: mariadbuser
quarkus.datasource.password: mariadbpw
quarkus.hibernate-orm.database.generation=drop-and-create
quarkus.hibernate-orm.log.sql=true

smallrye.messaging.source.orders.type=io.smallrye.reactive.messaging.kafka.Kafka
smallrye.messaging.source.orders.topic=OrderEvents
smallrye.messaging.source.orders.bootstrap.servers=kafka:9092
smallrye.messaging.source.orders.key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
smallrye.messaging.source.orders.value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
smallrye.messaging.source.orders.group.id=shipment-service-quarkus

In our case it contains

  • the definition of a datasource (based on MariaDB) to which the shipment service writes its data,

  • the definition of a messaging source, which is backed by the "OrderEvents" Kafka topic, using the given bootstrap server, deserializers and Kafka consumer group id.

Execution

The Docker Compose config file has been enriched with two services, MariaDB and the new Quarkus-based shipment service. So when docker-compose up is executed, two shipment services are started side-by-side: the original Thorntail-based one and the new one using Quarkus. When the order services receives a new purchase order and exports a corresponding event to Apache Kafka via the outbox table, that message is processed by both shipment services, as they are using distinct consumer group ids.

Performance Numbers

The numbers are definitely not scientific, but provide a good indication of the order-of-magnitude difference between the native Quarkus-based application and the Thorntail service running on the JVM:

Quarkus service Thorntail service

application package size [MB]

54

131

memory [MB]

33.8

1257

start time [ms]

260

5746

The memory data were obtained via htop utility. The startup time was measured till the message about application readiness was printed. As with all performance measurements, you should run your own comparisons based on your set-up and workload to gain insight into the actual differences for your specific use cases.

Summary

In this post we have successfully demonstrated that it is possible to consume Debezium-generated events in a Java application written with the Quarkus Java stack. We have also shown that it is possible to provide such application as a binary image and provided back-of-the-envelope performance numbers demonstrating significant savings in resources.

If you’d like to see the awesomeness of deploying Java microservices as native images by yourself, you can find the complete source code of the implementation in the Debezium examples repo. If you got any questions or feedback, please let us know in the comments below; looking forward to hearing from you!

Many thanks to Guillaume Smet for reviewing an earlier version of this post!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


back to top