Installing Debezium

There are several ways to install and use Debezium connectors, so we’ve documented a few of the most common ways to do this.

Installing a Debezium connector

If you’ve already installed Zookeeper, Kafka, and Kafka Connect, then using one of Debezium’s connectors is easy. Simply download one or more connector plugin archives (see below), extract its files into your Kafka Connect environment, and add the directory with the JARs to Kafka Connect’s classpath. Restart your Kafka Connect process to pick up the new JARs.

The connector plugins are available from Maven:

If immutable containers are your thing, then check out Debezium’s Docker images for Zookeeper, Kafka, and Kafka Connect with the MySQL and MongoDB connectors already pre-installed and ready to go. Our tutorial even walks you through using these images, and this is a great way to learn what Debezium is all about. You can even run Debezium on Kubernetes and OpenShift.

Not that Java 8 or later is required to run the Debezium connectors.

Consuming snapshot releases

Debezium executes nightly builds and deployments into the Central snapshot repository. If you want to try latest and fresh or verify a bugfix you are interested in then use plugins from Maven Central. The installation procedure is the same as for regular releases.

Using a Debezium connector

To use a connector to produce change events for a particular source server/cluster, simply create a configuration file for the MySQL Connector or MongoDB Connector, and use the Kafka Connect REST API to add that connector configuration to your Kafka Connect cluster. When the connector starts, it will connect to the source and produce events for each inserted, updated, and deleted row or document. See the Debezium documentation for the MySQL Connector and MongoDB Connector.

Configuring Debezium Topics

Debezium uses (either via Kafka Connect or directly) multiple topics for storing data. The topics have to be either created by an administrator or by Kafka itself by enabling auto-creation for topics. There are certain limitations and recommendations which apply to topics:

  • Database history topic (for MySQL connector)

    • Infinite (or very long) retention (no compaction!)

    • Replication factor at least 3 for production

    • Single partition

  • Other topics

    • Compaction enabled

    • Replicated in production

    • Single partition

      • You can relax the single partition rule but your application must handle out-of-order events for different rows in database (events for a single row are still totally ordered). If multiple partitions are used, Kafka will determine the partition by hashing the key by default. Other partition strategies require using SMTs to set the partition number for each record.

Using the Debezium libraries

Although Debezium is really intended to be used as turnkey services, all of Debezium’s JARs and other artifacts are available in Maven Central. For example, you might want to use our MySQL DDL parser from our MySQL connector library to parse those DDL statments in your consumers of the MySQL schema change topics.

We do provide a small library so applications can embed any Kafka Connect connector and consume data change events read directly from the source system. This provides a much lighter weight system (since Zookeeper, Kafka, and Kafka Connect services are not needed), but as a consequence is not as fault tolerant or reliable since the application must manage and maintain all state normally kept inside Kafka’s distributed and replicated logs. It’s perfect for use in tests, and with careful consideration it may be useful in some applications.

back to top