Debezium Blog

Create new topics / pipes

When you are working with Kafka Connect Distributed then you might have realized that once you start Kafka Connect there are already some internal Kafka Connect related topics created for you:

$ kafka-topics.sh --bootstrap-server $HOSTNAME:9092 --list

connect_configs
connect_offsets
connect_statuses

This is done automatically for you by Kafka Connect with a sane, customized default topic configuration that fits the needs of these internal topics.

When you start a Debezium connector the topics for the captured events are created by the Kafka broker based on a default, maybe customized, configuration in the broker if auto.create.topics.enable = true is enabled in the broker config:

auto.create.topics.enable = true
default.replication.factor = 1
num.partitions = 1
compression.type = producer
log.cleanup.policy = delete
log.retention.ms = 604800000  ## 7 days

But often, when you use Debezium and Kafka in a production environment you might choose to disable Kafka’s topic auto creation capability with auto.create.topics.enable = false, or you want the connector topics to be configured differently from the default. In this case you have to create topics for Debezium’s captured data sources upfront.
But there’s good news! Beginning with Kafka Connect version 2.6.0, this can be automated since KIP-158 is implemented to enable customizable topic creation with Kafka Connect.

Setting up change data capture (CDC) pipelines with Debezium typically is a matter of configuration, without any programming being involved. It’s still a very good idea to have automated tests for your CDC set-up, making sure that everything is configured correctly and that your Debezium connectors are set up as intended.

There’s two main components involved whose configuration need consideration:

  • The source database: it must be set up so that Debezium can connect to it and retrieve change events; details depend on the specific database, e.g. for MySQL the binlog must be in "row" mode, for Postgres, one of the supported logical decoding plug-ins must be installed, etc.

  • The Debezium connector: it must be configured using the right database host and credentials, possibly using SSL, applying table and column filters, potentially one or more single message transformations (SMTs), etc.

We have developed a Debezium connector for usage with Db2 which is now available as part of the Debezium incubator. Here we describe the use case we have for Change Data Capture (CDC), the various approaches that already exist in the Db2 ecology, and how we came to Debezium. In addition, we motivate the approach we took to implementing the Db2 Debezium connector.

This article is a dive into the realms of Event Sourcing, Command Query Responsibility Segregation (CQRS), Change Data Capture (CDC), and the Outbox Pattern. Much needed clarity on the value of these solutions will be presented. Additionally, two differing designs will be explained in detail with the pros/cons of each.

So why do all these solutions even matter? They matter because many teams are building microservices and distributing data across multiple data stores. One system of microservices might involve relational databases, object stores, in-memory caches, and even searchable indexes of data. Data can quickly become lost, out of sync, or even corrupted therefore resulting in disastrous consequences for mission critical systems.

Solutions that help avoid these serious problems are of paramount importance for many organizations. Unfortunately, many vital solutions are somewhat difficult to understand; Event Sourcing, CQRS, CDC, and Outbox are no exception. Please look at these solutions as an opportunity to learn and understand how they could apply to your specific use cases.

As you will find out at the end of this article, I will propose that three of these four solutions have high value, while the other should be discouraged except for the rarest of circumstances. The advice given in this article should be evaluated against your specific needs, because, in some cases, none of these four solutions would be a good fit.

Outbox as in that folder in my email client? No, not exactly but there are some similarities!

The term outbox describes a pattern that allows independent components or services to perform read your own write semantics while concurrently providing a reliable, eventually consistent view to those writes across component or service boundaries.

You can read more about the Outbox pattern and how it applies to microservices in our blog post, Reliable Microservices Data Exchange With the Outbox Patttern.

So what exactly is an Outbox Event Router?

In Debezium version 0.9.3.Final, we introduced a ready-to-use Single Message Transform (SMT) that builds on the Outbox pattern to propagate data change events using Debezium and Kafka. Please see the documentation for details on how to use this transformation.

back to top