Debezium Server

Table of Contents

Installation
Configuration
Extensions
- Implementation of a new sink
- Customization of an existing sink

This feature is currently in incubating state, i.e. exact semantics, configuration options etc. may change in future revisions, based on the feedback we receive.

Please let us know if you encounter any problems while using this feature. Also please reach out if you have requirements for specific sinks to be supported by Debezium Server or even would be interested in contributing the required implementation.

Debezium provides a ready-to-use application that streams change events from a source database to messaging infrastructure like Amazon Kinesis, Google Cloud Pub/Sub, Apache Pulsar or Redis (Stream). For streaming change events to Apache Kafka, it is recommended to deploy the Debezium connectors via Kafka Connect.

Installation

To install the server download and unpack the server distribution archive:

Debezium Server distribution

A directory named debezium-server will be created with these contents:

debezium-server/
|-- CHANGELOG.md
|-- conf
|-- CONTRIBUTE.md
|-- COPYRIGHT.txt
|-- debezium-server-1.9.8.Final-runner.jar
|-- lib
|-- LICENSE-3rd-PARTIES.txt
|-- LICENSE.txt
|-- README.md
`-- run.sh

The server is started using run.sh script, dependencies are stored in the lib directory, and the directory conf contains configuration files.

Configuration

Debezium Server uses MicroProfile Configuration for configuration. This means that the application can be configured from disparate sources like configuration files, environment variables, system properties etc.

The main configuration file is conf/application.properties. There are multiple sections configured:

debezium.source is for source connector configuration; each instance of Debezium Server runs exactly one connector
debezium.sink is for the sink system configuration
debezium.format is for the output serialization format configuration
debezium.transforms is for the configuration of message transformations

An example configuration file can look like so:

debezium.sink.type=kinesis
debezium.sink.kinesis.region=eu-central-1
debezium.source.connector.class=io.debezium.connector.postgresql.PostgresConnector
debezium.source.offset.storage.file.filename=data/offsets.dat
debezium.source.offset.flush.interval.ms=0
debezium.source.database.hostname=localhost
debezium.source.database.port=5432
debezium.source.database.user=postgres
debezium.source.database.password=postgres
debezium.source.database.dbname=postgres
debezium.source.database.server.name=tutorial
debezium.source.schema.include.list=inventory

When the server is started it generates a seqeunce of log messages like this:

__  ____  __  _____   ___  __ ____  ______
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/
2020-05-15 11:33:12,189 INFO  [io.deb.ser.kin.KinesisChangeConsumer] (main) Using 'io.debezium.server.kinesis.KinesisChangeConsumer$$Lambda$119/0x0000000840130c40@f58853c' stream name mapper
2020-05-15 11:33:12,628 INFO  [io.deb.ser.kin.KinesisChangeConsumer] (main) Using default KinesisClient 'software.amazon.awssdk.services.kinesis.DefaultKinesisClient@d1f74b8'
2020-05-15 11:33:12,628 INFO  [io.deb.ser.DebeziumServer] (main) Consumer 'io.debezium.server.kinesis.KinesisChangeConsumer' instantiated
2020-05-15 11:33:12,754 INFO  [org.apa.kaf.con.jso.JsonConverterConfig] (main) JsonConverterConfig values:
	converter.type = key
	decimal.format = BASE64
	schemas.cache.size = 1000
	schemas.enable = true

2020-05-15 11:33:12,757 INFO  [org.apa.kaf.con.jso.JsonConverterConfig] (main) JsonConverterConfig values:
	converter.type = value
	decimal.format = BASE64
	schemas.cache.size = 1000
	schemas.enable = false

2020-05-15 11:33:12,763 INFO  [io.deb.emb.EmbeddedEngine$EmbeddedConfig] (main) EmbeddedConfig values:
	access.control.allow.methods =
	access.control.allow.origin =
	admin.listeners = null
	bootstrap.servers = [localhost:9092]
	client.dns.lookup = default
	config.providers = []
	connector.client.config.override.policy = None
	header.converter = class org.apache.kafka.connect.storage.SimpleHeaderConverter
	internal.key.converter = class org.apache.kafka.connect.json.JsonConverter
	internal.value.converter = class org.apache.kafka.connect.json.JsonConverter
	key.converter = class org.apache.kafka.connect.json.JsonConverter
	listeners = null
	metric.reporters = []
	metrics.num.samples = 2
	metrics.recording.level = INFO
	metrics.sample.window.ms = 30000
	offset.flush.interval.ms = 0
	offset.flush.timeout.ms = 5000
	offset.storage.file.filename = data/offsets.dat
	offset.storage.partitions = null
	offset.storage.replication.factor = null
	offset.storage.topic =
	plugin.path = null
	rest.advertised.host.name = null
	rest.advertised.listener = null
	rest.advertised.port = null
	rest.extension.classes = []
	rest.host.name = null
	rest.port = 8083
	ssl.client.auth = none
	task.shutdown.graceful.timeout.ms = 5000
	topic.tracking.allow.reset = true
	topic.tracking.enable = true
	value.converter = class org.apache.kafka.connect.json.JsonConverter

2020-05-15 11:33:12,763 INFO  [org.apa.kaf.con.run.WorkerConfig] (main) Worker configuration property 'internal.key.converter' is deprecated and may be removed in an upcoming release. The specified value 'org.apache.kafka.connect.json.JsonConverter' matches the default, so this property can be safely removed from the worker configuration.
2020-05-15 11:33:12,763 INFO  [org.apa.kaf.con.run.WorkerConfig] (main) Worker configuration property 'internal.value.converter' is deprecated and may be removed in an upcoming release. The specified value 'org.apache.kafka.connect.json.JsonConverter' matches the default, so this property can be safely removed from the worker configuration.
2020-05-15 11:33:12,765 INFO  [org.apa.kaf.con.jso.JsonConverterConfig] (main) JsonConverterConfig values:
	converter.type = key
	decimal.format = BASE64
	schemas.cache.size = 1000
	schemas.enable = true

2020-05-15 11:33:12,765 INFO  [org.apa.kaf.con.jso.JsonConverterConfig] (main) JsonConverterConfig values:
	converter.type = value
	decimal.format = BASE64
	schemas.cache.size = 1000
	schemas.enable = true

2020-05-15 11:33:12,767 INFO  [io.deb.ser.DebeziumServer] (main) Engine executor started
2020-05-15 11:33:12,773 INFO  [org.apa.kaf.con.sto.FileOffsetBackingStore] (pool-3-thread-1) Starting FileOffsetBackingStore with file data/offsets.dat
2020-05-15 11:33:12,835 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1) Starting PostgresConnectorTask with configuration:
2020-05-15 11:33:12,837 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    connector.class = io.debezium.connector.postgresql.PostgresConnector
2020-05-15 11:33:12,837 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    offset.flush.interval.ms = 0
2020-05-15 11:33:12,838 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    database.user = postgres
2020-05-15 11:33:12,838 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    database.dbname = postgres
2020-05-15 11:33:12,838 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    offset.storage.file.filename = data/offsets.dat
2020-05-15 11:33:12,838 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    database.hostname = localhost
2020-05-15 11:33:12,838 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    database.password = ********
2020-05-15 11:33:12,839 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    name = kinesis
2020-05-15 11:33:12,839 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    database.server.name = tutorial
2020-05-15 11:33:12,839 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    database.port = 5432
2020-05-15 11:33:12,839 INFO  [io.deb.con.com.BaseSourceTask] (pool-3-thread-1)    schema.include.list = inventory
2020-05-15 11:33:12,908 INFO  [io.quarkus] (main) debezium-server 1.2.0-SNAPSHOT (powered by Quarkus 1.4.1.Final) started in 1.198s. Listening on: http://0.0.0.0:8080
2020-05-15 11:33:12,911 INFO  [io.quarkus] (main) Profile prod activated.
2020-05-15 11:33:12,911 INFO  [io.quarkus] (main) Installed features: [cdi, smallrye-health]

Source configuration

The source configuration uses the same configuration properties that are described on the specific connector documentation pages (just with debezium.source prefix), together with few more specific ones, necessary for running outside of Kafka Connect:

Property Default Description

Property	Default	Description
`debezium.source.connector.class`		The name of the Java class implementing the source connector.
`debezium.source.offset.storage`	`org.apache.kafka.connect.storage.FileOffsetBackingStore`	Class to use for storing and retrieving offsets for non-Kafka deployments. To use Redis to store offsets, use `io.debezium.server.redis.RedisOffsetBackingStore`
`debezium.source.offset.storage.file.filename`		If using a file offset store (default), the file in which connector offsets are stored for non-Kafka deployments.
`debezium.source.offset.flush.interval.ms`		Defines how frequently the offsets are flushed into the file.
`debezium.source.offset.storage.redis.address`		(Optional) If using Redis to store offsets, an address, formatted as `host:port`, at which the Redis target streams are provided. If not supplied, will attempt to read `debezium.sink.redis.address`
`debezium.source.offset.storage.redis.user`		(Optional) If using Redis to store offsets, a user name used to communicate with Redis. If the `redis.address` configuration is not supplied, and the `redis.address` is taken from the Redis sink, will attempt to load the value from `debezium.sink.redis.user`
`debezium.source.offset.storage.redis.password`		(Optional) If using Redis to store offsets, a password (of respective user) used to communicate with Redis. A password must be set if a user is set. If the `redis.address` configuration is not supplied, and the `redis.address` is taken from the Redis sink, will attempt to load the value from `debezium.sink.redis.password`
`debezium.source.offset.storage.redis.ssl.enabled`		(Optional) If using Redis to store offsets, whether or not to use SSL to communicate with Redis. If the `redis.address` configuration is not supplied, and the `redis.address` is taken from the Redis sink, will attempt to load the value from `debezium.sink.redis.ssl.enabled`. Default is 'false'
`debezium.source.database.history`	`io.debezium.relational.history.KafkaDatabaseHistory`	Some of the connectors (e.g MySQL, SQL Server, Db2, Oracle) track the database schema evolution over time and stores this data in a database schema history. This is by default based on Kafka. There are also other options available `io.debezium.relational.history.FileDatabaseHistory` for non-Kafka deployments `io.debezium.relational.history.MemoryDatabaseHistory` volatile store for test environments `io.debezium.server.redis.RedisDatabaseHistory` volatile store for test environments
`debezium.source.database.history.file.filename`		The name and location of the file to which `FileDatabaseHistory` persists its data.
`debezium.source.database.history.redis.address`		The Redis host:port to connect to if using `RedisDatabaseHistory`.
`debezium.source.database.history.redis.user`		The Redis user to use if using `RedisDatabaseHistory`.
`debezium.source.database.history.redis.password`		The Redis password to use if using `RedisDatabaseHistory`.
`debezium.source.database.history.redis.ssl.enabled`		Use SSL connection if using `RedisDatabaseHistory`.
`debezium.source.database.history.redis.key`		The Redis key to use for storage if using `RedisDatabaseHistory`. Default: metadata:debezium:db_history
`debezium.source.database.history.redis.retry.initial.delay.ms`		The initial delay in case of a connection retry to Redis if using `RedisDatabaseHistory`. Default: 300 (ms)
`debezium.source.database.history.redis.retry.max.delay.ms`		The maximum delay in case of a connection retry to Redis if using `RedisDatabaseHistory`. Default: 10000 (ms)

debezium.source.connector.class

The name of the Java class implementing the source connector.

debezium.source.offset.storage

org.apache.kafka.connect.storage.FileOffsetBackingStore

Class to use for storing and retrieving offsets for non-Kafka deployments. To use Redis to store offsets, use io.debezium.server.redis.RedisOffsetBackingStore

debezium.source.offset.storage.file.filename

If using a file offset store (default), the file in which connector offsets are stored for non-Kafka deployments.

debezium.source.offset.flush.interval.ms

Defines how frequently the offsets are flushed into the file.

debezium.source.offset.storage.redis.address

(Optional) If using Redis to store offsets, an address, formatted as host:port, at which the Redis target streams are provided. If not supplied, will attempt to read debezium.sink.redis.address

debezium.source.offset.storage.redis.user

(Optional) If using Redis to store offsets, a user name used to communicate with Redis. If the redis.address configuration is not supplied, and the redis.address is taken from the Redis sink, will attempt to load the value from debezium.sink.redis.user

debezium.source.offset.storage.redis.password

(Optional) If using Redis to store offsets, a password (of respective user) used to communicate with Redis. A password must be set if a user is set. If the redis.address configuration is not supplied, and the redis.address is taken from the Redis sink, will attempt to load the value from debezium.sink.redis.password

debezium.source.offset.storage.redis.ssl.enabled

(Optional) If using Redis to store offsets, whether or not to use SSL to communicate with Redis. If the redis.address configuration is not supplied, and the redis.address is taken from the Redis sink, will attempt to load the value from debezium.sink.redis.ssl.enabled. Default is 'false'

debezium.source.database.history

io.debezium.relational.history.KafkaDatabaseHistory

Some of the connectors (e.g MySQL, SQL Server, Db2, Oracle) track the database schema evolution over time and stores this data in a database schema history. This is by default based on Kafka. There are also other options available

io.debezium.relational.history.FileDatabaseHistory for non-Kafka deployments
io.debezium.relational.history.MemoryDatabaseHistory volatile store for test environments
io.debezium.server.redis.RedisDatabaseHistory volatile store for test environments

debezium.source.database.history.file.filename

The name and location of the file to which FileDatabaseHistory persists its data.

debezium.source.database.history.redis.address

The Redis host:port to connect to if using RedisDatabaseHistory.

debezium.source.database.history.redis.user

The Redis user to use if using RedisDatabaseHistory.

debezium.source.database.history.redis.password

The Redis password to use if using RedisDatabaseHistory.

debezium.source.database.history.redis.ssl.enabled

Use SSL connection if using RedisDatabaseHistory.

debezium.source.database.history.redis.key

The Redis key to use for storage if using RedisDatabaseHistory. Default: metadata:debezium:db_history

debezium.source.database.history.redis.retry.initial.delay.ms

The initial delay in case of a connection retry to Redis if using RedisDatabaseHistory. Default: 300 (ms)

debezium.source.database.history.redis.retry.max.delay.ms

The maximum delay in case of a connection retry to Redis if using RedisDatabaseHistory. Default: 10000 (ms)

Format configuration

The message output format can be configured for both key and value separately. By default the output is in JSON format but an arbitrary implementation of Kafka Connect’s Converter can be used.

Property Default Description

Property	Default	Description
`debezium.format.key`	`json`	The name of the output format for key, one of `json`/`avro`/`protobuf`.
`debezium.format.key.*`		Configuration properties passed to the key converter.
`debezium.format.value`	`json`	The name of the output format for value, one of `json`/`avro`/`protobuf`.
`debezium.format.value.*`		Configuration properties passed to the value converter.

debezium.format.key

json

The name of the output format for key, one of json/avro/protobuf.

debezium.format.key.*

Configuration properties passed to the key converter.

debezium.format.value

json

The name of the output format for value, one of json/avro/protobuf.

debezium.format.value.*

Configuration properties passed to the value converter.

Transformation configuration

Before the messages are delivered to the sink, they can run through a sequence of transformations. The server supports single message transformations defined by Kafka Connect. The configuration will need to contain the list of transformations, implementation class for each transformation and configuration options for each of the transformations.

Property Default Description [id="debezium-transforms"]

Property	Default	Description [id="debezium-transforms"]
`debezium.transforms`		The comma separated list of symbolic names of transformations.
`debezium.transforms.<name>.type`		The name of Java class implementing the transformation with name `<name>`.
`debezium.transforms.<name>.*`		Configuration properties passed to the transformation with name `<name>`.

debezium.transforms

The comma separated list of symbolic names of transformations.

debezium.transforms.<name>.type

The name of Java class implementing the transformation with name <name>.

debezium.transforms.<name>.*

Configuration properties passed to the transformation with name <name>.

Additional configuration

Debezium Server runs on top Quarkus framework. All configuration options exposed by Quarkus are available in Debezium Server too. The most frequent used are:

Property Default Description [id="debezium-quarkus-http-port"]

Property	Default	Description [id="debezium-quarkus-http-port"]
`quarkus.http.port`	8080	The port on which Debezim exposes Microprofile Health endpoint and other exposed status information.
`quarkus.log.level`	INFO	The default log level for every log category.
`quarkus.log.console.json`	true	Determine whether to enable the JSON console formatting extension, which disables "normal" console formatting.

quarkus.http.port

8080

The port on which Debezim exposes Microprofile Health endpoint and other exposed status information.

quarkus.log.level

INFO

The default log level for every log category.

quarkus.log.console.json

true

Determine whether to enable the JSON console formatting extension, which disables "normal" console formatting.

JSON logging can be disabled by setting quarkus.log.console.json=false in the conf/application.properties file, as demonstrated in the conf/application.properties.example file.

Sink configuration

Sink configuration is specific for each sink type.

The sink is selected by configuration property debezium.sink.type.

Amazon Kinesis

Amazon Kinesis is an implementation of data streaming system with support for stream sharding and other techniques for high scalability. Kinesis exposes a set of REST APIs and provides a (not-only) Java SDK that is used to implement the sink.

Property Default Description

Property	Default	Description
`debezium.sink.type`		Must be set to `kinesis`.
`debezium.sink.kinesis.region`		A region name in which the Kinesis target streams are provided.
`debezium.sink.kinesis.endpoint`	endpoint determined by aws sdk	(Optional) An endpoint url at which the Kinesis target streams are provided.
`debezium.sink.kinesis.credentials.profile`	`default`	A credentials profile name used to communicate with Amazon API.
`debezium.sink.kinesis.null.key`	`default`	Kinesis does not support the notion of messages without key. So this string will be used as message key for messages from tables without primary key.

debezium.sink.type

Must be set to kinesis.

debezium.sink.kinesis.region

A region name in which the Kinesis target streams are provided.

debezium.sink.kinesis.endpoint

endpoint determined by aws sdk

(Optional) An endpoint url at which the Kinesis target streams are provided.

debezium.sink.kinesis.credentials.profile

default

A credentials profile name used to communicate with Amazon API.

debezium.sink.kinesis.null.key

default

Kinesis does not support the notion of messages without key. So this string will be used as message key for messages from tables without primary key.

Injection points

The Kinesis sink behaviour can be modified by a custom logic providing alternative implementations for specific functionalities. When the alternative implementations are not available then the default ones are used.

Interface CDI classifier Description

Interface	CDI classifier	Description
`software.amazon.awssdk.services.kinesis.KinesisClient`	`@CustomConsumerBuilder`	Custom configured instance of a `KinesisClient` used to send messages to target streams.
`io.debezium.server.StreamNameMapper`		Custom implementation maps the planned destination (topic) name into a physical Kinesis stream name. By default the same name is used.

software.amazon.awssdk.services.kinesis.KinesisClient

@CustomConsumerBuilder

Custom configured instance of a KinesisClient used to send messages to target streams.

io.debezium.server.StreamNameMapper

Custom implementation maps the planned destination (topic) name into a physical Kinesis stream name. By default the same name is used.

Google Cloud Pub/Sub

Google Cloud Pub/Sub is an implementation of messaging/eventing system designed for scalable batch and stream processing applications. Pub/Sub exposes a set of REST APIs and provides a (not-only) Java SDK that is used to implement the sink.

Property Default Description

Property	Default	Description
`debezium.sink.type`		Must be set to `pubsub`.
`debezium.sink.pubsub.project.id`	system-wide default project id	A project name in which the target topics are created.
`debezium.sink.pubsub.ordering.enabled`	`true`	Pub/Sub can optionally use a message key to guarantee the delivery of the messages in the same order as were sent for messages with the same order key. This feature can be disabled.
`debezium.sink.pubsub.null.key`	`default`	Tables without primary key sends messages with `null` key. This is not supported by Pub/Sub so a surrogate key must be used.
`debezium.sink.pubsub.batch.delay.threshold.ms`	`100`	The maximum amount of time to wait to reach element count or request bytes threshold before publishing outstanding messages to Pub/Sub.
`debezium.sink.pubsub.batch.element.count.threshold`	`100L`	Once this many messages are queued, send all of the messages in a single call, even if the delay threshold hasn’t elapsed yet.
`debezium.sink.pubsub.batch.request.byte.threshold`	`10000000L`	Once the number of bytes in the batched request reaches this threshold, send all of the messages in a single call, even if neither the delay or message count thresholds have been exceeded yet.
`debezium.sink.pubsub.flowControl.enabled`	`false`	When enabled, configures your publisher client with flow control to limit the rate of publish requests.
`debezium.sink.pubsub.flowControl.max.outstanding.messages`	`Long.MAX_VALUE`	(Optional) If flow control enabled, the maxmium number of messages before messages are blocked from being published
`debezium.sink.pubsub.flowControl.max.outstanding.bytes`	`Long.MAX_VALUE`	(Optional) If flow control enabled, the maxmium number of bytes before messages are blocked from being published
`debezium.sink.pubsub.retry.total.timeout.ms`	`60000`	The total timeout for a call to publish (including retries) to Pub/Sub.
`debezium.sink.pubsub.retry.initial.delay.ms`	`5`	The initial amount of time to wait before retrying the request.
`debezium.sink.pubsub.retry.delay.multiplier`	`2.0`	The previous wait time is multiplied by this multiplier to come up with the next wait time, until the max is reached.
`debezium.sink.pubsub.retry.max.delay.ms`	`Long.MAX_VALUE`	The maximum amount of time to wait before retrying. i.e. after this value is reached, the wait time will not increase further by the multiplier.
`debezium.sink.pubsub.retry.initial.rpc.timeout.ms`	`10000`	Controls the timeout for the initial Remote Procedure Call
`debezium.sink.pubsub.retry.rpc.timeout.multiplier`	`2.0`	The previous RPC timeout is multiplied by this multipler to come up with the next RPC timeout value, until the max is reached
`debezium.sink.pubsub.retry.max.rpc.timeout.ms`	`10000`	The max timeout for individual publish requests to Cloud Pub/Sub.

debezium.sink.type

Must be set to pubsub.

debezium.sink.pubsub.project.id

system-wide default project id

A project name in which the target topics are created.

debezium.sink.pubsub.ordering.enabled

true

Pub/Sub can optionally use a message key to guarantee the delivery of the messages in the same order as were sent for messages with the same order key. This feature can be disabled.

debezium.sink.pubsub.null.key

default

Tables without primary key sends messages with null key. This is not supported by Pub/Sub so a surrogate key must be used.

debezium.sink.pubsub.batch.delay.threshold.ms

100

The maximum amount of time to wait to reach element count or request bytes threshold before publishing outstanding messages to Pub/Sub.

debezium.sink.pubsub.batch.element.count.threshold

100L

Once this many messages are queued, send all of the messages in a single call, even if the delay threshold hasn’t elapsed yet.

debezium.sink.pubsub.batch.request.byte.threshold

10000000L

Once the number of bytes in the batched request reaches this threshold, send all of the messages in a single call, even if neither the delay or message count thresholds have been exceeded yet.

debezium.sink.pubsub.flowControl.enabled

false

When enabled, configures your publisher client with flow control to limit the rate of publish requests.

debezium.sink.pubsub.flowControl.max.outstanding.messages

Long.MAX_VALUE

(Optional) If flow control enabled, the maxmium number of messages before messages are blocked from being published

debezium.sink.pubsub.flowControl.max.outstanding.bytes

Long.MAX_VALUE

(Optional) If flow control enabled, the maxmium number of bytes before messages are blocked from being published

debezium.sink.pubsub.retry.total.timeout.ms

60000

The total timeout for a call to publish (including retries) to Pub/Sub.

debezium.sink.pubsub.retry.initial.delay.ms

5

The initial amount of time to wait before retrying the request.

debezium.sink.pubsub.retry.delay.multiplier

2.0

The previous wait time is multiplied by this multiplier to come up with the next wait time, until the max is reached.

debezium.sink.pubsub.retry.max.delay.ms

Long.MAX_VALUE

The maximum amount of time to wait before retrying. i.e. after this value is reached, the wait time will not increase further by the multiplier.

debezium.sink.pubsub.retry.initial.rpc.timeout.ms

10000

Controls the timeout for the initial Remote Procedure Call

debezium.sink.pubsub.retry.rpc.timeout.multiplier

2.0

The previous RPC timeout is multiplied by this multipler to come up with the next RPC timeout value, until the max is reached

debezium.sink.pubsub.retry.max.rpc.timeout.ms

10000

The max timeout for individual publish requests to Cloud Pub/Sub.

Injection points

The Pub/Sub sink behaviour can be modified by a custom logic providing alternative implementations for specific functionalities. When the alternative implementations are not available then the default ones are used.

Interface CDI classifier Description

Interface	CDI classifier	Description
`io.debezium.server.pubsub.PubSubChangeConsumer.PublisherBuilder`	`@CustomConsumerBuilder`	A class that provides custom configured instance of a `Publisher` used to send messages to a dedicated topic.
`io.debezium.server.StreamNameMapper`		Custom implementation maps the planned destination (topic) name into a physical Pub/Sub topic name. By default the same name is used.

io.debezium.server.pubsub.PubSubChangeConsumer.PublisherBuilder

@CustomConsumerBuilder

A class that provides custom configured instance of a Publisher used to send messages to a dedicated topic.

io.debezium.server.StreamNameMapper

Custom implementation maps the planned destination (topic) name into a physical Pub/Sub topic name. By default the same name is used.

HTTP Client

The HTTP Client will stream changes to any HTTP Server for additional processing with the original design goal to have Debezium act as a Knative Event Source.

Property Default Description

Property	Default	Description
`debezium.sink.type`		Must be set to `http`
`debezium.sink.http.url`		The HTTP Server URL to stream events to. This can also be set by defining the `K_SINK` environment variable, which is used by the Knative source framework.
`debezium.sink.http.timeout.ms`	60000	The number of seconds to wait for a response from the server before timing out. (default of 60s)
`debezium.sink.http.retries`	5	The number retries before exception is thrown (default 5 times).
`debezium.sink.http.retry.interval.ms`	1000	The number of milliseconds to wait before another attempt to send record is made after failure (default of 1s).

debezium.sink.type

Must be set to http

debezium.sink.http.url

The HTTP Server URL to stream events to. This can also be set by defining the K_SINK environment variable, which is used by the Knative source framework.

debezium.sink.http.timeout.ms

60000

The number of seconds to wait for a response from the server before timing out. (default of 60s)

debezium.sink.http.retries

The number retries before exception is thrown (default 5 times).

debezium.sink.http.retry.interval.ms

1000

The number of milliseconds to wait before another attempt to send record is made after failure (default of 1s).

Apache Pulsar

Apache Pulsar is high-performance, low-latency server for server-to-server messaging. Pulsar exposes a REST APIs and a native endpoint provides a (not-only) Java client that is used to implement the sink.

Property Default Description

Property	Default	Description
`debezium.sink.type`		Must be set to `pulsar`.
`debezium.sink.pulsar.client.*`		The Pulsar module supports pass-through configuration. The client configuration properties are passed to the client with the prefix removed. At least `serviceUrl` must be provided.
`debezium.sink.pulsar.producer.*`		The Pulsar module supports pass-through configuration. The message producer configuration properties are passed to the producer with the prefix removed. The `topic` is set by Debezium.
`debezium.sink.pulsar.null.key`	`default`	Tables without primary key sends messages with `null` key. This is not supported by Pulsar so a surrogate key must be used.

debezium.sink.type

Must be set to pulsar.

debezium.sink.pulsar.client.*

The Pulsar module supports pass-through configuration. The client configuration properties are passed to the client with the prefix removed. At least serviceUrl must be provided.

debezium.sink.pulsar.producer.*

The Pulsar module supports pass-through configuration. The message producer configuration properties are passed to the producer with the prefix removed. The topic is set by Debezium.

debezium.sink.pulsar.null.key

default

Tables without primary key sends messages with null key. This is not supported by Pulsar so a surrogate key must be used.

Injection points

The Pulsar sink behaviour can be modified by a custom logic providing alternative implementations for specific functionalities. When the alternative implementations are not available then the default ones are used.

Interface CDI classifier Description

Interface	CDI classifier	Description
`io.debezium.server.StreamNameMapper`		Custom implementation maps the planned destination (topic) name into a physical Pulsar topic name. By default the same name is used.

io.debezium.server.StreamNameMapper

Custom implementation maps the planned destination (topic) name into a physical Pulsar topic name. By default the same name is used.

Azure Event Hubs

Azure Event Hubs is a big data streaming platform and event ingestion service that can receive and process millions of events per second. Data sent to an event hub can be transformed and stored by using any real-time analytics provider or batching/storage adapters.

Property Default Description

Property	Default	Description
`debezium.sink.type`		Must be set to `eventhubs`.
`debezium.sink.eventhubs.connectionstring`		Connection string required to communicate with Event Hubs. The format is: `Endpoint=sb://<NAMESPACE>/;SharedAccessKeyName=<ACCESS_KEY_NAME>;SharedAccessKey=<ACCESS_KEY_VALUE>`
`debezium.sink.eventhubs.hubname`		Name of the Event Hub
`debezium.sink.eventhubs.partitionid`		(Optional) The identifier of the Event Hub partition that the events will be sent to. Use this if you want all the change events received by Debezium to be sent to a specific partition in Event Hubs. Do not use if you have specified `debezium.sink.eventhubs.partitionkey`
`debezium.sink.eventhubs.partitionkey`		(Optional) The partition key will be used to hash the events. Use this if you want all the change events received by Debezium to be sent to a specific partition in Event Hubs. Do not use if you have specified `debezium.sink.eventhubs.partitionid`
`debezium.sink.eventhubs.maxbatchsize`		Sets the maximum size for the batch of events, in bytes.

debezium.sink.type

Must be set to eventhubs.

debezium.sink.eventhubs.connectionstring

Connection string required to communicate with Event Hubs. The format is: Endpoint=sb://<NAMESPACE>/;SharedAccessKeyName=<ACCESS_KEY_NAME>;SharedAccessKey=<ACCESS_KEY_VALUE>

debezium.sink.eventhubs.hubname

Name of the Event Hub

debezium.sink.eventhubs.partitionid

(Optional) The identifier of the Event Hub partition that the events will be sent to. Use this if you want all the change events received by Debezium to be sent to a specific partition in Event Hubs. Do not use if you have specified debezium.sink.eventhubs.partitionkey

debezium.sink.eventhubs.partitionkey

(Optional) The partition key will be used to hash the events. Use this if you want all the change events received by Debezium to be sent to a specific partition in Event Hubs. Do not use if you have specified debezium.sink.eventhubs.partitionid

debezium.sink.eventhubs.maxbatchsize

Sets the maximum size for the batch of events, in bytes.

Injection points

The default sink behaviour can be modified by a custom logic providing alternative implementations for specific functionalities. When the alternative implementations are not available then the default ones are used.

Interface CDI classifier Description

Interface	CDI classifier	Description
`com.azure.messaging.eventhubs.EventHubProducerClient`	`@CustomConsumerBuilder`	Custom configured instance of a `EventHubProducerClient` used to send messages.

com.azure.messaging.eventhubs.EventHubProducerClient

@CustomConsumerBuilder

Custom configured instance of a EventHubProducerClient used to send messages.

Redis (Stream)

Redis is an open source (BSD licensed) in-memory data structure store, used as a database, cache and message broker. The Stream is a data type which models a log data structure in a more abstract way. It implements powerful operations to overcome the limitations of a log file.

Property Default Description

Property	Default	Description
`debezium.sink.type`		Must be set to `redis`.
`debezium.sink.redis.address`		An address, formatted as `host:port`, at which the Redis target streams are provided.
`debezium.sink.redis.user`		(Optional) A user name used to communicate with Redis.
`debezium.sink.redis.password`		(Optional) A password (of respective user) used to communicate with Redis. A password must be set if a user is set.
`debezium.sink.redis.ssl.enabled`		(Optional) Use SSL to communicate with Redis. Default 'false'
`debezium.sink.redis.null.key`	`default`	Redis does not support the notion of data without key. So this string will be used as key for records without primary key.
`debezium.sink.redis.null.value`	`default`	Redis does not support the notion of null payloads, as is the case with tombstone events. So this string will be used as value for records without a payload.
`debezium.sink.redis.batch.size`	`500`	Number of change records to insert in a single batch write (Pipelined transaction).
`debezium.sink.redis.retry.initial.delay.ms`	`300`	Initial retry delay when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed `debezium.sink.redis.retry.max.delay.ms`
`debezium.sink.redis.retry.max.delay.ms`	`10000`	Max delay when encountering Redis connection or OOM issues.

debezium.sink.type

Must be set to redis.

debezium.sink.redis.address

An address, formatted as host:port, at which the Redis target streams are provided.

debezium.sink.redis.user

(Optional) A user name used to communicate with Redis.

debezium.sink.redis.password

(Optional) A password (of respective user) used to communicate with Redis. A password must be set if a user is set.

debezium.sink.redis.ssl.enabled

(Optional) Use SSL to communicate with Redis. Default 'false'

debezium.sink.redis.null.key

default

Redis does not support the notion of data without key. So this string will be used as key for records without primary key.

debezium.sink.redis.null.value

default

Redis does not support the notion of null payloads, as is the case with tombstone events. So this string will be used as value for records without a payload.

debezium.sink.redis.batch.size

500

Number of change records to insert in a single batch write (Pipelined transaction).

debezium.sink.redis.retry.initial.delay.ms

300

Initial retry delay when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed debezium.sink.redis.retry.max.delay.ms

debezium.sink.redis.retry.max.delay.ms

10000

Max delay when encountering Redis connection or OOM issues.

Injection points

The Redis sink behavior can be modified by a custom logic providing alternative implementations for specific functionalities. When the alternative implementations are not available then the default ones are used.

Interface CDI classifier Description

Interface	CDI classifier	Description
`io.debezium.server.StreamNameMapper`		Custom implementation maps the planned destination (topic) name into a physical Redis stream name. By default the same name is used.

io.debezium.server.StreamNameMapper

Custom implementation maps the planned destination (topic) name into a physical Redis stream name. By default the same name is used.

NATS Streaming

NATS Streaming is a data streaming system powered by NATS, and written in the Go programming language.

Property Default Description

Property	Default	Description
`debezium.sink.type`		Must be set to `nats-streaming`.
`debezium.sink.nats-streaming.url`		URL (or comma separated list of URLs) to a node or nodes in the cluster formatted as `nats://host:port`.
`debezium.sink.nats-streaming.cluster.id`		NATS Streaming Cluster ID.
`debezium.sink.nats-streaming.client.id`		NATS Streaming Client ID.

debezium.sink.type

Must be set to nats-streaming.

debezium.sink.nats-streaming.url

URL (or comma separated list of URLs) to a node or nodes in the cluster formatted as nats://host:port.

debezium.sink.nats-streaming.cluster.id

NATS Streaming Cluster ID.

debezium.sink.nats-streaming.client.id

NATS Streaming Client ID.

Injection points

The NATS Streaming sink behavior can be modified by a custom logic providing alternative implementations for specific functionalities. When the alternative implementations are not available then the default ones are used.

Interface CDI classifier Description

Interface	CDI classifier	Description
`io.nats.streaming.StreamingConnection`	`@CustomConsumerBuilder`	Custom configured instance of a `StreamingConnection` used to publish messages to target subjects.
`io.debezium.server.StreamNameMapper`		Custom implementation maps the planned destination (topic) name into a physical NATS Streaming subject name. By default the same name is used.

io.nats.streaming.StreamingConnection

@CustomConsumerBuilder

Custom configured instance of a StreamingConnection used to publish messages to target subjects.

io.debezium.server.StreamNameMapper

Custom implementation maps the planned destination (topic) name into a physical NATS Streaming subject name. By default the same name is used.

Apache Kafka

Apache Kafka is a popular open-source platform for distributed event streaming. Debezium server supports publishing captured change events to a configured Kafka message broker.

Property Default Description

Property	Default	Description
`debezium.sink.type`		Must be set to `kafka`.
`debezium.sink.kafka.producer.*`		The Kafka sink adapter supports pass-through configuration. This means that all Kafka producer configuration properties are passed to the producer with the prefix removed. At least `bootstrap.servers`, `key.serializer` and `value.serializer` properties must be provided. The `topic` is set by Debezium.

debezium.sink.type

Must be set to kafka.

debezium.sink.kafka.producer.*

The Kafka sink adapter supports pass-through configuration. This means that all Kafka producer configuration properties are passed to the producer with the prefix removed. At least bootstrap.servers, key.serializer and value.serializer properties must be provided. The topic is set by Debezium.

Pravega

Pravega is a cloud-native storage system for event streams and data streams. This sink offers two modes: non-transactional and transactional. The non-transactional mode individually writes each event in a Debezium batch to Pravega. The transactional mode writes the Debezium batch to a Pravega transaction that commits when the batch is completed.

The Pravega sink expects destination scope and streams to already be created.

Property Default Description

Property	Default	Description
`debezium.sink.type`		Must be set to `pravega`.
`debezium.sink.pravega.controller.uri`	`tcp://localhost:9090`	The connection string to a Controller in the Pravega cluster.
`debezium.sink.pravega.scope`		The name of the scope in which to find the destination streams.
`debezium.sink.pravega.transaction`	`false`	Set to `true` to have the sink use Pravega transactions for each Debezium batch.

debezium.sink.type

Must be set to pravega.

debezium.sink.pravega.controller.uri

tcp://localhost:9090

The connection string to a Controller in the Pravega cluster.

debezium.sink.pravega.scope

The name of the scope in which to find the destination streams.

debezium.sink.pravega.transaction

false

Set to true to have the sink use Pravega transactions for each Debezium batch.

Injection points

Pravega sink behavior can be modified by custom logic providing alternative implementations for specific functionalities. When the alternative implementations are not available then the default ones are used.

Interface CDI classifier Description

Interface	CDI classifier	Description
`io.debezium.server.StreamNameMapper`		Custom implementation maps the planned destination (stream) name into a physical Pravega stream name. By default the same name is used.

io.debezium.server.StreamNameMapper

Custom implementation maps the planned destination (stream) name into a physical Pravega stream name. By default the same name is used.

Extensions

Debezium Server uses the Quarkus framework and relies on dependency injection to enable developer to extend its behaviour. Note that only the JVM mode of Quarkus is supported, but not native execution via GraalVM. The server can be extended in two ways by providing a custom logic:

implementation of a new sink
customization of an existing sink - i.e. non-standard configuration

Implementation of a new sink

The new sink can be implemented as a CDI bean implementing interface DebeziumEngine.ChangeConsumer and with annotation @Named and unique name and scope @Dependent. The name of the bean is used as the debezium.sink.type option.

The sink needs to read the configuration using Microprofile Config API. The execution path must pass the messages into the target system and regularly commit the passed/processed messages.

See the Kinesis sink implementation for further details.

Customization of an existing sink

Some of the sinks exposes dependency injections points that enable users to provide its own bean that would modify the behaviour of the sink. Typical examples are fine tuning of the target client setup, the destination naming etc.

See an example of a custom topic naming policy implementation for further details.