When a Debezium connector is deployed to a Kafka Connect instance it is sometimes necessary to keep database credentials hidden from other users of the Connect API.

Let’s remind how a connector registration request looks like for the MySQL Debezium connector:

{
    "name": "inventory-connector",
    "config": {
        "connector.class": "io.debezium.connector.mysql.MySqlConnector",
        "tasks.max": "1",
        "database.hostname": "mysql",
        "database.port": "3306",
        "database.user": "debezium",
        "database.password": "dbz",
        "database.server.id": "184054",
        "database.server.name": "dbserver1",
        "database.whitelist": "inventory",
        "database.history.kafka.bootstrap.servers": "kafka:9092",
        "database.history.kafka.topic": "schema-changes.inventory"
    }
}

The username and password are passed to the API as plain strings. Worse yet, anybody who has access to the Kafka Connect cluster and its REST API can issue a GET request to obtain a configuration of the connector including the database credentials:

curl -s http://localhost:8083/connectors/inventory-connector | jq .
{
  "name": "inventory-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.user": "debezium",
    "database.server.id": "184054",
    "tasks.max": "1",
    "database.hostname": "mysql",
    "database.password": "dbz",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes.inventory",
    "name": "inventory-connector",
    "database.server.name": "dbserver1",
    "database.whitelist": "inventory",
    "database.port": "3306"
  },
  "tasks": [
    {
      "connector": "inventory-connector",
      "task": 0
    }
  ],
  "type": "source"
}

If one Kafka Connect cluster is shared by multiple connectors/teams, then this behaviour can be undesiable for security reasons.

To solve the problem KIP-297 ("Externalizing Secrets for Connect Configurations") was implemented in Kafka 2.0.

The externalization expects there is at least one implementation class of the org.apache.kafka.common.config.provider.ConfigProvider interface. Kafka Connect provides the reference implementation org.apache.kafka.common.config.provider.FileConfigProvider that reads secrets from a file. Available config providers are configured at Kafka Connect worker level (e.g. in connect-distributed.properties) and are referred to from the connector configuration.

An example of worker configuration would be this:

config.providers=file
config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider

and the connector registration request will refer to it like so:

{
    "name": "inventory-connector",
    "config": {
        "connector.class": "io.debezium.connector.mysql.MySqlConnector",
        "tasks.max": "1",
        "database.hostname": "mysql",
        "database.port": "3306",
        "database.user": "${file:/secrets/mysql.properties:user}",
        "database.password": "${file:/secrets/mysql.properties:password}",
        "database.server.id": "184054",
        "database.server.name": "dbserver1",
        "database.whitelist": "inventory",
        "database.history.kafka.bootstrap.servers": "kafka:9092",
        "database.history.kafka.topic": "schema-changes.inventory"
    }
}

Here, the Placeholder ${file:/secrets/mysql.properties:user} says that the file config provider should be used, reading the property file /secrets/mysql.properties and extracting the user property from it.

The file config provider is probably the simplest possible implementation, and it can be expected that other providers will appear that will integrate with secret repositories or identity management systems. It should be noted though that the file config provider is satisfactory in Kubernetes/OpenShift deployments, as secrets objects could be injected into cluster pods as files and thus consumed by it.

We’ve created a version of the Debezium tutorial example, which demonstrates a deployment of externalized secrets. Please note the two environment variables in the Docker Compose connect service:

- CONNECT_CONFIG_PROVIDERS=file
- CONNECT_CONFIG_PROVIDERS_FILE_CLASS=org.apache.kafka.common.config.provider.FileConfigProvider

These environment variables are directly mapped into Kafka Connect worker properties as a functionality of the debezium/connect image.

When you issue the REST call to get the connector configuration, you will see that the sensitive information is externalized and masked from unauthorized users:

curl -s http://localhost:8083/connectors/inventory-connector | jq .
{
  "name": "inventory-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.user": "${file:/secrets/mysql.properties:user}",
    "database.server.id": "184054",
    "tasks.max": "1",
    "database.hostname": "mysql",
    "database.password": "${file:/secrets/mysql.properties:password}",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes.inventory",
    "name": "inventory-connector",
    "database.server.name": "dbserver1",
    "database.whitelist": "inventory",
    "database.port": "3306"
  },
  "tasks": [
    {
      "connector": "inventory-connector",
      "task": 0
    }
  ],
  "type": "source"
}

Please refer to the README of the tutorial example for complete instructions.

Jiri Pechanec

Jiri is a software developer (and a former quality engineer) at Red Hat. He spent most of his career with Java and system integration projects and tasks. He lives near Brno, Czech Republic.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.