Subscribe


Debezium 0.6.2 Is Released

We are accelerating! Three weeks after the 0.6.1 release, the Debezium team is bringing Debezium 0.6.2 to you!

This release revolves mostly around bug fixes, but there are a few new features, too. Let’s take a closer look at some of the changes.

PostgreSQL Connector

The big news for the Postgres connector is that Debezium now runs against PostgreSQL 10 thanks to a contribution from Scofield Xu. As a part of this change we are providing a Docker Image with PostgreSQL 10, too, and we have set up a daily run of our integration tests against it.

If you are building Postgres yourself using the Debezium logical decoding plug-in, you can save quite some megabytes if you don’t need the PostGIS geometric extension: thanks to the work by Danila Kiver, it’s now possible to omit that extension.

MySQL Connector

We’ve received multiple reports related to parsing MySQL DDL statements, e.g. there were a few specific invocations of the ALTER TABLE statement which weren’t handled correctly. Those as well as a few other parser bugs have been fixed.

If you work with the TIMESTAMP column type and your Kafka Connect server isn’t using UTC as timezone, then the fix for DBZ-260 is applying to you. In that case, the ISO 8601 formatted String emitted by Debezium would have, incorrectly, contained the UTC date and time plus the zone offset (as per the time zone the Kafka Connect server is located in) before. Whereas now it will contain the date and time adjusted to the zone offset. This may require adjustments to to downstream consumers if they were relying on the previous, incorrect behavior.

DBZ-217 gives you more flexibility for handling corrupt events encountered in the MySQL binlog. By default, the connector will stop at the problematic event in such case. But you now also have the option to just log the event and its position and continue the processing after it.

Another nice improvement for the MySQL connector is a much reduced CPU load after the snapshot has been completed, when using the "snapshot only" mode (DBZ-396).

MongoDB Connector

This connector received an important fix applying when more than one thread is used to performing the initial snapshot (DBZ-438). Before, it could happen that single messages got lost during snapshotting which is fixed now.

Examples and Docker Images

We have expanded our examples repository with an Avro example, which may be interesting to you if you’d like to not work with JSON messages but rather the compact Avro binary format and the Confluent schema registry.

As a part of our release process we are now creating micro tags for our Docker images for every released version. While tags in the format x.y.z are fixed in time, tags in the format x.y are rolling updates and always point to the latest micro release of that image.

Please see the full change log for more details and the complete list of fixed issues.

What’s next?

The Debezium 0.7 release is planned to be out in two to three weeks from now.

It will contain the move to Apache Kafka 1.0.0 and bring support for the wal2json logical decoding plug-in for Postgres. This will eventually allow to use the Debezium Postgres connector on Amazon RDS (once the correct wal2json version is available there).

In parallel, the work around handling updates to the whitelist configuration of the MySQL connector continues (it may be ready for 0.7.0), and so does the work on the Oracle connector (which will be shipping in a future release).

If you’d like to contribute, please let us know. We’re happy about any help and will work with you to get you started quickly. Check out the details below on how to get in touch.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium at Devoxx Belgium

Debezium’s project lead Gunnar Morling gave a few talks during recent Devoxx Belgium 2017. One of his talks was dedicated to Debezium and change data capture in general.

If you are interested in those topics and you want to obtain a fast and simple introduction to it, do not hesitate and watch the talk. Batteries and demo included!

The slide deck is available, too:

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.6.1 Is Released

Just shy of a month after the 0.6.0 release, I’m happy to announce the release of Debezium 0.6.1!

This release contains several bugfixes, dependency upgrades and a new option for controlling how BIGINT UNSIGNED columns are conveyed. We also expanded the set of Docker images and Docker Compose files accompanying our tutorial, so you can run it now with all the databases we support.

Let’s take a closer look at some of the changes.

New connector option for controlling BIGINT UNSIGNED representation

BIGINT UNSIGNED columns from MySQL databases have been represented using Kafka Connect’s Decimal type until now. This type allows to represent all possible values of such columns, but its based on a byte array, so it can be a bit cumbersome to handle for consumers. Therefore we added a new option named bigint.unsigned.handling.mode to the MySQL connector that allows to represent such columns using long.

For the very most cases that’s the preferable option, only if your column contains values larger than 2^63 (which MySQL doesn’t recommend due to potential value losses when performing calculations), you should stick to the Decimal representation.

Using long will be the default as of Debezium 0.7, for the 0.6.x timeline we decided to go with the previous behavior (i.e. using Decimal) for the sake of backwards compatibility.

Thanks a lot to Ben Williams who contributed this feature!

New example Docker images and Docker Compose files

In the Debezium examples repository we now provide Docker Compose files which let you run the tutorial with all the three databases we currently support, MySQL, Postgres and MongoDB.

Just choose the Compose file for your preferred database and get a all the required components (ZooKeeper, Apache Kafka, Kafka Connect and the database) running within a few seconds.

We’ve also deployed Docker images for Postgres and MongoDB to the Debezium organization on Docker Hub, so you got some data to play with.

Version upgrades

We’ve upgraded our images from Kafka 0.11.0.0 to 0.11.0.1. Also the binlog client library used by the MySQL connector was upgraded from 0.9.0 to 0.13.0.

Bugfixes

Finally, several bugs were fixed in 0.6.1. E.g. you can now name a column column in MySQL (DBZ-408), generated DROP TEMP TABLE statements won’t flood the DB history topic (DBZ-295) and we’ve fixed a case where the Postgres connector would stop working due to an internal error but fail to report though via the task/connector status (DBZ-380).

Please see the full change log for more details and the complete list of fixed issues.

What’s next?

The work on Debezium 0.7 has already begun and we’ve merged the first set of changes. You can expect to see support for using the wal2json logical decoding plug-in with the Postgres connector, which will finally allow it to use Debezium with Postgres on Amazon RDS! We’ve also started our explorations of providing a connector for Oracle (DBZ-20) and hope to report some progress here soon.

While the work on Debezium 0.7 continues, you will likely continue to see one or more 0.6.x bugfix releases. We’ve automated the release process as much as possible, making it a breeze to ship a new release and getting fixes into your hands quickly.

If you’d like to contribute, please let us know. We’re happy about any help and will work with you to get you started quickly. Check out the details below on how to get in touch.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Streaming data to a downstream database

In this blog post we will create a simple streaming data pipeline to continuously capture the changes in a MySQL database and replicate them in near real-time into a PostgreSQL database. We’ll show how to do this without writing any code, but instead by using and configuring Kafka Connect, the Debezium MySQL source connector, the Confluent JDBC sink connector, and a few single message transforms (SMTs).

This approach of replicating data through Kafka is really useful on its own, but it becomes even more advantageous when we can combine our near real-time streams of data changes with other streams, connectors, and stream processing applications. A recent Confluent blog post series shows a similar streaming data pipeline but using different connectors and SMTs. What’s great about Kafka Connect is that you can mix and match connectors to move data between multiple systems.

We will also demonstrate a new functionality that was released with Debezium 0.6.0: a single message transform for CDC Event Flattening.

Topology

The general topology for this scenario is displayed on the following picture:

Scenario topology
Figure 1: A General topology

 

To simplify the setup a little bit, we will use only one Kafka Connect instance that will contain all connectors. I.e. this instance will serve as an event producer and an event consumer:

 

Scenario topology
Figure 2: A Simplified topology

Configuration

We will use this compose for a fast deployment of the demo. The deployment consists of following Docker images:

The Debezium MySQL Connector was designed to specifically capture database changes and provide as much information as possible about those events beyond just the new state of each row. Meanwhile, the Confluent JDBC Sink Connector was designed to simply convert each message into a database insert/upsert based upon the structure of the message. So, the two connectors have different structures for the messages, but they also use different topic naming conventions and behavior of representing deleted records.

These mismatches in structure and behavior will be common when using connectors that were not designed to work together. But this is something that we can easily deal with, and we discuss how in the next few sections.

Event format

Debezium emits events in a complex format that contains all of the information about the captured data change: the type of operation, source metadata, the timestamp the event was processed by the connector, and state of the row before and after the change was made. Debezium calls this structure an "envelope":

{
	"op": "u",
	"source": {
		...
	},
	"ts_ms" : "...",
	"before" : {
		"field1" : "oldvalue1",
		"field2" : "oldvalue2"
	},
	"after" : {
		"field1" : "newvalue1",
		"field2" : "newvalue2"
	}
}

Many other Kafka Connect source connectors don’t have the luxury of knowing this much about the changes, and instead use a simpler model where each message directly represents the after state of the row. This is also what many sink connectors expect, and the Confluent JDBC Sink Connector is not different:

{
	"field1" : "newvalue1",
	"field2" : "newvalue2"
}

While we think it’s actually a great thing that Debezium CDC connectors provide as much detail as possible, we also make it easy for you to transform Debezium’s "envelope" format into the "row" format that is expected by many other connectors. Debezium provides a bridge between those two formats in a form of a single message transform. The UnwrapFromEnvelope transformation automatically extracts a new row record and thus effectively flattens the complex record into a simple one consumable by other connectors.

You can use this SMT on the source connector to transform the message before it is written to Kafka, or you can instead store the source connector’s richer "envelope" form of the message in Kafka and use this SMT on the sink connector to transform the message after it is read from Kafka and before it is passed to the sink connector. Both options work, and it just depends on whether you find the envelope form of the message useful for other purposes.

In our example we apply the SMT at the sink connector using these configuration properties:

"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",

Delete records

When the Debezium connector detects a row is deleted, it creates two event messages: a delete event and a tombstone message. The delete message has an envelope with the state of the deleted row in the before field, and an after field that is null. The tombstone message contains same key as the delete message, but the entire message value is null, and Kafka’s log compaction utilizes this to know that it can remove any earlier messages with the same key. A number of sink connectors, including the Confluent’s JDBC Sink Connector, are not expecting these messages and will instead fail if they see either kind of message. The UnwrapFromEnvelope SMT will by default filter out both delete and tombstone records, though you can change this if you’re using the SMT and want to keep one or both of these kinds of messages.

Topic naming

Last but not least there is a difference in naming of topics. Debezium uses fully qualified naming for target topics representing each table it manages. The naming follows the pattern <logical-name>.<database-name>.<table-name>. Kafka Connect JDBC Connector works with simple names <table-name>.

In more complex scenarios the user may deploy the Kafka Streams framework to establish elaborated routing between source and target routes. In our example we will use a stock RegexRouter SMT that would route records created by Debezium into topics named according to JDBC Connector schema. Again, we could use this SMT in either the source or sink connectors, but for this example we’re going to use it in the source connector so we can choose the names of the Kafka topics where the records will be written.

"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3"

Example

Kick the tires and let’s try our example!

First of all we need to deploy all components.

export DEBEZIUM_VERSION=0.6
docker-compose up

When all components are started we are going to register the JDBC Sink connector writing into PostgreSQL database:

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @jdbc-sink.json

Using this registration request:

{
    "name": "jdbc-sink",
    "config": {
        "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
        "tasks.max": "1",
        "topics": "customers",
        "connection.url": "jdbc:postgresql://postgres:5432/inventory?user=postgresuser&password=postgrespw",
        "transforms": "unwrap",                                                  (1)
        "transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",   (1)
        "auto.create": "true",                                                   (2)
        "insert.mode": "upsert",                                                 (3)
        "pk.fields": "id",                                                       (4)
        "pk.mode": "record_value"                                                (4)
    }
}

The request configures these options:

  1. unwrapping Debezium’s complex format into a simple one

  2. automatically create target tables

  3. insert a row if it does not exist or update an existing one

  4. identify the primary key stored in Kafka’s record value field

Then the source connector must be set up:

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @source.json

Using this registration request:

{
    "name": "inventory-connector",
    "config": {
        "connector.class": "io.debezium.connector.mysql.MySqlConnector",
        "tasks.max": "1",
        "database.hostname": "mysql",
        "database.port": "3306",
        "database.user": "debezium",
        "database.password": "dbz",
        "database.server.id": "184054",
        "database.server.name": "dbserver1",                                         (1)
        "database.whitelist": "inventory",                                           (2)
        "database.history.kafka.bootstrap.servers": "kafka:9092",
        "database.history.kafka.topic": "schema-changes.inventory",
        "transforms": "route",                                                       (3)
        "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",  (3)
        "transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",                     (3)
        "transforms.route.replacement": "$3"                                         (3)
    }
}

The request configures these options:

  1. logical name of the database

  2. the database we want to monitor

  3. an SMT which defines a regular expression matching the topic name <logical-name>.<database-name>.<table-name> and extracts the third part of it as the final topic name

Let’s check if the databases are synchronized. All the rows of the customers table should be found in the source database (MySQL) as well as the target database (Postgres):

docker-compose exec mysql bash -c 'mysql -u $MYSQL_USER  -p$MYSQL_PASSWORD inventory -e "select * from customers"'
+------+------------+-----------+-----------------------+
| id   | first_name | last_name | email                 |
+------+------------+-----------+-----------------------+
| 1001 | Sally      | Thomas    | sally.thomas@acme.com |
| 1002 | George     | Bailey    | gbailey@foobar.com    |
| 1003 | Edward     | Walker    | ed@walker.com         |
| 1004 | Anne       | Kretchmar | annek@noanswer.org    |
+------+------------+-----------+-----------------------+

docker-compose exec postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
 last_name |  id  | first_name |         email
-----------+------+------------+-----------------------
 Thomas    | 1001 | Sally      | sally.thomas@acme.com
 Bailey    | 1002 | George     | gbailey@foobar.com
 Walker    | 1003 | Edward     | ed@walker.com
 Kretchmar | 1004 | Anne       | annek@noanswer.org

With the connectors still running, we can add a new row to the MySQL database and then check that it was replicated into the PostgreSQL database:

docker-compose exec mysql bash -c 'mysql -u $MYSQL_USER  -p$MYSQL_PASSWORD inventory'
mysql> insert into customers values(default, 'John', 'Doe', 'john.doe@example.com');
Query OK, 1 row affected (0.02 sec)

docker-compose exec -postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
 last_name |  id  | first_name |         email
-----------+------+------------+-----------------------
...
Doe        | 1005 | John       | john.doe@example.com
(5 rows)

Summary

We set up a simple streaming data pipeline to replicate data in near real-time from a MySQL database to a PostgreSQL database. We accomplished this using Kafka Connect, the Debezium MySQL source connector, the Confluent JDBC sink connector, and a few SMTs — all without having to write any code. And since it is a streaming system, it will continue to capture all changes made to the MySQL database and replicating them in near real time.

What’s next?

In a future blog post we will reproduce the same scenario with Elasticsearch as a target for events.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.6 Is Out

What’s better than getting Java 9? Getting Java 9 and a new version of Debezium at the same time! So it’s with great happiness that I’m announcing the release of Debezium 0.6 today.

What’s in it?

Debezium is now built against and tested with Apache Kafka 0.11.0. Also the Debezium Docker images have been updated do that version (DBZ-305). You should make sure to read the Kafka update guide when upgrading from an earlier version.

To improve integration with existing Kafka sink connectors such as the JDBC sink connector or the Elasticsearch connector, Debezium provides a new single message transformation (DBZ-226). This SMT converts Debezium’s CDC event structure into a more conventional structure commonly used in other sink and non-CDC source connectors where the message represents the state of the inserted or updated row, or null in the case of a deleted row. This lets your for instance capture the changes from a table in MySQL and update a corresponding table in a Postgres database accordingly. We’ll provide a complete example showing the usage of that new SMT in the next few days.

If you are doing the Debezium tutorial, you will like the new Docker Compose set-up provided in the examples repo (DBZ-127). This lets you start all the required Docker containers with a single command.

New connector features

Now let’s take a look at some of the changes around the specific Debezium connectors. The MySQL connector has seen multiple improvements, e.g.:

  • Snapshot consistency wasn’t guaranteed before in some corner cases (DBZ-210); that’s fixed now

  • DEC and FIXED types supported in the DDL parser (DBZ-359; thanks to Liu Hanlin!)

  • UNION clause supported for ALTER TABLE (DBZ-346)

For the MongoDB connector, the way of serializing ids into the key payload of CDC events has changed (DBZ-306). The new format allows to read back ids into the correct type. We also took the opportunity and made the id field name consistent with the other connectors, i.e. it’s "id" now. Note: that change may break existing consumers, so some work on your end may be required, depending on the implementation of your consumer. The details are discussed in the release notes and the format of message keys is described in depth in the connector documentation. Kudos to Hans-Peter Grahsl who contributed on this feature!

Another nice improvement for this connector is support for SSL connections (DBZ-343).

Finally, the Postgres connector learned some new tricks, too:

  • Support for variable-width numeric columns (DBZ-318)

  • Views won’t stop the connector any more (DBZ-319)

  • Warnings and notifications emitted by the server are correctly forwarded to the log (DBZ-279)

Please refer to the changelog for an overview of all the 20 issues fixed in Debezium 0.6.0.

What’s next?

High on our agenda is exploring support for Oracle (DBZ-20). We are also looking into using another logical decoding plug-in (wal2json) for the Postgres connector, which would enable to use Debezium with Postgres instances running on Amazon RDS. Another feature being worked on by community member Moira Tagle is support for updates to the table.whitelist for existing connector instances. Finally, we’ve planned to test and adapt the existing MySQL connector for providing CDC functionality to MariaDB.

Debezium 0.7 with one or more out of those features as well as hopefully some others will be released later this year. We’ll likely also do further 0.6.x releases with bug fixes as required.

You’d like to contribute? That’s great - let us know and we’ll get you started. Check out the details below on how to get in touch.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


back to top