Subscribe


Debezium 0.10.0.Beta1 Released

Another week, another Debezium release — I’m happy to announce the release of Debezium 0.10.0.Beta1!

Besides the upgrade to Apache Kafka 2.2.1 (DBZ-1316), this mostly fixes some bugs, including a regression to the MongoDB connector introduced in the Alpha2 release (DBZ-1317).

A very welcomed usability improvement is that the connectors will log a warning now if not at least one table is actually captured as per the whitelist/blacklist configuration (DBZ-1242). This helps to prevent the accidental exclusion all tables by means of an incorrect filter expression, in which case the connectors "work as intended", but no events are propagated to the message broker.

Please see the release notes for the complete list of issues fixed in this release. Also make sure to examine the upgrade guidelines for 0.10.0.Alpha1 and Alpha2 when upgrading from earlier versions.

Many thanks to community members Cheng Pan and Ching Tsai for their contributions to this release!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium’s Newsletter 01/2019

Welcome to the first edition of the Debezium community newsletter in which we share blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.

Articles

Gunnar Morling recently attended Kafka Summit in London where he gave a talk on Change Data Streaming Patterns for Microservices With Debezium. You can watch the full presentation here.

Strimzi provides an easy way to run Apache Kafka on Kubernetes or Openshift. This article by Sincy Sebastian shows just how simple it is to replicate change events from MySQL to Elastic Search using Debezium.

Debezium allows replicating data between heterogeneous data stores with ease. This article by Matthew Groves explains how you can replicate data from MySQL to CouchBase.

As the size of data that systems maintain continues to grow, this begins to impact how we capture, compute, and report real-time analytics. This article by Maria Patterson explains how you can use Debezium to stream data from Postgres, perform analytical calculations using KSQL, and then stream those results back to Postgres for consumption.

In a recent article published in Portuguese, Paulo Singaretti illustrates how they use Debezium and Kafka to stream changes from their relational database and then store the change stream results in Google Cloud Services.

This recent blog by Jia Zhai provides a complete tutorial showing how to use Debezium connectors with Apache Pulsar.

Time to upgrade

Debezium version 0.9.5 was just released. If you are using the 0.9 branch you should definitely check out 0.9.5. For details on the bug fixes as well as the enhancements this version includes, check out the release notes.

The Debezium team has also begun active development on the next major version, 0.10. We recently published a blog that provides an overview behind what 0.10 is meant to deliver. If you want details on the bug fixes and enhancements we’ve packed into this release, you can view the issue list.

Feedback

We intend to publish new additions of this newsletter periodically. Should anyone have any suggestions on changes or what could be highlighted here, we welcome that feedback. You can reach out to us via any of our community channels found here.


Debezium 0.10.0.Alpha2 Released

Release early, release often — Less than a week since the Alpha1 we are announcing the release of Debezium 0.10.0.Alpha2!

This is an incremental release that completes some of the tasks started in the Alpha1 release and provides a few bugfixes and also quality improvements in our Docker images.

The change in the logic of the snapshot field has been delivered (DBZ-1295) as outlined in the last announcement. All connectors now provide information which of the records is the last one in the snapshot phase so that downstream consumers can react to this.

Apache ZooKeeper was upgraded to version 3.4.14 to fix a security vulnerability (CVE-2019-0201).

Our regular contributor Renato dived deeply into our image build scripts and enriched (DBZ-1279) them with a Dockerfile linter.

Schema change events include the table name(s) in the metadata describing which tables are affected by the change (DBZ-871).

Bartosz Miedlar has fixed a bug in MySQL ANTLR grammar causing issues with identifiers in backquotes (DBZ-1300).

What’s next?

We hope we will be able to keep the recent release cadence and get lout the first beta version of 0.10 in two weeks.

Stay tuned for more!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.10.0.Alpha1 "Spring Clean-Up" Edition Released

I’m very happy to announce the release of Debezium 0.10.0.Alpha1!

The major theme for Debezium 0.10 will be to do some clean-up (that’s what you do at this time of the year, right?); we’ve planned to remove a few deprecated features and to streamline some details in the structure the CDC events produced by the different Debezium connectors.

This means that upgrading to Debezium 0.10 from earlier versions might take a bit more planning and consideration compared to earlier upgrades, depending on your usage of features and options already marked as deprecated in 0.9 and before. But no worries, we’re describing all changes in great detail in this blog post and the release notes.

Why?

First of all, let’s discuss a bit why we’re doing these changes.

Over the last three years, Debezium has grown from supporting just a single database into an entire family of CDC connectors for a range of different relational databases and MongoDB, as well as accompanying components such as message transformations for topic routing or implementing the outbox pattern.

As in any mature project, over time we figured that a few things should be done differently in the code base than we had thought at first. For instance we moved from a hand-written parser for processing MySQL DDL statements to a much more robust implementation based on Antlr. Also we realized the way certain temporal column types were exported was at risk of value overflow in certain conditions, so we added a new mode not prone to these issues. As a last example, we made options like the batch size used during snapshotting consistent across the different connectors.

Luckily, Debezium quickly gained traction and despite the 0.x version number, it is used heavily in production at a large number of organizations, and users rely on its stability. So whenever we did such changes, we aimed at making the upgrade experience as smooth as possible; usually that means that the previous behavior is still available but is marked as deprecated in the documentation, while a new improved option, implementation etc. is added and made the default behavior.

At the same time we realized that there are a couple of differences between the connectors which shouldn’t really be there. Specifically, the source block of change events has some differences which make a uniform handling by consumers more complex than it should be; for instance the timestamp field is named "ts_sec" in MySQL events but "ts_usec" for Postgres.

With all this in mind, we decided that it is about time to clean up these issues. This done for a couple of purposes:

  • Keeping the code base maintainable and open for future development by removing legacy code such as deprecated options and their handling as well as the legacy MySQL DDL parser

  • Making CDC events from different connectors easier to consume by unifying the source block created by the different connectors as far as possible

  • Preparing the project to go to version 1.0 with an even stronger promise of retaining backwards compatibility than already practiced today

What?

Now as we have discussed why we feel it’s time for some "clean-up", let’s take a closer look at the most relevant changes. Please also refer to the "breaking changes" section of the migration notes for more details.

  • The legacy DDL parser for MySQL has been removed (DBZ-736); if you are not using the Antlr-based one yet (it was introduced in 0.8 and became the default in 0.9), it’s highly recommended that you test it with your databases. Should you run into any parsing errors, please report them so we can fix them for the 0.10 Final release.

  • The SMTs for retrieving the new record/document state from change events have been renamed from io.debezium.transforms.UnwrapFromEnvelope and io.debezium.connector.mongodb.transforms.UnwrapFromMongoDbEnvelope into ExtractNewRecordState and ExtractNewDocumentState, respectively (DBZ-677). The old names can still be used as of 0.10, but doing so will raise a warning. They are planned for removal in Debezium 0.11.

  • Several connector options that were deprecated in earlier Debezium versions have been removed (DBZ-1234): the drop.deletes option of new record/document state extraction SMTs (superseded by delete.handling.mode option), the rows.fetch.size option (superseded by snapshot.fetch.size), the adaptive value of time.precision.mode option for MySQL (prone to value loss, use adaptive_microseconds instead) and the snapshot.minimal.locks for the MySQL connector (superseded by snapshot.locking.mode)

  • Several option names of the (incubating) SMT for the outbox pattern have been renamed for the sake of consistency (DBZ-1289)

  • Several fields within the source block of CDC events have been renamed for the sake of consistency (DBZ-596); as this is technically a backwards-incompatible change when using Avro and the schema registry, we’ve added a connector option source.struct.version which, when set to the value v1, will have connectors produce the previous source structure. v2 is the default and any consumers should be adjusted to work with the new source structure as soon as possible.

New Features and Bugfixes

Besides these changes, the 0.10.0.Alpha1 release also contains some feature additions and bug fixes:

  • The SQL Server connector supports custom SELECT statements for snapshotting (DBZ-1224)

  • database, schema and table/collection names have been added consistently to the source block for CDC events from all connectors (DBZ-875)

  • Client authentication works for the MySQL connector(DBZ-1228)

  • The embedded engine doesn’t duplicate events after restarts any longer (DBZ-1276)

  • A parser bug related to CREATE INDEX statements was fixed (DBZ-1264)

Overall, 30 issues were addressed in this release. Many thanks to Arkoprabho Chakraborti, Ram Satish and Yuchao Wang for their contributions to this release!

Speaking of contributors, we did some housekeeping to the list of everyone ever contributing to Debezium, too. Not less than exactly 111 individuals have contributed code up to this point, which is just phenomenal! Thank you so much everyone, you folks rock!

Outlook

Going forward, there are some more details we’d like to unify across the different connectors before going to Debezium 0.10 Final. For instance the source attribute snapshot will be changed so it can take one of three states: true, false or last (indicating that this event is the last one created during initial snapshotting).

We’ll also continue our efforts to to migrate the existing Postgres connector to the framework classes established for the SQL Server and Oracle connectors. Another thing we’re actively exploring is how the Postgres could take advantage of the "logical replication" feature added in Postgres 10. This may provide us with a way to ingest change events without requiring a custom server-side logical decoding plug-in, which proves challenging in cloud environments where there’s typically just a limited set of logical decoding options available.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Tutorial for Using Debezium Connectors With Apache Pulsar

This is a guest post by Apache Pulsar PMC Member and Committer Jia Zhai.

Debezium is an open source project for change data capture (CDC). It is built on Apache Kafka Connect and supports multiple databases, such as MySQL, MongoDB, PostgreSQL, Oracle, and SQL Server. Apache Pulsar includes a set of built-in connectors based on Pulsar IO framework, which is counter part to Apache Kafka Connect.

As of version 2.3.0, Pulsar IO comes with support for the Debezium source connectors out of the box, so you can leverage Debezium to stream changes from your databases into Apache Pulsar. This tutorial walks you through setting up the Debezium connector for MySQL with Pulsar IO.

Tutorial Steps

This tutorial is similar to the Debezium tutorial, except that storage of event streams is changed from Kafka to Pulsar. It mainly includes six steps:

  1. Start a MySQL server;

  2. Start standalone Pulsar service;

  3. Start the Debezium connector in Pulsar IO. Pulsar IO reads database changes existing in MySQL server;

  4. Subscribe Pulsar topics to monitor MySQL changes;

  5. Make changes in MySQL server, and verify that changes are recorded in Pulsar topics immediately;

  6. Clean up.

Step 1: Start a MySQL server

Start a MySQL server that contains a database example, from which Debezium captures changes. Open a new terminal to start a new container that runs a MySQL database server pre-configured with a database named inventory:

docker run --rm --name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=debezium -e MYSQL_USER=mysqluser -e MYSQL_PASSWORD=mysqlpw debezium/example-mysql:0.9

The following information is displayed:

2019-03-25T14:12:41.178325Z 0 [Note] Event Scheduler: Loaded 0 events
2019-03-25T14:12:41.178670Z 0 [Note] mysqld: ready for connections.
Version: '5.7.25-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)

Step 2: Start standalone Pulsar service

Start Pulsar service locally in standalone mode. Support for running Debezium connectors in Pulsar IO is introduced in Pulsar 2.3.0. Download Pulsar binary of 2.3.0 release and pulsar-io-kafka-connect-adaptor-2.3.0.nar of 2.3.0 release. In Pulsar, all Pulsar IO connectors are packaged as separate NAR files.

$ wget https://archive.apache.org/dist/pulsar/pulsar-2.3.0/apache-pulsar-2.3.0-bin.tar.gz
$ wget https://archive.apache.org/dist/pulsar/pulsar-2.3.0/connectors/pulsar-io-kafka-connect-adaptor-2.3.0.nar
$ tar zxf apache-pulsar-2.3.0-bin.tar.gz
$ cd apache-pulsar-2.3.0
$ mkdir connectors
$ cp ../pulsar-io-kafka-connect-adaptor-2.3.0.nar connectors
$ bin/pulsar standalone
start pulsar standalone]

Step 3: Start the Debezium MySQL connector in Pulsar IO

Start the Debezium MySQL connector in Pulsar IO, with local run mode, in another terminal tab. The “debezium-mysql-source-config.yaml” file contains all the configuration, and main parameters are listed under the “configs” node. The .yaml file contains the "task.class" parameter. The configuration file also includes MySQL related parameters (like server, port, user, password) and two names of Pulsar topics for "history" and "offset" storage.

$ bin/pulsar-admin source localrun  --sourceConfigFile debezium-mysql-source-config.yaml

The content in the “debezium-mysql-source-config.yaml” file is as follows.

tenant: "test"
namespace: "test-namespace"
name: "debezium-kafka-source"
topicName: "kafka-connect-topic"
archive: "connectors/pulsar-io-kafka-connect-adaptor-2.3.0.nar"

parallelism: 1

configs:
  ## sourceTask
  task.class: "io.debezium.connector.mysql.MySqlConnectorTask"

  ## config for mysql, docker image: debezium/example-mysql:0.8
  database.hostname: "localhost"
  database.port: "3306"
  database.user: "debezium"
  database.password: "dbz"
  database.server.id: "184054"
  database.server.name: "dbserver1"
  database.whitelist: "inventory"

  database.history: "org.apache.pulsar.io.debezium.PulsarDatabaseHistory"
  database.history.pulsar.topic: "history-topic"
  database.history.pulsar.service.url: "pulsar://127.0.0.1:6650"
  ## KEY_CONVERTER_CLASS_CONFIG, VALUE_CONVERTER_CLASS_CONFIG
  key.converter: "org.apache.kafka.connect.json.JsonConverter"
  value.converter: "org.apache.kafka.connect.json.JsonConverter"
  ## PULSAR_SERVICE_URL_CONFIG
  pulsar.service.url: "pulsar://127.0.0.1:6650"
  ## OFFSET_STORAGE_TOPIC_CONFIG
  offset.storage.topic: "offset-topic"

Tables are created automatically in the aforementioned MySQL server. So Debezium connector reads history records from MySQL binlog file from the beginning. In the output you will find the connector has already been triggered and processed in 47 records.

connector start process records

For more information on how to manage connectors, see the Pulsar IO documentation.

Records that have been captured and read by Debezium are automatically published to Pulsar topics. When you start a new terminal, you will find the current topics in Pulsar with the following command:

$ bin/pulsar-admin topics list public/default
list Pulsar topics

For each table, which has been changed, the change data is stored in a separate Pulsar topic. Except database table related topics, another two topics named “history-topic” and “offset-topic” are used to store history and offset related data.

persistent://public/default/history-topic
persistent://public/default/offset-topic

Step 4: Subscribe Pulsar topics to monitor MySQL changes

Take the persistent://public/default/dbserver1.inventory.products topic as an example. Use the CLI command to consume this topic and monitor changes while the “products” table changes.

 $ bin/pulsar-client consume -s "sub-products" public/default/dbserver1.inventory.products -n 0

The output is as follows:

…
22:17:41.201 [pulsar-client-io-1-1] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [public/default/dbserver1.inventory.products][sub-products] Subscribing to topic on cnx [id: 0xfe0b4feb, L:/127.0.0.1:55585 - R:localhost/127.0.0.1:6650]
22:17:41.223 [pulsar-client-io-1-1] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [public/default/dbserver1.inventory.products][sub-products] Subscribed to topic on localhost/127.0.0.1:6650 -- consumer: 0

You can also consume the offset topic to monitor the offset changes while the table changes are stored in the persistent://public/default/dbserver1.inventory.products Pulsar topic.

$ bin/pulsar-client consume -s "sub-offset" offset-topic -n 0

Step 5: Make changes in MySQL server, and verify that changes are recorded in Pulsar topics immediately

Start a MySQL CLI docker connector, and you can make changes to the “products” table in MySQL server.

$docker run -it --rm --name mysqlterm --link mysql --rm mysql:5.7 sh -c 'exec mysql -h"$MYSQL_PORT_3306_TCP_ADDR" -P"$MYSQL_PORT_3306_TCP_PORT" -uroot -p"$MYSQL_ENV_MYSQL_ROOT_PASSWORD"'

After running the command, MySQL CLI is displayed, and you can change the names of the two items in the “products” table.

mysql> use inventory;
mysql> show tables;
mysql> SELECT * FROM  products ;
mysql> UPDATE products SET name='1111111111' WHERE id=101;
mysql> UPDATE products SET name='1111111111' WHERE id=107;
mysql updates

In the terminal where you consume products topic, you find that two changes have been added.

table topic stores mysql updates

In the terminal where you consume the offset topic, you find that two offsets have been added.

offset topic get updated

In the terminal where you local-run the connector, you find two more records have been processed.

table topic get more records

Step 6: Clean up.

Use “Ctrl + C” to close terminals. Use “docker ps” and “docker kill” to stop MySQL related containers.

mysql> quit

$ docker ps
CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS                               NAMES
84d66c2f591d        debezium/example-mysql:0.8   "docker-entrypoint.s…"   About an hour ago   Up About an hour    0.0.0.0:3306->3306/tcp, 33060/tcp   mysql

$ docker kill 84d66c2f591d

To delete Pulsar data, delete data directory in the Pulsar binary directory.

$ pwd
/Users/jia/ws/releases/apache-pulsar-2.3.0

$ rm -rf data

Conclusion

The Pulsar IO framework allows to run the Debezium connectors for change data capture, streaming data changes from different databases into Apache Pulsar. In this tutorial you’ve learned how to capture data changes in a MySQL database and propagate them to Pulsar. We are improving support for running the Debezium connectors with Apache Pulsar continuously, it will be much easier to use after Pulsar 2.4.0 release.


back to top