A user of the Debezium connector for MySQL informed us about a potential issue with the configuration of the connector’s internal database history topic, which may cause the deletion of parts of that topic (DBZ-663). Please continue reading if you’re using the Debezium MySQL connector in versions 0.7.3 or 0.7.4.
In Debezium 0.7.3 we rolled out a feature for creating the database history automatically if it doesn’t exist yet (DBZ-278). While this feature sets the retention time for the topic to an "infinite" period, it doesn’t specify the "retention.bytes" option for the history topic. This may cause parts of the history topic to be deleted in case all of the following conditions are met:
You are using versions 0.7.3 or 0.7.4 of the Debezium connector for MySQL
The database history topic has been created by the connector (i.e. you haven’t created it yourself)
The broker level option "log.retention.bytes" is set to another value than -1 (note that the default is -1, in which case things work as intended)
The database history topic grows beyond the threshold configured via "log.retention.bytes"
If the history topic is incomplete, the connector will fail to recover the database history after a restart of the connector and will not continue with reading the MySQL binlog.
You should either create the database history topic yourself with an infinite retention or alternatively override the "retention.bytes" configuration for the history topic created by the connector:
<KAFKA_DIR>/bin/kafka-configs.sh \ --zookeeper zookeeper:2181 \ --entity-type topics \ --entity-name <DB_HISTORY_TOPIC> \ --alter \ --add-config retention.bytes=-1
In case parts of the history topic were removed already, you can use the snapshot mode
schema_only_recovery for re-creating the history topic in case no schema changes have happened since the last committed offset of the connector. Alternatively, a complete new snapshot should be taken, e.g. by setting up a new connector instance.
We’ll release Debezium 0.7.5 with a fix for this issue early next week. Note that previously created database history topics should be re-configured as described above. Please don’t hesitate to get in touch in the comments below, the chat room or the mailing list in case you have any further questions on this issue.
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.