Subscribe


Debezium 0.7.3 Is Released

I’m very happy to announce the release of Debezium 0.7.3!

This is primarily a bugfix release, but we’ve also added a handful of smaller new features. It’s a recommended upgrade for all users. When upgrading from earlier versions, please check out the release notes of all versions between the one your’re currently on and 0.7.3 in order to learn about any steps potentially required for upgrading.

Let’s take a closer look at some of the new features.

All Connectors

Using the new connector option tombstones.on.delete you can now control whether upon record deletions a tombstone event should be emitted or not (DBZ-582). Doing so is usually the right thing and thus remains the default behaviour. But disabling tombstones may be desirable in certain situations, and this gets a bit easier now using that option (before you’d have to use an SMT - single message transform -, which for instance isn’t supported when using Debezium’s embedded mode). This feature was contributed by our community member Raf Liwoch. Thanks!

We’ve also spent some time on a few operational aspects: The sourceInfo element of Debezium’s change data messages contains a new field representing the version of the connector that created the message (DBZ-593). This lets message consumers take specific action based on the version. For instance this can be helpful where a new Debezium release fixes a bug, which consumers could work around so far. Now, after the update to that new Debezium version, that workaround should not be applied anymore. The version field will allow consumers to decide whether to apply the workaround or not.

The names of all the threads managed by Debezium are now structured in the form of "debezium-<connector>-…​" (DBZ-587). This helps with identifying Debezium’s threads when analyzing thread dumps for instance.

Postgres Connector

Here we’ve focused on improving the support for array types: besides fixing a bug related to numeric arrays (DBZ-577) we’ve also completed the support for the PostGIS types (which was introduced in 0.7.2), allowing you to capture array columns of types GEOMETRY and GEOGRAPHY.

Snapshots are now correctly interruptable (DBZ-586) and the connector will correctly handle the case where after a restart it should continue from a WAL position which isn’t available any more: it’ll stop, requiring you to do a new snapshot (DBZ-590).

MySQL Connector

The MySQL connector can create the DB history topic automatically, if needed (DBZ-278). This means you don’t have to create that topic yourself and you also don’t need to rely on Kafka’s automatic topic creation any longer (any change data topics will automatically be created by Kafka Connect).

Also the connector can optionally emit messages to a dedicated heartbeat topic in a configurable interval (DBZ-220). This comes in handy in situations where you only want to capture tables with low traffic, while other tables in the database are changed more frequently. In that case, no messages would have been emitted to Kafka Connect for a long time, and thus no offset would have been committed either. This could have caused trouble when restarting the connector: it wanted to resume from the last comitted offset, which may not be available in the binlogs any longer. But as the captured tables didn’t change, it actually wouldn’t be necessary to resume from such old binlog position. This all is avoided by emitting messages to the heartbeat topic regularly, which causes the last offset the connector has seen to be committed.

We’ll roll out this change to the other connectors, too, in future releases.

What’s next?

Please see the full change log for more details and the complete list of issues fixed in Debezium 0.7.3.

The next release is scheduled for March 7th. We’ll still have to decide whether that will be 0.7.4 or 0.8.0, depending on how far we are by then with our work on the Oracle connector (DBZ-137).

Please also our roadmap describing our ideas for future development of Debezium. This is our current thinking of the things we’d like to tackle in the coming months, but it’s not cast in stone, so please let us know about your feature requests by sending a message to our Google group. We’re looking forward to your feedback!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.7.2 Is Released

It’s my pleasure to announce the release of Debezium 0.7.2!

Amongst the new features there’s support for geo-spatial types, a new snapshotting mode for recovering a lost DB history topic for the MySQL connector, and a message transformation for converting MongoDB change events into a structure which can be consumed by many more sink connectors. And of course we fixed a whole lot of bugs, too.

Debezium 0.7.2 is a drop-in replacement for previous 0.7.x versions. When upgrading from versions earlier than 0.7.0, please check out the release notes of all 0.7.x releases to learn about any steps potentially required for upgrading.

A big thank you goes out to our fantastic community members for their hard work on this release: Andrey Pustovetov, Denis Mikhaylov, Peter Goransson, Robert Coup, Sairam Polavarapu and Tom Bentley.

Now let’s take a closer look at some of new features.

MySQL Connector

The biggest change of the MySQL connector is support for geo-spatial column types such as GEOMETRY, POLYGON, MULTIPOINT etc.

There are two new logical field types — io.debezium.data.geometry.Geometry and io.debezium.data.geometry.Geography — for representing geo-spatial columns in change data messages. These types represent geo-spatial data via WKB ("well-known binary") and SRID (coordinate reference system identifier), allowing downstream consumers to interpret the change events using any existing library with support for parsing WKB. A blog post with more details on this will follow soon.

The new snapshotting mode schema_only_recovery comes in handy when for some reason you lost (parts of) the DB history topic used by the MySQL connector. It’s also useful if you’d like to compact that topic by re-creating it. Please refer to the connector documentation for the details of this mode, esp. when it’s safe (and when not) to make use of it.

Another new feature related to managing the size of the DB history topic is the option to control whether to include all DDL events or only those pertaining to tables captured as per the whitelist/blacklist configuration. Again, check out the connector docs to learn more about the specifics of that setting.

Finally, we fixed a few shortcomings of the MySQL DDL parser (DBZ-524, DBZ-530).

PostgreSQL Connector

Similar to the MySQL connector, there’s largely improved support for geo-spatial columns in Postgres now. More specifically, PostGIS column types can be represented in change data events now. Thanks a lot for Robert Coup who contributed this feature!

Also the support for Postgres array columns has been expanded, e.g. we now support to track changes to VARCHAR and DATE array columns. Note that the connector doesn’t yet work with geo-spatial array columns (should you ever have those), but this should be added soon, too.

If you’d like to include just a subset of the rows of a captured table in snapshots, you may like the ability to specify dedicated SELECT statements to do so. For instance this can be used to exclude any logically deleted records — which you can recognize based on some flag in that table — from the snapshot.

A few bugs in this connector where reported and fixed by community members, too, e.g. the connector can be correctly paused now (thanks, Andrey Pustovetov), and we fixed an issue which could potentially have committed an incorrect offset to Kafka Connect (thanks, Thon Mekathikom).

MongoDB Connector

If you’ve ever compared the structures of change events emitted by the Debezium RDBMS connectors (MySQL, Postgres) and the MongoDB connector, you’ll know that the message structure of the latter is a bit different than the others. Due to the schemaless nature of MongoDB, the change events essentially contain a String with a JSON representation of the applied insert or patch. This structure cannot be consumed by existing sink connectors, such as the Confluent connectors for JDBC or Elasticsearch.

This gets possible now by means of a newly added single message transformation (SMT), which parses these JSON strings and creates a structured Kafka Connect record from it (thanks, Sairam Polavarapu!). When applying this SMT to the JDBC sink connector, you can now stream data changes from MongoDB to any supported relational database.

Note that this SMT is work-in-progress, details of its emitted message structure may still change. Also there are some inherent limitations to what can be achieved with it, if you e.g. have arrays in your MongoDB documents, the record created by this SMT will be structured accordingly, but many sink connectors cannot process such structure.

We have some ideas for further development here, e.g. there could be an option for flattening out (non-array) nested structures, so that e.g. { "address" { "street" : "..." } } would be represented as address_street, which then could be consumed by sink connectors expecting a flat structure.

The new SMT is described in detail in our docs.

What’s next?

Please see the full change log for more details and the complete list of issues fixed in Debezium 0.7.2.

The 0.7.3 release is scheduled for February 14th.

We’ll focus on some more bug fixes, also we’re working on having Debezium regulary emit heartbeat messages to a dedicated topic. This will be practical for diagnostic purposes but also help to regularly trigger commits of the offset in Kafka Connect. That’s beneficial in certain situations when capturing tables which only very infrequently change.

We’ve also worked out a roadmap describing our ideas for future work on Debezium, going beyond the next bugfix releases. While nothing is cast in stone, this is our idea of the features to add in the coming months. If you miss anything important on this roadmap, please tell us either in the comments below or send a message to our Google group. Looking forward to your feedback!

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Streaming Data Changes from Your Database to Elasticsearch

We wish all the best to the Debezium community for 2018!

While we’re working on the 0.7.2 release, we thought we’d publish another post describing an end-to-end data streaming use case based on Debezium. We have seen how to set up a change data stream to a downstream database a few weeks ago. In this blog post we will follow the same approach to stream the data to an Elasticsearch server to leverage its excellent capabilities for full-text search on our data. But to make the matter a little bit more interesting, we will stream the data to both, a PostgreSQL database and Elasticsearch, so we will optimize access to the data via the SQL query language as well as via full-text search.

Topology

Here’s a diagram that shows how the data is flowing through our distributed system. First, the Debezium MySQL connector is continuously capturing the changes from the MySQL database, and sending the changes for each table to separate Kafka topics. Then, the Confluent JDBC sink connector is continuously reading those topics and writing the events into the PostgreSQL database. And, at the same time, the Confluent Elasticsearch connector is continuously reading those same topics and writing the events into Elasticsearch.

 

Scenario topology
Figure 1: A general topology

 

We are going to deploy these components into several different processes. In this example, we’ll deploy all three connectors to a single Kafka Connect instance that will write to and read from Kafka on behalf of all of the connectors (in production you might need to keep the connectors separated to achieve better performance).

 

Scenario topology
Figure 2: A simplified topology

Configuration

We will use this Docker Compose file for a fast deployment of the demo. The deployment consists of the following Docker images:

  • Apache ZooKeeper

  • Apache Kafka

  • An enriched Kafka Connect / Debezium image with a few changes:

    • PostgreSQL JDBC driver placed into /kafka/libs directory

    • The Confluent JDBC connector placed into /kafka/connect/kafka-connect-jdbc directory

  • Pre-populated MySQL as used in our tutorial

  • Empty PostgreSQL

  • Empty Elasticsearch

The message format is not the same for the Debezium source connector and the JDBC and Elasticsearch connectors as they are developed separately and each focuses on slightly different objectives. Debezium emits a more complex event structure so that it captures all of the information available. In particular, the change events contain the old and the new state of a changed record. Both sink connectors on the other hand expect a simple message that just represents the record state to be written.

Debezium’s UnwrapFromEnvelope single message transformation (SMT) collapses the complex change event structure into the same row-based format expected by the two sink connectors and effectively acts as a message translator between the two aforementioned formats.

Example

Let’s move directly to our example as that’s where the changes are visible. First of all we need to deploy all components:

export DEBEZIUM_VERSION=0.7
docker-compose up

When all components are started we are going to register the Elasticsearch Sink connector writing into the Elasticsearch instance. We want to use the same key (primary id) in the source and both PostgreSQL and Elasticsearch:

curl -i -X POST -H "Accept:application/json" \
    -H  "Content-Type:application/json" http://localhost:8083/connectors/ \
    -d @es-sink.json

We’re using this registration request:

{
  {
    "name": "elastic-sink",
    "config": {
      "connector.class":
          "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
      "tasks.max": "1",
      "topics": "customers",
      "connection.url": "http://elastic:9200",
      "transforms": "unwrap,key",
      "transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",        (1)
      "transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",(2)
      "transforms.key.field": "id",                                                 (2)
      "key.ignore": "false",                                                        (3)
      "type.name": "customer"                                                       (4)
    }
  }
}

The request configures these options:

  1. extracting only the new row’s state from Debezium’s change data message

  2. extracting the id field from the key struct, then the same key is used for the source and both destinations. This is to address the fact that the Elasticsearch connector only supports numeric types and string as keys. If we do not extract the id the messages will be filtered out by the connector because of unknown key type.

  3. use key from the event instead of generating a synthetic one

  4. type under which the events will be registered in Elasticsearch

Next we are going to register the JDBC Sink connector writing into PostgreSQL database:

curl -i -X POST -H "Accept:application/json" \
    -H  "Content-Type:application/json" http://localhost:8083/connectors/ \
    -d @jdbc-sink.json

Finally, the source connector must be set up:

curl -i -X POST -H "Accept:application/json" \
    -H  "Content-Type:application/json" http://localhost:8083/connectors/ \
    -d @source.json

Let’s check if the databases and the search server are synchronized. All the rows of the customers table should be found in the source database (MySQL) as well as the target database (Postgres) and Elasticsearch:

docker-compose exec mysql bash -c 'mysql -u $MYSQL_USER  -p$MYSQL_PASSWORD inventory -e "select * from customers"'
+------+------------+-----------+-----------------------+
| id   | first_name | last_name | email                 |
+------+------------+-----------+-----------------------+
| 1001 | Sally      | Thomas    | sally.thomas@acme.com |
| 1002 | George     | Bailey    | gbailey@foobar.com    |
| 1003 | Edward     | Walker    | ed@walker.com         |
| 1004 | Anne       | Kretchmar | annek@noanswer.org    |
+------+------------+-----------+-----------------------+
docker-compose exec postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
 last_name |  id  | first_name |         email
-----------+------+------------+-----------------------
 Thomas    | 1001 | Sally      | sally.thomas@acme.com
 Bailey    | 1002 | George     | gbailey@foobar.com
 Walker    | 1003 | Edward     | ed@walker.com
 Kretchmar | 1004 | Anne       | annek@noanswer.org
curl 'http://localhost:9200/customers/_search?pretty'
{
  "took" : 42,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "customer",
        "_id" : "1001",
        "_score" : 1.0,
        "_source" : {
          "id" : 1001,
          "first_name" : "Sally",
          "last_name" : "Thomas",
          "email" : "sally.thomas@acme.com"
        }
      },
      {
        "_index" : "customers",
        "_type" : "customer",
        "_id" : "1004",
        "_score" : 1.0,
        "_source" : {
          "id" : 1004,
          "first_name" : "Anne",
          "last_name" : "Kretchmar",
          "email" : "annek@noanswer.org"
        }
      },
      {
        "_index" : "customers",
        "_type" : "customer",
        "_id" : "1002",
        "_score" : 1.0,
        "_source" : {
          "id" : 1002,
          "first_name" : "George",
          "last_name" : "Bailey",
          "email" : "gbailey@foobar.com"
        }
      },
      {
        "_index" : "customers",
        "_type" : "customer",
        "_id" : "1003",
        "_score" : 1.0,
        "_source" : {
          "id" : 1003,
          "first_name" : "Edward",
          "last_name" : "Walker",
          "email" : "ed@walker.com"
        }
      }
    ]
  }
}

With the connectors still running, we can add a new row to the MySQL database and then check that it was replicated into both the PostgreSQL database and Elasticsearch:

docker-compose exec mysql bash -c 'mysql -u $MYSQL_USER  -p$MYSQL_PASSWORD inventory'

mysql> insert into customers values(default, 'John', 'Doe', 'john.doe@example.com');
Query OK, 1 row affected (0.02 sec)
docker-compose exec -postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
 last_name |  id  | first_name |         email
-----------+------+------------+-----------------------
...
Doe        | 1005 | John       | john.doe@example.com
(5 rows)
curl 'http://localhost:9200/customers/_search?pretty'
...
{
  "_index" : "customers",
  "_type" : "customer",
  "_id" : "1005",
  "_score" : 1.0,
  "_source" : {
    "id" : 1005,
    "first_name" : "John",
    "last_name" : "Doe",
    "email" : "john.doe@example.com"
  }
}
...

Summary

We set up a complex streaming data pipeline to synchronize a MySQL database with another database and also with an Elasticsearch instance. We managed to keep the same identifier across all systems which allows us to correlate records across the system as a whole.

Propagating data changes from a primary database in near realtime to a search engine such as Elasticsearch enables many interesting use cases. Besides different applications of fulltext search one could for instance also think about creating dashboards and all kinds of visualizations using Kibana, to gain further insight into the data.

If you’d like to try out this set-up yourself, just clone the project from our examples repo. In case you need help, have feature requests or would like to share your experiences with this pipeline, please let us know in the comments below.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.7.1 Is Released

Just last few days before Christmas we are releasing Debezium 0.7.1! This is a bugfix release that fixes few annoying issues that were found during first rounds of use of Debezium 0.7 by our community. All issues relate to either newly provided wal2json support or reduced risk of internal race condition improvement.

Robert Coup has found a performance regression in situations when 0.7.0 was used with old version of Protobuf decoder.

Suraj Savita (and others) has found an issue when our code failed to correctly detect it runs with Amazon RDS wal2json plug-in. We are outsmarted by the JDBC driver internals and included a distinct plugin decoder name wal2json_rds that bypasses detection routine and by default expects it runs against Amazon RDS instance. This mode should be used only with RDS instances.

We have also gathered feedback from first tries to run with Amazon RDS and included a short section in our documentation on this topic.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


Debezium 0.7.0 Is Released

It’s not Christmas yet, but we already got a present for you: Debezium 0.7.0 is here, full of new features as well as many bug fixes! A big thank you goes out to all the community members who contributed to this release. It is very encouraging for us to see not only more and more issues and feature requests being reported, but also pull requests coming in.

Note that this release comes with a small number of changes to the default mappings for some data types. We try to avoid this sort of changes as far as possible, but in some cases it is required, e.g. if the previous mapping could have caused potential value losses. Please see below for the details and also make sure to check out the full change log which describes these changes in detail.

Now let’s take a closer look at some of new features.

Based on Apache Kafka 1.0

A few weeks ago the Apache Kafka team has released version 1.0.0. This was an important milestone for the Kafka community, and we now can happily declare that Debezium is built against and runs on that Apache Kafka version. Our Docker images were also promoted to contain Apache Kafka and Kafka Connect 1.0.0.

PostgreSQL Connector

The big news for the PostgreSQL connector is that it now supports the wal2json logical decoding plugin as an alternative to the existing DecoderBufs plug-in. This means that you now can use Debezium to stream changes out of PostgreSQL on Amazon RDS, as wal2json is the logical decoding plugin used in this environment. Many thanks to Robert Coup who significantly contributed to this feature.

Working on this plug-in, we noticed that there was a potential race condition when it comes to applying changes to the schema of captured tables. In that case it could have happened that a number of messages pertaining to data changes done before the schema change were emitted using the new schema. With the exception of a few corner cases (which are described here), this has been addressed when using Debezium’s own DecoderBufs plug-in. So it’s highly recommended to upgrade the DecoderBufs plug-in to the new version before upgrading the Debezium connector. We’ve also worked closely with the author of the wal2json plug-in (big thanks for the quick help!) to prevent the issue when using the wal2json plug-in.

While the Debezium Docker images for Postgres already come with the latest version of DecoderBufs and wal2json, RDS for now is still using an older version of wal2json. Until this has been updated, special attention must be paid when applying schema changes to captured tables. Please see the changelog for a in-depth description of this issue and ways to mitigate it.

There are new daily running CI jobs that verify that the wal2json plugin passes our test suite. For the foreseeable future we’ll support both, wal2json as well as the existing DecoderBufs plug-in. The latter should be more efficient due to the usage of the Protocol Buffers binary format, whereas the former comes in handy for RDS or other cloud environments where you don’t have control over the installed logical decoding plug-ins, but wal2json is available.

In other news on the Postgres connector, Andrey Pustovetov discovered and proposed a fix for a multi-threading bug that could have put the connector into an undefined state if a rebalance in the Connect cluster was triggered during snapshotting. Thanks, Andrey!

MySQL Connector

In the MySQL connector we’ve fixed two issues which affect the default mapping of certain column types.

Following up to the new BIGINT UNSIGNED mapping introduced in Debezium 0.6.1, this type is now encoded as int64 in Debezium messages by default as it is easier for (polyglot) clients to work with. This is a reasonable mapping for the vast majority of cases. Only when using values > 2^63, you should switch it back to the Decimal logical type which is a bit more cumbersome to handle, though. This should be a rare situation, as MySQL advices against using unsigned values > 2^63 due to potential value losses when performing DB-side calculations. Please see the connector documentation for the details.

Rene Kerner has improved the support for the MySQL TIME type. MySQL allows to store values larger than 23:59:59 in such columns, and the type int32 which was previously used for TIME(0-3) columns isn’t enough to convey the entire possible value range. Therefore all TIME columns in MySQL are by default represented as int64 now, using the io.debezium.time.MicroTime logical type, i.e. the value represents micro-seconds. If needed, you can switch to the previous mapping by setting time.precision.mode to adaptive, but you should only do so if you’re sure that you only ever will have values that fit into int32. This option is only kept for a transitioning period and will be removed in a future release.

Recently we got a report that MySQL’s binlog can contain ROLLBACK statements and thus transactions that are actually not committed. Of course no data change messages should be emitted in this situation. This e.g. can be the case when temporary tables are dropped. So we introduced a look-ahead buffer functionality that reads the binlog by transaction and excludes those that were rolled back. This feature should be considered incubating and is disabled by default for the time being. We’d like to gather your feedback on this, so if you’d benefit from this feature, please give it a try and let us know if you run into any issues. For further details please refer to the binlog.buffer.size setting in the MySQL connector docs.

Andras Istvan Nagy came with the idea and implemented a way for explicitly selecting the rows from each table that will be part of the snapshotting process. This can for instance be very useful if you work with soft deletes and would like to exclude all logically deleted records from snapshotting.

Please see the full change log for more details and the complete list of fixed issues.

What’s next?

The Debezium 0.7.1 release is planned to be out roughly two weeks after Christmas.

It will contain a new SMT that will unwind MongoDB change events into a regular JSON consumable by sink connectors.

A big overhaul of GEOMETRY types is in progress. When completed, all GEOMETRY types will be supported by both MySQL and PostgreSQL connectors and they will be available in standard WKB format for easy consumption by polyglot clients.

There is ongoing work for the MySQL connector to allow dynamic update of table.whitelist option. This will allow the user to re-configure the set of tables captured without need to re-create connector.

If you’d like to contribute, please let us know. We’re happy about any help and will work with you to get you started quickly. Check out the details below on how to get in touch.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Gitter, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.


back to top