MySQL 5.7 introduced a new data type for storing and working with JSON data. Clients can define tables with columns using the new JSON datatype, and they can store and read JSON data using SQL statements and new built-in JSON functions to construct JSON data from other relational columns, introspect the structure of JSON values, and search within and manipulate JSON data. It possible to define generated columns on tables whose values are computed from the JSON value in another column of the same table, and to then define indexes with those generated columns. Overall, this is really a very powerful feature in MySQL.

Debezium’s MySQL connector will support the JSON datatype starting with the upcoming 0.3.4 release. JSON document, array, and scalar values will appear in change events as strings with io.debezium.data.json for the schema name. This will make it natural for consumers to work with JSON data. BTW, this is the same semantic schema type used by the MongoDB connector to represent JSON data.

This sounds straightforward, and we hope it is. But implementing this required a fair amount of work. That’s because although MySQL exposes JSON data as strings to client applications, internally it stores all JSON data in a special binary form that allows the MySQL engine to efficiently access the JSON data in queries, JSON functions and generated columns. All JSON data appears in the binlog in this binary form as well, which meant that we had to parse the binary form ourselves if we wanted to extract the more useful string representation. Writing and testing this parser took a bit of time and effort, and ultimately we donated it to the excellent MySQL binlog client library that the connector uses internally to read the binlog events.

We’d like to thank Stanley Shyiko for guiding us and helping us debug the final problems with parsing JSON in the binlog, for accepting our proposed changes into his library, for releasing his library quickly when needed, and for being so responsive on this and other issues!

Randall Hauch

Randall is an open source software developer at Red Hat, and has been working in data integration for almost 20 years. He is the founder of Debezium and has worked on several other open source projects. He lives in Edwardsville, IL, near St. Louis.

     


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.