Let’s talk about TOAST. Toast? No, TOAST!
So what’s that? TOAST (The Oversized-Attribute Storage Technique) is a mechanism in Postgres which stores large column values in multiple physical rows, circumventing the page size limit of 8 KB.
Typically, TOAST storage is transparent to the user, so you don’t really have to care about it. There’s an exception, though: if a table row has changed, any unchanged values that were stored using the TOAST mechanism are not included in the message that Debezium receives from the database, unless they are part of the table’s replica identity. Consequently, such unchanged TOAST column value will not be contained in Debezium data change events sent to Apache Kafka. In this post we’re going to discuss different strategies for dealing with this situation.
This is a guest post by Apache Pulsar PMC Member and Committer Jia Zhai.
Debezium is an open source project for change data capture (CDC). It is built on Apache Kafka Connect and supports multiple databases, such as MySQL, MongoDB, PostgreSQL, Oracle, and SQL Server. Apache Pulsar includes a set of built-in connectors based on Pulsar IO framework, which is counter part to Apache Kafka Connect.
As of version 2.3.0, Pulsar IO comes with support for the Debezium source connectors out of the box, so you can leverage Debezium to stream changes from your databases into Apache Pulsar. This tutorial walks you through setting up the Debezium connector for MySQL with Pulsar IO.