Today, it is a common practise to build data lakes for analytics, reporting or machine learning needs.
In this blog post we will describe a simple way to build a data lake. The solution is using a realtime data pipeline based on Debezium, supporting ACID transactions, SQL updates and is highly scalable. And it’s not required to have Apache Kafka or Apache Spark applications to build the data feed, reducing complexity of the overall solution.
One of the major improvements in Debezium starting in version 1.6 is support for incremental snapshots. In this blog post we are going to explain the motivation for this feature, we will do a deep dive into the implementation details, and we will also show a demo of it.
It’s with great pleasure that I am announcing the release of Debezium 1.7.0.Final!
Key features of this release include substantial improvements to the notion of incremental snapshotting (as introduced in Debezium 1.6), a web-based user Debezium user interface, NATS support in Debezium Server, and support for running Apache Kafka without ZooKeeper via the Debezium Kafka container image.
Also in the wider Debezium community some exciting things happened over the last few months; For instance, we saw a CDC connector for ScyllaDB based on the Debezium connector framework, and there’s work happening towards a Debezium Server connector for Apache Iceberg (details about this coming soon in a guest post on this blog).
We are very happy to announce the release of Debezium 1.7.0.CR2!
As we are moving ahead towards the final release we include mostly bugfixes. Yet this release contains important performance improvements and a new feature for read-only MySQL incremental snapshots.
At ScyllaDB, we develop a high-performance NoSQL database Scylla, API-compatible with Apache Cassandra, Amazon DynamoDB and Redis. Earlier this year, we introduced support for Change Data Capture in Scylla 4.3. This new feature seemed like a perfect match for integration with the Apache Kafka ecosystem, so we developed the Scylla CDC Source Connector using the Debezium framework. In this blogpost we will cover the basic structure of Scylla’s CDC, reasons we chose the Debezium framework and design decisions we made.