At ScyllaDB, we develop a high-performance NoSQL database Scylla, API-compatible with Apache Cassandra, Amazon DynamoDB and Redis. Earlier this year, we introduced support for Change Data Capture in Scylla 4.3. This new feature seemed like a perfect match for integration with the Apache Kafka ecosystem, so we developed the Scylla CDC Source Connector using the Debezium framework. In this blogpost we will cover the basic structure of Scylla’s CDC, reasons we chose the Debezium framework and design decisions we made.
Welcome to the newest edition of the Debezium community newsletter, in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.
It’s been a long time since our last edition. But we are back again! In case you missed our last edition, you can check it out here.
Hello everyone, my name is Anisha Mohanty and I recently joined Red Hat and the Debezium team.
I started my journey with Red Hat in April 2020 after completing my graduation. I was introduced to open source in my early college days, but I wasn’t aware of how organizations work and wanted to get the essence of open source ethics and values. That is something that I am fascinated to learn as I joined Red Hat.
My work started under the Data Virtualization team with Teiid and then under the GRAPHQLCRUD project which is a standard for a generic query interface on top of GraphQL. The project has started well and is in great shape right now. We have successfully added CRUD capabilities, paging, and filtering specifications.
Coming to Debezium, I first heard about it as some DV members started contributing here, well back then it was a completely new thing for me. I started exploring more, and it was not long when I had my first interaction with Gunnar and Jiri. With a warm welcome and great team here, I am really excited to work with the Debezium Community.
Over the last five years, Debezium has become a leading open-source solution for change data capture for a variety of databases. Users from all kinds of industries work with Debezium for use cases like replication of data from operational databases into data warehouses, updating caches and search indexes, driving streaming queries via Kafka Streams or Apache Flink, synchronizing data between microservices, and many more.
When talking to Debezium users, we generally receive very good feedback on the range of applications enabled by Debezium and its flexibility: e.g. each connector can be configured and fine-tuned in many ways, depending on your specific requirements. A large number of metrics provide deep insight into the state of running Debezium connectors, allowing to safely operate CDC pipelines also in huge installations with thousands of connectors.
All this comes at the cost of a learning curve, though: users new to Debezium need to understand the different options and settings as well as learn about best practices for running Debezium in production. We’re therefore constantly exploring how the user experience of Debezium can be further improved, allowing people to set up and operate its connectors more easily.
Welcome to the first edition of "Debezium Community Stories With…", a new series of interviews with members of the Debezium and change data capture community, such as users, contributors or integrators. We’re planning to publish more parts of this series in a loose rhythm, so if you’d like to be part of it, please let us know. In today’s edition it’s my pleasure to talk to Renato Mefi, a long-time Debezium user and contributor.
Hello everyone, my name is René Kerner and I recently joined Red Hat and the Debezium team.
I was working at trivago since 2011, and in 2016 we started using Debezium at version 0.4/0.5 for capturing clickstreams in the offshore datacenters into Kafka and aggregate them in the central cluster. We really intensified Debezium usage within one year and in 2017 we also used it for trivago’s main data.
In 2014 I did my first OSS contributions to Composer, PHP’s dependency management and gave my first talk on it at the Developer Conference (called code.talks for many years now). Then in 2017 I did my first contributions to Debezium with work on the MySQL snapshot process and fixing a MySQL TIME data type issue.
In 2018 I left trivago and started working at Codecentric as a consultant for software architecture and development (mainly JVM focus) and Apache Kafka, doing many trainings and workshops at German "Fortune 500" companies (insurances, industrial sector, media). I was doing lots of networking at that time, where I learned how awesome the community around Kafka is. I was always quite sad I didn’t have more time to focus on OSS projects.
Welcome to the latest edition of the Debezium community newsletter, in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.
In case you missed our last edition, you can check it out here.
Welcome to the Debezium community newsletter in which we share all things CDC related including blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.
This past summer has been a super exciting time for the team. Not only have we been working hard on Debezium 0.10 but we have unveiled some recent changes to debezium.io.
Welcome to the first edition of the Debezium community newsletter in which we share blog posts, group discussions, as well as StackOverflow questions that are relevant to our user community.
Hello everyone, my name is Chris Cranford and I recently joined the Debezium team.
My journey at Red Hat began just over three years ago; however I have been in this line of work for nearly twenty years. All throughout my career, I have advocated and supported open source software. Many of my initial software endeavors were based on open source software, several which are still heavily used today such as Hibernate ORM.
When I first learned about the Debezium project last year, I was very excited about it right away.
I could see how this project would be very useful for many people out there and I was very impressed by the professional way it was set up: a solid architecture for change data capture based on Apache Kafka, a strong focus on robustness and correctness also in the case of failures, the overall idea of creating a diverse eco-system of CDC connectors. All that based on the principles of open source, combined with extensive documentation from day one, a friendly and welcoming web site and a great getting-started experience.
So you can imagine that I was more than enthusiastic about the opportunity to take over the role of Debezium’s project lead. Debezium and CDC have close links to some data-centric projects I’ve been previously working on and also tie in with ideas I’ve been pursuing around CQRS, even sourcing and denormalization. As core member of the Hibernate team at Red Hat, I’ve implemented the initial Elasticsearch support for Hibernate Search (which deals with full-text index updates via JPA/Hibernate). I’ve also contributed to Hibernate OGM - a project which connects JPA and the world of NoSQL. One of the plans for OGM is to create a declarative denormalization engine for creating read models optimized for specific use cases. It will be very interesting to see how this plays together with the capabilities provided by Debezium.
Just before I started the Debezium project in early 2016, Martin Kleppmann gave several presentations about turning the database inside out and how his Bottled Water project demonstrated the importantance that change data capture can play in using Kafka for stream processing. Then Kafka Connect was announced, and at that point it seemed obvious to me that Kafka Connect was the foundation upon which practical and reusable change data capture can be built. As these techniques and technologies were becoming more important to Red Hat, I was given the opportunity to start a new open source project and community around building great CDC connectors for a variety of databases management systems.
Over the past few years, we have created Kafka Connect connectors for MySQL, then MongoDB, and most recently PostgreSQL. Each were initially limited and had a number of problems and issues, but over time more and more people have tried the connectors, asked questions, answered questions, mentioned Debezium on Twitter, tested connectors in their own environments, reported problems, fixed bugs, discussed limitations and potential new features, implemented enhancements and new features, improved the documentation, and wrote blog posts. Simply put, people with similar needs and interests have worked together and have formed a community. Additional connectors for Oracle and SQL Server are in the works, but could use some help to move things along more quickly.
It’s really exciting to see how far we’ve come and how the Debezium community continues to evolve and grow. And it’s perhaps as good a time as any to hand the reigns over to someone else. In fact, after nearly 10 wonderful years at Red Hat, I’m making a bigger change and as of today am part of Confluent’s engineering team, where I expect to play a more active role in the broader Kafka community and more directly with Kafka Connect and Kafka Streams. I definitely plan to stay involved in the Debezium community, but will no longer be leading the project. That role will instead be filled by Gunnar Morling, who’s recently joined the Debezium community but has extensive experience in open source, the Hibernate community, and the Bean Validation specification effort. Gunnar is a great guy and an excellent developer, and will be an excellent lead for the Debezium community.