Welcome to the latest edition of "Debezium Community Stories With…", a series of interviews with members of the Debezium and change data capture community, such as users, contributors or integrators. Today, it’s my pleasure to talk to Lars M Johansson.
Lars, could you introduce yourself? What is your job, if you’re not contributing to Debezium?
Hi, my name is Lars Johansson (which is a fairly common name in Sweden, hence the middle initial 'M' for Magnus), and I’m an IT-Architect and consultant, running my own business since the beginning of this year. I’ve been with a number of consulting firms and some "proper" tech companies over the years but I have almost exclusively kept to the Unix/Linux, Java and Open Source side of the industry. I have worked on various types of business systems with clients ranging from banking, through retail and telecom, to life sciences and public sector.
These past few years I’ve had an assignment with the Swedish Migration Agency within their Digital Transformation programme (funded by the European Union). Part of that transformation has been the introduction of Event Driven Architecture and Apache Kafka, and with that Change Data Capture though Kafka Connect.
What are your use cases for Debezium and change data capture in your current project?
We have two main use cases for Debezium and CDC.
The first, and not quite as challenging as the second, is implementing the Transactional Outbox Pattern to produce business events from existing, actively developed applications to Kafka. We do that with the Outbox Event Router and corresponding connector against mostly PostgreSQL databases.
The second, more challenging use case is producing business events from an older legacy system based on IBM Informix. This is part of a transition strategy to decouple new applications and services from the old legacy system by exposing relevant state changes as business events in the new event streaming platform.
When we stared this journey there was no publicly available connector plugin for Informix. The reason for that can probably be summed up in Gunnar’s comment when I asked him about it during an "Ask the Experts" session at Kafka Summit:
> Gunnar: I think this is the third time I've heard that question.
> Lars: Today?
> Gunnar: No, in total.
So we started writing our own, and it was a journey of learning. The more we learnt, about the internals of the Informix database, its change streaming APIs and about Kafka Connect, the more we realised maybe we had taken a larger bite than we could chew.
Also, as the work with transactional outboxes progressed, we came to realise that we would eventually want to implement that pattern for Informix as well, and utilising the Outbox Event Router would be a great benefit. So, we stared looking at porting out work to the Debezium platform.
Actually, we weren’t the only ones looking at building a Debezium connector for Informix. We found an embryo of a connector at Debezium Connector for Informix. It was based on Debezium 1.5, so we set out to bring it up to 2.x standard and merge our work into it.
Simultaneously there was an initiative at the agency regarding how to deal with contributions to Open Source. The agency’s work is financed by public funds, from the Swedish government and from the European Union, and there are strategies and policies on national as well as European levels that promote contributing back to Open Source projects. As a result, a policy was drafted and adopted that allowed us to contribute our work back to the Debezium community, and the Debezium Connector for Informix was released to the public with v2.5.0.Alpha1.
This sounds really interesting; can you tell us more about the challenges you encountered and how you solved them?
Although superficially similar, all databases do things slightly different underneath the hood. Even more so when it comes to change data capture APIs. The publicly available documentation on Informix, and it’s CDC API also does not go into much depth, and not all features are documented and some documented features are not implemented. So just figuring out all the quirks and handling their edge cases took considerable effort. For instance mapping the different data types and which data types are supported in the CDC API.
But the greatest challenge was without doubt getting all restart and recover cases to work securely and consistently. In the end it was mostly a matter of grit: doubling down on integration tests, making sure all conceivable cases are covered and then code, test, refactor. Then test, test and test again… And this probably one of the biggest benefits of porting our work to Debezium: with the testing framework and extensive test suite implementations of the existing modules to look to for inspiration and ideas.
Another challenge, that is more to do with Open Source than CDC, is that we are developing to solve our own use cases. But other people have other use cases, and they make the plugin break in (for us) entirely unforeseen ways. That was frustrating to begin with before we got used to it, but in the end it is part of the beauty of Open Source and what makes us all better in the end.
As a contributor to a community-led Debezium connector, how was the experience?
It’s been an overall smooth experience. Once the initiative to publish our work back to the community was approved by the Agency, we started setting up the repository and integrating with the CI tools. We got good help from you Chris and Jiri Pechanec, and the contribution guide was also helpful. Despite the Informix community not being particularly vibrant, we did get some attention and feedback, bug-reports and suggested improvements, quite early on, and we have been steadily improving.
Are you doing other open-source work, too?
I have dabbled a bit here and there but this is my first proper contribution to an open-source project. Certainly the first to get any sort of attention.
Is there features in Debezium you believe are missing you’d like to see in the future?
Compared to writing your own connector from scratch, Debezium is really a Swiss Army knife of connector development. But if I have to suggest something, it would be common support for distributed cache off-load of very large transactions, since that is a problem we have actually run across in Informix…
Lars, thanks a lot for taking your time, it was a pleasure to have you here!
If you’re like to stay in touch with Lars M Johansson and discuss with him, please drop a comment below or follow and reach out to him on GitHub.
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.