Debezium Blog
Debezium provides a way to run the connectors directly within Debezium from the very beginning of the project. The way how it was provided has changed over the time and it still evolves. This article will describe another evolution step in this regard - new implementation of Debezium engine.
In the previous blog post, we have shown how to leverage Debezium to train neural-network model with the existing data from the database and use this pre-trained model to classify images newly stored into the database. In this blog post, we will move it one step further - we will use Debezium to create multiple data streams from the database and use one of the streams for continuous learning and to improve our model, and the second one for making predictions on the data. When the model is constantly improved or adjusted to recent data samples, this approach is known as online machine learning. Online learning is only suitable for some use cases, and implementing an online variant of a given algorithm may be challenging or even impossible. However, in situations where online learning is possible, it becomes a very powerful tool as it allows one to react to the changes in the data in real-time and avoids the need to re-train and re-deploy new models, thus saving the hardware and operational costs. As the streams of data become more and more common, e.g. with the advent of IoT, we can expect online learning to become more and more popular. It’s usually a perfect fit for analyzing streaming data in use cases where it’s possible.
With the recent success of ChatGPT, we can observe another wave of interest in the AI field and machine learning in general. The previous wave of interest in this field was, at least to a certain extent, caused by the fact that excellent ML frameworks like TensorFlow, PyTorch or general data processing frameworks like Spark became available and made the writing of ML models much more straightforward. Since that time, these frameworks have matured, and writing models are even more accessible, as you will see later in this blog. However, data set preparation and gathering data from various sources can sometimes take time and effort. Creating a complete pipeline that would pull existing or newly created data, adjust it, and ingest it into selected ML libraries can be challenging. Let’s investigate if Debezium can help with this task and explore how we can leverage Debezium’s capabilities to make it easier.