
We are pleased to announce the first beta release of Debezium 3.1, 3.1.0.Beta1.
This release includes a myriad of features and improvements, including but not limited to our first official release of Debezium Server UI, CloudEvent traceparent support, new features for Debezium’s PubSub and RabbitMQ sinks, schema access in the WASM transformation, and much, much more. Let’s dive in and take a look at all these new features and improvements.
New features and improvements
The upgrade to Debezium 3.1.0.Beta1 introduces several new features and improvements in several components:
First release of Debezium Platform
A year ago we began this incredible journey to create a new, modern user interface for Debezium Server that aimed to ease the deployment of Debezium on Kubernetes. We are excited to announce that Debezium 3.1 will be the first official release of this years-long effort.
The new Debezium Platform provides a modern pipeline-based approach to designing source and sink configurations, transformation chains, and more within seconds. You can install the Debezium Platform using helm
as follows:
helm install debezium-platform --set domain.url=<your-domain> --version 3.1.0-beta1 oci://quay.io/debezium-charts/debezium-platform
For more details on how to deploy using helm, see the README.md.
In addition, this release specifically adds some finishing touches to the user interface, which includes new search/list-view toggles, display applying transforms and editing of connector pipelines, and lastly experienced-user smart editors during configuration of a pipeline.
The following videos show how to use these new features:
Minimal locking for Percona
A new snapshot.locking.mode
has been added to Debezium for MySQL Percona users, to reduce the amount of locking that occurs during the snapshot. The new mode, minimal_percona_no_table_locks
, provides the same semantics as minimal_percona
, but additionally omits applying table-level locks (DBZ-8717). This provides an alternative for some environments where table-locks are not permitted.
New Oracle source info scn and timestamp fields
Debezium has added several new fields to the source
information block for Oracle change events (DBZ-8740), which include:
commit_ts_ms
-
This specifies the time in milliseconds when the event’s transaction was committed.
start_scn
-
This specifies the SCN for the first event observed in the event’s transaction.
start_ts_ms
-
This specifies the time in milliseconds when the first event in the event’s transaction was changed by the user.
These new fields are optional, so schema registry users should find these changes are backward compatible.
Oracle SCN values are not unique, so it is possible for multiple events to have the same SCN value and timestamps. Care should be taken when using these values for any type of event ordering. |
Changes to Vitess Epoch/Zero date column resolution
When the Vitess value converter emits a date column that is set to a zero date value, depending on the optionality of the column, the field may be emitted as null
or as the unix epoch. This creates an issue for consumer applications, as it’s impossible for them to differentiate when the epoch value is provided whether it’s a true epoch value or the sentinel value because the column is zero date in the source.
Debezium 3.1 introduces a new configuration property for Vitess users, override.datetime.to.nullable
.
The default, false
, continues to emit the date column using the old behavior, where if the column is not null, the unix epoch will be used rather than null
. This means that consumers will continue to be unable to differentiate between the two use cases.
When set to true
, all date and datetime columns are set as optional, meaning they can be serialized with null, regardless of how the column’s optionality may be set in the source database. This means that if a zero date is set in the source system, the connector will always use null
to represent this use case and will no longer use epoch-based values unless the field is populated with an actual non-zero date value.
Changes to Vitess binary-collated tiny, medium, and long text columns
In Debezium 3.1.0.Alpha2, we introduced a change to emit Vitess binary-collated text
, enum
, and set
column types as character-based field types in change events (DBZ-8679). This unfortunately only covered a subset of column types, and in this release we’ve expanded on that to include tinytext
, mediumtext
, and longtext
types (DBZ-8694).
Be aware that if you use schema registry, the change in how |
CloudEvent traceparent
support
Debezium’s CloudEvents support has been updated to include support for the traceparent
attribute, which provides the ability to integrate with OpenTelemetry to pass the trace details as part of the event (DBZ-8669).
By setting the opentelemetry.tracing.attributes.enabled
configuration property to true
along with including the traceparent:header
as part of the metadata.source
, this information will be made available to the CloudEvents converter.
You can customize the way that the conver populates the fields by changing the defaults and specifying the fields' values in the appropriate headers. For example:
{
"value.converter.metadata.source": "value,id:header,type:header,traceparent:header,dataSchemaName:header"
}
You can find other examples in Debezium’s CloudEvents documentation.
Schema access support in WASM transformation
You can now access some schema details inside your TinyGo programs using the WASM transformation (DBZ-8737). Two new methods have been added to support time, GetSchemaName
and GetSchemaType
.
package main
import( "githu.com/debezuim/debezium-smt-go-pdk" )
//export process
func process(proxyPtr uint32) uint32 {
var valueSchemaType = debezium.GetSchemaName(debezium.Get(proxyPtr, "valueSchema"))
var opType = debezium.GetSchemType(debezium.Get(proxyPtr, "valueSchema.op"))
// Filter where schema type or opType match
return debezium.SetBool(valueSchemaType == "dummy.Envelope" || opType == "string")
}
func main() {}
We welcome any and all feedback on how to improve the experience with the WASM transformation. Please reach out to us on our Zulip chat or log Jira enhancements. |
Conditional inclusion of components in connect-base
image
Debezium’s kafka
and connect
images are all derived from a single common image called connect-base
. By default, this base image installs Apicurio, Jolkia, and OpenTelemetry dependencies. This is great for testing purposes, but if you wish to use Debezium’s images as a basis for your own, you may prefer to omit these dependencies if they’re not necessary for your environment.
The connect-base
image can now be conditioned to omit any one of these dependencies (DBZ-8709). The OTL_ENABLED
, APICURIO_ENABLED
, and JOLOKIA_ENABLED
environment variables can be set to no
to omit those dependencies when building your images, creating a smaller image footprint.
The |
PubSub sink supports concurrency and compression
In order to improve throughput and capacity with Google PubSub, we have introduced the ability to specify several new configuration properties for PubSub to support concurrency and compression (DBZ-8715). These new configuration properties can be used in any existing PubSub configuration.
pubsub.concurrency.threads
-
This specifies the number of threads to be used to publish messages to Google PubSub. This can be used to scale up or to limit the number of PubSub threads created by the Google PubSub client library. By default, the PubSink uses the default behavior of the client library.
pubsub.compression.threshold.bytes
-
When set to a value of
0
or greater, the PubSub sink enables the optional use of compression to transmit batches of events to the PubSub endpoint. Whether compression will be used is defined by the provided threshold value. If the batch’s total bytes is less-than the threshold, compression will not be used. If the batch’s total bytes is equal-to or greater than the threshold, compression will be used.
PubSub sink supports locational endpoints
When working with the PubSub sink, the pubsub.address
is often not sufficient for production systems where you may need to interact with location-specific (aka region) endpoints. To address this concern, Debezium 3.1 introduces a new configuration property, pubsub.region
(DBZ-8735).
The new pubsub.region
property allows specifying the Google Cloud region to connect, i.e. us-central1
or asia-northeast1
. When specified, Debezium will use the location-specific endpoint for PubSub in the format <region>-pubsub.googleapis.com:443
. This permits connecting to the location-specific endpoint instead of the global endpoint.
The |
RabbitMQ sink supports key based routing
In Debezium 3.1, we have changed how you can route events using configuration. This new approach uses a strategy-based design, that retains old behaviors and introduces the new key-based routing mechanism (DBZ-8752).
First and foremost, the rabbitmq.routingKeyFromTopicName
is deprecated and will be removed in a future release. This functionality has been folded into the new rabbitmq.routingKey.source
configuration property, and it can be set one to one of the following values:
static
-
When using the static routing source, the RabbitMQ sink will use the
rabbitmq.routingKey
static value you have specified in the sink’s configuration. As this value is set in the configuration and read only during the sink startup, the value is static and does not change over the runtime of the sink. topic
-
When using the topic routing source, the RabbitMQ sink will source the routing key based on the destination topic name. This mode replaces the old
rabbitmq.routingKeyFromTopicName
configuration property behavior, which is now deprecated. key
-
When using the new key routing source, the RabbitMQ sink will source the routing key based on the event’s record key. This provides the flexibility to control the routing mechanism for RabbitMQ to use the raw Debezium change event’s key or by using a custom transformation to change the event’s key in-flight before sending the event to RabbitMQ.
Other changes
The following are some noteworthy changes in 3.1.0.Beta1:
-
SQL Server Connector cannot be upgraded to 2.0 DBZ-5845
-
JDBC sink connector doesn’t delete rows from a postgres db table DBZ-8287
-
MariaDB adapter fails on an ALTER USER statement DBZ-8436
-
Expressions cause SQL parser exception in Percona SEQUENCE_TABLE function DBZ-8559
-
Slow Debezium startup for large number of tables DBZ-8595
-
Debezium doesn’t shut down correctly when encountering message delivery timeout from pub/sub DBZ-8672
-
Broken pipe on streaming connection after blocking snapshot (Postgres) DBZ-8680
-
Support debezium platform in the release pipeline DBZ-8682
-
Create pipeline for package helm charts and publish on quay.io DBZ-8706
-
ts_ms in source may default to 0 instead of Instant.now() DBZ-8708
-
PDB database name default considering as UPPERCASE DBZ-8710
-
Alter table modify column fails when using DEFAULT ON NULL clause DBZ-8720
-
ExtractChangedRecordState SMT Now Working With Default Values DBZ-8721
-
Restart of Oracle RAC node leads to redo thread being inconsistent indefinitely DBZ-8724
-
Specifying archive.log.hours with non-zero value generates bad SQL DBZ-8725
-
debezium/connect docker image is not available on arm64 DBZ-8728
-
Create an orchestrator pipeline to run the release DBZ-8731
-
Debezium Server: Nats consumer crashes with binary serialization DBZ-8734
-
Update the way tests calculates the default zoned times for MariaDB driver 3.5 DBZ-8742
-
Possibly broken schema.history.internal.skip.unparseable.ddl for MariaDB DBZ-8745
-
Oracle snapshot’s source.ts does not account for database zone differences DBZ-8749
-
Bump assertj-core to 3.27.3 DBZ-8751
In total, 38 issues were resolved in Debezium 3.1.0.Beta1. The list of changes can also be found in our release notes.
A big thank you to all the contributors from the community who worked diligently on this release: Alvar Viana, Andrea Peruffo, Chris Cranford, Indra Shukla, Jakub Cechacek, James Johnston, Jiri Pechanec, Katerina Galieva, Krzysztof Grzechnik, Mario Fiore Vitale, Michael Cambria, Minjae Lee, Minjae Lee, Nathan Smit, Rodolphe Quiedeville, Roman Kudryashov, Thomas Thornton, Victor Castaño, Vojtech Juranek, Yannick Eisenschmidt, Zhongqiang Gong, kesompochy, and حمود سمبول!
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.