I am thrilled to share that Debezium 2.0.0.Beta1 has been released!
This release contains several new features including a pluggable topic selector, the inclusion of database user who committed changes for Oracle change events, and improved handling of table unique indices as primary keys. In addition, there are several breaking changes such as the move to multi-partition mode as default and the introduction of the debezium-storage
module and its implementations. So lets take a look at all these in closer detail.
Multi-partition mode now default
Many database platforms support multi-tenancy out of the box, meaning you can have one installation of the database engine and have many unique databases. In cases like SQL Server, this traditionally required a separate connector deployment for each unique database. Over the last year, a large effort has been made to break down that barrier and to introduce a common way that any single connector deployment could connect and stream changes from multiple databases.
The first notable change is with the SQL Server connector’s configuration option, database.dbname
. This option has been replaced with a new option called database.names
. As multi-partition mode is now default, this new database.names
option can be specified using a comma-separated list of database names, as shown below:
database.names=TEST1,TEST2
In this example, the connector is being configured to capture changes from two unique databases on the same host installation. The connector will start two unique tasks in Kafka Connect and each task will be responsible for streaming changes from its respective database concurrently.
The second notable change is with connector metrics naming. A connector exposes JMX metrics via beans that are identified with a unique name. With multi-partition mode the default with multiple tasks, each task requires its own metrics bean and so a change in the naming strategy was necessary.
In older versions of Debezium using SQL Server as an example, metrics were available using the following naming strategy:
debezium.sql_server:type=connector-metrics,server=<sqlserver.server.name>,context=<context>
In this release, the naming strategy now includes a new task
component in the JMX MBean name:
debezium.sql_server:type=connector-metrics,server=<sqlserver.server.name>,task=<task.id>,context=<context>
Please review your metrics configurations as the naming changes could have an impact when collecting Debezium metrics.
Debezium storage module
In this release, we have introduced a new debezium-storage
set of artifacts for file- and kafka- based database history and offset storage. This change is the first of several future implementations set to support platforms such as Amazon S3, Redis, and possibly JDBC.
For users who install connectors via plugin artifacts, this should be a seamless change as all dependencies are bundled in those plugin downloadable archives. For users who may embed Debezium in their applications or who may be building their own connector, be aware you may need to add a new storage dependency depending on which storage implementations used.
Pluggable topic selector
Debezium’s default topic naming strategy emits change events to topics named database.schema.table
. If you require that topics be named differently, an SMT would normally be added to the connector configuration to adjust this behavior. But, this presents a challenge in situations where one of the components of this topic name, perhaps the database or table name, contains a dot (.
) and perhaps an SMT doesn’t have adequate context.
In this release, a new TopicNamingStrategy
was introduced to allow fully customizing this behavior directly inside Debezium. The default naming strategy implementation should suffice in most cases, but if you find that it doesn’t you can provide a custom implementation of the TopicNamingStrategy
contract to fully control various namings used by the connector. To provide your own custom strategy, you would specify the topic.naming.strategy
connector option with the fully-qualified class name of the strategy, as shown below:
topic.naming.strategy=org.myorganization.MyCustomTopicNamingStrategy
This custom strategy is not just limited to controlling the names of topics for table mappings, but also for schema changes, transaction metadata, and heartbeats. You can refer to the DefaultTopicNamingStrategy
found here as an example. This feature is still incubating and we’ll continue to improve and develop it as feedback is received.
Oracle commit user in change events
The source information block of change events carry a variety of context about where the change event originated. In this release, the Oracle connector now includes the user who made the database change in the captured change event. A new field, user_name
, can now be found in the source info block with this new information. This field is optional, and is only available when changes are emitted using the LogMiner-based implementation. This field may also contain the value of UNKNOWN
if the user associated with a change is dropped prior to the change being captured by the connector.
Improved table unique index handling
A table does not have to have a primary key to be captured by a Debezium connector. In cases where a primary key is not defined, Debezium will inspect a table’s unique indices to see whether a reasonable key substitution can be made. In some situations, the index may refer to columns such as CTID
for PostgreSQL or ROWID
in Oracle. These columns are not visible nor user-defined, but instead are hidden synthetic columns generated automatically by the database. In addition, the index may also use database functions to transform the column value that is stored, such as UPPER
or LOWER
for example.
In this release, indices that rely on hidden, auto-generated columns, or columns wrapped in database functions are no longer eligible as primary key alternatives. This guarantees that when relying on an index as a primary key rather than a defined primary key itself, the generated message’s primary key value tuple directly maps to the same values used by the database to represent uniqueness.
Other fixes & improvements
There are several bugfixes and stability changes in this release, some noteworthy are:
-
MongoConnector’s field exclusion configuration does not work with fields with the same name but from different collections DBZ-4846
-
Remove redundant setting of last events DBZ-5047
-
Rename
docker-images
repository and JIRA component tocontainer-images
DBZ-5048 -
Read Debezium Metrics From Debezium Server Consumer DBZ-5235
-
User input are not consistent on Filter step for the DBZ connectors DBZ-5246
-
KafkaDatabaseHistory without check database history topic create result caused UnknowTopicOrPartitionException DBZ-5249
-
Treat SQLServerException with "Broken pipe (Write failed)" exception message as a retriable exception DBZ-5292
-
Lob type data is inconsistent between source and sink, after modifying the primary key DBZ-5295
-
Caused by: java.io.EOFException: Failed to read next byte from position 2005308603 DBZ-5333
-
Incremental Snapshot: Oracle table name parsing does not support periods in DB name DBZ-5336
-
Support PostgreSQL default value function calls with schema prefixes DBZ-5340
-
Unsigned tinyint conversion fails for MySQL 8.x DBZ-5343
-
Log a warning when an unsupported LogMiner operation is detected for a captured table DBZ-5351
-
NullPointerException thrown when unique index based on both system and non-system generated columns DBZ-5356
-
MySQL Connector column hash v2 does not work DBZ-5366
-
Outbox JSON expansion fails when nested arrays contain no elements DBZ-5367
-
docker-maven-plugin needs to be upgraded for Mac Apple M1 DBZ-5369
-
AWS DocumentDB (with MongoDB Compatibility) Connect Fail DBZ-5371
-
Oracle Xstream does not propagate commit timestamp to transaction metadata DBZ-5373
-
UI View connector config in non-first cluster return 404 DBZ-5378
-
CommitScn not logged in expected format DBZ-5381
-
org.postgresql.util.PSQLException: Bad value for type timestamp/date/time: CURRENT_TIMESTAMP DBZ-5384
-
Missing "previousId" property with parsing the rename statement in kafka history topic DBZ-5386
-
Check constraint introduces a column based on constraint in the schema change event. DBZ-5390
-
Support storing extended attributes in relational model and JSON schema history topic DBZ-5396
-
The column is referenced as PRIMARY KEY, but a matching column is not defined in table DBZ-5398
-
Clarify which database name to use for signal.data.collection when using Oracle with pluggable database support DBZ-5399
-
Timestamp with time zone column’s default values not in GMT DBZ-5403
-
Upgrade to Kafka 3.1 broke build compatibility with Kafka 2.x and Kafka 3.0 DBZ-5404
-
Remove the duplicated SimpleDdlParserListener from mysql connector DBZ-5425
Altogether, a total of 59 issues were fixed for this release.
A big thank you to all the contributors from the community who worked on this release: Andrew Walker, Anisha Mohanty, Bob Roldan, Chris Cranford, Giljae Joo, Harvey Yue, Henry Cai, Hossein Torabi, Ismail Simsek, Jan Doms, Jiri Pechanec, Martin Medek, Nathan Smit, Paweł Malon, Pengwei Dou, Sergei Morozov, Vojtech Juranek, and Zhongqiang Gong!
What’s Next?
In these last few months, the team has made some incredible progress on Debezium 2.0, and we can begin to see the finish line in the distance. A large of this is in part to the grew work the community has done to contribute changes, provide feedback, and to test and help make new features stable. But we’re not done, so you can continue to expect another 2.0.0.Beta2 release in approximately 3 weeks, sticking with our usual cadence.
In addition, we do continue to backport changes to the 1.9 branch and will likely look at a 1.9.6.Final release sometime in August to round out that release stream just before we wrap up Debezium 2.0.0.Final.
So stay cool and safe and happy capturing!
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.