I am thrilled to share that Debezium 2.0.0.Alpha3 has been released!
While this release contains a plethora of bugfixes, there are a few noteworthy improvements, which include providing a timestamp in transaction metadata events, the addition of several new fields in Oracle’s change event source block, and a non-backward compatible change to the Oracle connector’s offsets.
Lets take a look at these in closer detail.
Transaction metadata changes
A transaction metadata event describes the beginning and the end (commit) of a database transaction. These events are useful for a variety of reasons, including auditing. By default, transaction metadata events are not generated by a connector and to enable this feature, the provide.transaction.metadata
option must be enabled.
In this release, both BEGIN
and END
events include a new field, ts_ms
, which is the database timestamp of when the transaction either began or committed depending on the event type. An example of such an event now looks like:
{
"status": "END",
"id": "12345",
"event_count": 2,
"ts_ms": "1657033173441",
"data_collections": [
{
"data_collection": "s1.a",
"event_count": 1
},
{
"data_collection": "s2.a",
"event_count": 1
}
]
}
If you are already using the transaction metadata feature, new events will contain this field after upgrading.
If you are not using the transaction metadata feature but find this useful, simply add the provide.transaction.metadata
option set to true to your connector configuration. By default, metadata events are emitted to a topic named after your database.server.name
option. This can be overridden by specifying the transaction.topic
option, as shown below:
database.server.name=server1
provide.transaction.metadata=true
transaction.topic=my-transaction-events
In this example, all transaction metadata events will be emitted to my-transaction-events
. Please see your connector specific configuration for more details.
Oracle source info changes
The source
information block is a section in the change event’s payload that describes the database attributes of what generated the change event. For example, this section includes the system change number, the database timestamp of the change, and the transaction the change was part of.
In this release, we identified a regression where the scn
field did not correctly reflect the right source
of where the change event occurred. While it isn’t abnormal for Oracle to generate multiple changes with the same system change number, we did find a regression that caused the wrong system change number to get assigned to each individual event within a scoped transaction, which made it difficult for some to use this information for auditing purposes. The source.scn
field should now correctly reflect the system change number from Oracle LogMiner or Oracle Xstream.
Additionally, several new fields were added to the source
information block to improve integration with the LogMiner implementation and Oracle RAC. An example of the new source information block:
{
"source": {
"version": "2.0.0.Alpha3",
"name": "server1",
"ts_ms": 1520085154000,
"txId": "6.28.807",
"scn": "2122184",
"commit_scn": "2122185",
"rs_id": "001234.00012345.0124",
"ssn": 0,
"redo_thread": 1
}
}
The newly added fields are:
rs_id
-
Specifies the rollback segment identifier associated with the change.
ssn
-
Specifies the SQL sequence number, this combined with the
rs_id
represent a unique tuple for a change. redo_thread
-
Specifies the actual database redo thread that managed the change’s lifecycle.
Whether using Oracle Standalone or RAC, these values will always be provided when using Oracle LogMiner. These values have more importance on an Oracle RAC installation because you have multiple database servers manipulating the shared database concurrently. These fields specifically annotate which node and at what position on that node that the change originated.
Oracle connector offset changes
In an Oracle Real Application Clusters (RAC) environment, multiple nodes access and manipulate the Oracle database concurrently. Each node maintains its own redo log buffers and executes its own redo writer thread. This means that at any given moment, each node has its own unique "position" and these will differ entirely on the activity that takes place on each respective node.
In this release, a small change was necessary in DBZ-5245 to support Oracle RAC. Previously, the connector offsets maintained a field called scn
which represented this "position" of where the connector should stream changes from. But since each node could be at different positions in the redo, a single scn
value was inadequate for Oracle RAC.
The old Oracle connector offsets looked like this:
{
"scn": "1234567890",
"commit_scn": "2345678901",
"lcr_position": null,
"txId": null
}
Starting in 2.0.0.Alpha3, the new offset structure now has this form:
{
"scn": "1234567890:00124.234567890.1234:0:1,1234567891:42100.0987656432.4321:0:2",
"commit_scn": "2345678901",
"lcr_position": null,
"txId": null
}
You will notice that the scn
field now consists of a comma-separated list of values, where each entry represents a tuple of values. This new tuple has the format of scn:rollback-segment-id:ssn:redo-thread
.
While this change is forward compatible, meaning you can safely upgrade to 2.0.0.Alpha3 and the old format can be read, once the new format is written to the offsets, the older versions of the connector will be unable to read the offsets. If you upgrade and decide you need to roll back, be aware you’ll need to manually adjust the connector offset’s scn
field to simply contain a string of the most recent scn
value across all redo threads.
Other fixes & improvements
There are several bugfixes and stability changes in this release, some noteworthy are:
-
Incorrect loading of LSN from offsets DBZ-3942
-
Database history recovery will retain old tables after they’ve been renamed DBZ-4451
-
Adding new table with incremental snapshots not working DBZ-4834
-
BigDecimal has mismatching scale value for given Decimal schema DBZ-4890
-
Debezium has never found starting LSN DBZ-5031
-
Data duplication problem using postgresql source on debezium server DBZ-5070
-
Cursor fetch is used for all results during connection DBZ-5084
-
Debezuim connector fails at parsing select statement overrides when table name has space DBZ-5198
-
DDL statement couldn’t be parsed 2 - Oracle connector 1.9.3.Final DBZ-5230
-
Debezium server duplicates scripting jar files DBZ-5232
-
Cannot convert field type tinyint(1) unsigned to boolean DBZ-5236
-
Oracle unparsable ddl create table DBZ-5237
-
Postgres Incremental Snapshot on parent partitioned table not working DBZ-5240
-
Character set influencers are not properly parsed on default values DBZ-5241
-
NPE when using Debezium Embedded in Quarkus DBZ-5251
-
Oracle LogMiner may fail with an in-progress transaction in an archive log that has been deleted DBZ-5256
-
Order of source block table names in a rename schema change event is not deterministic DBZ-5257
-
Debezium fails to connect to replicaset if a node is down DBZ-5260
-
No changes to commit_scn when oracle-connector got new lob data DBZ-5266
-
Invalid date 'SEPTEMBER 31' DBZ-5267
-
database.history.store.only.captured.tables.ddl not suppressing logs DBZ-5270
-
io.debezium.text.ParsingException: DDL statement couldn’t be parsed DBZ-5271
-
Deadlock during snapshot with Mongo connector DBZ-5272
-
Mysql parser is not able to handle variables in KILL command DBZ-5273
-
Debezium server fail when connect to Azure Event Hubs DBZ-5279
-
ORA-01086 savepoint never established raised when database history topic cannot be created or does not exist DBZ-5281
-
Enabling database.history.store.only.captured.tables.ddl does not restrict history topic records DBZ-5285
Altogether, a total of 66 issues were fixed for this release.
A big thank you to all the contributors from the community who worked on this release: Anisha Mohanty, Bob Roldan, Chai Stofkoper, Chris Cranford, Mikhail Dubrovin, Gunnar Morling, Harvey Yue, Jakub Cechacek, Jiri Novotny, Jiri Pechanec, Jun Zhao, Kanha Gupta, Mark Bereznitsky, Mickael Maison, Mike Kamornikov, Naveen Kumar KR, Oskar Polak, Rahul Khanna, Robert Roldan, Tim Patterson, Vojtech Juranek, and yangrong688!
What’s Next?
You can expect a 1.9.5.Final release in the next week. This release will include many of the bugfixes that are part of this release, as we continue to improve the stability of 1.9 in micro-releases.
You can also expect 2.0.0.Beta1 in the next 3 weeks, keeping with our usual release cadence. The next major milestones includes unifying snapshot modes across connectors, a new Snapshotter
API for all connectors, compactable JSON database history, offset unification, offset storage API and much more.
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.