9 min read

Replication Internals: Decoding the MySQL Binary Log Part 9: XID_EVENT — Transaction Commit

Replication Internals: Decoding the MySQL Binary Log Part 9: XID_EVENT — Transaction Commit

In this ninth post of our series, we decode the XID_EVENT — the smallest event in the binary log, and the one that marks every transactional commit.


Introduction

Every DML transaction we've decoded so far ends the same way: with an XID_EVENT (event type 16, 0x10). At only 31 bytes on the wire, it carries a single piece of information — an 8-byte transaction identifier — but it does some of the heaviest lifting in MySQL replication. The XID_EVENT is what allows replication to be crash-safe with the default binlog_format = ROW and transaction-isolation = REPEATABLE-READ.

When the source commits a transaction that touches an XA-capable storage engine (in practice: InnoDB), it writes an XID_EVENT as the final event of the transaction in the binary log. The numeric XID inside is the same identifier used by the InnoDB two-phase commit protocol, so the binary log and the engine's redo log share a common reference point. After a crash, the server (or a replica) can match XIDs across the two logs to decide which transactions to commit and which to roll back.

For non-transactional or empty transactions, MySQL writes a QUERY_EVENT with the literal text COMMIT instead — we'll see exactly when this happens later in the post.


Event Locations

Three of the four transactions in our binary log finish with an XID_EVENT — one per DML statement:

INSERT transaction:
  Position   688: WRITE_ROWS_EVENT
  Position   737: XID_EVENT (31 bytes) ← Commit

UPDATE transaction:
  Position  1007: UPDATE_ROWS_EVENT
  Position  1079: XID_EVENT (31 bytes) ← Commit

DELETE transaction:
  Position  1340: DELETE_ROWS_EVENT
  Position  1397: XID_EVENT (31 bytes) ← Commit

The fourth transaction — the CREATE TABLE at position 276 — does not end with an XID_EVENT. DDL is implicitly committed inside the QUERY_EVENT itself, using the Q_DDL_LOGGED_WITH_XID status variable we decoded in Part 6. That XID was 54. As we'll see in a moment, our DML XIDs pick up right where that one left off.

Let's decode the first one at position 737.


Reading the Raw Bytes

$ xxd -s 737 -l 31 binlog.000024
000002e1: 3210 3568 1001 0000 001f 0000 0000 0300  2.5h............
000002f1: 0000 0037 0000 0000 0000 00d6 65e1 cf    ...7........e..

The event is 31 bytes: 19-byte common header + 0-byte post-header + 8-byte body + 4-byte checksum. This is the smallest event we'll meet in the entire series.


Common Header (19 bytes)

32103568 10 01000000 1f000000 00030000 0000
│        │  │        │        │        │
│        │  │        │        │        └─→ Flags: 0x0000
│        │  │        │        └───────────→ Next Position: 768
│        │  │        └────────────────────→ Event Size: 31 bytes
│        │  └─────────────────────────────→ Server ID: 1
│        └────────────────────────────────→ Event Type: 16 (XID_EVENT)
└─────────────────────────────────────────→ Timestamp: 1748308018
FieldBytesLittle-EndianValue
Timestamp321035680x683510321748308018 (2025-05-27 01:06:58)
Event Type100x1016 (XID_EVENT)
Server ID010000000x000000011
Event Size1f0000000x0000001f31 bytes
Next Position000300000x00000300768
Flags00000x0000No flags

Cross-check: 737 + 31 = 768, which matches the next event's position. ✓


XID_EVENT Structure

The XID_EVENT is the simplest event we've decoded. Its layout is documented in the Xid_event class:

FieldSizeDescription
Post-Header0 bytesEmpty
Body
xid8 bytesTransaction identifier (little-endian uint64)

The post-header is empty — and we already knew that. The FORMAT_DESCRIPTION_EVENT we decoded in Part 3 reported that index 15 of the post-header length array (XID_EVENT, type code 16) is 0x00.


Field-by-Field Decoding

XID: 3700000000000000

37 00 00 00 00 00 00 00 → little-endian uint64 → 0x0000000000000037 = 55

The transaction identifier is 55.

So what does this number actually represent? It's a MySQL-internal transaction ID — the same handle InnoDB uses to coordinate the two-phase commit between the redo log and the binary log. When the server enters its commit path, MYSQL_BIN_LOG::commit() in sql/binlog.cc reads it from the session's transaction context:

TC_LOG::enum_result MYSQL_BIN_LOG::commit(THD *thd, bool all) {
  ...
  Transaction_ctx *trn_ctx = thd->get_transaction();
  my_xid xid = trn_ctx->xid_state()->get_xid()->get_my_xid();
  ...
}

my_xid is just a typedef for ulonglong (sql/xa.h:66) — a plain 64-bit integer. That number is what gets written to the binary log.

Where does the value come from? Internally, MySQL builds an XA-style 128-byte XID structure (xid_t) for every transaction, with the format "MySQLXid" + server_id + my_xid (xid_t::set()). InnoDB sees that full structure during prepare/commit, but only the trailing 8-byte my_xid portion is what lands in the XID_EVENT. The same value also appears in InnoDB's redo log for the prepared transaction, which is the link that makes 2PC work.

Checksum: d665e1cf

d6 65 e1 cf → CRC32 of the entire event

Like every other event since MySQL 5.6.2, the trailing 4 bytes are the CRC32 over the event header and body.


Visual Breakdown

Position 737: XID_EVENT (31 bytes)

┌─────────────────────────────────────────────────────────────────────────┐
│                         COMMON HEADER (19 bytes)                        │
├─────────────────────────────────────────────────────────────────────────┤
│ 32103568     │ 10   │ 01000000 │ 1f000000 │ 00030000 │ 0000            │
│ Timestamp    │ Type │ ServerID │ Size     │ NextPos  │ Flags           │
│ 1748308018   │ 16   │ 1        │ 31       │ 768      │ 0x0000          │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│                            POST-HEADER                                  │
├─────────────────────────────────────────────────────────────────────────┤
│ (empty — XID_EVENT has no post-header)                                  │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│                             BODY (8 bytes)                              │
├──────────────────────────────┬──────────────────────────────────────────┤
│ 3700000000000000             │ xid: 55 (little-endian uint64)           │
└──────────────────────────────┴──────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│                           CHECKSUM (4 bytes)                            │
├──────────────────────────────┬──────────────────────────────────────────┤
│ d665e1cf                     │ CRC32                                    │
└──────────────────────────────┴──────────────────────────────────────────┘

mysqlbinlog output: COMMIT /* xid=55 */

Comparing All Three XID Events

Our binary log has three XID_EVENTs, and they are nearly identical — only the timestamp, the next-position pointer, the XID itself, and the checksum change:

PositionTimestampNext PosXIDTransaction
737174830801876855INSERT INTO person VALUES (1, 'Marcelo')
10791748308018111056UPDATE person SET name = 'Marcelo Altmann'
13971748308018142857DELETE FROM person WHERE ID = 1

The XIDs form a strictly increasing sequence. And remember the CREATE TABLE QUERY_EVENT from Part 6 — its Q_DDL_LOGGED_WITH_XID status variable carried xid=54. Putting that together with what we just decoded, every transaction in our workload got a unique XID:

54 → CREATE TABLE  (recorded in Q_DDL_LOGGED_WITH_XID, no XID_EVENT)
55 → INSERT        (XID_EVENT at position 737)
56 → UPDATE        (XID_EVENT at position 1079)
57 → DELETE        (XID_EVENT at position 1397)

All four XIDs are sequential, regardless of whether the commit was logged as a DDL inside a QUERY_EVENT or as a separate XID_EVENT. The same internal counter is feeding both paths.


When You Get an XID_EVENT (and When You Don't)

Not every committed transaction produces an XID_EVENT. The decision happens inside MYSQL_BIN_LOG::commit() — there's a chain of if/else branches that pick which "commit marker" to write at the end of the transaction cache:

else if (real_trans && xid && trn_ctx->rw_ha_count(trx_scope) > 1 &&
         !trn_ctx->no_2pc(trx_scope)) {
  Xid_log_event end_evt(thd, xid);
  if (cache_mngr->trx_cache.finalize(thd, &end_evt)) return RESULT_ABORTED;
}
...
else {
  Query_log_event end_evt(thd, STRING_WITH_LEN("COMMIT"), true, false, true,
                          0, true);
  if (cache_mngr->trx_cache.finalize(thd, &end_evt)) return RESULT_ABORTED;
}

In other words, the source emits an XID_EVENT only when all four of these are true:

  1. real_trans — this is a real transaction commit, not a sub-statement boundary.
  2. xid is non-zero — at least one storage engine assigned an XID to the transaction. InnoDB always does; pure-MyISAM transactions don't.
  3. rw_ha_count(trx_scope) > 1 — the transaction touched more than one read-write handlerton. The binary log itself counts as one, so "more than one" really means "at least one transactional engine on top of the binary log."
  4. !no_2pc(trx_scope) — none of the participating engines opted out of two-phase commit.

If those conditions aren't met, the transaction is closed with a Query_log_event carrying the literal text COMMIT instead.


Why XID Is Enough for Crash Safety

The XID_EVENT looks simple — eight little-endian bytes — but it's the heart of crash-safe replication. To see why, follow the order in which things actually become durable.

A committing transaction passes through three ordered phases inside the server:

  1. Engine prepare (deferred fsync). MYSQL_BIN_LOG::prepare() sets thd->durability_property = HA_IGNORE_DURABILITY and then calls ha_prepare_low(). InnoDB writes a "prepare" record carrying our XID into its in-memory redo log buffer, but does not fsync it yet — the durability flag tells it to wait. The same comment in the source spells out why: the server wants to fsync the prepare records of an entire group together, just before writing them to the binary log.
  2. Engine commit. The COMMIT stage runs ha_commit_low(), which tells InnoDB to write a "commit" record into the redo log. The transaction's row changes are now visible to other sessions.

Group flush. Inside MYSQL_BIN_LOG::ordered_commit(), the flush stage calls ha_flush_logs(true) before writing anything to the binary log. The inline comment is unambiguous:

We flush prepared records of transactions to the log of storage engine (for example, InnoDB redo log) in a group right before flushing them to binary log.

After that call returns, every prepare record for this commit group is durably on disk in the engine's redo log. Only then are the per-session binlog caches (including our XID_EVENT) appended to the binary log file, and the SYNC stage fsyncs the binary log.

That ordering — engine prepare durable → binlog durable → engine commit — is what turns the XID_EVENT into a fence: by the time the XID_EVENT is on disk in the binary log, the matching prepare record is guaranteed to already be on disk in the engine.

On restart after a crash, Binlog_recovery::recover() opens the last binary log file, walks every event, and collects the XID of every successfully-terminated transaction:

The recovered set is then handed to ha_recover(). From the Binlog_recovery class doc:

After a crash, storage engines may contain transactions that are prepared but not committed (in theory any engine, in practice InnoDB). This class's methods use the binary log as the source of truth to determine which of these transactions should be committed and which should be rolled back. […] The list of XIDs of all internally coordinated transactions that are completely written to the binary log is passed to the storage engines through the ha_recover function in the handler interface. This tells the storage engines to commit all prepared transactions that are in the set, and to roll back all prepared transactions that are not in the set.

That single rule closes every crash window:

Crash pointPrepare durable in redo?XID_EVENT durable in binlog?Recovery decision
Before the group flushNoNoNo prepared txn for this XID — nothing to do
After group flush, before binlog fsyncYesNoEngine rolls back the prepared txn
After binlog fsync, before engine commitYesYesEngine commits the prepared txn
After engine commit(already committed)YesNothing to do

Without the XID_EVENT, recovery would have no way to map a prepared transaction in InnoDB's redo log back to a transaction in the binary log. The eight bytes are doing real work.


Try It Yourself

Decoding XID_EVENT is short enough to fit in a screenful:

import struct
from datetime import datetime, timezone

XID_EVENT = 16

with open('binlog.000024', 'rb') as f:
    for pos in (737, 1079, 1397):
        f.seek(pos)
        header = f.read(19)
        timestamp, event_type, server_id, event_size, next_pos, flags = \
            struct.unpack('<IBIIIH', header)

        assert event_type == XID_EVENT, f"Not an XID_EVENT at {pos}"

        # Body: 8-byte little-endian XID, then 4-byte CRC32
        xid = struct.unpack('<Q', f.read(8))[0]
        crc = f.read(4)

        ts = datetime.fromtimestamp(timestamp, tz=timezone.utc)
        print(f"Position {pos}: XID_EVENT ({event_size} bytes)")
        print(f"  Timestamp:   {timestamp} ({ts})")
        print(f"  Server ID:   {server_id}")
        print(f"  Next pos:    {next_pos}")
        print(f"  XID:         {xid}")
        print(f"  CRC32:       {crc.hex()}")
        print(f"  → COMMIT /* xid={xid} */")
        print()

Output:

Position 737: XID_EVENT (31 bytes)
  Timestamp:   1748308018 (2025-05-27 01:06:58+00:00)
  Server ID:   1
  Next pos:    768
  XID:         55
  CRC32:       d665e1cf
  → COMMIT /* xid=55 */

Position 1079: XID_EVENT (31 bytes)
  Timestamp:   1748308018 (2025-05-27 01:06:58+00:00)
  Server ID:   1
  Next pos:    1110
  XID:         56
  CRC32:       09f17bb4
  → COMMIT /* xid=56 */

Position 1397: XID_EVENT (31 bytes)
  Timestamp:   1748308018 (2025-05-27 01:06:58+00:00)
  Server ID:   1
  Next pos:    1428
  XID:         57
  CRC32:       ace3f486
  → COMMIT /* xid=57 */

We can cross-check against mysqlbinlog:

$ mysqlbinlog --no-defaults binlog.000024 | grep -E '^(COMMIT|# at )'
...
# at 737
COMMIT/*!*/;
# at 1079
COMMIT/*!*/;
# at 1397
COMMIT/*!*/;
Note: The binary log files used in this series (binlog.000024binlog_gtid_tag.000001, and others) are available at github.com/altmannmarcelo/presentations/tree/main/binlog.

References


What's Next?

We've now decoded every event a typical DML or DDL transaction can produce — from the GTID that names it, through the QUERY_EVENT or row events that carry its work, to the XID_EVENT that commits it. Only one event is left in our binary log: the ROTATE_EVENT at position 1428, which closes the file and points the reader at the next one. That's the subject of Part 10.


Next up: Part 10: ROTATE_EVENT — Closing the File and Pointing at the Next One


This series is based on a presentation given at the MySQL Online Summit. The goal is to help MySQL users understand what goes under the hood of replication by manually decoding binary log files.