fix: #3591896 Narrow archiveSegment() chain-write lock scope
Fix for #3591896
Narrows the chain-write lock scope on ChainArchiver::archiveSegment() so the long Phase A (NDJSON write loop + hash_file() + HMAC computations) runs unlocked, and only the bounded Phase B (segment row UPDATE + segment_archived chain event INSERT + rename(temp, final)) holds the lock.
Replaces the TTL renewal pattern shipped in #3591878 for archive. Restore is NOT narrowed in this MR — see the deferred section below and follow-up #3591904.
Approach
archiveSegment (narrowed)
Before: lock acquired around the full archive operation including writeNdjson + hash_file (data-dependent runtime). Multi-million-row segments needed TTL renewal to avoid lock expiry mid-loop.
After:
Phase A (no lock):
writeNdjson() // long row loop, disjoint id window
appendArchiveRecordFooter()
hash_file('sha256')
archiveContentHmac()
lifecycleHmac(initial)
Phase B (chain-write lock held + DB transaction):
UPDATE audit_trail_segment
WHERE archived_at = 0 // optimistic concurrency predicate
chainedWrite(SegmentLifecycleAction::ARCHIVED, acquire_lock: FALSE)
UPDATE audit_trail_segment // finalize with archived_event_id
rename(temp, final)
COMMITSame-segment concurrent archive: caught by the WHERE predicate
If cron and drush both pass the early loadSegmentRow() check on the same segment (no lock around it) and both reach Phase B, the second caller's UPDATE affects 0 rows. The catch throws cleanly, rolls back, unlinks the temp file. First wins, second errors with Segment #N: concurrent archive landed.
Operational benefit
While a long archive is running, logger writes and other chain-write ops on the chain proceed unblocked. No more tens-of-seconds outages on user-facing audit log writes during multi-million-row archives.
What restoreLocked() looks like (unchanged in this MR)
Attempted to narrow restore with the same pattern. Audit surfaced a fundamental constraint: the DB transaction has to wrap the row-replay loop + chain-event INSERT + segment UPDATE for atomicity. Three placements for the lock relative to the transaction were considered:
- Lock inside transaction: Drupal's
DatabaseLockBackendshares the DB connection → semaphore INSERT joins the transaction → InnoDB X-lock on semaphore row held for the full transaction. Concurrent logger writes block at the DB level (below Drupal's retry loop) and surface rawlock_wait_timeouterrors after 50s. PHP-side release runs early but has no effect on DB-level mutual exclusion. - Lock before transaction (current behavior): lock effectively held for the full transaction anyway, via either the renewer or chainedWrite's same-holder re-acquire which also joins the transaction. Same operational behavior as MR !8 (merged)'s wide-lock pattern.
- Two transactions (T1 row replay, T2 chain event + segment UPDATE): breaks atomicity. Partial-state failure (T1 committed, T2 didn't) looks indistinguishable from a tamper at the verifier level.
No placement preserves both atomicity and DB-level narrowing without a new per-segment lifecycle marker. Tracked at #3591904 with the proposed restore_in_progress schema change + verifier tolerance + retry-detection design.
restoreLocked() in this MR keeps the wide-lock + TTL-renewal pattern from #3591878 unchanged.
Companion correctness fixes (surfaced during the audit)
Two chainedWrite calls inside DB transactions were missing acquire_lock: FALSE:
ChainArchiver::purge()line ~1035 (segment_live_purged event)ChainArchiver::restoreLocked()line ~2571 (segment_restored event)
The outer chain-write lock is already held by both methods. Default acquire_lock: TRUE caused a same-holder MERGE on the semaphore row inside the transaction, holding the InnoDB X-lock on that row for the transaction duration. For purge() the impact was negligible (transaction is bounded); for restoreLocked() it contributed to the same lock-during-transaction issue tracked at #3591904. Both fixed in this MR.
Reverted from #3591878 in this MR
writeNdjson()signature dropped$lock_nameand$renewed_atparameters; no renewer call in its loop anymore.- archiveSegment no longer passes a
$renewed_atinto writeNdjson.
Kept from #3591878 (needed by restoreLocked until #3591904 lands)
renewChainWriteLockIfStale()helperCHAIN_WRITE_LOCK_TTL_S/CHAIN_WRITE_LOCK_RENEW_INTERVAL_Sconstants- Renewer call inside restoreLocked's per-row loop
All three are scheduled for deletion in #3591904.
Verification
- Full kernel suite: 232 passing, 985 assertions.
- phpcs clean on touched files (the 2 pre-existing
AuditTrailSettingsFormnon-capturing-catch errors are out of scope). - FF-merges cleanly against
1.x.