fix: #3591871 Serialize ensureSegmentCoverage() under a per-chain coverage lock

Fix for #3591871

ChainArchiver::ensureSegmentCoverage() read existing segments unlocked, then INSERTed bare segments for the uncovered gaps. Two concurrent callers with different-but-overlapping input ranges (cron coveragePass racing drush audit_trail:auto-archive) could compute their obstacle lists before either INSERTed and produce overlapping bare segments that the (chain, from_id, to_id) UNIQUE constraint did not catch.

Approach

Add a per-chain coverage lock distinct from the chain-write lock so chain row writes (audit log entries) proceed unblocked during coverage; only coverage callers serialize on this lock. The lock wraps the SELECT-obstacles + INSERT-bare-per-gap critical section.

Lock infrastructure

  • Acquire deadline (30s) and hold TTL (60s) split into two distinct constants so changing one cannot accidentally change the other (separation of concerns suggested during review).
  • Renewal helper renewCoverageLockIfStale() called between INSERTs so a long backlog cannot outlive the TTL and let a concurrent caller in mid-loop. Renew interval at 1/3 of TTL (20s) so two consecutive missed renews still leave the lock alive. Throws if the lock was lost to another holder between renews.

Companion fix in acquireChainWriteLock()

Same MR also fixes a deadline-vs-short-circuit bug found in the existing ChainArchiver::acquireChainWriteLock() while implementing the coverage lock. The retry loop had the shape:

while (!$this->lock->acquire(...)) {
  if (!$this->lock->wait(...) && microtime(TRUE) >= $deadline) {
    throw ...;
  }
}

Because wait() returns TRUE when it times out without the lock becoming free, !wait() is FALSE in that branch and the && short-circuits the deadline check. Under sustained contention by a foreign holder the loop spins forever instead of throwing after the deadline. Fixed by separating the deadline check from the wait back-off.

The same bug shape exists on AuditTrailChainWriter::writeChainedRow() and ChainArchiver::restoreLocked()'s TTL vs critical-section length is also vulnerable on very large restores. Both are tracked as a follow-up at #3591878 (lock-handling audit across the whole module).

Tests

Two new kernel tests pin the contract:

  • testEnsureSegmentCoverageAcquiresAndReleasesCoverageLock — happy path: lock acquired before the SELECT, released after INSERTs.
  • testEnsureSegmentCoverageReleasesCoverageLockOnThrow — error path: try / finally releases the lock even when the body throws.

Tests use a CoverageLockSpyBackend that wraps the real lock service and records every acquire / release call so the test pins the lock NAME and the acquire / release pairing across both paths.

Verification

  • Full kernel suite: 232 passing, 982 assertions.
  • phpcs clean on touched files (pre-existing AuditTrailSettingsForm errors out of scope).
  • FF-merges cleanly against 1.x.
Edited by Frank Mably

Merge request reports

Loading