2
0
mirror of https://github.com/esiur/esiur-dotnet.git synced 2026-06-13 22:48:42 +00:00
Files
esiur-dotnet/Tests/Distribution/Deadlock
2026-06-12 02:43:53 +03:00
..
2026-06-03 22:11:34 +03:00
2026-06-12 02:43:53 +03:00
2026-06-03 13:02:56 +03:00
2026-06-03 22:11:34 +03:00
2026-06-03 22:11:34 +03:00
2026-06-03 22:11:34 +03:00

Distributed deadlock test (two nodes / WAN)

Two console apps that evaluate the recursive-attachment deadlock-prevention algorithm over a real TCP connection between two machines:

  • Server (Server/) hosts a configurable graph of Node resources whose references may form cycles, and prints the cycle census of the deployed graph (so the experiment can state that circular dependencies were actually generated).
  • Client (Client/) connects, fetches the graph concurrently, and classifies each run as Completed / Deadlocked / Slow using a stall detector — a deadlock is detected as the absence of attachment progress while requests are still pending, which distinguishes it from slow WAN processing rather than relying on a blunt timeout.

Authentication is disabled (AllowUnauthorizedAccess, anonymous None mode), so no credentials are needed.

Build

dotnet build Tests/Distribution/Deadlock/Server/Esiur.Tests.Deadlock.Server.csproj -c Release
dotnet build Tests/Distribution/Deadlock/Client/Esiur.Tests.Deadlock.Client.csproj -c Release

Run

On node A (server):

dotnet run --project Tests/Distribution/Deadlock/Server -c Release -- \
    --port 10950 --topology ring --nodes 8

It prints, e.g.: topology=ring nodes=8 edges=8 cyclic=True backEdges=1 and the node count to pass to the client. Leave it running (Ctrl+C to stop).

Topologies (--topology):

name cyclic description
ring yes i → (i+1) mod n; every node fetched as an independent request (cross-chain cycles)
cycle yes single-root cycle 0→1→…→n-1→0 (fetch only --roots 0)
complete yes every ordered pair i → j
staggered no two roots share a deep dependency reached at different depths (stresses non-cyclic contention; --nodes is derived)
random usually ErdősRényi directed graph (--nodes, --seed, --edge-prob)
chain no acyclic control 0→1→…→n-1
diamond no acyclic control

On node B (client):

dotnet run --project Tests/Distribution/Deadlock/Client -c Release -- \
    --host <NODE_A_IP> --port 10950 --nodes 8 \
    --mode WaitWithCycleDetection --iterations 20 --stall-ms 5000 --hard-ms 60000

Modes (--mode):

  • WaitWithCycleDetection (default, the production algorithm) — completes; breaks only genuine cycles.
  • NaiveWait (control) — no cycle handling; deadlocks on any cyclic graph (detected via the stall window).
  • LegacyCrossChainPlaceholder — for reference only.

Other client options: --roots all|0,1,2 (which nodes to fetch; default all n0..n{N-1}), --stall-ms (no-progress window ⇒ deadlock; set comfortably above your WAN round-trip × graph depth), --hard-ms (progress-but-unfinished ⇒ slow).

Output

The client prints per-iteration rows and a summary, and writes deadlock_<mode>_<host>_<port>.csv:

iteration,outcome,ms,cycle_breaks,unnecessary_placeholders,unpublished
  • outcomeCompleted / Deadlocked / SlowTimeout.
  • ms — fetch time (deadlocked rows equal the stall window).
  • cycle_breaks — placeholders returned to break a cycle on this connection.
  • unnecessary_placeholders — placeholders returned where no genuine cycle existed (always 0 for the production resolver; non-zero only for the legacy reference mode).
  • unpublished — resources delivered to the application whose dependency graph was not fully attached at delivery (-1 for a deadlocked/failed run).

Suggested WAN runs for the paper

  1. Detection works and cycles exist. Server --topology ring --nodes 8; client --mode WaitWithCycleDetection (expect all Completed, cycle_breaks > 0) and then --mode NaiveWait (expect Deadlocked — validates the detector on the same cyclic graph).
  2. Random pool census. Server --topology random --nodes 12 --seed 20260603; the server prints whether the deployed graph is cyclic; run the client in WaitWithCycleDetection.
  3. Threshold justification. Compare the client's reported completion ms (median/p99) against --stall-ms; the stall window should be orders of magnitude larger.