AI Loops Reference

How to choose the right loop

A loop is only worth running if it adds a grounded feedback signal. Work top to bottom: decide whether you need a loop at all, then let the source of that signal and the shape of the work pick the pattern and its UML view.

The procedure

Need a loop at all? If one well-prompted pass already clears your quality bar, ship it. Add iteration only when it measurably improves the outcome.
Where does the feedback come from? Rank the signal from strongest to weakest: environment ground truth (tests, compiler, tool output) → retrieved evidence → an independent critic or judge → an explicit rubric → bare self-critique. The stronger the signal, the more the loop improves correctness rather than just polish.
Is the output objectively verifiable or a matter of judgment? Verifiable → lean on tools/tests (agentic loops). Subjective → write a rubric or use a critic/persona.
One actor or several? A single model iterating → activity-style loop. Two or more roles exchanging messages → a sequence/conversation loop.
Independent or dependent sub-work? Independent subtasks → parallelize (fork/join, par). Each step depends on the last → keep it sequential.
What is the cost ceiling? Multi-agent loops can use ~15× the tokens of a single chat; reserve them for high-value breadth, and always cap iterations (3–4).

flowchart TD
  S(("start")) --> Q0{"does one good pass meet your bar?"}
  Q0 -->|"yes"| STOP["Ship it: no loop needed"]
  Q0 -->|"no"| Q1{"what grounds the feedback?"}
  Q1 -->|"tests / code / tools"| A1["ReAct · Red-Green-Refactor · eval-driven"]
  Q1 -->|"an explicit rubric"| A2["Draft to Rubric to Revise (Self-Refine)"]
  Q1 -->|"an outside critic or persona"| A3["Generator-Critic · Audience-Critic"]
  Q1 -->|"a metric to optimize"| A4["OPRO · DSPy"]
  Q1 -->|"several agents or viewpoints"| Q2{"shape of the work?"}
  Q2 -->|"independent subtasks, need breadth"| A5["Orchestrator-Worker (parallel)"]
  Q2 -->|"contested judgment"| A6["Multi-Persona Debate"]
  Q2 -->|"pick among options"| A7["Scoring Matrix + Red-Team + Premortem"]
  A1 --> E(("end"))
  A2 --> E
  A3 --> E
  A4 --> E
  A5 --> E
  A6 --> E
  A7 --> E
  STOP --> E

UML activity diagram — the selection procedure itself. The first decision is Anthropic's rule: only add a loop when a single pass falls short. Reasoning-only self-critique with no fresh signal is deliberately absent: it is unreliable for correctness.

Match the situation to a loop

If your situation is…	Use	Level
A single pass already clears the bar	No loop — ship the output	—
Output is code or otherwise tool-checkable	ReAct · Red-Green-Refactor · eval-driven	L3
Quality is subjective but you can write a rubric	Draft → Rubric → Revise (Self-Refine)	L1 L2
You need an outside viewpoint or audience lens	Generator-Critic · Audience-Critic	L1 L2
You are tuning a prompt against a metric	OPRO · DSPy	L3
One answer, many valid reasoning paths	Self-Consistency	L2
Compositional, multi-step problem	Least-to-Most	L2
Open-ended search or planning	Tree of Thoughts	L3
Multi-attempt task with a success signal	Reflexion	L3
Broad research with independent subtasks	Orchestrator-Worker	L3
A contested strategic call	Multi-Persona Debate	L2 L3
Choosing among discrete options	Scoring Matrix + Red-Team + Premortem	L2
Explaining or learning a concept	Explain → Probe → Refine (Feynman)	L1

Pick the UML view from the loop's structure

Loop structure	UML diagram	Loop & stop shown by
Single actor, iterative refinement	Activity	merge node upstream + decision node with a `[guard]`; back-edge to the merge
Two or more roles exchanging messages	Sequence	`loop [guard]` / `alt` / `opt` fragment; exit on the guard
Discrete attempts or modes	State machine	guarded transitions; completion transition to the final state
Independent parallel sub-work	Activity (fork/join) or Sequence (`par`)	fork bar → join bar (synchronize); no back-edge if it is one-shot

Cross-cutting rules (apply to every loop)

Ground the critique. If you cannot point to tests, tools, a rubric, retrieved data, or an independent critic, assume the loop is polishing confidence, not correctness.
Isolate the critic. A fresh-context reviewer or a distinct model catches far more than same-context self-praise.
Cap iterations (3–4) and stop the moment your evals pass — more rounds rarely help and cost compounds.
Gate irreversible actions behind a human checkpoint (plan approval, before writes/sends/deploys).

Foundational Techniques

L2 Self-ConsistencyUML ACTIVITY · FORK/JOIN

What/When Sample multiple reasoning paths and take the majority answer. Use for math, logic, and any task with a single correct answer.

Prompt

Solve this problem 5 independent times, each with fresh step-by-step reasoning.
After all 5, report the answer that appears most often and your confidence.
Problem: [...]

Why it works A complex problem has many correct reasoning paths converging on one answer; majority voting filters out one-off reasoning errors (+17.9% GSM8K in Wang et al. 2022).

flowchart TD
  S(("start")) --> P["Prompt with chain-of-thought"]
  P --> F[" "]:::bar
  F --> R1["Sample path 1 to a1"]
  F --> R2["Sample path 2 to a2"]
  F --> R3["Sample path N to aN"]
  R1 --> J[" "]:::bar
  R2 --> J
  R3 --> J
  J --> V["Aggregate by majority vote"]
  V --> E(("end"))
  classDef bar fill:#6ea8fe,stroke:#6ea8fe,color:#0f1115;

UML activity diagram — fork/join bars show parallel sampling that synchronizes at the vote. Note there is no loop: this pattern is parallel sampling + aggregation, not iterative refinement.

Tooling Plain chat; or API with temperature ~0.7 and a vote aggregator.

L3 Tree of ThoughtsUML ACTIVITY · SEARCH

What/When Explore multiple branches, self-evaluate each, backtrack. Use for planning, puzzles, creative-search problems.

Prompt

For this problem, generate 3 distinct candidate next steps.
Rate each 1-10 on how likely it leads to a correct solution.
Expand only the top 2. Repeat until solved; backtrack if a branch scores low.
Problem: [...]

Why it works Adds deliberate lookahead/backtracking over coherent "thoughts" (GPT-4 Game-of-24: 4%->74%, Yao et al. 2023).

flowchart TD
  S(("start")) --> Root["Problem (root thought)"]
  Root --> A["Thought A"]
  Root --> B["Thought B"]
  Root --> C["Thought C"]
  A --> EA{"promising?"}
  B --> EB{"promising?"}
  C --> EC{"promising?"}
  EA -->|"prune"| BK["Backtrack to parent"]
  EB -->|"expand"| B1["Thought B.1"]
  EC -->|"expand"| C1["Thought C.1"]
  BK -.-> Root
  B1 --> D{"solution or depth > T?"}
  C1 --> D
  D -->|"no"| Root
  D -->|"yes"| E(("end"))

UML activity diagram with search semantics — decision nodes self-evaluate each branch; the prune edge triggers backtracking, the "no" edge continues the DFS, and the guard "solution or depth > T" is the stopping condition.

Tooling Manual for small trees; LangGraph / custom search for large.

L2 Least-to-Most DecompositionUML ACTIVITY · SEQUENTIAL LOOP

What/When Decompose into ordered subproblems, solve sequentially feeding answers forward. Use for compositional/multi-step problems.

Prompt

Stage 1: Break this problem into the ordered sub-questions needed to solve it.
Stage 2: Answer each sub-question in order, using earlier answers as inputs.
Finish with the final answer.
Problem: [...]

Why it works Separates decomposition from solving, reducing error propagation (99.7% vs 16.2% CoT on SCAN length split, Zhou et al. 2022).

flowchart TD
  S(("start")) --> D["Decompose into ordered subproblems 1..n"]
  D --> M(["merge"])
  M --> Solve["Solve subproblem i using answers 1..i-1"]
  Solve --> Q{"more subproblems?"}
  Q -->|"yes: i = i + 1"| M
  Q -->|"no"| F["Emit final answer"]
  F --> E(("end"))

UML activity diagram — a sequential loop (merge upstream, decision downstream). The data dependency is the point: each answer is fed forward into the next subproblem, so this is sequential, not parallel.

L3 ReAct (Reason + Act)UML SEQUENCE · LOOP

What/When Interleave reasoning with tool calls. Use whenever the model needs live information or to execute actions.

Prompt

Work in a loop of Thought / Action / Observation.
Thought: reason about what you need next.
Action: call a tool (search, code, API).
Observation: read the result, then loop.
Stop when you can answer with evidence. Task: [...]

Why it works Grounds reasoning in tool observations, cutting hallucination vs CoT alone (Yao et al. 2022).

sequenceDiagram
  actor U as User
  participant A as Agent
  participant T as Tool / Env
  U->>A: Task
  loop until final answer
    A->>A: Thought (reasoning)
    A->>T: Action (tool call)
    T-->>A: Observation
  end
  A-->>U: Final answer

UML sequence diagram — Thought is a self-message (internal reasoning); only Action and Observation cross to the environment. The loop fragment plus its exit is the stopping condition.

Tooling Claude Code, Cursor, LangGraph, any tool-use API.

L3 Reflexion (Verbal RL)UML STATE MACHINE

What/When On failure, write a natural-language reflection, store it, and retry. Use for multi-attempt tasks with a success signal.

Prompt

Attempt the task. If the test/check fails, write a short reflection:
what went wrong and what to do differently. Store it.
Retry using all stored reflections. Repeat up to N times.

Why it works Reflections act as episodic memory that improves the next trial (91% pass@1 HumanEval vs GPT-4's 80%, Shinn et al. 2023).

stateDiagram-v2
  [*] --> Acting
  Acting: Acting / generate trajectory
  Evaluating: Evaluating / compute reward
  Reflecting: Reflecting / append reflection to memory
  Acting --> Evaluating
  Evaluating --> [*]: success
  Evaluating --> Reflecting: fail and trials left
  Evaluating --> [*]: fail and max trials
  Reflecting --> Acting: retry with memory

UML state machine — discrete trial modes. The verbal reflection is appended to a bounded memory (1–3 experiences); guarded transitions encode the two stop conditions: success, or max trials reached.

Tooling Agent framework with a memory buffer + an evaluator (tests).

1. Presentations / Pitch Decks

L1 Audience-Critic RefinementUML SEQUENCE · LOOP

What/When Draft, critique from the audience's view, revise. Decks, talks, sales narratives.

Audience: [who you are presenting to and what they care about].
Step 1: slide-by-slide outline, one headline assertion per slide.
Step 2: adopt that audience as a skeptical reviewer; for each slide give the
toughest objection they would raise and whether the headline survives it.
Step 3: rewrite to preempt every surviving objection. Show all steps.

Why it works Surfaces blind-spot objections; the audience persona is the grounded critic.

sequenceDiagram
  participant A as Author
  participant C as Persona Critic
  loop until persona satisfied
    A->>C: Draft
    C-->>A: Critique from audience POV
    A->>A: Revise
  end
  A-->>A: Final version

UML sequence diagram — a specialization of generator–critic where the critic lifeline adopts the target-audience persona. The loop guard is the stopping condition.

2. Prompt Engineering

L3 OPRO / DSPy Optimization LoopUML ACTIVITY · OPTIMIZE LOOP

What/When Automatically search for better prompts against a metric.

Below are instruction variants and their scores (ascending): [list].
Generate 5 NEW instructions likely to score higher than the best.
Output only the instructions.

Why it works Externalizes the score; LLM-as-optimizer beats hand-tuned prompts (up to +8% GSM8K, +50% BBH, OPRO/Yang et al. 2024).

flowchart TD
  S(("start")) --> I["Seed instructions + scores"]
  I --> M(["merge"])
  M --> G["Generate new candidate prompts"]
  G --> Ev["Evaluate on eval set to scores"]
  Ev --> Q{"improved and budget left?"}
  Q -->|"yes"| M
  Q -->|"no"| B["Keep best prompt"]
  B --> E(("end"))

UML activity diagram — a meta-prompt optimization loop. The eval score is the grounded signal; the loop stops on convergence or exhausted budget.

Tooling Chat for manual leaderboard; DSPy (MIPRO/GEPA) + eval set for automated.

3. Financial Models / Spreadsheets

L2 Assumption-Audit + RecomputeUML ACTIVITY · AUDIT LOOP

Build a 3-statement model from these assumptions: [...].
Then act as a CFO red-teamer: list every assumption, flag the 3 highest-
sensitivity ones, identify circular/broken logic, recompute flagged cells,
and report deltas.

Why it works Recomputation (ideally via a code tool) is a grounded check on numerical errors.

flowchart TD
  S(("start")) --> B["Build model from assumptions"]
  B --> R(["merge"])
  R --> RT["Red-team: list + stress assumptions"]
  RT --> RC["Recompute flagged cells"]
  RC --> D["Report deltas vs baseline"]
  D --> Q{"material change or new assumptions?"}
  Q -->|"yes"| R
  Q -->|"no"| E(("end"))

UML activity diagram — "report deltas vs baseline" is the signature step; recompute via a code tool supplies the grounded check that closes the audit loop.

Tooling Claude Code / Cursor with Python or spreadsheet tool.

4. Coding / Software Development

L3 Red-Green-Refactor TDD LoopUML STATE MACHINE

1) Write FAILING tests for [feature] from this spec. Do NOT implement. Stop.
2) Write minimum code to pass the tests. Run them. Fix and re-run until green.
3) Refactor for clarity; tests must stay green.

Why it works Tests are objective, verifiable feedback — exactly why coding agents iterate well (Anthropic). Forces test-first against the model's implementation-first bias.

stateDiagram-v2
  [*] --> Red
  Red: Red / write a failing test
  Green: Green / write minimal code
  Refactor: Refactor / improve, keep green
  Red --> Green: test fails as expected
  Green --> Green: tests fail
  Green --> Refactor: all tests pass
  Refactor --> Refactor: tests fail
  Refactor --> Red: more features
  Refactor --> [*]: done

UML state machine — the canonical three-state cycle. Self-transitions are the inner fix loops; the pass/fail guards drive the flow and the "done" guard reaches the final state.

Tooling Claude Code (plan mode + reviewer subagent), Cursor Agent + Plan Mode.

L3 Generate-then-Review SubagentUML SEQUENCE · LOOP

Implement [task]. Then: "Use a subagent to review this code for security
and correctness gaps that affect the stated requirements (ignore stylistic
nitpicks)." Fix the flagged gaps and re-review.

Why it works A fresh-context reviewer catches what the author missed; constrain it to correctness to avoid over-engineering.

sequenceDiagram
  participant I as Implementer
  participant R as Reviewer Subagent
  I->>I: Implement task
  loop until no blocking gaps
    I->>R: Send code for review
    R-->>I: Security / correctness gaps
    I->>I: Fix flagged gaps
  end
  I-->>I: Ship

UML sequence diagram — the reviewer runs on a separate lifeline with fresh context. The loop continues until the reviewer returns no blocking gaps.

5. Research & Synthesis

L3 Orchestrator-Worker + Citation PassUML SEQUENCE · PAR

Decompose this question into 3-5 INDEPENDENT sub-questions. For each specify:
objective, sources/tools, output format, and explicit boundaries (what NOT to
cover). Synthesize results, then flag every claim lacking a named source.

Why it works Parallel breadth + context isolation (90.2% over single-agent on Anthropic's research eval). Tight delegation contracts prevent drift/duplication.

sequenceDiagram
  participant L as Lead Agent
  participant S1 as Subagent 1
  participant S2 as Subagent 2
  participant S3 as Subagent 3
  L->>L: Decompose and plan subtasks
  par parallel subagents
    L->>S1: Subtask 1
    S1-->>L: Findings 1
  and
    L->>S2: Subtask 2
    S2-->>L: Findings 2
  and
    L->>S3: Subtask 3
    S3-->>L: Findings 3
  end
  L->>L: Synthesize + citation pass

UML sequence diagram — the par fragment shows workers running in parallel with isolated context. Subtasks are decided at runtime by the lead (not predefined), which distinguishes this from static parallelization.

Tooling Claude Code subagents, LangGraph. Cost ~15x a chat — use for high-value breadth.

6. Writing & Editing

L1 Draft to Rubric Critique to ReviseUML ACTIVITY · LOOP

Draft [your document]. Score 1-5 on: thesis clarity, evidence, concision, flow,
actionability, with ONE concrete fix each. Rewrite incorporating every fix.

Why it works A fixed rubric makes self-critique actionable (Self-Refine: ~20% avg improvement). Revision is reliable for style/format even where reasoning self-correction is not.

flowchart TD
  S(("start")) --> D["Draft"]
  D --> M(["merge"])
  M --> Sc["Score on rubric + one fix each"]
  Sc --> Rv["Rewrite with fixes"]
  Rv --> Q{"all dimensions pass?"}
  Q -->|"no"| M
  Q -->|"yes"| E(("end"))

UML activity diagram — the fixed rubric supplies the grounded critique; the guarded loop-back repeats until every dimension passes (the Self-Refine pattern).

7. Business Strategy

L2 Multi-Persona Debate to SynthesisUML SEQUENCE · PAR + LOOP

Propose a strategy for [X]. Simulate a debate: Champion, Skeptic, Competitor,
Customer. Run 2 rounds where each responds to the others. Synthesize a revised
strategy that survives the debate.

Why it works Multi-agent debate improves factuality/reasoning by forcing cross-examination (Du et al. 2023).

sequenceDiagram
  participant M as Moderator
  participant A as Agent A
  participant B as Agent B
  participant C as Agent C
  par concurrent proposals
    A-->>M: Proposal A
  and
    B-->>M: Proposal B
  and
    C-->>M: Proposal C
  end
  loop 2-3 debate rounds
    M->>A: Share all responses
    M->>B: Share all responses
    M->>C: Share all responses
    A-->>M: Revised A
    B-->>M: Revised B
    C-->>M: Revised C
  end
  M->>M: Synthesize / majority vote

UML sequence diagram — the par fragment shows agents reasoning concurrently; the loop fragment is the debate rounds where each reads the others and revises before the moderator synthesizes.

8. Data Analysis & Visualization

L3 ReAct with Code ExecutionUML SEQUENCE · LOOP

Goal: [question] on this dataset. Loop Thought -> Action(code) -> Observation:
hypothesize, run code, read result, decide next step. Continue until answered.
Produce a labeled, publication-quality chart ("make it beautiful").

Why it works Execution output is ground truth; explicit "beautiful" prompts optimize for human perception.

sequenceDiagram
  participant A as Analyst Agent
  participant K as Code / REPL
  loop until answered with evidence
    A->>A: Thought (hypothesis)
    A->>K: Action (run code)
    K-->>A: Observation (result / error)
  end
  A-->>A: Labeled chart + answer

UML sequence diagram — ReAct specialized for data work: the REPL's execution output is ground truth, and the loop continues until the question can be answered with evidence.

Tooling Claude Code / Cursor + notebook/REPL.

9. Creative Work

L1 Diverge to Criteria Filter to RefineUML ACTIVITY · FORK + LOOP

Generate 20 diverse candidates for [your brief]. Score each 1-5 against
your explicit criteria [e.g. fit, memorability, feasibility, originality].
Take the top 3 and produce 2 refined variants each with a one-line rationale.

Why it works Separates ideation from judgment so neither contaminates the other.

flowchart TD
  S(("start")) --> F[" "]:::bar
  F --> G1["Candidate 1"]
  F --> G2["Candidate 2"]
  F --> G3["Candidate N"]
  G1 --> J[" "]:::bar
  G2 --> J
  G3 --> J
  J --> Sc["Score vs criteria"]
  Sc --> Top["Filter to top-k"]
  Top --> M(["merge"])
  M --> Rf["Refine top candidate"]
  Rf --> Q{"meets criteria?"}
  Q -->|"no"| M
  Q -->|"yes"| E(("end"))
  classDef bar fill:#6ea8fe,stroke:#6ea8fe,color:#0f1115;

UML activity diagram — fork/join is the parallel divergence; the convergence loop refines the top candidates until they meet the criteria.

10. Decision-Making & Evaluation

L2 Scoring Matrix + Red-Team + PremortemUML ACTIVITY · LOOP

Compare [options] on criteria [weights] in a weighted matrix; recommend.
RED-TEAM your recommendation: 3 strongest reasons it's wrong.
PREMORTEM: assume it failed in 12 months — top 3 causes. Revise if needed.

Why it works Rubric forces explicit tradeoffs; premortem/red-team surface failure modes proactively.

flowchart TD
  S(("start")) --> O["Define options + weighted criteria"]
  O --> M(["merge"])
  M --> Sc["Score options (matrix)"]
  Sc --> RT["Red-team the top option"]
  RT --> PM["Premortem: assume failure, list causes"]
  PM --> Q{"new risks or criteria surfaced?"}
  Q -->|"yes"| M
  Q -->|"no"| Dec["Decide"]
  Dec --> E(("end"))

UML activity diagram — red-team and premortem are audit actions whose findings loop back to update the scoring matrix until no new risks surface.

11. Learning & Knowledge Work

L1 Explain to Probe Gaps to RefineUML ACTIVITY · LOOP

Explain [concept] for a smart non-expert. List the 5 questions a sharp student
would ask that your explanation does NOT answer well. Rewrite to answer all 5.

Why it works The model's own probing questions act as a lightweight critic (Feynman technique).

flowchart TD
  S(("start")) --> M(["merge"])
  M --> Ex["Explain in simple terms"]
  Ex --> Gp["Identify gaps / jargon"]
  Gp --> Q{"gaps remain?"}
  Q -->|"yes: study source"| M
  Q -->|"no"| Fin["Final simple explanation"]
  Fin --> E(("end"))

UML activity diagram — the Feynman loop. The "gaps remain?" guard drives refinement until the concept can be explained simply with no gaps.

12. Tooling Loops

L3 Eval-Driven DevelopmentUML ACTIVITY · FLYWHEEL

1) Write 10-50 eval cases (input + expected behavior / rubric) FIRST.
2) Run current prompt/agent against them; score.
3) Change ONE thing (prompt, model, retrieval); re-run; compare scores.
4) Ship only if net score improves; add production failures back to the eval set.

Why it works Evals are the working spec; "if your eval correctly captures what good means, optimizing against it is sufficient" (Braintrust). The non-deterministic cousin of TDD.

flowchart TD
  S(("start")) --> W["Write / curate evals"]
  W --> M(["merge"])
  M --> Run["Run system on eval set"]
  Run --> Score["Score: judge / deterministic / human"]
  Score --> Q{"meets bar?"}
  Q -->|"no: change one variable"| M
  Q -->|"yes"| Ship["Ship"]
  Ship --> Tr["Collect production traces"]
  Tr --> NF{"new failure modes?"}
  NF -->|"yes: add cases"| W
  NF -->|"no"| E(("end"))

UML activity diagram — an inner refinement loop (change one variable) nested inside the outer flywheel where production traces feed new eval cases back into the dataset.

Tooling DSPy, promptfoo, Braintrust, custom harness.

L3 Claude Code: Research to Plan to Execute to Review to ShipUML ACTIVITY · HUMAN GATE

Use plan mode: have Claude research the codebase and propose a gated plan;
approve it; execute with TDD; delegate review to a subagent; commit with
clear messages + a progress file for the next context window.

Why it works Human-gated plan + grounded tests + fresh-context review; progress files bridge limited context windows.

flowchart TD
  subgraph Human [Human reviewer]
    GATE{"plan approved?"}
  end
  S(("start")) --> Inv["Investigate / gather context"]
  Inv --> Plan["Produce plan (read-only)"]
  Plan --> GATE
  GATE -->|"no: revise"| Plan
  GATE -->|"yes"| Exec["Execute changes"]
  Exec --> Ver["Verify: tests / build"]
  Ver --> Q{"pass?"}
  Q -->|"no"| Exec
  Q -->|"yes"| Ship["Commit / ship"]
  Ship --> E(("end"))

UML activity diagram — the human-approval decision node sits in its own partition (swimlane) and gates execution; the inner verify→fix loop runs autonomously.

L3 Cursor: Plan Mode + Agent Loop + RulesUML SEQUENCE · GATE + LOOP

Shift+Tab for Plan Mode -> review the generated plan -> run Agent/Composer:
edit -> run checks -> read errors -> iterate until tests pass.
Encode reusable standards/workflows in .cursor/rules/*.mdc (keep always-on
rules short to limit the token tax).

Why it works Orchestrator rebuilds context from tool/test output each step; rules give persistent prompt-level context across sessions.

sequenceDiagram
  actor H as Human
  participant A as Cursor Agent
  participant T as Tools / Repo
  H->>A: Request feature
  A->>T: Research codebase
  T-->>A: Context
  A-->>H: Proposed plan (read-only)
  H->>A: Approve plan
  loop until tests pass
    A->>T: Edit + run checks
    T-->>A: Errors / results
    A->>A: React and iterate
  end
  A-->>H: Done

UML sequence diagram — the read-only plan is reviewed at a human approval gate, then an autonomous edit→check→react loop continues until the checks are green.