How to choose the right loop
A loop is only worth running if it adds a grounded feedback signal. Work top to bottom: decide whether you need a loop at all, then let the source of that signal and the shape of the work pick the pattern and its UML view.
The procedure
- Need a loop at all? If one well-prompted pass already clears your quality bar, ship it. Add iteration only when it measurably improves the outcome.
- Where does the feedback come from? Rank the signal from strongest to weakest: environment ground truth (tests, compiler, tool output) → retrieved evidence → an independent critic or judge → an explicit rubric → bare self-critique. The stronger the signal, the more the loop improves correctness rather than just polish.
- Is the output objectively verifiable or a matter of judgment? Verifiable → lean on tools/tests (agentic loops). Subjective → write a rubric or use a critic/persona.
- One actor or several? A single model iterating → activity-style loop. Two or more roles exchanging messages → a sequence/conversation loop.
- Independent or dependent sub-work? Independent subtasks → parallelize (fork/join,
par). Each step depends on the last → keep it sequential. - What is the cost ceiling? Multi-agent loops can use ~15× the tokens of a single chat; reserve them for high-value breadth, and always cap iterations (3–4).
flowchart TD
S(("start")) --> Q0{"does one good pass meet your bar?"}
Q0 -->|"yes"| STOP["Ship it: no loop needed"]
Q0 -->|"no"| Q1{"what grounds the feedback?"}
Q1 -->|"tests / code / tools"| A1["ReAct · Red-Green-Refactor · eval-driven"]
Q1 -->|"an explicit rubric"| A2["Draft to Rubric to Revise (Self-Refine)"]
Q1 -->|"an outside critic or persona"| A3["Generator-Critic · Audience-Critic"]
Q1 -->|"a metric to optimize"| A4["OPRO · DSPy"]
Q1 -->|"several agents or viewpoints"| Q2{"shape of the work?"}
Q2 -->|"independent subtasks, need breadth"| A5["Orchestrator-Worker (parallel)"]
Q2 -->|"contested judgment"| A6["Multi-Persona Debate"]
Q2 -->|"pick among options"| A7["Scoring Matrix + Red-Team + Premortem"]
A1 --> E(("end"))
A2 --> E
A3 --> E
A4 --> E
A5 --> E
A6 --> E
A7 --> E
STOP --> E
UML activity diagram — the selection procedure itself. The first decision is Anthropic's rule: only add a loop when a single pass falls short. Reasoning-only self-critique with no fresh signal is deliberately absent: it is unreliable for correctness.
Match the situation to a loop
| If your situation is… | Use | Level |
|---|---|---|
| A single pass already clears the bar | No loop — ship the output | — |
| Output is code or otherwise tool-checkable | ReAct · Red-Green-Refactor · eval-driven | L3 |
| Quality is subjective but you can write a rubric | Draft → Rubric → Revise (Self-Refine) | L1 L2 |
| You need an outside viewpoint or audience lens | Generator-Critic · Audience-Critic | L1 L2 |
| You are tuning a prompt against a metric | OPRO · DSPy | L3 |
| One answer, many valid reasoning paths | Self-Consistency | L2 |
| Compositional, multi-step problem | Least-to-Most | L2 |
| Open-ended search or planning | Tree of Thoughts | L3 |
| Multi-attempt task with a success signal | Reflexion | L3 |
| Broad research with independent subtasks | Orchestrator-Worker | L3 |
| A contested strategic call | Multi-Persona Debate | L2 L3 |
| Choosing among discrete options | Scoring Matrix + Red-Team + Premortem | L2 |
| Explaining or learning a concept | Explain → Probe → Refine (Feynman) | L1 |
Pick the UML view from the loop's structure
| Loop structure | UML diagram | Loop & stop shown by |
|---|---|---|
| Single actor, iterative refinement | Activity | merge node upstream + decision node with a [guard]; back-edge to the merge |
| Two or more roles exchanging messages | Sequence | loop [guard] / alt / opt fragment; exit on the guard |
| Discrete attempts or modes | State machine | guarded transitions; completion transition to the final state |
| Independent parallel sub-work | Activity (fork/join) or Sequence (par) | fork bar → join bar (synchronize); no back-edge if it is one-shot |
Cross-cutting rules (apply to every loop)
- Ground the critique. If you cannot point to tests, tools, a rubric, retrieved data, or an independent critic, assume the loop is polishing confidence, not correctness.
- Isolate the critic. A fresh-context reviewer or a distinct model catches far more than same-context self-praise.
- Cap iterations (3–4) and stop the moment your evals pass — more rounds rarely help and cost compounds.
- Gate irreversible actions behind a human checkpoint (plan approval, before writes/sends/deploys).
Foundational Techniques
L2 Self-ConsistencyUML ACTIVITY · FORK/JOIN
What/When Sample multiple reasoning paths and take the majority answer. Use for math, logic, and any task with a single correct answer.
Prompt
Solve this problem 5 independent times, each with fresh step-by-step reasoning. After all 5, report the answer that appears most often and your confidence. Problem: [...]
Why it works A complex problem has many correct reasoning paths converging on one answer; majority voting filters out one-off reasoning errors (+17.9% GSM8K in Wang et al. 2022).
flowchart TD
S(("start")) --> P["Prompt with chain-of-thought"]
P --> F[" "]:::bar
F --> R1["Sample path 1 to a1"]
F --> R2["Sample path 2 to a2"]
F --> R3["Sample path N to aN"]
R1 --> J[" "]:::bar
R2 --> J
R3 --> J
J --> V["Aggregate by majority vote"]
V --> E(("end"))
classDef bar fill:#6ea8fe,stroke:#6ea8fe,color:#0f1115;
UML activity diagram — fork/join bars show parallel sampling that synchronizes at the vote. Note there is no loop: this pattern is parallel sampling + aggregation, not iterative refinement.
Tooling Plain chat; or API with temperature ~0.7 and a vote aggregator.
L3 Tree of ThoughtsUML ACTIVITY · SEARCH
What/When Explore multiple branches, self-evaluate each, backtrack. Use for planning, puzzles, creative-search problems.
Prompt
For this problem, generate 3 distinct candidate next steps. Rate each 1-10 on how likely it leads to a correct solution. Expand only the top 2. Repeat until solved; backtrack if a branch scores low. Problem: [...]
Why it works Adds deliberate lookahead/backtracking over coherent "thoughts" (GPT-4 Game-of-24: 4%->74%, Yao et al. 2023).
flowchart TD
S(("start")) --> Root["Problem (root thought)"]
Root --> A["Thought A"]
Root --> B["Thought B"]
Root --> C["Thought C"]
A --> EA{"promising?"}
B --> EB{"promising?"}
C --> EC{"promising?"}
EA -->|"prune"| BK["Backtrack to parent"]
EB -->|"expand"| B1["Thought B.1"]
EC -->|"expand"| C1["Thought C.1"]
BK -.-> Root
B1 --> D{"solution or depth > T?"}
C1 --> D
D -->|"no"| Root
D -->|"yes"| E(("end"))
UML activity diagram with search semantics — decision nodes self-evaluate each branch; the prune edge triggers backtracking, the "no" edge continues the DFS, and the guard "solution or depth > T" is the stopping condition.
Tooling Manual for small trees; LangGraph / custom search for large.
L2 Least-to-Most DecompositionUML ACTIVITY · SEQUENTIAL LOOP
What/When Decompose into ordered subproblems, solve sequentially feeding answers forward. Use for compositional/multi-step problems.
Prompt
Stage 1: Break this problem into the ordered sub-questions needed to solve it. Stage 2: Answer each sub-question in order, using earlier answers as inputs. Finish with the final answer. Problem: [...]
Why it works Separates decomposition from solving, reducing error propagation (99.7% vs 16.2% CoT on SCAN length split, Zhou et al. 2022).
flowchart TD
S(("start")) --> D["Decompose into ordered subproblems 1..n"]
D --> M(["merge"])
M --> Solve["Solve subproblem i using answers 1..i-1"]
Solve --> Q{"more subproblems?"}
Q -->|"yes: i = i + 1"| M
Q -->|"no"| F["Emit final answer"]
F --> E(("end"))
UML activity diagram — a sequential loop (merge upstream, decision downstream). The data dependency is the point: each answer is fed forward into the next subproblem, so this is sequential, not parallel.
L3 ReAct (Reason + Act)UML SEQUENCE · LOOP
What/When Interleave reasoning with tool calls. Use whenever the model needs live information or to execute actions.
Prompt
Work in a loop of Thought / Action / Observation. Thought: reason about what you need next. Action: call a tool (search, code, API). Observation: read the result, then loop. Stop when you can answer with evidence. Task: [...]
Why it works Grounds reasoning in tool observations, cutting hallucination vs CoT alone (Yao et al. 2022).
sequenceDiagram
actor U as User
participant A as Agent
participant T as Tool / Env
U->>A: Task
loop until final answer
A->>A: Thought (reasoning)
A->>T: Action (tool call)
T-->>A: Observation
end
A-->>U: Final answer
UML sequence diagram — Thought is a self-message (internal reasoning); only Action and Observation cross to the environment. The loop fragment plus its exit is the stopping condition.
Tooling Claude Code, Cursor, LangGraph, any tool-use API.
L3 Reflexion (Verbal RL)UML STATE MACHINE
What/When On failure, write a natural-language reflection, store it, and retry. Use for multi-attempt tasks with a success signal.
Prompt
Attempt the task. If the test/check fails, write a short reflection: what went wrong and what to do differently. Store it. Retry using all stored reflections. Repeat up to N times.
Why it works Reflections act as episodic memory that improves the next trial (91% pass@1 HumanEval vs GPT-4's 80%, Shinn et al. 2023).
stateDiagram-v2 [*] --> Acting Acting: Acting / generate trajectory Evaluating: Evaluating / compute reward Reflecting: Reflecting / append reflection to memory Acting --> Evaluating Evaluating --> [*]: success Evaluating --> Reflecting: fail and trials left Evaluating --> [*]: fail and max trials Reflecting --> Acting: retry with memory
UML state machine — discrete trial modes. The verbal reflection is appended to a bounded memory (1–3 experiences); guarded transitions encode the two stop conditions: success, or max trials reached.
Tooling Agent framework with a memory buffer + an evaluator (tests).
1. Presentations / Pitch Decks
L1 Audience-Critic RefinementUML SEQUENCE · LOOP
What/When Draft, critique from the audience's view, revise. Decks, talks, sales narratives.
Audience: [who you are presenting to and what they care about]. Step 1: slide-by-slide outline, one headline assertion per slide. Step 2: adopt that audience as a skeptical reviewer; for each slide give the toughest objection they would raise and whether the headline survives it. Step 3: rewrite to preempt every surviving objection. Show all steps.
Why it works Surfaces blind-spot objections; the audience persona is the grounded critic.
sequenceDiagram
participant A as Author
participant C as Persona Critic
loop until persona satisfied
A->>C: Draft
C-->>A: Critique from audience POV
A->>A: Revise
end
A-->>A: Final version
UML sequence diagram — a specialization of generator–critic where the critic lifeline adopts the target-audience persona. The loop guard is the stopping condition.
2. Prompt Engineering
L3 OPRO / DSPy Optimization LoopUML ACTIVITY · OPTIMIZE LOOP
What/When Automatically search for better prompts against a metric.
Below are instruction variants and their scores (ascending): [list]. Generate 5 NEW instructions likely to score higher than the best. Output only the instructions.
Why it works Externalizes the score; LLM-as-optimizer beats hand-tuned prompts (up to +8% GSM8K, +50% BBH, OPRO/Yang et al. 2024).
flowchart TD
S(("start")) --> I["Seed instructions + scores"]
I --> M(["merge"])
M --> G["Generate new candidate prompts"]
G --> Ev["Evaluate on eval set to scores"]
Ev --> Q{"improved and budget left?"}
Q -->|"yes"| M
Q -->|"no"| B["Keep best prompt"]
B --> E(("end"))
UML activity diagram — a meta-prompt optimization loop. The eval score is the grounded signal; the loop stops on convergence or exhausted budget.
Tooling Chat for manual leaderboard; DSPy (MIPRO/GEPA) + eval set for automated.
3. Financial Models / Spreadsheets
L2 Assumption-Audit + RecomputeUML ACTIVITY · AUDIT LOOP
Build a 3-statement model from these assumptions: [...]. Then act as a CFO red-teamer: list every assumption, flag the 3 highest- sensitivity ones, identify circular/broken logic, recompute flagged cells, and report deltas.
Why it works Recomputation (ideally via a code tool) is a grounded check on numerical errors.
flowchart TD
S(("start")) --> B["Build model from assumptions"]
B --> R(["merge"])
R --> RT["Red-team: list + stress assumptions"]
RT --> RC["Recompute flagged cells"]
RC --> D["Report deltas vs baseline"]
D --> Q{"material change or new assumptions?"}
Q -->|"yes"| R
Q -->|"no"| E(("end"))
UML activity diagram — "report deltas vs baseline" is the signature step; recompute via a code tool supplies the grounded check that closes the audit loop.
Tooling Claude Code / Cursor with Python or spreadsheet tool.
4. Coding / Software Development
L3 Red-Green-Refactor TDD LoopUML STATE MACHINE
1) Write FAILING tests for [feature] from this spec. Do NOT implement. Stop. 2) Write minimum code to pass the tests. Run them. Fix and re-run until green. 3) Refactor for clarity; tests must stay green.
Why it works Tests are objective, verifiable feedback — exactly why coding agents iterate well (Anthropic). Forces test-first against the model's implementation-first bias.
stateDiagram-v2 [*] --> Red Red: Red / write a failing test Green: Green / write minimal code Refactor: Refactor / improve, keep green Red --> Green: test fails as expected Green --> Green: tests fail Green --> Refactor: all tests pass Refactor --> Refactor: tests fail Refactor --> Red: more features Refactor --> [*]: done
UML state machine — the canonical three-state cycle. Self-transitions are the inner fix loops; the pass/fail guards drive the flow and the "done" guard reaches the final state.
Tooling Claude Code (plan mode + reviewer subagent), Cursor Agent + Plan Mode.
L3 Generate-then-Review SubagentUML SEQUENCE · LOOP
Implement [task]. Then: "Use a subagent to review this code for security and correctness gaps that affect the stated requirements (ignore stylistic nitpicks)." Fix the flagged gaps and re-review.
Why it works A fresh-context reviewer catches what the author missed; constrain it to correctness to avoid over-engineering.
sequenceDiagram
participant I as Implementer
participant R as Reviewer Subagent
I->>I: Implement task
loop until no blocking gaps
I->>R: Send code for review
R-->>I: Security / correctness gaps
I->>I: Fix flagged gaps
end
I-->>I: Ship
UML sequence diagram — the reviewer runs on a separate lifeline with fresh context. The loop continues until the reviewer returns no blocking gaps.
5. Research & Synthesis
L3 Orchestrator-Worker + Citation PassUML SEQUENCE · PAR
Decompose this question into 3-5 INDEPENDENT sub-questions. For each specify: objective, sources/tools, output format, and explicit boundaries (what NOT to cover). Synthesize results, then flag every claim lacking a named source.
Why it works Parallel breadth + context isolation (90.2% over single-agent on Anthropic's research eval). Tight delegation contracts prevent drift/duplication.
sequenceDiagram
participant L as Lead Agent
participant S1 as Subagent 1
participant S2 as Subagent 2
participant S3 as Subagent 3
L->>L: Decompose and plan subtasks
par parallel subagents
L->>S1: Subtask 1
S1-->>L: Findings 1
and
L->>S2: Subtask 2
S2-->>L: Findings 2
and
L->>S3: Subtask 3
S3-->>L: Findings 3
end
L->>L: Synthesize + citation pass
UML sequence diagram — the par fragment shows workers running in parallel with isolated context. Subtasks are decided at runtime by the lead (not predefined), which distinguishes this from static parallelization.
Tooling Claude Code subagents, LangGraph. Cost ~15x a chat — use for high-value breadth.
6. Writing & Editing
L1 Draft to Rubric Critique to ReviseUML ACTIVITY · LOOP
Draft [your document]. Score 1-5 on: thesis clarity, evidence, concision, flow, actionability, with ONE concrete fix each. Rewrite incorporating every fix.
Why it works A fixed rubric makes self-critique actionable (Self-Refine: ~20% avg improvement). Revision is reliable for style/format even where reasoning self-correction is not.
flowchart TD
S(("start")) --> D["Draft"]
D --> M(["merge"])
M --> Sc["Score on rubric + one fix each"]
Sc --> Rv["Rewrite with fixes"]
Rv --> Q{"all dimensions pass?"}
Q -->|"no"| M
Q -->|"yes"| E(("end"))
UML activity diagram — the fixed rubric supplies the grounded critique; the guarded loop-back repeats until every dimension passes (the Self-Refine pattern).
7. Business Strategy
L2 Multi-Persona Debate to SynthesisUML SEQUENCE · PAR + LOOP
Propose a strategy for [X]. Simulate a debate: Champion, Skeptic, Competitor, Customer. Run 2 rounds where each responds to the others. Synthesize a revised strategy that survives the debate.
Why it works Multi-agent debate improves factuality/reasoning by forcing cross-examination (Du et al. 2023).
sequenceDiagram
participant M as Moderator
participant A as Agent A
participant B as Agent B
participant C as Agent C
par concurrent proposals
A-->>M: Proposal A
and
B-->>M: Proposal B
and
C-->>M: Proposal C
end
loop 2-3 debate rounds
M->>A: Share all responses
M->>B: Share all responses
M->>C: Share all responses
A-->>M: Revised A
B-->>M: Revised B
C-->>M: Revised C
end
M->>M: Synthesize / majority vote
UML sequence diagram — the par fragment shows agents reasoning concurrently; the loop fragment is the debate rounds where each reads the others and revises before the moderator synthesizes.
8. Data Analysis & Visualization
L3 ReAct with Code ExecutionUML SEQUENCE · LOOP
Goal: [question] on this dataset. Loop Thought -> Action(code) -> Observation:
hypothesize, run code, read result, decide next step. Continue until answered.
Produce a labeled, publication-quality chart ("make it beautiful").
Why it works Execution output is ground truth; explicit "beautiful" prompts optimize for human perception.
sequenceDiagram
participant A as Analyst Agent
participant K as Code / REPL
loop until answered with evidence
A->>A: Thought (hypothesis)
A->>K: Action (run code)
K-->>A: Observation (result / error)
end
A-->>A: Labeled chart + answer
UML sequence diagram — ReAct specialized for data work: the REPL's execution output is ground truth, and the loop continues until the question can be answered with evidence.
Tooling Claude Code / Cursor + notebook/REPL.
9. Creative Work
L1 Diverge to Criteria Filter to RefineUML ACTIVITY · FORK + LOOP
Generate 20 diverse candidates for [your brief]. Score each 1-5 against your explicit criteria [e.g. fit, memorability, feasibility, originality]. Take the top 3 and produce 2 refined variants each with a one-line rationale.
Why it works Separates ideation from judgment so neither contaminates the other.
flowchart TD
S(("start")) --> F[" "]:::bar
F --> G1["Candidate 1"]
F --> G2["Candidate 2"]
F --> G3["Candidate N"]
G1 --> J[" "]:::bar
G2 --> J
G3 --> J
J --> Sc["Score vs criteria"]
Sc --> Top["Filter to top-k"]
Top --> M(["merge"])
M --> Rf["Refine top candidate"]
Rf --> Q{"meets criteria?"}
Q -->|"no"| M
Q -->|"yes"| E(("end"))
classDef bar fill:#6ea8fe,stroke:#6ea8fe,color:#0f1115;
UML activity diagram — fork/join is the parallel divergence; the convergence loop refines the top candidates until they meet the criteria.
10. Decision-Making & Evaluation
L2 Scoring Matrix + Red-Team + PremortemUML ACTIVITY · LOOP
Compare [options] on criteria [weights] in a weighted matrix; recommend. RED-TEAM your recommendation: 3 strongest reasons it's wrong. PREMORTEM: assume it failed in 12 months — top 3 causes. Revise if needed.
Why it works Rubric forces explicit tradeoffs; premortem/red-team surface failure modes proactively.
flowchart TD
S(("start")) --> O["Define options + weighted criteria"]
O --> M(["merge"])
M --> Sc["Score options (matrix)"]
Sc --> RT["Red-team the top option"]
RT --> PM["Premortem: assume failure, list causes"]
PM --> Q{"new risks or criteria surfaced?"}
Q -->|"yes"| M
Q -->|"no"| Dec["Decide"]
Dec --> E(("end"))
UML activity diagram — red-team and premortem are audit actions whose findings loop back to update the scoring matrix until no new risks surface.
11. Learning & Knowledge Work
L1 Explain to Probe Gaps to RefineUML ACTIVITY · LOOP
Explain [concept] for a smart non-expert. List the 5 questions a sharp student would ask that your explanation does NOT answer well. Rewrite to answer all 5.
Why it works The model's own probing questions act as a lightweight critic (Feynman technique).
flowchart TD
S(("start")) --> M(["merge"])
M --> Ex["Explain in simple terms"]
Ex --> Gp["Identify gaps / jargon"]
Gp --> Q{"gaps remain?"}
Q -->|"yes: study source"| M
Q -->|"no"| Fin["Final simple explanation"]
Fin --> E(("end"))
UML activity diagram — the Feynman loop. The "gaps remain?" guard drives refinement until the concept can be explained simply with no gaps.
12. Tooling Loops
L3 Eval-Driven DevelopmentUML ACTIVITY · FLYWHEEL
1) Write 10-50 eval cases (input + expected behavior / rubric) FIRST. 2) Run current prompt/agent against them; score. 3) Change ONE thing (prompt, model, retrieval); re-run; compare scores. 4) Ship only if net score improves; add production failures back to the eval set.
Why it works Evals are the working spec; "if your eval correctly captures what good means, optimizing against it is sufficient" (Braintrust). The non-deterministic cousin of TDD.
flowchart TD
S(("start")) --> W["Write / curate evals"]
W --> M(["merge"])
M --> Run["Run system on eval set"]
Run --> Score["Score: judge / deterministic / human"]
Score --> Q{"meets bar?"}
Q -->|"no: change one variable"| M
Q -->|"yes"| Ship["Ship"]
Ship --> Tr["Collect production traces"]
Tr --> NF{"new failure modes?"}
NF -->|"yes: add cases"| W
NF -->|"no"| E(("end"))
UML activity diagram — an inner refinement loop (change one variable) nested inside the outer flywheel where production traces feed new eval cases back into the dataset.
Tooling DSPy, promptfoo, Braintrust, custom harness.
L3 Claude Code: Research to Plan to Execute to Review to ShipUML ACTIVITY · HUMAN GATE
Use plan mode: have Claude research the codebase and propose a gated plan; approve it; execute with TDD; delegate review to a subagent; commit with clear messages + a progress file for the next context window.
Why it works Human-gated plan + grounded tests + fresh-context review; progress files bridge limited context windows.
flowchart TD
subgraph Human [Human reviewer]
GATE{"plan approved?"}
end
S(("start")) --> Inv["Investigate / gather context"]
Inv --> Plan["Produce plan (read-only)"]
Plan --> GATE
GATE -->|"no: revise"| Plan
GATE -->|"yes"| Exec["Execute changes"]
Exec --> Ver["Verify: tests / build"]
Ver --> Q{"pass?"}
Q -->|"no"| Exec
Q -->|"yes"| Ship["Commit / ship"]
Ship --> E(("end"))
UML activity diagram — the human-approval decision node sits in its own partition (swimlane) and gates execution; the inner verify→fix loop runs autonomously.
L3 Cursor: Plan Mode + Agent Loop + RulesUML SEQUENCE · GATE + LOOP
Shift+Tab for Plan Mode -> review the generated plan -> run Agent/Composer: edit -> run checks -> read errors -> iterate until tests pass. Encode reusable standards/workflows in .cursor/rules/*.mdc (keep always-on rules short to limit the token tax).
Why it works Orchestrator rebuilds context from tool/test output each step; rules give persistent prompt-level context across sessions.
sequenceDiagram
actor H as Human
participant A as Cursor Agent
participant T as Tools / Repo
H->>A: Request feature
A->>T: Research codebase
T-->>A: Context
A-->>H: Proposed plan (read-only)
H->>A: Approve plan
loop until tests pass
A->>T: Edit + run checks
T-->>A: Errors / results
A->>A: React and iterate
end
A-->>H: Done
UML sequence diagram — the read-only plan is reviewed at a human approval gate, then an autonomous edit→check→react loop continues until the checks are green.