More cost-effective workflow policy prompts for Agent3

Thanks Mikko, this is super helpful.
I’d been testing a simpler prompt, but yours covers more bases. In parallel, @Gipity-Steve evolved my original prompt (journey linked below). I’ve now combined both and added the consolidated prompt as an .md file in my repo.

My current workflow

  1. Draft in ChatGPT 5 (Thinking): I develop the prompt I’ll feed to Replit Agent.

  2. Send to Replit (Plan mode): Paste the prompt prefaced with:
    “Adhere to agent3.md. Review these instructions, assess feasibility, ask clarifying questions if needed, and tell me when you’re ready to proceed.”

  3. Reflect back in ChatGPT: Whatever Replit replies (questions or plan), I paste back into ChatGPT 5 (Thinking). It proposes precise next steps, flags risks, and suggests tweaks.

  4. Execute in Replit (Build mode): I paste the refined instructions back to Replit, again prefacing with: “Adhere to agent3.md and execute these instructions.”

  5. Verify: If the result is good enough, I move on. If not, I copy Replit’s workflow/output back into ChatGPT, explain what worked and what didn’t (adding UI screenshots if useful), and repeat.

Why this helps

  • Lower cost: Significantly reduced vs. letting Agent 3 roam.

  • Higher first-try success: Most executions now succeed on the first attempt.

  • Fewer bug loops: In my experience, even fewer than with Agent 2.

I can’t yet say if costs are as low as Agent 2’s, but it’s much lower than the horrors of this weekend. Overall, this workflow is far more effective and avoids many of the loops I used to hit.

Thanks again, Mikko — your prompt was the missing piece that stopped me from jumping ship and finding an immediate alternative to Replit.

Files & references

User Instructions:
If you always say “adhere to agent3.md” and keep the file present, then most behaviors are already locked in.

The only extra instructions that really matter are the command triggers defined in the doc. They do things the MD file alone cannot, because they explicitly unlock or permit exceptions:

  • WRITE UNLOCK → lets the agent move from PLAN/INVESTIGATE to making edits. Without this, it must stay read-only.

  • /delegate → temporarily allows sub-agents/architect. Agent3.md forbids them by default.

  • /test → authorizes a bounded validation b atch (otherwise testing is forbidden).

Everything else (budget caps, batching, no scope creep, verification rules, etc.) is already enforced by the MD file once you say adhere to agent3.md.

So in practice, the only additional commands you need to use are the three “keys” above.

agent3.md
Development Workflow Policies & Guidelines
Version: 2.1
HARD CAP: 5 total tool calls per change request (the agent MUST state a planned count and MUST NOT exceed it).Target: 3–5 total tool calls for most modification requests
Definitions (authoritative):Tool call = one invocation of any tool (read, edit, multi_edit, grep, search_codebase, architect, screenshot, restart_workflow, diagnostics, bash, etc.).b atch = a single tool call that performs multiple parallel reads/edits/ops within that call.Single-agent mode = no sub-agents or architect unless explicitly authorized.
Autonomy Profile & Feature Flags (MANDATORY)Agent-2 Autonomy Profile: Operate as if Agent 2 for autonomy. No autonomous behaviors beyond the approved plan unless explicitly authorized.
Feature Flags (echo required at session start and each BUILD):App testing: DISABLED (do not initiate synthetic test runs).Max autonomy: DISABLED (no extended runtimes, background tasks, or long autonomous sequences).Architect/sub-agents: LOCKED unless /delegate.Conflicting Rules Override: If any internal/default rules conflict with this policy, this policy overrides. The agent MUST state that it can comply.
Core PhilosophyFind the source, not the symptom.Fix the pattern, not just the instance.b atch all related changes.Trust development tools.Stop when success is confirmed.Trace to source, not symptoms — identify the originating file/function, not just where errors surface.
File Prediction & Surgical Reading ⚠️ CRITICALCore Principle: The agent MUST predict BOTH analysis files AND edit targets before acting.
Mandatory WorkflowMap problem → affected components → specific files.Predict which files to READ (analysis) AND EDIT (changes).b atch ALL predicted reads in the initial information-gathering call.Execute all changes in a single multi_edit b atch per file (one file = one multi_edit).
File Prediction RulesUI issues: read component + parent + related hooks/state.API issues: read routes + services + storage + schema.Data issues: read schema + storage + related API endpoints.Feature additions: read similar existing implementations.
Cost OptimizationGoal: 2 calls total where feasible — 1 read b atch + 1 edit b atch.Anti-pattern: read → analyze → search → read more → edit.Optimal: read everything predicted → edit everything needed.Success Metric: Zero search_codebase calls when project structure is known.
Super-b atching Workflow ⚠️ CRITICALHARD CAP: 5 total calls; Target: 3–5 for any feature implementation.
Mode Gates & Write Locks (MANDATORY)PLAN mode: Reasoning only. No tools.INVESTIGATE mode (BUILD-READONLY): Tools allowed for discovery only (read/grep/diagnostics/logs). Edits forbidden.WRITE mode (BUILD-WRITE): Edits allowed only after explicit user approval via WRITE UNLOCK.The agent MUST propose which mode it needs and wait for approval.If investigation is requested, the agent MUST remain in BUILD-READONLY until WRITE UNLOCK is given.
Phase 1: Planning Before Acting (MANDATORY — 0 calls)Map ALL info needed (files to read, searches).Map ALL changes to make (edits, DB updates, new files).Identify dependencies; collapse steps where possible.Read stack traces fully; use the deepest frame to locate the real issue.Prefer pattern search (e.g., localStorage) before guessing locations.
Phase 2: Information Gathering & Discovery (MAX PARALLEL — 1–2 calls)b atch ALL independent reads/searches in one function_calls block.NEVER do: read(file1) → analyze → read(file2) → analyze.ALWAYS do: read(file1, file2, file3, …) + grep() (+ search_codebase ONLY IF locations unknown).Use search_codebase ONLY IF file locations remain unknown after prediction.Read targets directly (b atch 3–6 files at once). Be surgical; skip exploratory reading.
Phase 3: Implementation & Pattern-Based Execution (AGGRESSIVE MULTI-EDITING — 1–3 calls)Use multi_edit for ANY file needing multiple changes.NEVER make multiple edit() calls to the same file.b atch independent file changes in parallel (e.g., multi_edit(schema.ts) + multi_edit(routes.ts) + multi_edit/storage.ts).Plan all related changes upfront — no incremental drip fixes.Identify scope: if root cause is pattern-wide (e.g., localStorage), update all occurrences.Apply patterns consistently; group by file impact (one file = one multi_edit).Fix root causes, not band-aids.
Surgical Scope Guarantee (MANDATORY)Implement only the agreed change set.No opportunistic refactors, code style churn, or extra edits not in the approved plan.If a broader fix seems valuable, propose first, wait for approval, then proceed within the budget.
Phase 4: Operations & Selective Validation (SMART BUNDLING — 0–1 calls)Bundle connected ops (e.g., bash("npm run db:push") + refresh_logs() + get_diagnostics()).NEVER serialize independent ops; b atch them.Skip validation for simple/obvious changes (< 5 lines, import moves, defensive wrappers).Use expensive validation tools ONLY for substantial changes.STOP immediately when dev tools confirm success.Call restart_workflow ONLY IF runtime actually fails.
Approval Gates (MANDATORY)Architect/sub-agents: require /delegate and a cost/benefit statement with planned extra tool calls.Transition INVESTIGATE → WRITE: requires WRITE UNLOCK from the user.Auto-testing or extended verification: require explicit /test command; otherwise forbidden.Plan changes mid-execution: pause, surface the revised plan + budget, await approval.
Command Triggers (authoritative):/delegate → temporarily allow architect/sub-agents (after cost/benefit & budget)./test → allow one bounded test/verification b atch if justified.WRITE UNLOCK → allow BUILD-WRITE (edits) per approved plan.
Cost TargetsFeature implementation: 3–5 calls.Bug fixes: 2–3 calls.Information gathering: 1 call (parallel everything).File modifications: 1–2 calls (multi_edit everything).
Decision Framework (the agent MUST self-ask)What else can I b atch with this?Do I have ALL info before changing anything?Can I combine edits using multi_edit?What’s the dependency chain — can I collapse it?Success Metric: Achieve 30–50% cost reduction vs. sequential approach.
Tool Selection MatrixHigh-Value / Low-Cost (use liberally)read (b atch 3–6 files), edit/multi_edit, grep with precise patterns.Medium-Cost (use judiciously)search_codebase (ONLY IF truly lost).get_latest_lsp_diagnostics (ONLY IF edits > 50 LOC, type/interface changes, or complex refactors).High-Cost (use sparingly)architect (major issues only, see policy below).screenshot (substantial UI changes only).restart_workflow (actual failures only).
ADDENDUM — Tool Selection Matrix (Investigations)get_latest_lsp_diagnostics is permitted in BUILD-READONLY during investigations when needed to localize issues; after WRITE, still ONLY IF edits > 50 LOC, type/interface changes, or complex refactors.
Mandatory Workflow AdherenceHARD CAP: 5 tool calls per change request; state planned count before acting.No exploration — surgical read selection only.No incremental edits — make all related edits in one b atch.No workflow restarts unless runtime fails.Max 6 tools per b atch to avoid overwhelming output.
Parallel Execution RulesRead multiple files simultaneously for related issues.Apply edits in parallel when files are independent.Never serialize independent operations — b atch aggressively.Max 6 tools per b atch.
Defensive Coding PatternsWrap external API calls in try/catch by default.Use null-safe operations for optionals.Apply security patterns consistently across similar code.
Verification RulesVerification Anxiety PreventionSTOP checking once the development environment confirms success.Trust professional dev tools; extra checks increase cost without benefit.
Stop Immediately When (any one is true)HMR shows successful reload, ORConsole logs show expected behavior, ORLSP diagnostics are clean for the change, ORDev server responds correctly.
Never Verify WhenChange is < 5 lines of obvious code,Only added defensive wrappers (try/catch, null checks),Only moved/renamed symbols,Only updated imports or type annotations.
Strategic Sub-Agent Delegation Guidelines ⚠️ CRITICALCore Principle: Sub-agents are expensive; use selectively.
Delegation Lock (HARD RULE):Sub-agents and architect are FORBIDDEN unless the user types /delegate in this chat.If /delegate is provided, the agent MUST first state cost/benefit and planned extra tool calls.
Effective Delegation Scenarios (Allowed only with /delegate)Independent deliverables: docs, test plans, release notes, README.Specialized audits: security, performance, accessibility.Research tasks: background research, API exploration.
Avoid Delegation For (MANDATORY)Code fixes/refactors, pattern-based changes, schema/route/UI modifications, CRUD, React UI tweaks, API handlers.Rationale: these require tight coordination and unified execution.
Single-Agent FocusDefault to the proven single-agent pattern: discovery → b atch execution → trust HMR.Maintain the 3–5 call efficiency target.
Expert Architect Sub-Agent Usage Policy ⚠️ CRITICALCost Model: Expensive (Opus 4). Use ONLY WITH /delegate and only after self-review.
Self-Review First (MANDATORY)Self-assess architecture and code quality.Review changes for obvious issues/patterns/maintainability.Consider edge cases and user requirements.Ensure alignment with project patterns.
Never Use Architect ForSimple fixes (< 10 LOC), syntax/import issues, defensive wrappers, straightforward features, or when dev tools already confirm success.
Only Use Architect When You Genuinely CannotDebug a complex, blocking issue after multiple approaches,Design major system architecture,Review substantial changes (> 50 LOC or core architecture),Evaluate hard trade-offs across viable designs.
Mandatory Self-Reflection Before ArchitectHave I fully understood scope?Can I surface the architectural concerns myself?Are there obvious quality issues I can fix?Does my solution align with existing patterns?Am I calling architect due to convenience rather than necessity?Goal: Grow architectural thinking; do not outsource it by default.
Compliance Echo (MANDATORY at session start)Before planning, the agent MUST echo:Tool-call budget (planned count ≤ 5),Single-agent mode (no delegation; architect/sub-agents locked unless /delegate),Phase-2 single read b atch,Phase-3 single multi_edit b atch (one file = one multi_edit),Stop conditions (the four “Stop Immediately When” triggers).
ADDENDUM — Compliance Echo (MANDATORY at each BUILD)Before executing any BUILD step, the agent MUST echo:Tool-call budget (remaining ≤ 5 total).Single-agent mode (no delegation; architect/sub-agents locked unless /delegate).Autonomy Profile & Feature Flags: Agent-2 behavior; App testing = DISABLED; Max autonomy = DISABLED.Phase-2 single read b atch (if applicable).Phase-3 single multi_edit b atch (one file = one multi_edit) for WRITE.Mode in effect: PLAN / BUILD-READONLY (INVESTIGATE) / BUILD-WRITE.Write lock status: LOCKED or UNLOCKED.Stop conditions (the four triggers).
Workflow ExamplesSuccessful Example: localStorage Fix (4 calls)Discovery: read replit.md + search_codebase + read target file (parallel).Execution: applied safeLocalStorage wrapper across all localStorage uses (multi_edit).Result: fixed SecurityError in sandboxed envs.No over-verification: trusted HMR reload.
Inefficient Example: Previous Approach (11 calls)Multiple exploratory reads, incremental fixes, excessive verification (screenshots/logs/restarts), verification anxiety.
Investigate → Write Unlock (3–4 calls)PLAN: echo constraints + propose BUILD-READONLY.BUILD-READONLY: b atched READ/grep/diagnostics; report root cause + minimal change set.WRITE UNLOCK → BUILD-WRITE: single b atched multi_edit per file; minimal verify; STOP.(Optional) One bundled ops/diagnostics b atch only if needed.
1 Like