Hey everyone,
I didn’t have any previous coding experience, but I constantly found myself unable to tell AI exactly what I wanted. To solve my own problem, I started building Prompt Kaizen in my free time after work, coding the entire thing straight from my phone. I actually ended up building the optimization engine by using the engine itself to write the prompts! I want to give a huge thank you to Replit for providing such an incredible platform that allows someone without experience to just start building something real.
Prompt Kaizen compiles vague inputs into production-grade specifications. It’s not a “make my prompt longer” tool. It’s a multi-stage pipeline that analyzes, enriches, stress-tests, and scores your prompts before you ever paste them into an AI.
Here’s the live app: https://promptkaizen.com
THE PROBLEM I KEPT HITTING
Every AI coding session starts the same way: you type something lazy like “Create a React component,” the LLM fills in 47 assumptions you never made, and you spend the next hour unwinding hallucinated constraints.
The issue isn’t the model — it’s that we’re feeding it ambiguous inputs and expecting precise outputs. That’s a signal processing problem, not an AI problem.
WHAT THE PIPELINE ACTUALLY DOES (with real numbers)
Your prompt goes through 5 stages — you can watch each one stream in real time via SSE:
-
Understand — Classifies your intent, detects domain, identifies what’s missing
-
Enrich — Expands the prompt with structure, constraints, edge cases, and domain knowledge
-
Stress-Test — A red-team pass that attacks the enriched prompt for logical flaws, contradictions, assumption injection, and prompt injection vulnerabilities
-
Refine — Reconciles the enrichment with every critique finding to produce the final prompt
-
Measure — Evaluates the output across 4 dimensions: clarity, structure, specificity, and redundancy
Across 65+ completed production runs, the numbers look like this:
-
Average output score: 84/100
-
Average score lift: +43 points (input vs. output)
-
Average expansion: 34x (word count)
-
Max expansion: 509x (a single-sentence input became a 3,089-word specification)
THE REAL DIFFERENTIATOR: THE STRESS-TEST CATCHES THINGS HUMANS MISS
The stage I’m most proud of is the red-team critique. It doesn’t just check grammar — it attacks the enriched prompt for logical contradictions, scope explosion, assumption injection, and even prompt substitution.
Here are three real catches from production runs:
Example 1 — Input: “Create a React component” (4 words, scored 8/100)
The enrichment stage expanded it to 667 words with TypeScript interfaces, React 18+ hooks, accessibility requirements, and error boundary patterns. But the stress-test flagged it: the enriched prompt told the developer to include error boundaries while the rest of the prompt pushed functional-only patterns — and React error boundaries require class components. The pipeline caught the contradiction, forced a rewrite, and the final output scored 95/100.
Example 2 — Input: “Write a python script” (19 words with context, scored 15/100 → final 94/100)
The enrichment expanded a vague automation request into a 610-word structured specification with TOCTOU race condition handling, path validation, symlink resolution, dry-run safety mode, and a blocklist for protected directories. The stress-test validated the expansion, the synthesis cleaned it up, and the final output scored 94/100 — a +79 point lift from a prompt that started as barely usable.
Example 3 — Input: “Write me a prompt for 3rd game” (8 words, stress-test score 42/100)
This one is where the system earns its keep. The stress-test fired three [CRITICAL] alerts:
-
Context Mismatch: “Assumes sequential game development learning path, but original provides no context about prior projects or skill level”
-
Scope Explosion: “Expands 8-word request into ~2000+ word specification with 9 major sections never implied in original”
-
Assumption Injection: “Unilaterally decides genre (2D platformer), tech stack (JavaScript/HTML5 Canvas), and complexity level without user consent”
The system flagged it for review instead of blindly shipping an over-engineered spec. In another run, the stress-test even detected prompt substitution — where the enrichment stage generated a completely different prompt instead of optimizing the original. It fired [CRITICAL] “The optimized prompt is not an optimization of the original — it is a completely different prompt.” The system essentially caught itself fabricating, and flagged it before the user ever saw the output.
5 PRISM MODES
The pipeline adapts based on what you’re working on:
-
General Logic — Clear, structured prompts for any AI task
-
Software Engineering — Technical precision, error handling, implementation specs
-
Creative Writing — Narrative flow, emotional resonance, world-building
-
Business Strategy — Brand voice, compliance, professional communication
-
Education — Socratic approach, learning objectives, reasoning steps
HOW I BUILT IT ON REPLIT
I wanted to push the native Replit stack as far as it goes. Here’s the full architecture:
-
Frontend: React + TypeScript (Vite), Wouter routing, Tailwind CSS, shadcn/ui
-
Backend: Express.js with SSE streaming for real-time pipeline visualization
-
Database: PostgreSQL via Drizzle ORM
-
Auth: Replit Auth (OIDC) — zero-config sign-in, no auth headaches
-
AI: Replit AI Integrations — dynamically routes between a fast model for scoring/classification and a heavy reasoning model for expansion, critique, and synthesis
-
Payments: Stripe integration for subscriptions
The SSE implementation streams each pipeline stage to the UI in real time — you literally watch each stage think, then hand off to the next one. The state machine handles resume/rollback if any stage fails mid-run, so you never lose progress.
PRICING (transparent, no hidden gotchas)
-
Free: €0/month — 5 optimizations + 10 bonus credits on sign-up. All Prism Modes, quality scoring included.
-
Entry: €29/month — 100 optimizations, full history, export/import, Concept Studio. Annual plan available (~20% off).
-
Pro: €79/month — 350 optimizations + Integrity Engine, Forensic Analysis, Bulk Optimization, Prompt Memories. Annual plan available (~20% off).
TRY TO BREAK IT
I’d genuinely love for this community to stress-test the pipeline. Some things to try:
-
Feed it a deliberately contradictory prompt and see if the red-team stage catches it
-
Try a complex multi-step technical spec and check what the enrichment adds
-
Submit something intentionally vague (like “make an app”) and see how it handles extreme ambiguity
-
Run the same prompt through different Prism Modes and compare the outputs
The app is live at https://promptkaizen.com — Free tier gives you enough runs to put it through its paces. If you find something the stress-test misses, or the SSE streaming breaks on your browser, I want to hear about it. That’s how this gets better.
Built entirely on Replit. Source runs on the Replit stack (Express + Vite + PostgreSQL + Replit Auth + Replit AI Integrations).
