Most AI writing tools are built for fluency. They help you turn rough text into cleaner text. They make a draft sound smarter, smoother, and more complete.
That is useful, but it is not the real problem most teams have.
The real problem is judgment.
When you are working on a pitch, a memo, an idea brief, a go-to-market narrative, or a 30-second script, the hard part is not generating words. The hard part is knowing whether the words will survive contact with different audiences who care about different things.
A customer wants to know if the message feels true. A product marketer wants sharper positioning. A sales leader wants specificity and credibility. An engineer wants the claims to hold up. An investor wants to know why this matters and why it wins.
Those are not minor edits. Those are different ways of seeing.
I built kritique for myself because I kept hitting that exact judgment problem in my own writing. The single-model rewrite loop was not the bottleneck. The bottleneck was that the rewrite never had to survive the people who would actually read the work. I am writing this up because the underlying idea seems useful beyond my own desk, and the pattern is worth more than any single implementation of it.
Instead of asking one model to write and approve its own work, kritique runs a council. One persona writes. Multiple personas critique from distinct lenses. The system keeps iterating until the artifact is strong enough to pass a configured threshold of agreement, while still preserving the dissent that remains.
The result is not generic AI polish. The result is structured pressure.
why this needs to exist
Most business writing fails for reasons that normal AI rewriting does not catch. It may sound polished but still be vague. It may sound ambitious but still be strategically thin. It may sound persuasive to the founder and completely unconvincing to a buyer, operator, or investor.
That happens because writing quality is contextual. There is no single universal standard for a good memo or pitch. There are only competing standards imposed by the people who will read it. In practice, good communication emerges from surviving criticism across those standards.
That is what kritique automates.
You bring an artifact:
- an idea
- a pitch
- a memo
- a 30-second script
- a positioning statement
- a go-to-market narrative
Then you pick the kind of council that should pressure-test it. The system gives you something closer to a real review room than a generic autocomplete box.
how it helps
kritique helps in three ways.
First, it gives you perspective separation. Most AI tools collapse into one voice. Even when they are useful, they tend to average everything into one notion of "better." kritique does the opposite. It makes the different perspectives explicit. A customer critic should care about resonance. A sales critic should care about clarity and objections. A technical critic should care about credibility. A founder-style writer should preserve ambition while incorporating what actually improves the artifact.
Second, it turns critique into something actionable. Critics do not just say "this feels weak." They return structured feedback: what text they object to, how they would rewrite it, why it is better from their lens, whether they are satisfied, and how important the issue is. That pushes the system away from vague commentary and toward specific, usable revision pressure.
Third, it gives you convergence without pretending everyone agrees. Real teams rarely reach perfect consensus. One person always wants another pass. One critic always has one more concern. If a writing tool requires unanimity, it will often loop forever or stop being useful. kritique uses threshold-based convergence instead. If enough critics are satisfied, the system can stop, show the revised artifact, and clearly report who still objects and why.
That is closer to how real decisions get made.
the core product shape
At its core, kritique is a CLI for iterative artifact review.
You choose:
- an artifact type
- a writer mode
- a preset council or custom council
- models for the writer and critics
- convergence settings like threshold, round limit, and history depth
Then the loop runs:
- the writer revises the artifact
- critics review it in parallel from their own lenses
- the engine evaluates convergence
- if needed, the writer revises again using accumulated feedback
- the system returns the latest artifact, a concise summary, unresolved holdouts, and any degraded or stalled status
This shape matters because it separates three concerns cleanly: what kind of artifact is being reviewed, who is in the room, and how strict the stopping condition is. That makes the tool flexible without turning it into open-ended chat.
how the council actually works
The most important part of kritique is not that multiple personas exist. It is how they interact. The review loop is intentionally structured.
1. The writer starts with the current artifact
The writer gets the current version of the artifact plus the recent feedback history. That history is bounded. The system does not keep dragging the entire transcript forward forever, because that creates token bloat and weakens the signal. Instead, it carries a configurable window of recent rounds so the writer sees the most relevant unresolved pressure.
The writer's job is not to blindly accept every comment. The writer is asked to incorporate changes that improve clarity, precision, credibility, and fit for the artifact type, while preserving the core intent and requested format. That distinction matters. If the writer accepts every suggestion literally, the artifact turns into committee writing. If the writer ignores feedback, the loop becomes cosmetic. The right behavior is selective synthesis under pressure.
2. Critics review in parallel
Once the writer produces a revision, the critics review that version independently and in parallel. This matters for two reasons. It keeps the process fast enough to be usable. And it preserves epistemic independence, because critics are reacting to the artifact, not to each other.
Each critic gets the document framed through its persona lens and returns structured feedback rather than free-form commentary. In practice, a critic response carries fields like satisfied, priority, suggestions, and summary. Each suggestion is tied to exact text. The critic is expected to quote the phrase or passage it objects to, propose a rewrite, and explain why that rewrite is better from its perspective.
That is the key design decision. Comments are not treated as vague impressions. They are treated as structured edit proposals with rationale.
3. Comments are normalized before they influence the next draft
Raw critic output is not trusted as-is. The engine parses and validates responses, retries malformed JSON when needed, and normalizes the output into a consistent internal form. That gives the next round a stable contract: every critic either contributes valid structured feedback, explicitly marks itself satisfied, or is marked as degraded for that round.
This solves a practical problem in LLM systems. If one critic returns beautiful prose, another returns malformed JSON, and a third returns contradictory fields, the writer cannot consume those outputs reliably. Normalization turns messy model behavior into a usable editing interface.
4. The writer receives accumulated revision pressure
When the next writer round begins, kritique does not just say "please improve this." It hands the writer the current artifact plus the accumulated suggestions from the recent council history. That history includes who raised the issue, what exact text they objected to, what rewrite they proposed, and why they think it is better.
This is where the system becomes more than a simple rewrite loop. The writer is not improvising from scratch each round. It is resolving a queue of targeted criticisms from multiple lenses. In effect, the comments become a prioritized editing brief.
5. Convergence is measured, not guessed
After each critic round, the engine evaluates whether the artifact has converged. Convergence is threshold-based, not unanimity-based. That means the system looks at how many critics are satisfied relative to the configured threshold. If the threshold is met, the run can stop even if some critics still disagree.
This is a much better stopping rule than "everyone must approve." In real review settings, one critic may stay stricter than the rest. Requiring unanimity would let a single holdout force low-value rewrite loops. Threshold convergence keeps the bar high without making the process brittle. The unresolved objections do not disappear. They are preserved in the output as holdouts, so the user can still inspect what did not converge and decide whether it matters.
6. Edits are made through revision, not patchwork
kritique does not mechanically splice critic rewrites into the document one by one. Instead, the writer produces a coherent new version of the full artifact after considering the council feedback. This avoids a common failure mode in automated editing systems: local edits improve individual sentences while degrading the overall flow, voice, or argument. That means the system treats comments as constraints on the next draft, not as direct string replacements.
This is especially important for business writing, where one change often has downstream effects. Tightening positioning in the opening may require changing the summary, the CTA, and the supporting claims later in the piece. A full-document rewrite can absorb those dependencies. Patch-by-patch editing usually cannot.
7. Stall detection keeps the loop honest
One of the most common failure modes in iterative writing systems is false motion. The writer changes a few words, keeps the same underlying draft, and the loop pretends progress happened. kritique checks for that. The engine compares successive versions for high similarity. If the writer is effectively submitting the same artifact again, the system can force a more substantive rewrite attempt. If the artifact still does not materially change after that, the run can terminate with a stalled status instead of wasting more rounds.
This matters because the product is trying to improve judgment, not simulate work.
8. Failure is surfaced, not hidden
Critics can fail. Providers can timeout. Models can emit invalid JSON. A run can partially degrade. The system is designed so those failures become part of the result rather than silent corruption. A degraded critic is marked as degraded for the round. The rest of the council can continue if enough signal remains. The final output and session record preserve what failed and what still completed successfully.
That makes the review process auditable. You can tell the difference between:
- a clean convergence
- a threshold convergence with holdouts
- a stalled run
- a degraded run with partial critic failure
That distinction is essential if users are going to trust the output.
9. The session ends with both a revision and a trace
At the end of a run, the output is not just "here is your final text." The output includes:
- the latest revised artifact
- the council summary
- unresolved holdouts
- degraded or stalled status when relevant
- the recorded session history
That final trace is part of the product, not just a debugging convenience. If a team is using kritique seriously, they need to know what changed, what objections were resolved, and what concerns are still open. That is how the system turns AI critique from a black box into a review process.
why personas matter
In kritique, personas are not theater. They are constraints. The point is not to have the model pretend to be a colorful character. The point is to force role accountability.
Some built-in personas reflect common business functions:
founderpmproduct_marketersalescustomerengineerinvestorneutral_synthesizer
Each one is supposed to care about a different failure mode. That keeps the review sharper than a generic "improve this" instruction ever could.
The system also supports custom personas at runtime. That matters because teams often need domain-specific councils: compliance, security, design, finance, support, enterprise buyer, technical evaluator. If the product is really about structured judgment, it cannot assume one fixed set of critics is enough for everyone.
why multi-model is useful
The council is not only multi-persona. It is also multi-model.
That matters because different model families have different habits, blind spots, and standards of quality. One model may be better at terse editing. Another may be better at strategic criticism. Another may be more literal, skeptical, or operationally grounded.
When you combine model diversity with persona diversity, you get a stronger kind of pressure test. You are no longer asking a single system to generate, critique, and validate its own answer. You are asking a room of systems, each with a role and lens, to try to improve or reject the artifact. That produces a much better signal than one model saying, "looks good."
the problems it solves
This design solves a few recurring problems in AI-assisted writing.
It solves false confidence. One polished response from one model often feels better than it is. A council exposes weak spots before a real audience does.
It solves perspective collapse. Instead of averaging every concern into one voice, the system preserves distinct objections.
It solves vague critique. Structured suggestions force critics to point at exact text and propose improvements.
It solves endless rewriting. Threshold-based convergence and round limits stop the loop from drifting forever.
It solves fragile automation. Critics can fail, responses can come back malformed, and writers can stall. The system is designed to degrade gracefully, retry JSON parsing, detect near-identical rewrites, and preserve useful output even when a run is imperfect.
It solves poor auditability. Sessions are logged so the path from initial artifact to final revision is inspectable. That matters if you want to trust the system, debug it, or understand why it made the changes it did.
the hard parts
Building a product like this is not mainly about prompts. The hard parts are architectural.
One challenge is preventing prompt leakage from one domain into all others. If a tool starts life as a pitch reviewer, it is easy for founder language, investor framing, and pitch-specific assumptions to leak into every artifact type. kritique avoids that by treating artifact type as prompt framing, not engine logic.
Another challenge is keeping the engine generic while preserving high-signal presets. The product should feel easy for common tasks like idea review or 30-sec script review, but those presets should only configure personas, defaults, and framing. They should not fork the engine into separate products.
A third challenge is handling disagreement well. If every critic must say yes, one permanently strict critic can block the run forever. If the system stops too early, the output is shallow. Threshold-based convergence solves that by letting the user choose how much agreement is enough while still surfacing holdouts explicitly.
A fourth challenge is making LLM failure legible. Models return malformed JSON. Providers time out. Writers sometimes produce nearly identical rewrites while claiming progress. A usable council engine has to assume those failures will happen and design around them.
why the CLI is the right place to start
kritique is intentionally CLI-first. That is not because the terminal is the final destination. It is because the terminal is a good place to build the hard part first.
The important product here is the council engine:
- persona definitions
- prompt builders
- model adapters
- convergence logic
- stall detection
- failure handling
- session persistence
The CLI gives that engine a practical surface without forcing premature UI decisions. It keeps the loop inspectable, testable, and cheap to iterate on. Once the engine is strong, other surfaces can exist. But if the engine is weak, no interface will save it.
The experience of using kritique is less like asking an AI assistant for a rewrite and more like handing a document to a compact review board. You pick the artifact and the room.
For example:
- an
ideareviewed by a founder, product marketer, customer, and investor - a
memoreviewed by a PM, engineer, and neutral synthesizer - a
30-sec scriptreviewed by sales, customer, and product marketer - a go-to-market narrative reviewed by sales, founder, and investor
The system rewrites, critiques, revises, and stops when enough of the room believes the artifact is strong enough. Then it tells you what changed, where the council converged, and which critics still disagree. That is a much more useful workflow than one-shot rewriting, because it mirrors how high-stakes communication actually improves.
what kritique really does
kritique does not try to replace judgment. It tries to concentrate judgment.
It creates a repeatable process for putting business writing under pressure from multiple perspectives, capturing the disagreement, and turning that pressure into a stronger artifact.
That is why this kind of tool is needed. Not because teams need more generated text. They already have that. They need a better way to pressure-test thinking before it leaves the room.
That is what kritique does for me, and that is why I am sharing the design. The implementation matters less than the pattern: a council, structured suggestions, threshold convergence, preserved dissent. If you are building something similar, or if you want to use what I have built, the idea is yours to take.