Proof CLI · The verification layer · Private beta

AI writes your code.
Who verifies the intent?

AI generates code 10x faster, but nobody verifies that the code matches the intent. Tests check behavior on specific inputs — formal methods prove it for all inputs. Proof is the verification layer: write requirements, prove they hold, trace every change, ship with mathematical confidence.

AI-native CLI-first Git-native Zero vendor lock-in 5-second feedback
$ proof verify ✓ realizability ✓ consistency ⚠ 2 vacuous ✓ gaps $ proof trace suspect ⚠ 3 specs drifted from code handler.go changed 6 days ago — requirement not updated SYS-REQ-047 updated — no matching test exists $ proof mcdc report --view hotspots pkg/auth/session.go err != nil → always true (dead branch) pkg/merge/policy.go 4 conditions never independently tested $ proof gaps ⚠ idle_state: no requirement governs this output ⚠ retry_count: no boundary assumption specified
The honest questions

Can your team honestly answer these?

When AI writes the code, who verifies the intent?
Do you understand exactly how your software works — all the small nuances?
How much of it is actually tested — not test coverage numbers, really tested?
Does your code align with the spec? Does your documentation?
Can you see the blast radius of every change before you ship it?
Can you prove your requirements are complete and don't contradict each other?
Is there a single place where you can see how it all works together?
§1 — Axiom

The game changed when AI started writing code.

Before: Humans wrote code. The same brain that held the intent also wrote the implementation. The gap between what you meant and what you built was manageable — imperfectly, but manageable — because the author understood both sides.

Now: AI writes the code. First autocomplete, then full functions, then entire features. The brain that holds the intent is no longer the brain that writes the code. Tests still pass — because AI is good at writing code that passes tests. But tests verify behavior, not intent. And AI does not understand intent.

The verification gap is compounding with every commit. You have Jira, GitHub, CI/CD, code review — but none of these verify that what was intended is what got built. That's not a failure of your team. It's a failure of the tools.

Axiom 1

Tests check behavior on the inputs you thought of. Formal verification checks all possible inputs — including the ones you didn't think of. As AI generates code faster, the gap between intent and implementation grows. Not linearly. Multiplicatively.

The numbers 28 academic papers apply LLMs to requirements engineering. Best accuracy: 53–83% F1. That means 17–47% error rates on the things that define what software should do. And nobody checks the output.
§2 — Concepts

Your tests verify behavior. Proof verifies intent.

There is a set of techniques — proven over decades in aerospace and automotive engineering — that go far beyond what testing can catch. They have been locked behind $250K toolchains and specialist jargon. Proof makes them accessible. Here is what they do, in plain language.

  1. Structured requirements Write down what your software must do before writing how. Not Jira tickets — precise, verifiable behavioral specifications. AI helps you write them. The tool validates them in real time.
  2. Mathematical proof Your tests check specific inputs. Formal verification checks all possible inputs. "Can these requirements all be satisfied simultaneously?" "Do any contradict each other?" Answers in seconds — not weeks of review or hoping you covered enough test cases.
  3. True coverage MC/DC proves that every boolean condition in your code independently affects the outcome. Your go test -cover says 80% — but which 20% is missing, and does it matter? MC/DC answers that question. Required by aviation for the most critical software. Now available for Go, JavaScript, and TypeScript.
  4. Change tracking Every requirement links to the code that implements it, the tests that verify it, and the docs that describe it. Change one — instantly see what else needs updating. The blast radius of every change, visible at all times.
  5. Gap detection Automatically finds what is missing from your specification. Unconstrained outputs, missing boundary checks, variables without assumptions. This is how Proof found a crash bug in the Tyk API Gateway that all tests missed.
Why this matters now

The problem is unworkable: AI generates code faster than teams can verify intent. The gap compounds with every commit.

It's unavoidable: If you use AI to write code, you cannot interview the AI about its intent. Tests verify behavior, not intent.

It's urgent: Cursor, Claude Code, Copilot, Codex — every team is adopting AI coding tools right now. Every commit generated today is unverified intent.

Nothing exists: Zero tools verify AI-generated code against intent. ML-based traceability tops out at 64% accuracy. The only alternative to formal verification is hope.

No PhD required You don't need to know what FRETish, LTL, or Kind2 are. AI translates your plain English into formal specifications. The compiler validates. proof workflow --quick takes 5 seconds. You learn by using, not by studying.
Not just for planes You don't need to be building flight software. If you care whether your code does what you intended — especially when AI is generating it — these techniques apply to you.
§3 — The Chain

One tool covers the full verification chain. Nothing else does.

Proof is not a requirements database. It is not a test runner. It is not a coverage tool. It is all of these fused into a single verification chain where every link is traceable, verifiable, and auditable.

Why does nothing else cover the full chain? Because software verification is split into four separate worlds that don't intersect. Requirements tools don't verify. Verification tools don't manage requirements. Testing tools don't know the spec. Developer tools don't know any of these exist. Building a competitor requires expertise across all four — and those four communities don't overlap. Proof bridges all four.

  1. Write requirements Full authoring tools for humans and AI. Multi-level: stakeholder use cases → system components → software specifications. Quality checks, gap detection, and conflict resolution built in.
  2. Verify they hold Formal verification: are these requirements realizable? Consistent? Vacuous? Where are gaps? Proof uses Kind2 realizability, Z3 SMT proofs, and 7-category gap analysis to answer mathematically.
  3. Define your data Property-based specifications: describe the shape of your data, merge strategies, boundary conditions. Proof generates test fixtures automatically and proves properties hold for ALL inputs, not samples.
  4. Trace every change Bidirectional traceability from requirements → code → tests → documentation. See the blast radius of every change. Suspect links flag when artifacts drift. Nothing is missed.
  5. Prove the code matches the spec MC/DC coverage at both specification and code level. Compare spec-level truth tables with code-level execution traces. Close the loop until they agree — or find issues in the spec itself.
  6. Ship with evidence Generate NPR 7150, DO-178C, ISO 26262 compliant documents. Every claim backed by traceable evidence. Runs in your CI pipeline — GitHub Actions and GitLab CI supported.

This chain is enforced by a four-stage development workflow — SPEC, IMPLEMENT, VERIFY, DOCUMENT — with quality gates that block advancement until checks pass. You cannot skip steps. The process is the product.

Four worlds, one bridge Requirements engineering (DOORS, Jama), formal methods (Kind2, Z3), code coverage testing (LDRA, VectorCAST), and mainstream software engineering (GitHub, CI/CD) — four communities that don't attend the same conferences, don't read the same papers, and don't know each other exist. Proof is the first product to bridge all four.
Self-verified Proof verifies itself — 758+ requirements across 4 specification levels, all traced and verified by its own toolchain. The ultimate "we eat our own dogfood."
§4 — Write

AI drafts. You review. The tool verifies.

You don't start from scratch. Describe what your software should do in plain English — Proof's AI pipeline translates it into formal, verifiable requirements automatically. Our NL-to-FRETish translation is competitive with the best published academic approaches. You review and approve. The compiler validates in real-time — you can't produce an invalid specification.

No rewrite needed. Proof layers on top of your existing codebase. Pick your most critical component, make it bulletproof, then expand. In one case, applying Proof to the Tyk API Gateway's policy merge engine — a production system handling millions of requests — without modifying a single line of existing code, found a nil-pointer crash (the policy store could be unavailable, and nobody had specified what should happen), plus seven undocumented design decisions. Two hours of spec work. The key: writing specs from intent, not from existing code. Spec-from-code mirrors the code's assumptions and finds nothing. Spec-from-intent finds real bugs.

No rewrite You don't rewrite your software. You don't stop using your existing tools. Pick one critical path, spec it, verify it, trace it. Then expand. Proof lives alongside everything you already have.
AI workflow Works natively with Claude Code and Codex. AI assists with requirements authoring — with compiler-in-the-loop validation and automatic variable grounding. Humans review and approve.
# Three levels of requirements, each linked to the next $ proof req add --level stakeholder # use cases / user needs $ proof req add --level system # component-level behavior $ proof req add --level software # implementation-level detail # Validate structure, check for gaps $ proof validate ✓ all requirements valid $ proof gaps --check all ⚠ 3 outputs unconstrained · gap score: 0.91 idle_state: no governing requirement for this output error_code: missing boundary requirement session_timeout: no assumption on input range # AI-assisted: derive system requirements from stakeholder needs $ proof req derive STK-REQ-001 --llm → 4 system requirements generated · status: draft · review: pending
§5 — Verify

Even when all tests pass — how do you know the implementation is correct?

Tests prove what they test. They don't prove what they miss. Proof runs formal verification to catch what testing cannot: conflicting requirements, unrealizable specifications, vacuous conditions, and missing constraints.

The highest-value analysis is realizability checking. It answers: "Can these requirements actually be satisfied simultaneously?" NASA's FRET research applied this to a Lockheed Martin eVTOL flight controller and found a specification that would allow the vehicle to fly backwards — a "missing stay" requirement that no test would have caught because nobody thought to test for it. Proof runs the same mathematics on your specifications, in seconds.

Figure 2 — The verification chain
SPEC validate DATA Z3 SMT MC/DC FLIP TEST fixtures CODE MC/DC TRACE bidir AUDIT evidence DOCS NPR 7150
Fig. 2 — The verification chain. Each node represents an independently verifiable layer. Spec-level and code-level MC/DC close the loop between what was specified and what was built.
Theorem

When all layers produce clean evidence, the system can demonstrate — not just claim — that what was specified is what was built, tested, traced, and shipped.

Realizability: the killer analysis Across 42 NASA FRET papers, realizability checking finds the most real bugs: missing requirements, priority inversions, conflicting specs, the backwards-flying vehicle. It answers a question testing cannot even ask: "Is this specification physically possible?"
Real bugs found Applied to Tyk API Gateway: found 1 crash bug (nil pointer when policy store unavailable) via gap analysis. Spec-from-code found nothing. Spec-from-intent found it in 2 hours.
§6 — Trace

See the blast radius. Find what's missing.

Every requirement links to the code that implements it, the tests that verify it, and the documentation that describes it. Change a requirement — and Proof shows you exactly what downstream artifacts need updating. Change code — and Proof flags which requirements may now be suspect. This is bidirectional drift detection: spec-to-code and code-to-spec. When either side changes without the other, Proof raises a suspect link. Nothing drifts silently.

But traceability is not just about what changed. Gap analysis finds what was never specified: outputs with no governing requirement, inputs with no boundary assumption, behavior that no test covers. The hardest bugs to find are the ones where the specification itself is incomplete.

ML-based traceability — the best the research community has produced — achieves 64% accuracy. That means more than a third of the links are wrong. Proof's annotation-based approach achieves 100% precision — every link that exists is verifiable and correct. No false positives. Not a probability. A deterministic, verifiable chain from intent to implementation.

100% vs. 64% ML-based traceability (the state of the art in requirements engineering research) tops out at 0.64 F1. That means 36% of trace links are wrong. Proof's approach: code annotations scanned deterministically. Every link is verifiable. Zero false positives.
# What does this requirement touch? $ proof trace impact SYS-REQ-047 implements: pkg/session/apply.go:Apply() verifies: pkg/session/apply_test.go:TestApply() satisfies: STK-REQ-012 (parent stakeholder requirement) docs: generated-srs.html §4.3 # After a code change — what drifted? $ proof trace suspect --verbose ⚠ 2 suspect links SYS-REQ-047 → apply_test.go (requirement updated, test unchanged) SYS-REQ-089 → handler.go (code updated, requirement unchanged) # Full coverage view $ proof trace coverage ✓ 96.2% requirements traced implemented: 97.6% verified: 96.0% documented: 100%
§7 — MC/DC

Code-level MC/DC. For the languages you actually use.

MC/DC (Modified Condition/Decision Coverage) proves that every boolean condition in your code independently affects the decision outcome. It has historically required $50K+/year tools (LDRA, VectorCAST) and only worked for C/C++ and Ada. Proof brings MC/DC to modern languages: Go (production — the only tool that offers Go MC/DC at any price), JavaScript/TypeScript (beta), C/C++ and Rust (coming).

Figure 4 — Go code-level MC/DC measurement
# Measure MC/DC coverage for a Go package $ proof mcdc measure ./pkg/handler/... --experimental Code MC/DC Coverage Report Engine: go Pattern: ./pkg/handler/... Statements: 87.5% Decisions: 42 found, 38 fully covered Conditions: 94 total, 88 covered (93.6%) Hotspots: ┌──────────────────┬────────────────────┬──────────────────┐ │ ProcessRequest │ isValid && hasAuth │ hasAuth (skipped)│ │ MergePolicy │ a.Rate > b.Rate │ Rate (never <) │ └──────────────────┴────────────────────┴──────────────────┘ # Compare spec-level MC/DC with code-level $ proof mcdc show SYS-REQ-047 # spec truth table $ proof mcdc measure ./pkg/... --experimental # code coverage → close the loop until spec and code agree
Fig. 4 — Code-level MC/DC via AST instrumentation. Go (production), JavaScript/TypeScript (beta), C/C++ and Rust (coming). No source modification. Handles short-circuit evaluation correctly.

The loop works both ways: export spec-level MC/DC truth tables, compare them with code-level execution traces, and iterate until they agree. When they don't — you've found either a bug in the code or an issue in the spec.

How it works Copies source to temp dir, instruments AST with tracking calls, runs go test, collects which conditions were evaluated, finds MC/DC pairs automatically. Zero source modification.
The spec↔code loop This is the key insight: spec-level MC/DC (FLIP) proves the formula is well-structured. Code-level MC/DC proves the implementation covers all conditions. Together, they close the verification gap.
§8 — Paradigm

Built on published formal methods research.

The mathematics behind Proof are not new. They have been developed over seven years by NASA's FRET research program, validated across Lockheed Martin flight controllers, CERN safety systems, and eVTOL aircraft specifications. 42 peer-reviewed papers. 800+ requirements formalized in industrial case studies. Bugs found that testing could not conceive of.

What is new is making these techniques accessible. The AI research community builds better code generators. The formal methods community builds better verifiers. These two communities don't overlap — they publish at different conferences, cite different papers, and don't know each other exist. Proof sits at the convergence: LLM-assisted drafting on the input side, mathematical proof on the verification side. Same Kind2 realizability checker, Z3 SMT solver, and FLIP MC/DC prover that NASA uses — in a CLI that fits your git workflow, runs in 5 seconds, and costs nothing to start.

Proposition

AI broke the verification model. When humans wrote code, intent lived in the same brain as the implementation. When AI writes code, intent and implementation are separated — and nothing bridges the gap. Proof is the bridge.

Academic foundation Built on published formal methods research from NASA, CERN, and the aerospace industry — the same mathematics that verify spacecraft and particle accelerator safety systems. Every verification technique in Proof is validated in peer-reviewed research.
Standards covered NPR 7150.2D (NASA), DO-178C (aviation), ISO 26262 (automotive), IEC 62304 (medical), ISO 29148, IEEE 830, INCOSE. Full compliance matrices, not marketing claims.
Built for AI agents Proof is a deterministic workflow engine with strict rules — exactly what AI agents need. No ambiguity: the workflow blocks or advances. AI follows the same verification chain as humans. It can draft, analyze, and iterate — but can never approve.
$ proof workflow init --reqs SYS-REQ-047 → stage: SPEC $ proof workflow advance ✗ blocked: 2 gaps found, 1 requirement missing rationale fix issues, then advance again # ... fix the gaps ... $ proof workflow advance → IMPLEMENT $ proof workflow advance → VERIFY $ proof workflow advance ✗ blocked: 1 suspect link, mcdc_coverage below threshold # ... close the gaps ... $ proof audit ✓ 52 passed ⚠ 1 warning ✓ 0 errors
STAGE 1 — SPEC Write requirements, validate, check for gaps proof validate · proof gaps · proof workflow check spec fix STAGE 2 — IMPLEMENT Write code with annotations, build, test, lint go test · proof lint · proof trace autolink code fix spec issue STAGE 3 — VERIFY Formal verification, traceability, suspect links proof verify · proof trace coverage · proof mcdc STAGE 4 — DOCUMENT Generate SRS, compliance docs, final audit proof doc generate · proof audit --scope full
AI-native MC/DC FLIP FRET / FRETish LTL DO-178C ISO 26262 NPR 7150 IEC 62304 Formal Verification Traceability Kind2 / Z3 Gap Analysis
§9 — Q.E.D.

The verification layer
that doesn't exist yet.
Until now.

No tool verifies the chain from intent to implementation. Requirements tools manage but don't verify. Verification tools prove but don't know the spec. Testing tools measure but don't trace. Developer tools ship but don't check intent. Proof is the first product in a new category — intent verification for AI-assisted development.

Your data is YAML files in git — zero vendor lock-in, even if Proof disappears. AI drafts requirements, you review and approve. The --quick mode gives 5-second feedback. Runs in your CI pipeline in your CI pipeline — GitHub Actions and GitLab CI supported. Start for free.

Meet the founder
Leonid Bugaev

Leonid Bugaev

Founder & CEO

Two decades building enterprise software — leading teams that shipped systems for banks, governments, and API infrastructure serving millions of requests. I've lived in four worlds that don't normally overlap: the move-fast startup world, the compliance-heavy enterprise world, the formal methods research world, and the modern developer tooling world.

Proof exists because I couldn't find a tool that bridged all four. Requirements tools don't verify. Verification tools don't fit developer workflows. Developer tools don't know formal methods exist. So I built the bridge — informed by years of formal methods research across NASA, CERN, and the aerospace industry and two decades of seeing where intent gets lost in practice.

Connect on LinkedIn →

Questions, partnerships, or enterprise inquiries: hello@reqproof.com