Proof proof CLI hello@reqproof.com

Requirements. Code. Tests. Docs.
When did they last agree?

Proof is the full-chain verification engine — requirements authoring, formal verification, bidirectional traceability, code-level MC/DC, and compliance documentation. AI-native: works with Claude Code, Codex, and human workflows equally. No rewrites needed — start with your most critical path on your existing codebase and expand.

AI-native MC/DC FLIP FRET / FRETish Linear Temporal Logic DO-178C ISO 26262 NPR 7150 IEC 62304 Formal Verification Traceability Kind2 / Z3 Gap Analysis
Can your team honestly answer these?
  • Do you understand exactly how your software works — all the small nuances?
  • How much of it is actually tested — not test coverage numbers, really tested?
  • Does your code align with the spec? Does your documentation?
  • Can you see the blast radius of every change before you ship it?
  • Can you prove your requirements are complete and don't contradict each other?
  • Is there a single place where you can see how it all works together?
§1 — Axiom

You're doing everything right. It's still not enough.

Your team has Jira, Kanban boards, refinement sessions. You discuss tasks, try to find edge cases, define test scenarios in advance. You add test coverage. You have release management — canary deploys, staging environments, the works. You're better than most of your peers.

And yet. You still have bugs. You still can't say exactly how everything works together — it's a collection of tickets, specs, and tribal knowledge. You can't confidently answer which exact test scenarios are covered and which aren't. Your documentation isn't correct and updated in all places at all times. The honest answer to all these questions is no.

And it only gets worse. You have Confluence, Jira, GitHub, Slack — but there is no single place where you can see how it all works together. The more components you add, the more micro-interactions multiply. The complexity doesn't scale linearly — it compounds. That's not a failure of your team. It's a failure of the tools.

Axiom 1

Best practices aren't enough. Jira doesn't verify requirements. Test coverage doesn't prove correctness. CI/CD doesn't trace intent. The tools that exist today either manage requirements or verify code. Nothing covers the full chain. That gap is where intent gets lost.

The questions Can you explain exactly how your software works? All the nuances? How much is really tested? Does the code match the spec? Does the documentation? Can you see the blast radius of every change? Can you prove your requirements don't contradict each other?

§2 — Concepts

If these concepts are new to you, that's the point.

Aerospace and automotive engineers have used these techniques for decades. Most software engineers have never heard of them. That's exactly the gap Proof fills — bringing these proven methods to ordinary teams.

  1. Requirements engineering Writing down what your software must do before writing how. Not user stories — precise, verifiable behavioral specifications that can be traced to code and tests.
  2. Formal verification Using mathematical solvers to prove properties about your spec. "Can these requirements all be satisfied simultaneously?" "Do any contradict each other?" "Are any conditions impossible to reach?" Answers in seconds, not weeks of review.
  3. MC/DC coverage Modified Condition/Decision Coverage — proving that every boolean condition in your code independently affects the outcome. Goes far beyond line or branch coverage. Required by aviation (DO-178C) for the most critical software.
  4. Bidirectional traceability Every requirement links to the code that implements it, the tests that verify it, and the docs that describe it. Change one — instantly see what else needs updating. The "blast radius" of every change, visible at all times.
  5. Gap analysis Automatically finding what's missing from your specification. Unconstrained outputs, missing boundary checks, variables without assumptions. This is how Proof found a crash bug in the Tyk API Gateway that all tests missed.
Why this is urgent

Unworkable: AI generates code faster than teams can verify intent.

Unavoidable: Regulations are expanding — EU AI Act, FDA, ISO 21434.

Urgent: The gap compounds with every AI-generated commit.

Underserved: No tool on the market covers the full chain.

Why now These techniques existed for decades but required $250K+ toolchains and specialized training. Proof makes them accessible: structured English in, mathematical proof out. AI assists with the writing.
Not just for planes You don't need to be building flight software. If you care whether your code does what you intended — especially when AI is generating it — these tools apply to you.

§3 — The Chain

One tool covers the full development chain. Nothing else does.

Proof is not a requirements database. It is not a test runner. It is not a coverage tool. It is all of these fused into a single verification chain where every link is traceable, verifiable, and auditable.

  1. Write requirements Full authoring tools for humans and AI. Multi-level: stakeholder use cases → system components → software specifications. Quality checks, gap detection, and conflict resolution built in.
  2. Verify they hold Formal verification: are these requirements realizable? Consistent? Vacuous? Where are gaps? Proof uses Kind2 realizability, Z3 SMT proofs, and 7-category gap analysis to answer mathematically.
  3. Define your data Property-based specifications: describe the shape of your data, merge strategies, boundary conditions. Proof generates test fixtures automatically and proves properties hold for ALL inputs, not samples.
  4. Trace every change Bidirectional traceability from requirements → code → tests → documentation. See the blast radius of every change. Suspect links flag when artifacts drift. Nothing is missed.
  5. Prove the code matches the spec MC/DC coverage at both specification and code level. Compare spec-level truth tables with code-level execution traces. Close the loop until they agree — or find issues in the spec itself.
  6. Ship with evidence Generate NPR 7150, DO-178C, ISO 26262 compliant documents. Every claim backed by traceable evidence. 52 quality checks across 4 stages.
No competition Other tools manage requirements (DOORS) OR verify code (LDRA). Nothing covers the full chain. Proof is the only tool that verifies requirements AND traces them to code AND proves coverage.
Self-verified Proof verifies itself. 758+ requirements across 4 spec levels, all traced, all verified by its own toolchain.

§4 — Write

AI drafts. You review. The tool verifies.

You don't start from scratch. Describe what your software should do in plain English — Proof's AI pipeline translates it into formal, verifiable requirements automatically. Our NL-to-FRETish translation outperforms published academic frameworks. You review and approve. The compiler validates in real-time — you can't produce an invalid specification.

No rewrite needed. Proof layers on top of your existing codebase. Pick your most critical component, make it bulletproof, then expand. In one case, applying Proof to a production API gateway's policy engine — without modifying a single line of existing code — found a crash bug that all tests had missed, plus seven undocumented design decisions. Two hours of spec work.

Figure 1 — Multi-level requirements chain
# Three levels of requirements, each linked to the next $ proof req add --level stakeholder # use cases / user needs $ proof req add --level system # component-level behavior $ proof req add --level software # implementation-level detail # Validate structure, check for gaps $ proof validate ✓ 758 requirements valid $ proof gaps --check all ⚠ 3 outputs unconstrained · gap score: 0.91 idle_state: no governing requirement for this output error_code: missing boundary requirement session_timeout: no assumption on input range # AI-assisted: derive system requirements from stakeholder needs $ proof req derive STK-REQ-001 --llm → 4 system requirements generated · status: draft · review: pending
Fig. 1 — Requirements flow through three levels. Gap analysis finds where specs are incomplete — unconstrained outputs, missing boundaries, absent assumptions. AI can draft requirements, but a human must always review and approve.
No rewrite You don't rewrite your software. You don't stop using your existing tools. Pick one critical path, spec it, verify it, trace it. Then expand. Proof lives alongside everything you already have.
AI workflow MCP server with 24 tools. Hooks for Claude Code and Codex. A dedicated AI agent handles the requirements authoring flow — with optimized prompts, compiler-in-the-loop validation, and automatic variable grounding.

§5 — Verify

Even when all tests pass — how do you know the implementation is correct?

Tests prove what they test. They don't prove what they miss. Proof runs formal verification to catch what testing cannot: conflicting requirements, unrealizable specifications, vacuous conditions, and missing constraints.

Figure 2 — The verification chain
SPEC validate DATA Z3 SMT MC/DC FLIP TEST fixtures CODE MC/DC TRACE bidir AUDIT evidence DOCS NPR 7150
Fig. 2 — The verification chain. Each node represents an independently verifiable layer. Spec-level and code-level MC/DC close the loop between what was specified and what was built.
Theorem

When all layers produce clean evidence, the system can demonstrate — not just claim — that what was specified is what was built, tested, traced, and shipped.

Property testing Define data shapes and merge strategies in your requirements. Z3 proves they hold for ALL inputs — not samples. Then auto-generates boundary test fixtures humans would miss.
Real bugs found Applied to Tyk API Gateway: found 1 crash bug (nil pointer when policy store unavailable) via gap analysis. Spec-from-code found nothing. Spec-from-intent found it.

§6 — Trace

See the blast radius of every change.

Every requirement links to the code that implements it, the tests that verify it, and the documentation that describes it. Change a requirement — and Proof shows you exactly what downstream artifacts need updating. Change code — and Proof flags which requirements may now be suspect.

Figure 3 — Traceability and blast radius
# What does this requirement touch? $ proof trace impact SYS-REQ-047 implements: pkg/session/apply.go:Apply() verifies: pkg/session/apply_test.go:TestApply() satisfies: STK-REQ-012 (parent stakeholder requirement) docs: generated-srs.html §4.3 # After a code change — what drifted? $ proof trace suspect --verbose ⚠ 2 suspect links SYS-REQ-047 → apply_test.go (requirement updated, test unchanged) SYS-REQ-089 → handler.go (code updated, requirement unchanged) # Full coverage view $ proof trace coverage ✓ 96.2% requirements traced implemented: 740/758 verified: 728/758 documented: 758/758
Fig. 3 — Traceability is not a report you generate at the end. It is a living chain that flags when any link goes stale.
Why this matters You don't miss anything. When a requirement changes, Proof shows exactly which tests, which code, which docs need updating. When code changes, it flags which requirements may need review.

§7 — MC/DC

Code-level MC/DC for Go. Nobody else has this.

MC/DC (Modified Condition/Decision Coverage) proves that every boolean condition in your code independently affects the decision outcome. It has historically required $50K+/year tools (LDRA, VectorCAST) — and none of them support Go. Proof does.

Figure 4 — Go code-level MC/DC measurement
# Measure MC/DC coverage for a Go package $ proof mcdc measure ./pkg/handler/... --experimental Code MC/DC Coverage Report Engine: go Pattern: ./pkg/handler/... Statements: 87.5% Decisions: 42 found, 38 fully covered Conditions: 94 total, 88 covered (93.6%) Hotspots: ┌──────────────────┬────────────────────┬──────────────────┐ │ ProcessRequest │ isValid && hasAuth │ hasAuth (skipped)│ │ MergePolicy │ a.Rate > b.Rate │ Rate (never <) │ └──────────────────┴────────────────────┴──────────────────┘ # Compare spec-level MC/DC with code-level $ proof mcdc show SYS-REQ-047 # spec truth table $ proof mcdc measure ./pkg/... --experimental # code coverage → close the loop until spec and code agree
Fig. 4 — Code-level MC/DC measurement via Go AST instrumentation. No source modification. Works with standard go test. Handles short-circuit evaluation correctly.

The loop works both ways: export spec-level MC/DC truth tables, compare them with code-level execution traces, and iterate until they agree. When they don't — you've found either a bug in the code or an issue in the spec.

How it works Copies source to temp dir, instruments AST with tracking calls, runs go test, collects which conditions were evaluated, finds MC/DC pairs automatically. Zero source modification.
The spec↔code loop This is the key insight: spec-level MC/DC (FLIP) proves the formula is well-structured. Code-level MC/DC proves the implementation covers all conditions. Together, they close the verification gap.

§8 — Paradigm

NASA-grade engineering, affordable to everyone.

This is not an incremental improvement. It is a different paradigm in software development — one that merges the rigor of aerospace and automotive engineering with the agility of modern development workflows.

Proposition

The regulatory world built these processes because lives depend on software. The modern software world needs them because AI is writing the code and nobody is verifying the intent.

Figure 5 — The four-stage workflow
$ proof workflow init --reqs SYS-REQ-720 → stage: SPEC # Stage 1: Write and validate requirements $ proof validate && proof gaps --check all $ proof workflow advance → IMPLEMENT # Stage 2: Write code with annotations, run tests $ go test ./... && proof lint $ proof workflow advance → VERIFY # Stage 3: Formal verification + traceability $ proof verify && proof trace coverage $ proof workflow advance → DOCUMENT # Stage 4: Generate compliance documents $ proof doc generate npr7150-srs --format html $ proof audit --scope full ✓ Errors: 0 Warnings: 0 · 52 checks passed
Fig. 5 — Four stages with quality gates. Each stage blocks advancement until its checks pass. Not bureaucracy — the feedback loop that catches real bugs. Takes 5 seconds with --quick.
Standards covered NPR 7150.2D (NASA), DO-178C (aviation), ISO 26262 (automotive), IEC 62304 (medical), ISO 29148, IEEE 830, INCOSE. Full compliance matrices, not marketing claims.
Built for AI agents Proof is a deterministic workflow engine with strict rules — exactly what AI agents need. No ambiguity: the workflow blocks or advances. The MCP server exposes 24 tools so agents follow the same verification chain as humans. AI can draft, analyze, and iterate — but can never approve.

§9 — Q.E.D.

Replace a $250K/year toolchain
with a single CLI.

Your data is YAML files in git — zero vendor lock-in, even if Proof disappears. AI drafts requirements, you review and approve. The --quick mode gives 5-second feedback. Start for free. Ship with NASA-grade evidence.

Meet the founder
Leonid Bugaev

Leonid Bugaev

Founder & CEO

Two decades of shipping enterprise software — banks, governments, API infrastructure, large-scale distributed systems. I've lived on both sides: the move-fast startup world and the compliance-heavy enterprise world. I know how engineering actually works in organizations — the refinement sessions that miss edge cases, the documentation that drifts, the test coverage numbers that don't tell the real story.

Proof exists because I couldn't find a tool that combined both worlds. The agility of modern development and the discipline of aerospace-grade verification. So I built one.

Connect on LinkedIn →