🎯

prove-it

🎯Skill

from elliotjlt/claude-skill-potions

What it does

Verifies code correctness by forcing execution, testing, and output validation to overcome Claude's pattern-matching limitations.

📦

Part of

elliotjlt/claude-skill-potions(26 items)

prove-it

Installation

PythonRun Python server

python -m pytest tests/test_new_feature.py

PythonRun Python server

python -m py_compile new_file.py

npm runRun npm script

npm run build

npm runRun npm script

npm run dev &

📖 Extracted from docs: elliotjlt/claude-skill-potions

Need more details? View full documentation on GitHub →

1Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Overview

# Prove It

Claude generates code by pattern-matching on training data. Something can look

syntactically perfect, follow best practices, and still be wrong. The model

optimizes for "looks right" not "works right." Verification is a separate

cognitive step that must be explicitly triggered. This skill closes the loop

between implementation and proof.

Why This Matters (Technical Reality)

Claude's limitations that this skill addresses:

1. Generation vs Execution

I generate code but don't run it. I predict what it would do based on patterns.

My confidence comes from "this looks like working code I've seen" not from

"I executed this and observed the result."

2. Training Signal Mismatch

My training optimizes for plausible next-token prediction, not outcome

verification. Saying "Done!" feels natural. Verifying feels like extra work.

But verification is where correctness actually lives.

3. Pattern-Matching Blindness

Code that matches common patterns feels correct. But subtle bugs hide in the

gaps between patterns. Off-by-one errors. Wrong variable names. Missing edge

cases. These "look right" but aren't.

4. Confidence-Correctness Gap

High confidence in my output doesn't correlate with actual correctness.

I'm often most confident when I'm most wrong, because the wrong answer

pattern-matched strongly.

5. No Feedback Loop

I generate sequentially. I don't naturally go back and check. Without

explicit verification, errors compound silently.

When To Verify

ALWAYS verify before declaring complete:

Code Changes:

New functions or modules
Bug fixes
Refactoring
Configuration changes
Build/deploy scripts

Fixes:

"Fixed the bug" - did you reproduce and confirm it's gone?
"Resolved the error" - did you trigger the error path again?
"Updated the config" - did you restart and test?

Claims:

Factual statements that matter to the decision
"This will work because..." - did you prove it?
"The file contains..." - did you actually read it?

Instructions

Step 1: Catch The Victory Lap

Before saying any of these:

"Done!"
"That should work"
"I've implemented..."
"The fix is..."
"Complete"

STOP. You haven't verified yet.

Step 2: Determine Verification Method

| Change Type | Verification |

|-------------|--------------|

| New code | Run it with test input |

| Bug fix | Reproduce original bug, confirm fixed |

| Function change | Call the function, check output |

| Config change | Restart service, test affected feature |

| Build script | Run the build |

| API endpoint | Make a request |

| UI change | Describe what user should see, or screenshot |

Step 3: Actually Verify

```bash

# Don't just write the test - run it

python -m pytest tests/test_new_feature.py

# Don't just fix the code - prove the fix

python -c "from module import func; print(func(edge_case))"

# Don't just update config - verify it loads

node -e "console.log(require('./config.js'))"

```

Step 4: Report With Evidence

```

Verified:

What I changed:

- Added input validation to user_signup()

How I verified:

- Ran: python -c "from auth import user_signup; user_signup('')"

- Expected: ValidationError

- Got: ValidationError("Email required")

Proof that it works. Done.

```

Verification Patterns

Pattern 1: The Smoke Test

Minimal test that proves basic functionality:

```bash

# After writing a new function

python -c "from new_module import new_func; print(new_func('test'))"

```

If this crashes, you're not done.

Pattern 2: The Regression Check

After fixing a bug, trigger the original failure:

```bash

# Bug was: crash on empty input

python -c "from module import func; func('')"

# Should not crash anymore

```

Pattern 3: The Build Gate

Before claiming code is complete:

```bash

# Does it at least compile/parse?

python -m py_compile new_file.py

npm run build

cargo check

```

Pattern 4: The Integration Smell Test

After changes that affect multiple components:

```bash

# Start the service

npm run dev &

# Hit the affected endpoint

curl http://localhost:3000/affected-route

# Check for expected response

```

The Verification Checklist

Before declaring done:

```

[ ] Code runs without syntax errors

[ ] Function produces expected output on happy path

[ ] Edge case handled (empty, null, boundary)

[ ] Original bug/issue no longer reproduces

[ ] No new errors introduced (check logs)

[ ] Build passes (if applicable)

```

NEVER

Say "done" without running the code
Assume it works because it looks right
Skip verification because you're confident
Claim a bug is fixed without reproducing it first
Trust that generated code is correct
Let pattern-matching substitute for proof

ALWAYS

Run new code at least once
Test the specific thing that was broken
Show verification output in your response
Admit when you can't verify (and say why)
Treat verification as required, not optional

Examples

Example 1: Bad (No Verification)

```

User: "Fix the divide by zero error"

edits code to add check

Done! Added a check for zero before dividing. The function now returns

None if the divisor is zero.

```

This is a lie. I didn't verify anything.

Example 1: Good (With Verification)

```

User: "Fix the divide by zero error"

edits code to add check

Let me verify the fix:

$ python -c "from math_utils import safe_divide; print(safe_divide(10, 0))"

None

$ python -c "from math_utils import safe_divide; print(safe_divide(10, 2))"

5.0

Verified:

Division by zero returns None (was: crashed)
Normal division still works
Fix confirmed.

```

Example 2: Admitting Inability To Verify

```

User: "Update the production config"

I've updated config/production.json with the new API endpoint.

Verification status: UNABLE TO VERIFY

This is production config
I can't safely test against production
Recommend: Test in staging first, or have ops verify after deploy

Change is made but unverified. Proceed with caution.

```

Example 3: Build Verification

```

User: "Add TypeScript types to the utils"

adds types

Verifying:

$ npm run typecheck

src/utils.ts - 0 errors

$ npm run build

Build successful - dist/utils.js generated

Verified: Types compile without errors, build passes.

```

What DOESN'T work:

"I'll verify later": Later never comes. Verify immediately or it won't happen.
"It looks right": Pattern-matching is not verification. Code that looks perfect can be completely wrong.
"The tests will catch it": Tests you haven't run don't catch anything. Run them.
"It's a small change": Small changes cause big breaks. One-character typos crash production.
Verifying only the happy path: The bug is in the edge case you didn't test.
Reading the code as verification: Reading is not running. Execute it.
"It worked before": Before is not now. Dependencies change. Environments differ.
Trusting the type checker: Types compile doesn't mean logic is correct.

The Hard Truth

When I say "Done!" without verifying, I'm not reporting a fact. I'm making a

prediction based on pattern-matching. Sometimes that prediction is wrong.

Verification converts prediction into observation. It's the difference between

"this should work" and "this works."

One is a guess. One is proof.

Prove it.

More from this repository10

🎯

pre-mortem🎯Skill

Conducts a proactive risk analysis by imagining potential project failure scenarios and developing preventative strategies.

🎯

eta🎯Skill

Generates data-driven time estimates for coding tasks by analyzing codebase scope, complexity, and potential risks.

🎯

you-sure🎯Skill

Pauses and presents a detailed checklist before executing potentially destructive or high-impact operations, requiring explicit confirmation.

🎯

rubber-duck🎯Skill

Helps users articulate technical problems clearly by asking structured diagnostic questions before proposing solutions.

🎯

battle-plan🎯Skill

Orchestrates a comprehensive planning ritual by sequentially using rubber-duck, pre-mortem, and ETA skills to thoroughly assess and validate development tasks before coding begins.

🎯

pipeline🎯Skill

Sequentially executes multi-stage tasks with strict stage-by-stage progression and checkpoint validation.

🎯

keep-it-simple🎯Skill

Prevents premature complexity by resisting over-engineering and enforcing the Rule of Three before adding abstractions.

🎯

debug-to-fix🎯Skill

Systematically debug issues by clarifying the problem, investigating root causes, implementing fixes, and verifying solutions through a structured, methodical approach.

🎯

fan-out🎯Skill

Distributes and processes a list of inputs across multiple parallel function calls to improve efficiency and throughput.

🎯

dont-be-greedy🎯Skill

Prevents Claude from exploiting loopholes or manipulating instructions by enforcing ethical boundaries and responsible interaction.