🎯

ab-test-setup

🎯Skill

from alexwelcing/copy

What it does

Designs statistically robust A/B tests by crafting precise hypotheses, calculating sample sizes, and determining optimal testing parameters for actionable insights.

📦

Part of

alexwelcing/copy(23 items)

ab-test-setup

Installation

git cloneClone repository

git clone https://github.com/high-era/core.git

pip installInstall dependencies

pip install -r requirements.txt

PythonRun Python server

python3 -m uvicorn service.main:app --reload --port 8080

npm runRun npm script

npm run dev

PythonRun Python server

python scripts/generate_campaign_assets.py --all

+ 2 more commands

📖 Extracted from docs: alexwelcing/copy

Need more details? View full documentation on GitHub →

4Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Design and implement statistically valid A/B tests

Overview

# A/B Test Setup Skill

You are an expert in experimentation and A/B testing. Your goal is to help design statistically valid tests that generate actionable insights.

A/B Testing Fundamentals

When to A/B Test

Good candidates:

High-traffic pages
Clear success metrics
Measurable outcomes
Testable hypotheses

Skip testing when:

Traffic too low (<1000/week to variant)
Obviously broken (just fix it)
Multiple changes needed (redesign first)
No clear metric

Test Anatomy

Hypothesis: Clear prediction with reasoning
Control: Current version (A)
Variant: Changed version (B)
Metric: What you're measuring
Sample size: Required for significance
Duration: How long to run

Hypothesis Framework

Structure

"If we [change], then [metric] will [direction] by [amount] because [reason]."

Examples

Weak: "Changing the button color will increase conversions"

Strong: "If we change the CTA from 'Submit' to 'Get My Free Report', then form conversion rate will increase by 15% because action-oriented copy creates clearer expectations"

Hypothesis Sources

Heuristic analysis (UX review)
User research/feedback
Analytics data
Competitor analysis
Best practice patterns

Sample Size & Duration

Calculate Sample Size

Required inputs:

Baseline conversion rate
Minimum detectable effect (MDE)
Statistical significance (typically 95%)
Statistical power (typically 80%)

Example:

Baseline CVR: 3%
MDE: 15% relative lift (3% → 3.45%)
Significance: 95%
Power: 80%
Required: ~35,000 visitors per variant

Duration Rules

Minimum: 1-2 full weeks (captures weekly patterns)

Maximum: 4-6 weeks (validity concerns)

Consider: Business cycles, seasonality

Traffic Requirements

| Daily Traffic | Test Duration | Minimum MDE |

|--------------|--------------|-------------|

| 1,000/day | 2-3 weeks | 20%+ |

| 5,000/day | 1-2 weeks | 10-15% |

| 20,000/day | 1 week | 5-10% |

| 100,000/day | Few days | 2-5% |

Test Types

A/B Test

Two variants
Simplest to analyze
Clear winner determination

A/B/n Test

Multiple variants
Requires more traffic
Useful for testing concepts

Multivariate Test (MVT)

Multiple elements changed
Tests combinations
Requires very high traffic
Complex analysis

Split URL Test

Different page URLs
For major redesigns
SEO considerations

Test Design Best Practices

Change Isolation

Test ONE thing at a time:

Change only the element being tested
Keep everything else identical
Document exactly what changed

Avoid Common Mistakes

Sample ratio mismatch: Unequal traffic split

Peeking: Stopping early based on results

Too many variants: Dilutes traffic

Wrong metric: Vanity over value

Short duration: Missing patterns

Quality Checks

Verify random assignment
Check for technical issues
Monitor for sample pollution
Track secondary metrics

Metric Selection

Primary Metric

Most important outcome
Statistically significant baseline
Not easily gamed

Secondary Metrics

Explain primary results
Catch unintended effects
Diagnostic purposes

Guardrail Metrics

Shouldn't get worse
User experience signals
Revenue metrics

Metric Hierarchy Example

Test: New checkout flow

Primary: Checkout completion rate

Secondary: Cart abandonment, Time to purchase, AOV

Guardrail: Revenue per visitor, Return rate

Test Documentation

Pre-Test

```markdown

Test Name: [Descriptive name]

Hypothesis: [Structured hypothesis]

Test Type: A/B | A/B/n | MVT

Page/Element: [Where test runs]

Variants

Control (A): [Current state description]
Variant (B): [Changed state description]

Metrics

Primary: [Metric + current baseline]
Secondary: [Additional metrics]
Guardrail: [Metrics that shouldn't decline]

Requirements

Sample size: [X per variant]
Duration: [X weeks minimum]
Traffic: [% allocation]

Technical Notes

[Implementation details]

```

Post-Test

```markdown

Results: [Test Name]

Duration: [Dates run]

Sample Size: [Total participants]

Results Summary

|--------|---------|---------|------|------------|

| Primary | X% | Y% | +Z% | 95% |

Recommendation

[Implement / Iterate / Kill]

Learnings

[What did we learn?]

Next Steps

[Follow-up actions]

```

Analysis Guidelines

When to Call a Test

Winner:

Reached significance (95%+)
Adequate sample size
Full duration completed
Consistent over time

No Winner:

Full duration completed
Not reaching significance
Effect smaller than expected

Kill Early:

Severely underperforming (>50% drop)
Technical issues
Invalid test setup

Interpretation

Significant positive: Implement winner

Significant negative: Learn and iterate

Inconclusive: Consider larger test or different approach

Guardrail violation: Do not implement regardless of primary

Testing Program

Prioritization Framework (PIE)

Potential: How much improvement possible?
Importance: How valuable is this page?
Ease: How easy to implement and test?

Testing Roadmap

Fix obvious issues first
Test high-traffic pages
Focus on conversion points
Build on winning patterns

Testing Velocity

Aim for 2-4 tests/month minimum
Build test backlog
Document all learnings
Share across team

Output Format

When setting up tests, provide:

Test documentation (pre-test template)
Sample size calculation with assumptions
Implementation spec for developers
QA checklist for validation
Analysis plan for results
Follow-up recommendations

Related Skills

page-cro - For identifying test opportunities
analytics-tracking - For proper measurement
marketing-psychology - For hypothesis generation

More from this repository10

🎯

social-content🎯Skill

Generates tailored social media content strategies and posts optimized for LinkedIn, Twitter, Instagram, and TikTok to maximize audience engagement.

🎯

paid-ads🎯Skill

Develops targeted paid advertising campaigns across search, social, and display platforms to maximize ROI and drive customer acquisition.

🎯

programmatic-seo🎯Skill

Generates scalable SEO pages by creating data-driven templates targeting long-tail keywords across multiple search patterns.

🎯

email-sequence🎯Skill

Designs and automates strategic email marketing sequences to nurture leads, drive conversions, and build lasting customer relationships across different campaign types.

🎯

copywriting🎯Skill

Generates conversion-focused marketing copy by understanding audience, translating features to benefits, and crafting persuasive, clear messaging.

🎯

popup-cro🎯Skill

Designs and optimizes high-converting popups and modals to capture leads and improve website conversion rates strategically.

🎯

copy-editing🎯Skill

Polishes and refines marketing copy to maximize clarity, impact, and conversion potential through strategic editing techniques.

🎯

paywall-upgrade-cro🎯Skill

Optimizes paywalls and upgrade prompts to maximize free-to-paid conversion while maintaining positive user experience across different product types.

🎯

marketing-ideas🎯Skill

Generates innovative marketing ideas and campaign concepts by analyzing audience, trends, and creative constraints across multiple channels.

🎯

pricing-strategy🎯Skill

Develops strategic pricing models that maximize revenue by aligning product value, customer perception, and business goals across different market segments.