🎯

statistical-analysis

🎯Skill

from pluginagentmarketplace/custom-plugin-ai-data-scientist

What it does

Performs rigorous statistical analysis using Python's SciPy, enabling hypothesis testing, A/B testing, and data validation across various statistical methods.

📦

Part of

pluginagentmarketplace/custom-plugin-ai-data-scientist(12 items)

statistical-analysis

Installation

Add MarketplaceAdd marketplace to Claude Code

/plugin marketplace add pluginagentmarketplace/custom-plugin-ai-data-scientist

Install PluginInstall plugin from marketplace

/plugin install ai-data-scientist-plugin@pluginagentmarketplace-ai-data-scientist

git cloneClone repository

git clone https://github.com/pluginagentmarketplace/custom-plugin-ai-data-scientist.git

Claude CodeAdd plugin in Claude Code

/plugin load .

📖 Extracted from docs: pluginagentmarketplace/custom-plugin-ai-data-scientist

Need more details? View full documentation on GitHub →

7Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Probability, distributions, hypothesis testing, and statistical inference. Use for A/B testing, experimental design, or statistical validation.

Overview

# Statistical Analysis

Apply statistical methods to understand data and validate findings.

Quick Start

```python

from scipy import stats

import numpy as np

# Descriptive statistics

data = np.array([1, 2, 3, 4, 5])

print(f"Mean: {np.mean(data)}")

print(f"Std: {np.std(data)}")

# Hypothesis testing

group1 = [23, 25, 27, 29, 31]

group2 = [20, 22, 24, 26, 28]

t_stat, p_value = stats.ttest_ind(group1, group2)

print(f"P-value: {p_value}")

```

Core Tests

T-Test (Compare Means)

```python

# One-sample: Compare to population mean

stats.ttest_1samp(data, 100)

# Two-sample: Compare two groups

stats.ttest_ind(group1, group2)

# Paired: Before/after comparison

stats.ttest_rel(before, after)

```

Chi-Square (Categorical Data)

```python

from scipy.stats import chi2_contingency

observed = np.array([[10, 20], [15, 25]])

chi2, p_value, dof, expected = chi2_contingency(observed)

```

ANOVA (Multiple Groups)

```python

f_stat, p_value = stats.f_oneway(group1, group2, group3)

```

Confidence Intervals

```python

from scipy import stats

confidence_level = 0.95

mean = np.mean(data)

se = stats.sem(data)

ci = stats.t.interval(confidence_level, len(data)-1, mean, se)

print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")

```

Correlation

```python

# Pearson (linear)

r, p_value = stats.pearsonr(x, y)

# Spearman (rank-based)

rho, p_value = stats.spearmanr(x, y)

```

Distributions

```python

# Normal

x = np.linspace(-3, 3, 100)

pdf = stats.norm.pdf(x, loc=0, scale=1)

# Sampling

samples = np.random.normal(0, 1, 1000)

# Test normality

stat, p_value = stats.shapiro(data)

```

A/B Testing Framework

```python

def ab_test(control, treatment, alpha=0.05):

"""

Run A/B test with statistical significance

Returns: significant (bool), p_value (float)

"""

t_stat, p_value = stats.ttest_ind(control, treatment)

significant = p_value < alpha

improvement = (np.mean(treatment) - np.mean(control)) / np.mean(control) * 100

return {

'significant': significant,

'p_value': p_value,

'improvement': f"{improvement:.2f}%"

}

```

Interpretation

P-value < 0.05: Reject null hypothesis (statistically significant)

P-value >= 0.05: Fail to reject null (not significant)

Common Pitfalls

Multiple testing without correction
Small sample sizes
Ignoring assumptions (normality, independence)
Confusing correlation with causation
p-hacking (searching for significance)

Troubleshooting

Common Issues

Problem: Non-normal data for t-test

```python

# Check normality first

stat, p = stats.shapiro(data)

if p < 0.05:

# Use non-parametric alternative

stat, p = stats.mannwhitneyu(group1, group2) # Instead of ttest_ind

```

Problem: Multiple comparisons inflating false positives

```python

from statsmodels.stats.multitest import multipletests

# Apply Bonferroni correction

p_values = [0.01, 0.03, 0.04, 0.02, 0.06]

rejected, p_adjusted, _, _ = multipletests(p_values, method='bonferroni')

```

Problem: Underpowered study (sample too small)

```python

from statsmodels.stats.power import TTestIndPower

# Calculate required sample size

power_analysis = TTestIndPower()

sample_size = power_analysis.solve_power(

effect_size=0.5, # Medium effect (Cohen's d)

power=0.8, # 80% power

alpha=0.05 # 5% significance

)

print(f"Required n per group: {sample_size:.0f}")

```

Problem: Heterogeneous variances

```python

# Check with Levene's test

stat, p = stats.levene(group1, group2)

if p < 0.05:

# Use Welch's t-test (default in scipy)

t, p = stats.ttest_ind(group1, group2, equal_var=False)

```

Problem: Outliers affecting results

```python

from scipy.stats import zscore

# Detect outliers (|z| > 3)

z_scores = np.abs(zscore(data))

clean_data = data[z_scores < 3]

# Or use robust statistics

median = np.median(data)

mad = np.median(np.abs(data - median)) # Median Absolute Deviation

```

Debug Checklist

[ ] Check sample size adequacy (power analysis)
[ ] Test normality assumption (Shapiro-Wilk)
[ ] Test homogeneity of variance (Levene's)
[ ] Check for outliers (z-scores, IQR)
[ ] Apply multiple testing correction if needed
[ ] Report effect sizes, not just p-values

More from this repository10

🎯

reinforcement-learning🎯Skill

Trains intelligent agents to learn optimal behaviors through interaction with environments using reinforcement learning techniques.

🎯

computer-vision🎯Skill

Processes and analyzes images using deep learning models for classification, detection, and visual understanding tasks.

🎯

machine-learning🎯Skill

Builds, trains, and evaluates machine learning models for classification, regression, and clustering using scikit-learn's powerful algorithms and techniques.

🎯

data-visualization🎯Skill

Generates interactive data visualizations and performs exploratory data analysis using Matplotlib, Seaborn, Plotly, and other visualization tools.

🎯

time-series🎯Skill

Performs time series analysis using ARIMA, SARIMA, Prophet, detecting trends, seasonality, and anomalies for precise temporal predictions.

🎯

python-programming🎯Skill

Enables efficient Python programming for data science, covering fundamentals, data manipulation, and advanced library usage with NumPy and Pandas.

🎯

data-engineering🎯Skill

Builds scalable data pipelines and infrastructure using Apache Spark, Airflow, and big data processing techniques for efficient ETL workflows.

🎯

deep-learning🎯Skill

Develops neural network models using PyTorch and TensorFlow for advanced machine learning tasks like image classification, NLP, and pattern recognition.

🎯

model-optimization🎯Skill

Optimizes machine learning models through techniques like quantization, pruning, hyperparameter tuning, and AutoML for improved performance and efficiency.

🎯

nlp-processing🎯Skill

nlp-processing skill from pluginagentmarketplace/custom-plugin-ai-data-scientist