Experiment Designer

Name: Experiment Designer
Author: Alireza Rezvani

Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor.

$ npx promptcreek add experiment-designer

Auto-detects your installed agents and installs the skill to each one.

What This Skill Does

The Experiment Designer skill helps product teams design, prioritize, and evaluate product experiments. It guides users through hypothesis creation, metric definition, sample size estimation, ICE scoring, and results interpretation to make data-driven product decisions. This skill is ideal for product managers, data scientists, and engineers involved in A/B testing and experimentation.

When to Use

Planning A/B and multivariate experiments.
Writing clear hypotheses with measurable outcomes.
Estimating sample size for statistical significance.
Prioritizing experiments using ICE scoring.
Interpreting statistical output for product decisions.
Defining primary, guardrail, and secondary metrics.

Key Features

Generates hypotheses in If/Then/Because format.

Calculates sample size based on baseline and MDE.

Prioritizes experiments using the ICE scoring model.

Provides a hypothesis quality checklist.

Identifies common experiment pitfalls.

Defines key metrics for experiment success.

Installation

Run in your project directory:

$ npx promptcreek add experiment-designer

Auto-detects your installed agents (Claude Code, Cursor, Codex, etc.) and installs the skill to each one.

View Full Skill Content

Experiment Designer

Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions.

When To Use

Use this skill for:

A/B and multivariate experiment planning
Hypothesis writing and success criteria definition
Sample size and minimum detectable effect planning
Experiment prioritization with ICE scoring
Reading statistical output for product decisions

Core Workflow

Write hypothesis in If/Then/Because format
If we change [intervention]
Then [metric] will change by [expected direction/magnitude]
Because [behavioral mechanism]

Define metrics before running test
Primary metric: single decision metric
Guardrail metrics: quality/risk protection
Secondary metrics: diagnostics only

Estimate sample size
Baseline conversion or baseline mean
Minimum detectable effect (MDE)
Significance level (alpha) and power

Use:

python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute

Prioritize experiments with ICE
Impact: potential upside
Confidence: evidence quality
Ease: cost/speed/complexity

ICE Score = (Impact Confidence Ease) / 10

Launch with stopping rules
Decide fixed sample size or fixed duration in advance
Avoid repeated peeking without proper method
Monitor guardrails continuously

Interpret results
Statistical significance is not business significance
Compare point estimate + confidence interval to decision threshold
Investigate novelty effects and segment heterogeneity

Hypothesis Quality Checklist

[ ] Contains explicit intervention and audience
[ ] Specifies measurable metric change
[ ] States plausible causal reason
[ ] Includes expected minimum effect
[ ] Defines failure condition

Common Experiment Pitfalls

Underpowered tests leading to false negatives
Running too many simultaneous changes without isolation
Changing targeting or implementation mid-test
Stopping early on random spikes
Ignoring sample ratio mismatch and instrumentation drift
Declaring success from p-value without effect-size context

Statistical Interpretation Guardrails

p-value < alpha indicates evidence against null, not guaranteed truth.
Confidence interval crossing zero/no-effect means uncertain directional claim.
Wide intervals imply low precision even when significant.
Use practical significance thresholds tied to business impact.

See:

references/experiment-playbook.md
references/statistics-reference.md

Tooling

`scripts/sample_size_calculator.py`

Computes required sample size (per variant and total) from:

baseline rate
MDE (absolute or relative)
significance level (alpha)
statistical power

Example:

python3 scripts/sample_size_calculator.py \ --baseline-rate 0.10 \ --mde 0.015 \ --mde-type absolute \ --alpha 0.05 \

--power 0.8

0Installs

0Views

Supported Agents

Claude CodeCursorCodexGemini CLIAiderWindsurfOpenClaw

Attribution

Alireza Rezvani

alirezarezvani/claude-skills

MITseeded

Details

License: MIT
Source: seeded
Published: 3/17/2026

Related Skills

CPO Advisor

Product leadership for scaling companies. Product vision, portfolio strategy, product-market fit, and product org design. Use when setting product vision, managing a product portfolio, measuring PMF, designing product teams, prioritizing at the portfolio level, reporting to the board on product, or when user mentions CPO, product strategy, product-market fit, product organization, portfolio prioritization, or roadmap strategy.

Alireza Rezvani

#c-level#c-level advisor

Onboarding CRO

When the user wants to optimize post-signup onboarding, user activation, first-run experience, or time-to-value. Also use when the user mentions "onboarding flow," "activation rate," "user activation," "first-run experience," "empty states," "onboarding checklist," "aha moment," or "new user experience." For signup/registration optimization, see signup-flow-cro. For ongoing email sequences, see email-sequence.

Alireza Rezvani

#marketing

Paywall Upgrade CRO

When the user wants to create or optimize in-app paywalls, upgrade screens, upsell modals, or feature gates. Also use when the user mentions "paywall," "upgrade screen," "upgrade modal," "upsell," "feature gate," "convert free to paid," "freemium conversion," "trial expiration screen," "limit reached screen," "plan upgrade prompt," or "in-app pricing." Distinct from public pricing pages (see page-cro) — this skill focuses on in-product upgrade moments where the user has already experienced value.

Alireza Rezvani

#marketing

Experiment Designer

What This Skill Does

When to Use

Key Features

Installation

Experiment Designer

When To Use

Core Workflow

Hypothesis Quality Checklist

Common Experiment Pitfalls

Statistical Interpretation Guardrails

Tooling

scripts/sample_size_calculator.py

Supported Agents

Attribution

Details

Tags

Related Skills

CPO Advisor

Onboarding CRO

Paywall Upgrade CRO

`scripts/sample_size_calculator.py`