Setup
Set up a new autoresearch experiment interactively. Collects domain, target file, eval command, metric, direction, and evaluator.
$ npx promptcreek add setupAuto-detects your installed agents and installs the skill to each one.
What This Skill Does
This skill sets up a new autoresearch experiment by collecting necessary configuration details. It supports both interactive mode, where it prompts the user for each parameter, and direct mode, where arguments are provided via the command line. It's useful for engineers and data scientists who want to automate and track experiments.
When to Use
- Start a new engineering experiment.
- Optimize a specific target file.
- Compare different evaluation metrics.
- List existing experiments.
- Show available evaluators.
- Quickly configure an experiment.
Key Features
Installation
$ npx promptcreek add setupAuto-detects your installed agents (Claude Code, Cursor, Codex, etc.) and installs the skill to each one.
View Full Skill Content
/ar:setup — Create New Experiment
Set up a new autoresearch experiment with all required configuration.
Usage
/ar:setup # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list # Show existing experiments
/ar:setup --list-evaluators # Show available evaluators
What It Does
If arguments provided
Pass them directly to the setup script:
python {skill_path}/scripts/setup_experiment.py \
--domain {domain} --name {name} \
--target {target} --eval "{eval_cmd}" \
--metric {metric} --direction {direction} \
[--evaluator {evaluator}] [--scope {scope}]
If no arguments (interactive mode)
Collect each parameter one at a time:
- Domain — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
- Name — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
- Target file — Ask: "Which file to optimize?" Verify it exists.
- Eval command — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
- Metric — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
- Direction — Ask: "Is lower or higher better?"
- Evaluator (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
- Scope — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"
Then run setup_experiment.py with the collected parameters.
Listing
# Show existing experiments
python {skill_path}/scripts/setup_experiment.py --list
Show available evaluators
python {skill_path}/scripts/setup_experiment.py --list-evaluators
Built-in Evaluators
| Name | Metric | Use Case |
|------|--------|----------|
| benchmark_speed | p50_ms (lower) | Function/API execution time |
| benchmark_size | size_bytes (lower) | File, bundle, Docker image size |
| test_pass_rate | pass_rate (higher) | Test suite pass percentage |
| build_speed | build_seconds (lower) | Build/compile/Docker build time |
| memory_usage | peak_mb (lower) | Peak memory during execution |
| llm_judge_content | ctr_score (higher) | Headlines, titles, descriptions |
| llm_judge_prompt | quality_score (higher) | System prompts, agent instructions |
| llm_judge_copy | engagement_score (higher) | Social posts, ad copy, emails |
After Setup
Report to the user:
- Experiment path and branch name
- Whether the eval command worked and the baseline metric
- Suggest: "Run
/ar:run {domain}/{name}to start iterating, or/ar:loop {domain}/{name}for autonomous mode."
Supported Agents
Attribution
Details
- License
- MIT
- Source
- seeded
- Published
- 3/17/2026
Related Skills
Agent Protocol
Inter-agent communication protocol for C-suite agent teams. Defines invocation syntax, loop prevention, isolation rules, and response formats. Use when C-suite agents need to query each other, coordinate cross-functional analysis, or run board meetings with multiple agent roles.
CTO Advisor
Technical leadership guidance for engineering teams, architecture decisions, and technology strategy. Use when assessing technical debt, scaling engineering teams, evaluating technologies, making architecture decisions, establishing engineering metrics, or when user mentions CTO, tech debt, technical debt, team scaling, architecture decisions, technology evaluation, engineering metrics, DORA metrics, or technology strategy.
Agent Workflow Designer
Agent Workflow Designer