Back to Skills

Nextflow Development

Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.

$ npx promptcreek add nextflow-development

Auto-detects your installed agents and installs the skill to each one.

What This Skill Does

This skill assists users in deploying nf-core bioinformatics pipelines for analyzing sequencing data. It guides researchers, even those without extensive bioinformatics expertise, through the process of running pipelines for tasks like differential expression analysis and variant calling. The skill focuses on simplifying the deployment and execution of these complex pipelines.

When to Use

  • Run RNA-seq pipelines for differential expression analysis.
  • Execute WGS/WES pipelines for variant calling.
  • Deploy ATAC-seq pipelines for chromatin accessibility analysis.
  • Analyze local sequencing data using nf-core pipelines.
  • Process data obtained from public repositories like GEO/SRA.
  • Troubleshoot environment issues related to Docker and Nextflow.

Key Features

Provides a step-by-step workflow checklist for pipeline deployment.
Offers scripts for acquiring data from GEO/SRA.
Includes environment checks to ensure proper setup.
Provides fix instructions for common Docker and Nextflow issues.
Supports running test profiles to validate pipeline configuration.
Guides users in creating samplesheets for their data.

Installation

Run in your project directory:
$ npx promptcreek add nextflow-development

Auto-detects your installed agents (Claude Code, Cursor, Codex, etc.) and installs the skill to each one.

View Full Skill Content

nf-core Pipeline Deployment

Run nf-core bioinformatics pipelines on local or public sequencing data.

Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis.

Workflow Checklist

- [ ] Step 0: Acquire data (if from GEO/SRA)
  • [ ] Step 1: Environment check (MUST pass)
  • [ ] Step 2: Select pipeline (confirm with user)
  • [ ] Step 3: Run test profile (MUST pass)
  • [ ] Step 4: Create samplesheet
  • [ ] Step 5: Configure & run (confirm genome with user)
  • [ ] Step 6: Verify outputs

Step 0: Acquire Data (GEO/SRA Only)

Skip this step if user has local FASTQ files.

For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow.

Quick start:

# 1. Get study info

python scripts/sra_geo_fetch.py info GSE110004

2. Download (interactive mode)

python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i

3. Generate samplesheet

python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv

DECISION POINT: After fetching study info, confirm with user:

  • Which sample subset to download (if multiple data types)
  • Suggested genome and pipeline

Then continue to Step 1.


Step 1: Environment Check

Run first. Pipeline will fail without passing environment.

python scripts/check_environment.py

All critical checks must pass. If any fail, provide fix instructions:

Docker issues

| Problem | Fix |

|---------|-----|

| Not installed | Install from https://docs.docker.com/get-docker/ |

| Permission denied | sudo usermod -aG docker $USER then re-login |

| Daemon not running | sudo systemctl start docker |

Nextflow issues

| Problem | Fix |

|---------|-----|

| Not installed | curl -s https://get.nextflow.io \| bash && mv nextflow ~/bin/ |

| Version < 23.04 | nextflow self-update |

Java issues

| Problem | Fix |

|---------|-----|

| Not installed / < 11 | sudo apt install openjdk-11-jdk |

Do not proceed until all checks pass. For HPC/Singularity, see references/troubleshooting.md.


Step 2: Select Pipeline

DECISION POINT: Confirm with user before proceeding.

| Data Type | Pipeline | Version | Goal |

|-----------|----------|---------|------|

| RNA-seq | rnaseq | 3.22.2 | Gene expression |

| WGS/WES | sarek | 3.7.1 | Variant calling |

| ATAC-seq | atacseq | 2.1.2 | Chromatin accessibility |

Auto-detect from data:

python scripts/detect_data_type.py /path/to/data

For pipeline-specific details:


Step 3: Run Test Profile

Validates environment with small data. MUST pass before real data.

nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output

| Pipeline | Command |

|----------|---------|

| rnaseq | nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq |

| sarek | nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek |

| atacseq | nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq |

Verify:

ls test_output/multiqc/multiqc_report.html

grep "Pipeline completed successfully" .nextflow.log

If test fails, see references/troubleshooting.md.


Step 4: Create Samplesheet

Generate automatically

python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csv

The script:

  • Discovers FASTQ/BAM/CRAM files
  • Pairs R1/R2 reads
  • Infers sample metadata
  • Validates before writing

For sarek: Script prompts for tumor/normal status if not auto-detected.

Validate existing samplesheet

python scripts/generate_samplesheet.py --validate samplesheet.csv <pipeline>

Samplesheet formats

rnaseq:

sample,fastq_1,fastq_2,strandedness

SAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,auto

sarek:

patient,sample,lane,fastq_1,fastq_2,status

patient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1

patient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0

atacseq:

sample,fastq_1,fastq_2,replicate

CONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1


Step 5: Configure & Run

5a. Check genome availability

python scripts/manage_genomes.py check <genome>

If not installed:

python scripts/manage_genomes.py download <genome>

Common genomes: GRCh38 (human), GRCh37 (legacy), GRCm39 (mouse), R64-1-1 (yeast), BDGP6 (fly)

5b. Decision points

DECISION POINT: Confirm with user:

  • Genome: Which reference to use
  • Pipeline-specific options:

- rnaseq: aligner (star_salmon recommended, hisat2 for low memory)

- sarek: tools (haplotypecaller for germline, mutect2 for somatic)

- atacseq: read_length (50, 75, 100, or 150)

5c. Run pipeline

nextflow run nf-core/<pipeline> \

-r <version> \

-profile docker \

--input samplesheet.csv \

--outdir results \

--genome <genome> \

-resume

Key flags:

  • -r: Pin version
  • -profile docker: Use Docker (or singularity for HPC)
  • --genome: iGenomes key
  • -resume: Continue from checkpoint

Resource limits (if needed):

--max_cpus 8 --max_memory '32.GB' --max_time '24.h'

Step 6: Verify Outputs

Check completion

ls results/multiqc/multiqc_report.html

grep "Pipeline completed successfully" .nextflow.log

Key outputs by pipeline

rnaseq:

  • results/star_salmon/salmon.merged.gene_counts.tsv - Gene counts
  • results/star_salmon/salmon.merged.gene_tpm.tsv - TPM values

sarek:

  • results/variant_calling/*/ - VCF files
  • results/preprocessing/recalibrated/ - BAM files

atacseq:

  • results/macs2/narrowPeak/ - Peak calls
  • results/bwa/mergedLibrary/bigwig/ - Coverage tracks

Quick Reference

For common exit codes and fixes, see references/troubleshooting.md.

Resume failed run

nextflow run nf-core/<pipeline> -resume

References


Disclaimer

This skill is provided as a prototype example demonstrating how to integrate nf-core bioinformatics pipelines into Claude Code for automated analysis workflows. The current implementation supports three pipelines (rnaseq, sarek, and atacseq), serving as a foundation that enables the community to expand support to the full set of nf-core pipelines.

It is intended for educational and research purposes and should not be considered production-ready without appropriate validation for your specific use case. Users are responsible for ensuring their computing environment meets pipeline requirements and for verifying analysis results.

Anthropic does not guarantee the accuracy of bioinformatics outputs, and users should follow standard practices for validating computational analyses. This integration is not officially endorsed by or affiliated with the nf-core community.

Attribution

When publishing results, cite the appropriate pipeline. Citations are available in each nf-core repository's CITATIONS.md file (e.g., https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md).

Licenses

  • nf-core pipelines: MIT License (https://nf-co.re/about)
  • Nextflow: Apache License, Version 2.0 (https://www.nextflow.io/about-us.html)
  • NCBI SRA Toolkit: Public Domain (https://github.com/ncbi/sra-tools/blob/master/LICENSE)
0Installs
0Views

Supported Agents

Claude CodeCursorCodexGemini CLIAiderWindsurfOpenClaw

Details

License
MIT
Source
admin
Published
3/18/2026

Related Skills

Scientific Problem Selection

This skill should be used when scientists need help with research problem selection, project ideation, troubleshooting stuck projects, or strategic scientific decisions. Use this skill when users ask to pitch a new research idea, work through a project problem, evaluate project risks, plan research strategy, navigate decision trees, or get help choosing what scientific problem to work on. Typical requests include "I have an idea for a project", "I'm stuck on my research", "help me evaluate this project", "what should I work on", or "I need strategic advice about my research".

00
anthropics
#bio research

Scvi Tools

Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.

00
anthropics
#bio research

Single Cell Rna Qc

Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.

00
anthropics
#bio research