Senior Computer Vision

Name: Senior Computer Vision
Author: Alireza Rezvani

Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.

$ npx promptcreek add senior-computer-vision

Auto-detects your installed agents and installs the skill to each one.

What This Skill Does

The Senior Computer Vision Engineer skill supports production computer vision engineering. It covers object detection, image segmentation, and visual AI system deployment. It is designed for computer vision engineers deploying models in production environments.

When to Use

Generate training configuration.
Analyze model for optimization.
Build dataset pipeline.
Optimize models for inference.
Prepare custom datasets.
Select appropriate architectures.

Key Features

Supports YOLO and Faster R-CNN.

Offers guidance on model optimization.

Provides dataset pipeline building tools.

Covers object detection and segmentation.

Supports ONNX, TensorRT, and OpenVINO.

Installation

Run in your project directory:

$ npx promptcreek add senior-computer-vision

Auto-detects your installed agents (Claude Code, Cursor, Codex, etc.) and installs the skill to each one.

View Full Skill Content

Senior Computer Vision Engineer

Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.

Quick Start
Core Expertise
Tech Stack
Workflow 1: Object Detection Pipeline
Workflow 2: Model Optimization and Deployment
Workflow 3: Custom Dataset Preparation
Architecture Selection Guide
Reference Documentation
Common Commands

Quick Start

# Generate training configuration for YOLO or Faster R-CNN python scripts/vision_model_trainer.py models/ --task detection --arch yolov8 Analyze model for optimization opportunities (quantization, pruning) python scripts/inference_optimizer.py model.pt --target onnx --benchmark Build dataset pipeline with augmentations

python scripts/dataset_pipeline_builder.py images/ --format coco --augment

Core Expertise

This skill provides guidance on:

Object Detection: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR
Instance Segmentation: Mask R-CNN, YOLACT, SOLOv2
Semantic Segmentation: DeepLabV3+, SegFormer, SAM (Segment Anything)
Image Classification: ResNet, EfficientNet, Vision Transformers (ViT, DeiT)
Video Analysis: Object tracking (ByteTrack, SORT), action recognition
3D Vision: Depth estimation, point cloud processing, NeRF
Production Deployment: ONNX, TensorRT, OpenVINO, CoreML

Tech Stack

| Category | Technologies |

|----------|--------------|

| Frameworks | PyTorch, torchvision, timm |

| Detection | Ultralytics (YOLO), Detectron2, MMDetection |

| Segmentation | segment-anything, mmsegmentation |

| Optimization | ONNX, TensorRT, OpenVINO, torch.compile |

| Image Processing | OpenCV, Pillow, albumentations |

| Annotation | CVAT, Label Studio, Roboflow |

| Experiment Tracking | MLflow, Weights & Biases |

| Serving | Triton Inference Server, TorchServe |

Workflow 1: Object Detection Pipeline

Use this workflow when building an object detection system from scratch.

Step 1: Define Detection Requirements

Analyze the detection task requirements:

Detection Requirements Analysis:
Target objects: [list specific classes to detect]
Real-time requirement: [yes/no, target FPS]
Accuracy priority: [speed vs accuracy trade-off]
Deployment target: [cloud GPU, edge device, mobile]
Dataset size: [number of images, annotations per class]

Step 2: Select Detection Architecture

Choose architecture based on requirements:

| Requirement | Recommended Architecture | Why |

|-------------|-------------------------|-----|

| Real-time (>30 FPS) | YOLOv8/v11, RT-DETR | Single-stage, optimized for speed |

| High accuracy | Faster R-CNN, DINO | Two-stage, better localization |

| Small objects | YOLO + SAHI, Faster R-CNN + FPN | Multi-scale detection |

| Edge deployment | YOLOv8n, MobileNetV3-SSD | Lightweight architectures |

| Transformer-based | DETR, DINO, RT-DETR | End-to-end, no NMS required |

Step 3: Prepare Dataset

Convert annotations to required format:

# COCO format (recommended)
python scripts/dataset_pipeline_builder.py data/images/ \
    --annotations data/labels/ \
    --format coco \
    --split 0.8 0.1 0.1 \
    --output data/coco/

Verify dataset
python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"

Step 4: Configure Training

Generate training configuration:

# For Ultralytics YOLO python scripts/vision_model_trainer.py data/coco/ \ --task detection \ --arch yolov8m \ --epochs 100 \ --batch 16 \ --imgsz 640 \ --output configs/ For Detectron2 python scripts/vision_model_trainer.py data/coco/ \ --task detection \ --arch faster_rcnn_R_50_FPN \ --framework detectron2 \

--output configs/

Step 5: Train and Validate

# Ultralytics training yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640 Detectron2 training python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1 Validate on test set

yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml

Step 6: Evaluate Results

Key metrics to analyze:

| Metric | Target | Description |

|--------|--------|-------------|

| mAP@50 | >0.7 | Mean Average Precision at IoU 0.5 |

| mAP@50:95 | >0.5 | COCO primary metric |

| Precision | >0.8 | Low false positives |

| Recall | >0.8 | Low missed detections |

| Inference time | <33ms | For 30 FPS real-time |

Workflow 2: Model Optimization and Deployment

Use this workflow when preparing a trained model for production deployment.

Step 1: Benchmark Baseline Performance

# Measure current model performance python scripts/inference_optimizer.py model.pt \ --benchmark \ --input-size 640 640 \ --batch-sizes 1 4 8 16 \ --warmup 10 \

--iterations 100

Expected output:

Baseline Performance (PyTorch FP32):
Batch 1: 45.2ms (22.1 FPS)
Batch 4: 89.4ms (44.7 FPS)
Batch 8: 165.3ms (48.4 FPS)
Memory: 2.1 GB
Parameters: 25.9M

Step 2: Select Optimization Strategy

| Deployment Target | Optimization Path |

|-------------------|-------------------|

| NVIDIA GPU (cloud) | PyTorch → ONNX → TensorRT FP16 |

| NVIDIA GPU (edge) | PyTorch → TensorRT INT8 |

| Intel CPU | PyTorch → ONNX → OpenVINO |

| Apple Silicon | PyTorch → CoreML |

| Generic CPU | PyTorch → ONNX Runtime |

| Mobile | PyTorch → TFLite or ONNX Mobile |

Step 3: Export to ONNX

# Export with dynamic batch size
python scripts/inference_optimizer.py model.pt \
    --export onnx \
    --input-size 640 640 \
    --dynamic-batch \
    --simplify \
    --output model.onnx

Verify ONNX model
python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"

Step 4: Apply Quantization (Optional)

For INT8 quantization with calibration:

# Generate calibration dataset python scripts/inference_optimizer.py model.onnx \ --quantize int8 \ --calibration-data data/calibration/ \ --calibration-samples 500 \

--output model_int8.onnx

Quantization impact analysis:

|-----------|------|-------|---------------|

| FP32 | 100% | 1x | 0% |

| FP16 | 50% | 1.5-2x | <0.5% |

| INT8 | 25% | 2-4x | 1-3% |

Step 5: Convert to Target Runtime

# TensorRT (NVIDIA GPU)
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16

OpenVINO (Intel)
mo --input_model model.onnx --output_dir openvino/

CoreML (Apple)
python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"

Step 6: Benchmark Optimized Model

python scripts/inference_optimizer.py model.engine \ --benchmark \ --runtime tensorrt \

--compare model.pt

Expected speedup:

Optimization Results:
Original (PyTorch FP32): 45.2ms
Optimized (TensorRT FP16): 12.8ms
Speedup: 3.5x
Accuracy change: -0.3% mAP

Workflow 3: Custom Dataset Preparation

Use this workflow when preparing a computer vision dataset for training.

Step 1: Audit Raw Data

# Analyze image dataset python scripts/dataset_pipeline_builder.py data/raw/ \ --analyze \

--output analysis/

Analysis report includes:

Dataset Analysis:
Total images: 5,234
Image sizes: 640x480 to 4096x3072 (variable)
Formats: JPEG (4,891), PNG (343)
Corrupted: 12 files
Duplicates: 45 pairs

Annotation Analysis:
Format detected: Pascal VOC XML
Total annotations: 28,456
Classes: 5 (car, person, bicycle, dog, cat)
Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
Empty images: 234

Step 2: Clean and Validate

# Remove corrupted and duplicate images python scripts/dataset_pipeline_builder.py data/raw/ \ --clean \ --remove-corrupted \ --remove-duplicates \

--output data/cleaned/

Step 3: Convert Annotation Format

# Convert VOC to COCO format python scripts/dataset_pipeline_builder.py data/cleaned/ \ --annotations data/annotations/ \ --input-format voc \ --output-format coco \

--output data/coco/

Supported format conversions:

| From | To |

|------|-----|

| Pascal VOC XML | COCO JSON |

| YOLO TXT | COCO JSON |

| COCO JSON | YOLO TXT |

| LabelMe JSON | COCO JSON |

| CVAT XML | COCO JSON |

Step 4: Apply Augmentations

# Generate augmentation config python scripts/dataset_pipeline_builder.py data/coco/ \ --augment \ --aug-config configs/augmentation.yaml \

--output data/augmented/

Recommended augmentations for detection:

# configs/augmentation.yaml
augmentations:
  geometric:
    - horizontal_flip: { p: 0.5 }
    - vertical_flip: { p: 0.1 }  # Only if orientation invariant
    - rotate: { limit: 15, p: 0.3 }
    - scale: { scale_limit: 0.2, p: 0.5 }

  color:
    - brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
    - hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
    - blur: { blur_limit: 3, p: 0.1 }

  advanced:
    - mosaic: { p: 0.5 }  # YOLO-style mosaic
    - mixup: { p: 0.1 }   # Image mixing
    - cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }

Step 5: Create Train/Val/Test Splits

python scripts/dataset_pipeline_builder.py data/augmented/ \ --split 0.8 0.1 0.1 \ --stratify \ --seed 42 \

--output data/final/

Split strategy guidelines:

|--------------|-------|-----|------|

| <1,000 images | 70% | 15% | 15% |

| 1,000-10,000 | 80% | 10% | 10% |

| >10,000 | 90% | 5% | 5% |

Step 6: Generate Dataset Configuration

# For Ultralytics YOLO python scripts/dataset_pipeline_builder.py data/final/ \ --generate-config yolo \ --output data.yaml For Detectron2 python scripts/dataset_pipeline_builder.py data/final/ \ --generate-config detectron2 \

--output detectron2_config.py

Architecture Selection Guide

Object Detection Architectures

|--------------|-------|----------|----------|

| YOLOv8n | 1.2ms | 37.3 mAP | Edge, mobile, real-time |

| YOLOv8s | 2.1ms | 44.9 mAP | Balanced speed/accuracy |

| YOLOv8m | 4.2ms | 50.2 mAP | General purpose |

| YOLOv8l | 6.8ms | 52.9 mAP | High accuracy |

| YOLOv8x | 10.1ms | 53.9 mAP | Maximum accuracy |

| RT-DETR-L | 5.3ms | 53.0 mAP | Transformer, no NMS |

| Faster R-CNN R50 | 46ms | 40.2 mAP | Two-stage, high quality |

| DINO-4scale | 85ms | 49.0 mAP | SOTA transformer |

Segmentation Architectures

|--------------|------|-------|----------|

| SAM | Promptable | 50ms | Zero-shot segmentation |

CNN vs Vision Transformer Trade-offs

| Aspect | CNN (YOLO, R-CNN) | ViT (DETR, DINO) |

|--------|-------------------|------------------|

| Training data needed | 1K-10K images | 10K-100K+ images |

| Training time | Fast | Slow (needs more epochs) |

| Inference speed | Faster | Slower |

| Small objects | Good with FPN | Needs multi-scale |

| Global context | Limited | Excellent |

| Positional encoding | Implicit | Explicit |

Reference Documentation

→ See references/reference-docs-and-commands.md for details

Performance Targets

|--------|-----------|---------------|------|

| FPS | >30 | >10 | >15 |

| mAP@50 | >0.6 | >0.8 | >0.5 |

| Latency P99 | <50ms | <150ms | <100ms |

| GPU Memory | <4GB | <8GB | <2GB |

| Model Size | <50MB | <200MB | <20MB |

Resources

Architecture Guide: references/computer_vision_architectures.md
Optimization Guide: references/object_detection_optimization.md
Deployment Guide: references/production_vision_systems.md
Scripts: scripts/ directory for automation tools

0Installs

0Views

Supported Agents

Claude CodeCursorCodexGemini CLIAiderWindsurfOpenClaw

Attribution

Alireza Rezvani

alirezarezvani/claude-skills

MITseeded

Details

License: MIT
Source: seeded
Published: 3/17/2026

Related Skills

Agent Protocol

Inter-agent communication protocol for C-suite agent teams. Defines invocation syntax, loop prevention, isolation rules, and response formats. Use when C-suite agents need to query each other, coordinate cross-functional analysis, or run board meetings with multiple agent roles.

Alireza Rezvani

#c-level#c-level advisor

CTO Advisor

Technical leadership guidance for engineering teams, architecture decisions, and technology strategy. Use when assessing technical debt, scaling engineering teams, evaluating technologies, making architecture decisions, establishing engineering metrics, or when user mentions CTO, tech debt, technical debt, team scaling, architecture decisions, technology evaluation, engineering metrics, DORA metrics, or technology strategy.

Alireza Rezvani

#c-level#c-level advisor

Agent Workflow Designer

Alireza Rezvani

#engineering

Senior Computer Vision

What This Skill Does

When to Use

Key Features

Installation

Senior Computer Vision Engineer

Table of Contents

Quick Start

Analyze model for optimization opportunities (quantization, pruning)

Build dataset pipeline with augmentations

Core Expertise

Tech Stack

Workflow 1: Object Detection Pipeline

Step 1: Define Detection Requirements

Step 2: Select Detection Architecture

Step 3: Prepare Dataset

Verify dataset

Step 4: Configure Training

For Detectron2

Step 5: Train and Validate

Detectron2 training

Validate on test set

Step 6: Evaluate Results

Workflow 2: Model Optimization and Deployment

Step 1: Benchmark Baseline Performance

Step 2: Select Optimization Strategy

Step 3: Export to ONNX

Verify ONNX model

Step 4: Apply Quantization (Optional)

Step 5: Convert to Target Runtime

OpenVINO (Intel)

CoreML (Apple)

Step 6: Benchmark Optimized Model

Workflow 3: Custom Dataset Preparation

Step 1: Audit Raw Data

Step 2: Clean and Validate

Step 3: Convert Annotation Format

Step 4: Apply Augmentations

Step 5: Create Train/Val/Test Splits

Step 6: Generate Dataset Configuration

For Detectron2

Architecture Selection Guide

Object Detection Architectures

Segmentation Architectures

CNN vs Vision Transformer Trade-offs

Reference Documentation

Performance Targets

Resources

Supported Agents

Attribution

Details

Tags

Related Skills

Agent Protocol

CTO Advisor

Agent Workflow Designer