Senior Data Engineer

Name: Senior Data Engineer
Author: Alireza Rezvani

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

$ npx promptcreek add senior-data-engineer

Auto-detects your installed agents and installs the skill to each one.

What This Skill Does

This skill provides a production-grade data engineering toolkit for building scalable and reliable data systems. It helps users design data pipelines, make architectural decisions, and implement data quality frameworks. It's designed for data engineers who need to build and optimize data infrastructure.

When to Use

Design a batch ETL pipeline.
Implement real-time streaming data ingestion.
Set up a data quality framework.
Optimize a slow-running Spark job.
Choose between Lambda and Kappa architectures.
Create a dimensional model for a data warehouse.

Key Features

Generates pipeline orchestration configurations.

Validates data quality using configurable checks.

Analyzes and optimizes ETL performance.

Supports batch and streaming architectures.

Provides guidance on data modeling techniques.

Offers troubleshooting advice for common data engineering issues.

Installation

Run in your project directory:

$ npx promptcreek add senior-data-engineer

Auto-detects your installed agents (Claude Code, Cursor, Codex, etc.) and installs the skill to each one.

View Full Skill Content

Senior Data Engineer

Production-grade data engineering skill for building scalable, reliable data systems.

Trigger Phrases
Quick Start
Workflows

- Building a Batch ETL Pipeline

- Implementing Real-Time Streaming

- Data Quality Framework Setup

Architecture Decision Framework
Tech Stack
Reference Documentation
Troubleshooting

Trigger Phrases

Activate this skill when you see:

Pipeline Design:

"Design a data pipeline for..."
"Build an ETL/ELT process..."
"How should I ingest data from..."
"Set up data extraction from..."

Architecture:

"Should I use batch or streaming?"
"Lambda vs Kappa architecture"
"How to handle late-arriving data"
"Design a data lakehouse"

Data Modeling:

"Create a dimensional model..."
"Star schema vs snowflake"
"Implement slowly changing dimensions"
"Design a data vault"

Data Quality:

"Add data validation to..."
"Set up data quality checks"
"Monitor data freshness"
"Implement data contracts"

Performance:

"Optimize this Spark job"
"Query is running slow"
"Reduce pipeline execution time"
"Tune Airflow DAG"

Quick Start

Core Tools

# Generate pipeline orchestration config python scripts/pipeline_orchestrator.py generate \ --type airflow \ --source postgres \ --destination snowflake \ --schedule "0 5 *" Validate data quality python scripts/data_quality_validator.py validate \ --input data/sales.parquet \ --schema schemas/sales.json \ --checks freshness,completeness,uniqueness Optimize ETL performance python scripts/etl_performance_optimizer.py analyze \ --query queries/daily_aggregation.sql \ --engine spark \

--recommend

Workflows

→ See references/workflows.md for details

Architecture Decision Framework

Use this framework to choose the right approach for your data pipeline.

Batch vs Streaming

| Criteria | Batch | Streaming |

|----------|-------|-----------|

| Latency requirement | Hours to days | Seconds to minutes |

| Data volume | Large historical datasets | Continuous event streams |

| Processing complexity | Complex transformations, ML | Simple aggregations, filtering |

| Cost sensitivity | More cost-effective | Higher infrastructure cost |

| Error handling | Easier to reprocess | Requires careful design |

Decision Tree:

Is real-time insight required? ├── Yes → Use streaming │ └── Is exactly-once semantics needed? │ ├── Yes → Kafka + Flink/Spark Structured Streaming │ └── No → Kafka + consumer groups └── No → Use batch └── Is data volume > 1TB daily? ├── Yes → Spark/Databricks

└── No → dbt + warehouse compute

Lambda vs Kappa Architecture

| Aspect | Lambda | Kappa |

|--------|--------|-------|

| Complexity | Two codebases (batch + stream) | Single codebase |

| Maintenance | Higher (sync batch/stream logic) | Lower |

| Reprocessing | Native batch layer | Replay from source |

| Use case | ML training + real-time serving | Pure event-driven |

When to choose Lambda:

Need to train ML models on historical data
Complex batch transformations not feasible in streaming
Existing batch infrastructure

When to choose Kappa:

Event-sourced architecture
All processing can be expressed as stream operations
Starting fresh without legacy systems

Data Warehouse vs Data Lakehouse

| Feature | Warehouse (Snowflake/BigQuery) | Lakehouse (Delta/Iceberg) |

|---------|-------------------------------|---------------------------|

| Best for | BI, SQL analytics | ML, unstructured data |

| Storage cost | Higher (proprietary format) | Lower (open formats) |

| Flexibility | Schema-on-write | Schema-on-read |

| Performance | Excellent for SQL | Good, improving |

| Ecosystem | Mature BI tools | Growing ML tooling |

Tech Stack

| Category | Technologies |

|----------|--------------|

| Languages | Python, SQL, Scala |

| Orchestration | Airflow, Prefect, Dagster |

| Transformation | dbt, Spark, Flink |

| Streaming | Kafka, Kinesis, Pub/Sub |

| Storage | S3, GCS, Delta Lake, Iceberg |

| Warehouses | Snowflake, BigQuery, Redshift, Databricks |

| Quality | Great Expectations, dbt tests, Monte Carlo |

| Monitoring | Prometheus, Grafana, Datadog |

Reference Documentation

1. Data Pipeline Architecture

See references/data_pipeline_architecture.md for:

Lambda vs Kappa architecture patterns
Batch processing with Spark and Airflow
Stream processing with Kafka and Flink
Exactly-once semantics implementation
Error handling and dead letter queues

2. Data Modeling Patterns

See references/data_modeling_patterns.md for:

Dimensional modeling (Star/Snowflake)
Slowly Changing Dimensions (SCD Types 1-6)
Data Vault modeling
dbt best practices
Partitioning and clustering

3. DataOps Best Practices

See references/dataops_best_practices.md for:

Data testing frameworks
Data contracts and schema validation
CI/CD for data pipelines
Observability and lineage
Incident response

Troubleshooting

→ See references/troubleshooting.md for details

0Installs

0Views

Supported Agents

Claude CodeCursorCodexGemini CLIAiderWindsurfOpenClaw

Attribution

Alireza Rezvani

alirezarezvani/claude-skills

MITseeded

Details

License: MIT
Source: seeded
Published: 3/17/2026

Related Skills

Agent Protocol

Inter-agent communication protocol for C-suite agent teams. Defines invocation syntax, loop prevention, isolation rules, and response formats. Use when C-suite agents need to query each other, coordinate cross-functional analysis, or run board meetings with multiple agent roles.

Alireza Rezvani

#c-level#c-level advisor

CTO Advisor

Technical leadership guidance for engineering teams, architecture decisions, and technology strategy. Use when assessing technical debt, scaling engineering teams, evaluating technologies, making architecture decisions, establishing engineering metrics, or when user mentions CTO, tech debt, technical debt, team scaling, architecture decisions, technology evaluation, engineering metrics, DORA metrics, or technology strategy.

Alireza Rezvani

#c-level#c-level advisor

Agent Workflow Designer

Alireza Rezvani

#engineering

Senior Data Engineer

What This Skill Does

When to Use

Key Features

Installation

Senior Data Engineer

Table of Contents

Trigger Phrases

Quick Start

Core Tools

Validate data quality

Optimize ETL performance

Workflows

Architecture Decision Framework

Batch vs Streaming

Lambda vs Kappa Architecture

Data Warehouse vs Data Lakehouse

Tech Stack

Reference Documentation

1. Data Pipeline Architecture

2. Data Modeling Patterns

3. DataOps Best Practices

Troubleshooting

Supported Agents

Attribution

Details

Tags

Related Skills

Agent Protocol

CTO Advisor

Agent Workflow Designer