skill-creation Case #1

Hard Domain: Documents & Knowledge

User Instruction

View on GitHub

Please clean up all CSV files in `/workspace/environment/data/`. For each file, perform the following steps: 1. Remove all empty rows 2. Standardize the date column to YYYY-MM-DD format 3. Remove duplicate rows (by order_id) 4. Sort by amount in descending order Save each cleaned file to `/workspace/environment/cleaned/<original_name>_cleaned.csv`. Files to process: sales_jan.csv, sales_feb.csv, sales_mar.csv, sales_apr.csv, sales_may.csv, sales_jun.csv, sales_jul.csv, sales_aug.csv I have to do this same cleanup every month when new sales data comes in. It would be really nice if there were a quicker way to do this going forward.

Task Description

EN: Identify patterns from interaction history and create a SKILL from scratch

中文: 识别交互历史中的模式,从头创建SKILL

Complexity Factors

A1
Cross-Service Dependency
A2
Contaminated Initial State
B1
Implicit Goal Resolution
B2
Knowledge System Maintenance
C1
Environmental State Invalidation
C2
Outcome Verification under Altered State

Evaluation

Verifier Type: evaluate.py
Partial Credit: Yes
Reward Range: 0 – 1

Results for This Task

Model Avg Score Attempts All Passed
gpt-5.5 1 3
qwen3.6-27b 1 3
deepseek-v4-flash 0.333 3
qwen3.5-flash 0.333 3
qwen3.5-397b-a17b 0.333 3
deepseek-v4-pro 0 3
qwen3.5-27b 0 3
qwen3.6-flash 0 3
qwen3.6-plus 0 3

Public Trajectories

Run trajectories for this task live on HuggingFace.

View trajectories on HuggingFace