skill-creation Case #1

Hard Domain: Documents & Knowledge

User Instruction

Please clean up all CSV files in `/workspace/environment/data/`. For each file, perform the following steps: 1. Remove all empty rows 2. Standardize the date column to YYYY-MM-DD format 3. Remove duplicate rows (by order_id) 4. Sort by amount in descending order Save each cleaned file to `/workspace/environment/cleaned/<original_name>_cleaned.csv`. Files to process: sales_jan.csv, sales_feb.csv, sales_mar.csv, sales_apr.csv, sales_may.csv, sales_jun.csv, sales_jul.csv, sales_aug.csv I have to do this same cleanup every month when new sales data comes in. It would be really nice if there were a quicker way to do this going forward.

Task Description

EN: Identify patterns from interaction history and create a SKILL from scratch

中文: 识别交互历史中的模式，从头创建SKILL

Complexity Factors

✗

Cross-Service Dependency

✗

Contaminated Initial State

✗

Implicit Goal Resolution

✓

Knowledge System Maintenance

✗

Environmental State Invalidation

✗

Outcome Verification under Altered State

Evaluation

Verifier Type: evaluate.py

Partial Credit: Yes

Reward Range: 0 – 1

Results for This Task

Model	Avg Score	Attempts	All Passed
gpt-5.5	1	3	✓
qwen3.6-27b	1	3	✓
deepseek-v4-flash	0.333	3	✗
qwen3.5-flash	0.333	3	✗
qwen3.5-397b-a17b	0.333	3	✗
deepseek-v4-pro	0	3	✗
qwen3.5-27b	0	3	✗
qwen3.6-flash	0	3	✗
qwen3.6-plus	0	3	✗

Public Trajectories

Run trajectories for this task live on HuggingFace.

View trajectories on HuggingFace

Source Files

task.toml: tasks/skill-creation/task.toml

instruction: tasks/skill-creation/instruction.md

environment: tasks/skill-creation/environment/Dockerfile

test: tasks/skill-creation/tests/test.sh