skill-supplementation Case #2

Medium Domain: Documents & Knowledge

User Instruction

Are there any operations that can be automated recently, according to potential patterns within the interaction history in /workspace/environment/history.json? Please update the skills according to the pattern you discovered, and save the updated skills in /workspace/output.

Task Description

EN: Supplement and update an existing SKILL based on interaction history

中文: 根据交互历史，补充更新现有SKILL

Complexity Factors

✗

Cross-Service Dependency

✗

Contaminated Initial State

✗

Implicit Goal Resolution

✓

Knowledge System Maintenance

✗

Environmental State Invalidation

✗

Outcome Verification under Altered State

Evaluation

Verifier Type: evaluate.py

Partial Credit: Yes

Reward Range: 0 – 1

Results for This Task

Model	Avg Score	Attempts	All Passed
gpt-5.5	1	3	✓
deepseek-v4-pro	0.889	3	✗
qwen3.6-27b	0.889	3	✗
qwen3.6-flash	0.556	3	✗
qwen3.6-plus	0.444	3	✗
deepseek-v4-flash	0.222	3	✗
qwen3.5-27b	0.222	3	✗
qwen3.5-flash	0.222	3	✗
qwen3.5-397b-a17b	0	3	✗

Public Trajectories

Run trajectories for this task live on HuggingFace.

View trajectories on HuggingFace

Source Files

task.toml: tasks/skill-supplementation/task.toml

instruction: tasks/skill-supplementation/instruction.md

environment: tasks/skill-supplementation/environment/Dockerfile

test: tasks/skill-supplementation/tests/test.sh