skill-supplementation Case #2

Medium Domain: Documents & Knowledge

User Instruction

View on GitHub

Are there any operations that can be automated recently, according to potential patterns within the interaction history in /workspace/environment/history.json? Please update the skills according to the pattern you discovered, and save the updated skills in /workspace/output.

Task Description

EN: Supplement and update an existing SKILL based on interaction history

中文: 根据交互历史,补充更新现有SKILL

Complexity Factors

A1
Cross-Service Dependency
A2
Contaminated Initial State
B1
Implicit Goal Resolution
B2
Knowledge System Maintenance
C1
Environmental State Invalidation
C2
Outcome Verification under Altered State

Evaluation

Verifier Type: evaluate.py
Partial Credit: Yes
Reward Range: 0 – 1

Results for This Task

Model Avg Score Attempts All Passed
gpt-5.5 1 3
deepseek-v4-pro 0.889 3
qwen3.6-27b 0.889 3
qwen3.6-flash 0.556 3
qwen3.6-plus 0.444 3
deepseek-v4-flash 0.222 3
qwen3.5-27b 0.222 3
qwen3.5-flash 0.222 3
qwen3.5-397b-a17b 0 3

Public Trajectories

Run trajectories for this task live on HuggingFace.

View trajectories on HuggingFace