User Instruction
View on GitHubI recently moved to a new city, so I need to update the delivery address and phone number at my profile of Mosi Shop (address: http://localhost:1234/). My new delivery address is [4278 Maple View Drive, Sacramento, CA 95814, USA] and my new phone num is [12345678901]. Please help me update them at my profile of Mosi Shop (address: http://localhost:1234/, open it in your browser).
Task Description
EN: Change shipping address to '4278 Maple View Drive, Sacramento, CA 95814, USA' and phone number to '12345678901'
中文: 修改自己的收货地址为"4278 Maple View Drive, Sacramento, CA 95814, USA",手机号改成"12345678901"。
Complexity Factors
A1
✗
Cross-Service Dependency
A2
✗
Contaminated Initial State
B1
✗
Implicit Goal Resolution
B2
✗
Knowledge System Maintenance
C1
✗
Environmental State Invalidation
C2
✗
Outcome Verification under Altered State
Evaluation
Verifier Type:
verify.py Partial Credit: Yes
Reward Range:
0 – 1 Results for This Task
| Model | Avg Score | Attempts | All Passed |
|---|---|---|---|
| deepseek-v4-flash | 1 | 3 | ✓ |
| deepseek-v4-pro | 1 | 3 | ✓ |
| gpt-5.5 | 1 | 3 | ✓ |
| qwen3.6-27b | 1 | 3 | ✓ |
| qwen3.6-flash | 1 | 3 | ✓ |
| qwen3.6-plus | 1 | 3 | ✓ |
| qwen3.5-397b-a17b | 0.833 | 3 | ✗ |
| qwen3.5-27b | 0.333 | 3 | ✗ |
| qwen3.5-flash | 0.333 | 3 | ✗ |
Public Trajectories
Run trajectories for this task live on HuggingFace.
View trajectories on HuggingFaceSource Files
task.toml: tasks/info-change/task.toml
instruction: tasks/info-change/instruction.md
environment: tasks/info-change/environment/Dockerfile