User Instruction
View on GitHubI've booked a GKD Airlines ticket and will be departing from JFK to LAX the day after tomorrow. Could you please check my company email system (http://localhost:5174/, open it in browser) to see if I've received any emails related to my flight? Because I'm worried about potential disruptions due to the typhoon. If the flight is canceled, please communicate with the airline company to submit a claim.
Task Description
EN: Check the status of a booked flight; if the flight is cancelled, help the user file a compensation claim according to the airline website announcement
中文: 让OpenClaw检查预定航班的状态是否正常;如果出现被取消的情况,需要帮助用户按照航司网站上的公告申请理赔
Complexity Factors
A1
✓
Cross-Service Dependency
A2
✗
Contaminated Initial State
B1
✓
Implicit Goal Resolution
B2
✗
Knowledge System Maintenance
C1
✗
Environmental State Invalidation
C2
✗
Outcome Verification under Altered State
Evaluation
Verifier Type:
verify.py Partial Credit: Yes
Reward Range:
0 – 1 Results for This Task
| Model | Avg Score | Attempts | All Passed |
|---|---|---|---|
| deepseek-v4-flash | 0 | 3 | ✗ |
| deepseek-v4-pro | 0 | 3 | ✗ |
| gpt-5.5 | 0 | 3 | ✗ |
| qwen3.5-27b | 0 | 3 | ✗ |
| qwen3.5-flash | 0 | 3 | ✗ |
| qwen3.5-397b-a17b | 0 | 3 | ✗ |
| qwen3.6-27b | 0 | 3 | ✗ |
| qwen3.6-flash | 0 | 3 | ✗ |
| qwen3.6-plus | 0 | 3 | ✗ |
Public Trajectories
Run trajectories for this task live on HuggingFace.
View trajectories on HuggingFaceSource Files
task.toml: tasks/flight-cancel-claim/task.toml
instruction: tasks/flight-cancel-claim/instruction.md
environment: tasks/flight-cancel-claim/environment/Dockerfile