flight-cancel-claim Case #11

Hard Domain: E-commerce & Daily Svcs airlineemail

User Instruction

View on GitHub

I've booked a GKD Airlines ticket and will be departing from JFK to LAX the day after tomorrow. Could you please check my company email system (http://localhost:5174/, open it in browser) to see if I've received any emails related to my flight? Because I'm worried about potential disruptions due to the typhoon. If the flight is canceled, please communicate with the airline company to submit a claim.

Task Description

EN: Check the status of a booked flight; if the flight is cancelled, help the user file a compensation claim according to the airline website announcement

中文: 让OpenClaw检查预定航班的状态是否正常;如果出现被取消的情况,需要帮助用户按照航司网站上的公告申请理赔

Complexity Factors

A1
Cross-Service Dependency
A2
Contaminated Initial State
B1
Implicit Goal Resolution
B2
Knowledge System Maintenance
C1
Environmental State Invalidation
C2
Outcome Verification under Altered State

Evaluation

Verifier Type: verify.py
Partial Credit: Yes
Reward Range: 0 – 1

Results for This Task

Model Avg Score Attempts All Passed
deepseek-v4-flash 0 3
deepseek-v4-pro 0 3
gpt-5.5 0 3
qwen3.5-27b 0 3
qwen3.5-flash 0 3
qwen3.5-397b-a17b 0 3
qwen3.6-27b 0 3
qwen3.6-flash 0 3
qwen3.6-plus 0 3

Public Trajectories

Run trajectories for this task live on HuggingFace.

View trajectories on HuggingFace