schedule-change-request Case #14

Medium Domain: Calendar & Task Mgmt airlineemailtodolist

User Instruction

View on GitHub

I'm attending my brother's wedding this Sunday, so I won't be able to attend any other events. Please check my schedule on the todolist app (http://localhost:3000/, open it in your browser), and if anyone is going with me, please email them (use my company's email system, http://localhost:5174/, open it in your browser) to let them know why I can't attend and apologize.

Task Description

EN: A schedule change requires the agent to find conflicting calendar entries, email relevant parties to inform them of the change, and explain the reason

中文: 用户因事导致日程发生变化,因此需要OpenClaw帮助用户找出Calendar上存在冲突的日程,并向相关人员写邮件告知日程变化并解释原因

Complexity Factors

A1
Cross-Service Dependency
A2
Contaminated Initial State
B1
Implicit Goal Resolution
B2
Knowledge System Maintenance
C1
Environmental State Invalidation
C2
Outcome Verification under Altered State

Evaluation

Verifier Type: verify.py
Partial Credit: Yes
Reward Range: 0 – 1

Results for This Task

Model Avg Score Attempts All Passed
deepseek-v4-pro 1 3
gpt-5.5 1 3
qwen3.6-flash 1 3
qwen3.6-plus 0.75 3
deepseek-v4-flash 0.25 3
qwen3.5-27b 0.25 3
qwen3.5-397b-a17b 0.25 3
qwen3.6-27b 0.25 3
qwen3.5-flash 0.167 3

Public Trajectories

Run trajectories for this task live on HuggingFace.

View trajectories on HuggingFace