email-reply Case #7

Easy Domain: Communication & Email email

User Instruction

View on GitHub

Please check if I have received an email from parrot.ai in my company email system website (http://localhost:5174/, open it in your browser). They told me a few days ago that they would compile their recent requirements and send them to me. If you have indeed received their email, please reply to them to let them know that we have received it and will handle it properly.

Task Description

EN: Check if an expected email exists; if so compose and send a reply

中文: 让OpenClaw检查是否存在期望邮件,如果存在则编辑并发送回信

Complexity Factors

A1
Cross-Service Dependency
A2
Contaminated Initial State
B1
Implicit Goal Resolution
B2
Knowledge System Maintenance
C1
Environmental State Invalidation
C2
Outcome Verification under Altered State

Evaluation

Verifier Type: verify.py
Partial Credit: Yes
Reward Range: 0 – 1

Results for This Task

Model Avg Score Attempts All Passed
deepseek-v4-flash 1 3
deepseek-v4-pro 1 3
gpt-5.5 1 3
qwen3.6-27b 1 3
qwen3.6-flash 1 3
qwen3.6-plus 1 3
qwen3.5-397b-a17b 0.667 3
qwen3.5-27b 0.333 3
qwen3.5-flash 0 3

Public Trajectories

Run trajectories for this task live on HuggingFace.

View trajectories on HuggingFace