conflict-repair-acb Case #27

Easy Domain: Documents & Knowledge doc-search

User Instruction

View on GitHub

Your workspace notes about speculative decoding contain stale claims. Review the local corpus in `corpus/` and use the browser portal in `tools/` to find stronger evidence, then repair your notes and write `~/.openclaw/output/result.json`.

Task Description

EN: Old durable notes contain obvious errors; use local materials and an internal browser portal to distinguish Adaptive Cache Bridging's key mechanisms from incorrect shorthand

中文: 旧 durable note 里有明显错误,需要结合本地材料和 internal browser portal,把 Adaptive Cache Bridging 的关键机制与错误 shorthand 区分开。

Complexity Factors

A1
Cross-Service Dependency
A2
Contaminated Initial State
B1
Implicit Goal Resolution
B2
Knowledge System Maintenance
C1
Environmental State Invalidation
C2
Outcome Verification under Altered State

Evaluation

Verifier Type: llm_judge.py
Partial Credit: Yes
Reward Range: 0 – 1
LLM Judge Task

This task uses an LLM-based judge for evaluation, which requires judge credentials to run.

Results for This Task

Model Avg Score Attempts All Passed
qwen3.5-27b 0.979 3
qwen3.6-27b 0.954 3
qwen3.6-plus 0.892 3
qwen3.5-397b-a17b 0.888 3
deepseek-v4-flash 0.862 3
gpt-5.5 0.821 3
deepseek-v4-pro 0.792 3
qwen3.6-flash 0.763 3
qwen3.5-flash 0.649 3

Public Trajectories

Run trajectories for this task live on HuggingFace.

View trajectories on HuggingFace