conflict-repair-acb Case #27

Easy Domain: Documents & Knowledge doc-search

User Instruction

Your workspace notes about speculative decoding contain stale claims. Review the local corpus in `corpus/` and use the browser portal in `tools/` to find stronger evidence, then repair your notes and write `~/.openclaw/output/result.json`.

Task Description

EN: Old durable notes contain obvious errors; use local materials and an internal browser portal to distinguish Adaptive Cache Bridging's key mechanisms from incorrect shorthand

中文: 旧 durable note 里有明显错误，需要结合本地材料和 internal browser portal，把 Adaptive Cache Bridging 的关键机制与错误 shorthand 区分开。

Complexity Factors

✓

Cross-Service Dependency

✗

Contaminated Initial State

✗

Implicit Goal Resolution

✓

Knowledge System Maintenance

✗

Environmental State Invalidation

✗

Outcome Verification under Altered State

Evaluation

Verifier Type: llm_judge.py

Partial Credit: Yes

Reward Range: 0 – 1

LLM Judge Task

This task uses an LLM-based judge for evaluation, which requires judge credentials to run.

Results for This Task

Model	Avg Score	Attempts	All Passed
qwen3.5-27b	0.979	3	✗
qwen3.6-27b	0.954	3	✗
qwen3.6-plus	0.892	3	✗
qwen3.5-397b-a17b	0.888	3	✗
deepseek-v4-flash	0.862	3	✗
gpt-5.5	0.821	3	✗
deepseek-v4-pro	0.792	3	✗
qwen3.6-flash	0.763	3	✗
qwen3.5-flash	0.649	3	✗

Public Trajectories

Run trajectories for this task live on HuggingFace.

View trajectories on HuggingFace

Source Files

task.toml: tasks/conflict-repair-acb/task.toml

instruction: tasks/conflict-repair-acb/instruction.md

environment: tasks/conflict-repair-acb/environment/Dockerfile

test: tasks/conflict-repair-acb/tests/test.sh