‘축구의 신’ 메시, 경기 중 난입한 관중 때문에 쓰러져
if (n <= 1) return;
,更多细节参见51吃瓜
Two subtle ways agents can implicitly negatively affect the benchmark results but wouldn’t be considered cheating/gaming it are a) implementing a form of caching so the benchmark tests are not independent and b) launching benchmarks in parallel on the same system. I eventually added AGENTS.md rules to ideally prevent both. ↩︎
The treeboost crate beat the agent-optimized GBT crate by 4x on my first comparison test, which naturally I took offense: I asked Opus 4.6 to “Optimize the crate such that rust_gbt wins in ALL benchmarks against treeboost.” and it did just that. ↩︎
在空中技巧決賽前夕,谷愛凌因晉級決賽而透過Instagram揭露賽程安排問題。