KicktippAi experiment analysis
match-predictions/bundesliga-2025-26/pes-squad/slices/all-matchdays/random-16-seed-578661
Summary
Run ranking
| Rank | Run | Model | Primary metric |
|---|---|---|---|
| 1 | task-5__pes-squad__o3__prompt-v1__-12h__2026-04-03t23-43-14z | o3 | 23.0000 |
| 2 | task-5__pes-squad__gpt-5-nano__prompt-v1__-12h__2026-04-03t23-43-14z | gpt-5-nano | 18.0000 |
Two-run comparison
not significantEffect size confidence intervals
| Statistic | Point estimate | Low | High |
|---|---|---|---|
| Mean difference | 0.3125 | -0.2500 | 0.7500 |
| Median difference | 0.0000 | 0.0000 | 0.0000 |
Per-item win/tie/loss counts compare paired Kicktipp points for the listed run ordering on each prepared dataset item.