KicktippAi experiment analysis

match-predictions/bundesliga-2025-26/pes-squad/slices/all-matchdays/random-16-seed-578661

Task: task-5 Primary metric: total_kicktipp_points Runs: 2 Pairings: 16

Summary

Datasetmatch-predictions/bundesliga-2025-26/pes-squad/slices/all-matchdays/random-16-seed-578661
Task typetask-5
Primary metrictotal_kicktipp_points
Alpha0.0500

Run ranking

Rank Run Model Primary metric
1task-5__pes-squad__o3__prompt-v1__-12h__2026-04-03t23-43-14zo323.0000
2task-5__pes-squad__gpt-5-nano__prompt-v1__-12h__2026-04-03t23-43-14zgpt-5-nano18.0000

Two-run comparison

not significant
Better runtask-5__pes-squad__o3__prompt-v1__-12h__2026-04-03t23-43-14z
Other runtask-5__pes-squad__gpt-5-nano__prompt-v1__-12h__2026-04-03t23-43-14z
total_kicktipp_points delta5.0000
Wilcoxon p-value0.2763
Mean difference0.3125
Median difference0.0000
Per-item W/T/L2/13/1

Effect size confidence intervals

Statistic Point estimate Low High
Mean difference0.3125-0.25000.7500
Median difference0.00000.00000.0000

Per-item win/tie/loss counts compare paired Kicktipp points for the listed run ordering on each prepared dataset item.