KicktippAi experiment analysis

match-predictions/bundesliga-2025-26/ehonda-ai-arena/community-to-date/through-md28/community-to-date-md28

Task: community-to-date Primary metric: total_kicktipp_points Runs: 5 Pairings: 252

Summary

Datasetmatch-predictions/bundesliga-2025-26/ehonda-ai-arena/community-to-date/through-md28/community-to-date-md28
Task typecommunity-to-date
Primary metrictotal_kicktipp_points
Alpha0.0500

Community standings

Rank Participant Kicktipp Points p-value vs baseline
1o3346
2gpt-5336
3gpt-5-nano331
4o4-mini329
5gpt-5-mini328

Multi-run comparison

Friedman p-value 0.7401
o3gpt-510.00000.77671.0000no29/199/24
o3gpt-5-nano15.00000.49621.0000no42/175/35
o3o4-mini17.00000.46861.0000no33/195/24
o3gpt-5-mini18.00000.41061.0000no35/191/26
gpt-5gpt-5-nano5.00000.67851.0000no36/183/33
gpt-5o4-mini7.00000.65301.0000no28/200/24
gpt-5gpt-5-mini8.00000.71281.0000no34/191/27
gpt-5-nanoo4-mini2.00000.87911.0000no32/186/34
gpt-5-nanogpt-5-mini3.00000.78231.0000no36/177/39
o4-minigpt-5-mini1.00000.94031.0000no23/205/24

Per-item win/tie/loss counts compare paired Kicktipp points for the listed run ordering on each prepared dataset item.