KicktippAi experiment analysis

gpt-5.5 (xhigh) vs gpt-5-nano vs gpt-5.5 (none)

match-predictions/bundesliga-2025-26/pes-squad/repeated-match/md01-fc-bayern-munchen-vs-rb-leipzig/repeat-25-knowledge-cutoff-bayern-rbl-md1

Task: repeated-match Primary metric: avg_kicktipp_points Runs: 3 Pairings: 25

At a glance

Match to predict

FC Bayern München vs RB Leipzig

Matchday 12025-08-22T21:30:00 UTC+02 (+02)
Actual outcome FC Bayern München 6 - 0 RB Leipzig

Prediction distribution

gpt-5.5 (xhigh) n=25
3:1 20
2:1 3
6:0 2
gpt-5-nano n=25
2:1 20
3:1 4
3:2 1
gpt-5.5 (none) n=25
3:1 25

100x low follow-up

Exact 6:0: 5 / 100

A later gpt-5.5 low run repeats the same source match, hosted prompt route, and exact pre-kickoff evaluation time on a 100x repeated-match dataset. It is published as a separate single-run page because this report is a paired 25x comparison.

Follow-up report: gpt-5.5 (low) 100x knowledge cutoff follow-up. Companion writeup: knowledge-cutoff-bayern-rbl-repeated-match.md.

Summary

Datasetmatch-predictions/bundesliga-2025-26/pes-squad/repeated-match/md01-fc-bayern-munchen-vs-rb-leipzig/repeat-25-knowledge-cutoff-bayern-rbl-md1
Task typerepeated-match
Primary metricavg_kicktipp_points
Alpha0.0500

Friedman test across all paired runs; pairwise Wilcoxon signed-rank tests use holm correction, with bootstrap confidence intervals for paired differences.

Dataset metadata

Bundesliga 2025/26 opening match, FC Bayern München vs RB Leipzig on matchday 1, ended 6:0. Repeated-match dataset for probing whether models with knowledge after the fixture reproduce the exact known outcome.

Field Value
FixtureFC Bayern München vs RB Leipzig
Actual ResultFC Bayern München 6 - 0 RB Leipzig
Matchday1
Repetitions25
Why InterestingBundesliga 2025/26 opening match, FC Bayern München vs RB Leipzig on matchday 1, ended 6:0. Repeated-match dataset for probing whether models with knowledge after the fixture reproduce the exact known outcome.
Competitionbundesliga-2025-26
Communitypes-squad
Season2025/2026
Slicerepeat-25-knowledge-cutoff-bayern-rbl-md1
Source Poolmd01-fc-bayern-munchen-vs-rb-leipzig
Sample Size25
Sample Methodrepeated-match
Scoperepeated-match
Slice Kindrepeated-match
Source Datasetmatch-predictions/bundesliga-2025-26/pes-squad

Run ranking

Rank Run Model Primary metric
1gpt-5.5 (xhigh)gpt-5.5 (xhigh)2.1600
2gpt-5-nanogpt-5-nano2.0000
3gpt-5.5 (none)gpt-5.5 (none)2.0000

Multi-run comparison

Friedman p-value 0.1353
gpt-5.5 (xhigh)gpt-5-nano0.16000.15730.4719no2/23/0
gpt-5.5 (xhigh)gpt-5.5 (none)0.16000.15730.4719no2/23/0
gpt-5-nanogpt-5.5 (none)0.00001.00001.0000no0/25/0

Per-item win/tie/loss counts compare paired Kicktipp points for the listed run ordering on each prepared dataset item.