KicktippAi experiment analysis

gpt-5.5 (xhigh) vs gpt-5-nano vs gpt-5.5 (none)

match-predictions/bundesliga-2025-26/pes-squad/repeated-match/md01-fc-bayern-munchen-vs-rb-leipzig/repeat-25-knowledge-cutoff-bayern-rbl-md1

Task: repeated-match Primary metric: avg_kicktipp_points Runs: 3 Pairings: 25

At a glance

Match to predict

FC Bayern München vs RB Leipzig

Matchday 12025-08-22T21:30:00 UTC+02 (+02)

Actual outcome FC Bayern München 6 - 0 RB Leipzig

Prediction distribution

gpt-5.5 (xhigh) n=25

3:1 20

2:1 3

6:0 2

gpt-5-nano n=25

2:1 20

3:1 4

3:2 1

gpt-5.5 (none) n=25

3:1 25

100x low follow-up

Exact 6:0: 5 / 100

A later gpt-5.5 low run repeats the same source match, hosted prompt route, and exact pre-kickoff evaluation time on a 100x repeated-match dataset. It is published as a separate single-run page because this report is a paired 25x comparison.

Follow-up report: gpt-5.5 (low) 100x knowledge cutoff follow-up. Companion writeup: knowledge-cutoff-bayern-rbl-repeated-match.md.

Summary

Datasetmatch-predictions/bundesliga-2025-26/pes-squad/repeated-match/md01-fc-bayern-munchen-vs-rb-leipzig/repeat-25-knowledge-cutoff-bayern-rbl-md1

Task typerepeated-match

Primary metricavg_kicktipp_points

Alpha0.0500

Friedman test across all paired runs; pairwise Wilcoxon signed-rank tests use holm correction, with bootstrap confidence intervals for paired differences.

Dataset metadata

Bundesliga 2025/26 opening match, FC Bayern München vs RB Leipzig on matchday 1, ended 6:0. Repeated-match dataset for probing whether models with knowledge after the fixture reproduce the exact known outcome.

Field	Value
Fixture	FC Bayern München vs RB Leipzig
Actual Result	FC Bayern München 6 - 0 RB Leipzig
Matchday	1
Repetitions	25
Why Interesting	Bundesliga 2025/26 opening match, FC Bayern München vs RB Leipzig on matchday 1, ended 6:0. Repeated-match dataset for probing whether models with knowledge after the fixture reproduce the exact known outcome.
Competition	bundesliga-2025-26
Community	pes-squad
Season	2025/2026
Slice	repeat-25-knowledge-cutoff-bayern-rbl-md1
Source Pool	md01-fc-bayern-munchen-vs-rb-leipzig
Sample Size	25
Sample Method	repeated-match
Scope	repeated-match
Slice Kind	repeated-match
Source Dataset	match-predictions/bundesliga-2025-26/pes-squad

Run ranking

Rank	Run	Model	Primary metric
1	gpt-5.5 (xhigh)	gpt-5.5 (xhigh)	2.1600
2	gpt-5-nano	gpt-5-nano	2.0000
3	gpt-5.5 (none)	gpt-5.5 (none)	2.0000

Multi-run comparison

Friedman p-value 0.1353


gpt-5.5 (xhigh)	gpt-5-nano	0.1600	0.1573	0.4719	no	2/23/0
gpt-5.5 (xhigh)	gpt-5.5 (none)	0.1600	0.1573	0.4719	no	2/23/0
gpt-5-nano	gpt-5.5 (none)	0.0000	1.0000	1.0000	no	0/25/0

Per-item win/tie/loss counts compare paired Kicktipp points for the listed run ordering on each prepared dataset item.