KicktippAi experiment analysis

gpt-5.5 (xhigh) vs gpt-5.5 (none)

match-predictions/bundesliga-2025-26/pes-squad/repeated-match/md26-vfb-stuttgart-vs-rb-leipzig/repeat-25

Task: repeated-match Primary metric: avg_kicktipp_points Runs: 2 Pairings: 25

At a glance

not significant ยท p-value 0.1573
Match to predict

VfB Stuttgart vs RB Leipzig

Matchday 262026-03-15T19:30:00 UTC+01 (+01)
Actual outcome VfB Stuttgart 1 - 0 RB Leipzig
Compact head to head

Not significant

p-value 0.1573
gpt-5.5 (xhigh)
3.0000 avg points
gpt-5.5 (none)
2.7600 avg points

Prediction distribution

gpt-5.5 (xhigh) n=25
2:1 25
gpt-5.5 (none) n=25
2:1 23
2:2 2

Summary

Datasetmatch-predictions/bundesliga-2025-26/pes-squad/repeated-match/md26-vfb-stuttgart-vs-rb-leipzig/repeat-25
Task typerepeated-match
Primary metricavg_kicktipp_points
Alpha0.0500

Paired Wilcoxon signed-rank test on per-item Kicktipp-point differences; bootstrap confidence intervals summarize mean and median paired differences.

Dataset metadata

Stuttgart's 1-0 Matchday 26 win over Leipzig was a close top-four clash where Stuttgart leapfrogged Leipzig.

Field Value
FixtureVfB Stuttgart vs RB Leipzig
Actual ResultVfB Stuttgart 1 - 0 RB Leipzig
Matchday26
Repetitions25
Why InterestingStuttgart's 1-0 Matchday 26 win over Leipzig was a close top-four clash where Stuttgart leapfrogged Leipzig.
Competitionbundesliga-2025-26
Communitypes-squad
Season2025/2026
Slicerepeat-25
Source Poolmd26-vfb-stuttgart-vs-rb-leipzig
Sample Size25
Sample Methodrepeated-match
Scoperepeated-match
Slice Kindrepeated-match
Source Datasetmatch-predictions/bundesliga-2025-26/pes-squad

Run ranking

Rank Run Model Primary metric
1gpt-5.5 (xhigh)gpt-5.5 (xhigh)3.0000
2gpt-5.5 (none)gpt-5.5 (none)2.7600

Two-run comparison

not significant
Better rungpt-5.5 (xhigh)
Other rungpt-5.5 (none)
avg_kicktipp_points delta0.2400
Wilcoxon p-value0.1573
Mean difference0.2400
Median difference0.0000
Per-item W/T/L2/23/0

Effect size confidence intervals

Statistic Point estimate Low High
Mean difference0.2400-0.12000.4800
Median difference0.00000.00000.0000

Per-item win/tie/loss counts compare paired Kicktipp points for the listed run ordering on each prepared dataset item.