KicktippAi experiment analysis

gpt-5.5 (high) vs gpt-5.5 (medium)

match-predictions/bundesliga-2025-26/pes-squad/repeated-match-slices/all-matchdays-after-20251202t230000z/random-15x10-seed-20260517-after-20251203

Task: repeated-match-slice Primary metric: avg_kicktipp_points Runs: 2 Pairings: 10

At a glance

significant · p-value 0.0020
Compact head to head

Significant

p-value 0.0020
gpt-5.5 (high)
19.5000 avg points
gpt-5.5 (medium)
15.0000 avg points

Prediction distribution

gpt-5.5 (high) n=150
2:1 50
1:2 48
0:1 16
2:0 11
0:3 10
1:0 10
1:1 3
0:2 2
gpt-5.5 (medium) n=150
1:1 55
2:1 47
1:2 17
2:0 12
0:3 10
0:2 5
1:3 3
3:1 1

Matches

15 fixtures

Per-match averages and scoreline distributions are descriptive. Individual matches do not run significance tests.

Match 1

1. FC Heidenheim 18460:4FC Bayern München

Matchday 152025-12-21T17:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
2.0000 avg points
0:3 2pt 10
gpt-5.5 (medium) n=10
2.0000 avg points
0:3 2pt 10
Match 2

Borussia Dortmund2:01899 Hoffenheim

Matchday 132025-12-07T17:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
2.0000 avg points
2:1 2pt 10
gpt-5.5 (medium) n=10
2.0000 avg points
2:1 2pt 10
Match 3

VfL Wolfsburg3:11. FC Union Berlin

Matchday 132025-12-06T15:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
0.2000 avg points
1:2 0pt 8
0:1 0pt 1
2:1 2pt 1
gpt-5.5 (medium) n=10
0.0000 avg points
1:1 0pt 8
1:2 0pt 2
Match 4

FC Augsburg2:0Bayer 04 Leverkusen

Matchday 132025-12-06T15:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
0.0000 avg points
1:2 0pt 10
gpt-5.5 (medium) n=10
0.0000 avg points
1:2 0pt 7
1:3 0pt 3
Match 5

FC St. Pauli1:1RB Leipzig

Matchday 162026-01-27T20:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
0.0000 avg points
0:1 0pt 6
0:2 0pt 2
1:2 0pt 2
gpt-5.5 (medium) n=10
0.0000 avg points
0:2 0pt 5
1:2 0pt 5
Match 6

1. FC Heidenheim 18462:21. FC Köln

Matchday 162026-01-10T15:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
0.0000 avg points
1:2 0pt 7
0:1 0pt 2
2:1 0pt 1
gpt-5.5 (medium) n=10
1.8000 avg points
1:1 2pt 9
1:2 0pt 1
Match 7

VfB Stuttgart1:0SC Freiburg

Matchday 202026-02-01T15:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
3.0000 avg points
2:1 3pt 10
gpt-5.5 (medium) n=10
3.0000 avg points
2:1 3pt 10
Match 8

RB Leipzig1:2FSV Mainz 05

Matchday 202026-01-31T15:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
0.0000 avg points
2:1 0pt 10
gpt-5.5 (medium) n=10
0.0000 avg points
2:1 0pt 10
Match 9

SC Freiburg3:3Bayer 04 Leverkusen

Matchday 252026-03-07T15:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
0.0000 avg points
1:2 0pt 10
gpt-5.5 (medium) n=10
1.8000 avg points
1:1 2pt 9
1:2 0pt 1
Match 10

Hamburger SV1:2RB Leipzig

Matchday 242026-03-01T19:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
4.0000 avg points
1:2 4pt 10
gpt-5.5 (medium) n=10
0.4000 avg points
1:1 0pt 9
1:2 4pt 1
Match 11

VfB Stuttgart4:0Hamburger SV

Matchday 292026-04-12T18:30:00 UTC+02 (+02)
gpt-5.5 (high) n=10
2.0000 avg points
2:1 2pt 10
gpt-5.5 (medium) n=10
2.0000 avg points
2:1 2pt 10
Match 12

Bayer 04 Leverkusen6:3VfL Wolfsburg

Matchday 282026-04-04T16:30:00 UTC+02 (+02)
gpt-5.5 (high) n=10
2.0000 avg points
2:1 2pt 7
2:0 2pt 3
gpt-5.5 (medium) n=10
2.0000 avg points
2:1 2pt 7
2:0 2pt 2
3:1 2pt 1
Match 13

Bor. Mönchengladbach2:21. FC Heidenheim 1846

Matchday 282026-04-04T16:30:00 UTC+02 (+02)
gpt-5.5 (high) n=10
0.0000 avg points
2:0 0pt 8
1:0 0pt 2
gpt-5.5 (medium) n=10
0.0000 avg points
2:0 0pt 10
Match 14

Bor. Mönchengladbach2:0FC St. Pauli

Matchday 262026-03-13T20:30:00 UTC+01 (+01)
gpt-5.5 (high) n=10
1.8000 avg points
1:0 2pt 8
1:1 0pt 1
2:1 2pt 1
gpt-5.5 (medium) n=10
0.0000 avg points
1:1 0pt 10
Match 15

FC St. Pauli1:2FSV Mainz 05

Matchday 322026-05-03T16:30:00 UTC+02 (+02)
gpt-5.5 (high) n=10
2.5000 avg points
0:1 3pt 7
1:1 0pt 2
1:2 4pt 1
gpt-5.5 (medium) n=10
0.0000 avg points
1:1 0pt 10

Summary

Datasetmatch-predictions/bundesliga-2025-26/pes-squad/repeated-match-slices/all-matchdays-after-20251202t230000z/random-15x10-seed-20260517-after-20251203
Task typerepeated-match-slice
Primary metricavg_kicktipp_points
Alpha0.0500

Paired Wilcoxon signed-rank test on per-item Kicktipp-point differences; bootstrap confidence intervals summarize mean and median paired differences.

Dataset metadata

repeated-match-slice dataset for 150 item(s) on random-15x10-seed-20260517-after-20251203

Field Value
Competitionbundesliga-2025-26
Communitypes-squad
Season2025/2026
Slicerandom-15x10-seed-20260517-after-20251203
Source Poolall-matchdays-after-20251202t230000z
Matches15
Repetitions10
Predictions150
Sample Size150
Sample Methodrepeated-match-slice
Sample Seed20260517
Scoperepeated-match-slice
Slice Kindrepeated-match-slice
Source Datasetmatch-predictions/bundesliga-2025-26/pes-squad
Starts After2025-12-03T00:00:00 Europe/Berlin (+01)

Run ranking

Rank Run Model Primary metric
1gpt-5.5 (high)gpt-5.5 (high)19.5000
2gpt-5.5 (medium)gpt-5.5 (medium)15.0000

Two-run comparison

significant
Better rungpt-5.5 (high)
Other rungpt-5.5 (medium)
avg_kicktipp_points delta4.5000
Wilcoxon p-value0.0020
Mean difference4.5000
Median difference5.0000
Per-item W/T/L10/0/0

Effect size confidence intervals

Statistic Point estimate Low High
Mean difference4.50003.30005.7000
Median difference5.00004.00007.0000

Per-item win/tie/loss counts compare paired Kicktipp points for the listed run ordering on each prepared dataset item.