KicktippAi experiment analysis

match-predictions/bundesliga-2025-26/pes-squad/repeated-match-slices/all-matchdays-after-20251202t230000z/random-15x10-seed-20260517-after-20251203

Task: repeated-match-slice Primary metric: avg_kicktipp_points Runs: 5 Pairings: 10

At a glance

Prediction distribution

o3 (medium) n=150

2:1 40

1:2 38

2:0 16

3:1 11

0:1 9

0:2 9

1:1 8

1:3 8

0:3 6

1:0 2

1:4 2

3:2 1

gpt-5.5 (high) n=150

2:1 50

1:2 48

0:1 16

2:0 11

0:3 10

1:0 10

1:1 3

0:2 2

gpt-5.4-nano (none) n=150

2:1 53

1:1 37

1:2 29

1:3 12

0:2 7

2:0 5

1:0 3

1:4 2

3:1 2

gpt-5.5 (none) n=150

1:1 57

2:1 56

1:2 18

1:3 13

2:0 4

1:4 2

gpt-5.5 (medium) n=150

1:1 55

2:1 47

1:2 17

2:0 12

0:3 10

0:2 5

1:3 3

3:1 1

Matches

15 fixtures

Per-match averages and scoreline distributions are descriptive. Individual matches do not run significance tests.

Match 1

1. FC Heidenheim 18460:4FC Bayern München

Matchday 152025-12-21T17:30:00 UTC+01 (+01)

o3 (medium) n=10

2.0000 avg points

0:3 2pt 6

1:3 2pt 2

1:4 2pt 2

gpt-5.5 (high) n=10

2.0000 avg points

0:3 2pt 10

gpt-5.4-nano (none) n=10

2.0000 avg points

1:3 2pt 8

1:4 2pt 2

gpt-5.5 (none) n=10

2.0000 avg points

1:3 2pt 8

1:4 2pt 2

gpt-5.5 (medium) n=10

2.0000 avg points

0:3 2pt 10

Match 2

Borussia Dortmund2:01899 Hoffenheim

Matchday 132025-12-07T17:30:00 UTC+01 (+01)

o3 (medium) n=10

2.0000 avg points

2:1 2pt 9

3:2 2pt 1

gpt-5.5 (high) n=10

2.0000 avg points

2:1 2pt 10

gpt-5.4-nano (none) n=10

2.1000 avg points

2:1 2pt 9

3:1 3pt 1

gpt-5.5 (none) n=10

2.0000 avg points

2:1 2pt 10

gpt-5.5 (medium) n=10

2.0000 avg points

2:1 2pt 10

Match 3

VfL Wolfsburg3:11. FC Union Berlin

Matchday 132025-12-06T15:30:00 UTC+01 (+01)

o3 (medium) n=10

0.0000 avg points

1:2 0pt 8

0:1 0pt 1

1:1 0pt 1

gpt-5.5 (high) n=10

0.2000 avg points

1:2 0pt 8

0:1 0pt 1

2:1 2pt 1

gpt-5.4-nano (none) n=10

0.0000 avg points

1:1 0pt 10

gpt-5.5 (none) n=10

0.0000 avg points

1:1 0pt 10

gpt-5.5 (medium) n=10

0.0000 avg points

1:1 0pt 8

1:2 0pt 2

Match 4

FC Augsburg2:0Bayer 04 Leverkusen

Matchday 132025-12-06T15:30:00 UTC+01 (+01)

o3 (medium) n=10

0.0000 avg points

1:3 0pt 6

0:2 0pt 2

1:2 0pt 2

gpt-5.5 (high) n=10

0.0000 avg points

1:2 0pt 10

gpt-5.4-nano (none) n=10

0.0000 avg points

0:2 0pt 5

1:3 0pt 3

1:2 0pt 2

gpt-5.5 (none) n=10

0.0000 avg points

1:2 0pt 5

1:3 0pt 5

gpt-5.5 (medium) n=10

0.0000 avg points

1:2 0pt 7

1:3 0pt 3

Match 5

FC St. Pauli1:1RB Leipzig

Matchday 162026-01-27T20:30:00 UTC+01 (+01)

o3 (medium) n=10

0.0000 avg points

0:2 0pt 7

1:2 0pt 2

0:1 0pt 1

gpt-5.5 (high) n=10

0.0000 avg points

0:1 0pt 6

0:2 0pt 2

1:2 0pt 2

gpt-5.4-nano (none) n=10

0.0000 avg points

1:2 0pt 7

0:2 0pt 2

1:3 0pt 1

gpt-5.5 (none) n=10

0.0000 avg points

1:2 0pt 10

gpt-5.5 (medium) n=10

0.0000 avg points

0:2 0pt 5

1:2 0pt 5

Match 6

1. FC Heidenheim 18462:21. FC Köln

Matchday 162026-01-10T15:30:00 UTC+01 (+01)

o3 (medium) n=10

0.4000 avg points

1:2 0pt 7

1:1 2pt 2

0:1 0pt 1

gpt-5.5 (high) n=10

0.0000 avg points

1:2 0pt 7

0:1 0pt 2

2:1 0pt 1

gpt-5.4-nano (none) n=10

2.0000 avg points

1:1 2pt 10

gpt-5.5 (none) n=10

2.0000 avg points

1:1 2pt 10

gpt-5.5 (medium) n=10

1.8000 avg points

1:1 2pt 9

1:2 0pt 1

Match 7

VfB Stuttgart1:0SC Freiburg

Matchday 202026-02-01T15:30:00 UTC+01 (+01)

o3 (medium) n=10

2.7000 avg points

2:1 3pt 7

3:1 2pt 3

gpt-5.5 (high) n=10

3.0000 avg points

2:1 3pt 10

gpt-5.4-nano (none) n=10

3.0000 avg points

2:1 3pt 10

gpt-5.5 (none) n=10

3.0000 avg points

2:1 3pt 10

gpt-5.5 (medium) n=10

3.0000 avg points

2:1 3pt 10

Match 8

RB Leipzig1:2FSV Mainz 05

Matchday 202026-01-31T15:30:00 UTC+01 (+01)

o3 (medium) n=10

0.0000 avg points

2:1 0pt 6

3:1 0pt 4

gpt-5.5 (high) n=10

0.0000 avg points

2:1 0pt 10

gpt-5.4-nano (none) n=10

0.0000 avg points

2:0 0pt 5

2:1 0pt 4

3:1 0pt 1

gpt-5.5 (none) n=10

0.0000 avg points

2:1 0pt 10

gpt-5.5 (medium) n=10

0.0000 avg points

2:1 0pt 10

Match 9

SC Freiburg3:3Bayer 04 Leverkusen

Matchday 252026-03-07T15:30:00 UTC+01 (+01)

o3 (medium) n=10

0.2000 avg points

1:2 0pt 4

0:1 0pt 3

2:1 0pt 2

1:1 2pt 1

gpt-5.5 (high) n=10

0.0000 avg points

1:2 0pt 10

gpt-5.4-nano (none) n=10

0.0000 avg points

1:2 0pt 10

gpt-5.5 (none) n=10

2.0000 avg points

1:1 2pt 10

gpt-5.5 (medium) n=10

1.8000 avg points

1:1 2pt 9

1:2 0pt 1

Match 10

Hamburger SV1:2RB Leipzig

Matchday 242026-03-01T19:30:00 UTC+01 (+01)

o3 (medium) n=10

3.2000 avg points

1:2 4pt 8

1:1 0pt 2

gpt-5.5 (high) n=10

4.0000 avg points

1:2 4pt 10

gpt-5.4-nano (none) n=10

4.0000 avg points

1:2 4pt 10

gpt-5.5 (none) n=10

1.2000 avg points

1:1 0pt 7

1:2 4pt 3

gpt-5.5 (medium) n=10

0.4000 avg points

1:1 0pt 9

1:2 4pt 1

Match 11

VfB Stuttgart4:0Hamburger SV

Matchday 292026-04-12T18:30:00 UTC+02 (+02)

o3 (medium) n=10

2.0000 avg points

2:1 2pt 8

3:1 2pt 2

gpt-5.5 (high) n=10

2.0000 avg points

2:1 2pt 10

gpt-5.4-nano (none) n=10

2.0000 avg points

2:1 2pt 10

gpt-5.5 (none) n=10

2.0000 avg points

2:1 2pt 10

gpt-5.5 (medium) n=10

2.0000 avg points

2:1 2pt 10

Match 12

Bayer 04 Leverkusen6:3VfL Wolfsburg

Matchday 282026-04-04T16:30:00 UTC+02 (+02)

o3 (medium) n=10

2.0000 avg points

2:0 2pt 8

3:1 2pt 2

gpt-5.5 (high) n=10

2.0000 avg points

2:1 2pt 7

2:0 2pt 3

gpt-5.4-nano (none) n=10

2.0000 avg points

2:1 2pt 10

gpt-5.5 (none) n=10

2.0000 avg points

2:1 2pt 7

2:0 2pt 3

gpt-5.5 (medium) n=10

2.0000 avg points

2:1 2pt 7

2:0 2pt 2

3:1 2pt 1

Match 13

Bor. Mönchengladbach2:21. FC Heidenheim 1846

Matchday 282026-04-04T16:30:00 UTC+02 (+02)

o3 (medium) n=10

0.0000 avg points

2:0 0pt 8

1:0 0pt 1

2:1 0pt 1

gpt-5.5 (high) n=10

0.0000 avg points

2:0 0pt 8

1:0 0pt 2

gpt-5.4-nano (none) n=10

0.6000 avg points

2:1 0pt 7

1:1 2pt 3

gpt-5.5 (none) n=10

0.0000 avg points

2:1 0pt 9

2:0 0pt 1

gpt-5.5 (medium) n=10

0.0000 avg points

2:0 0pt 10

Match 14

Bor. Mönchengladbach2:0FC St. Pauli

Matchday 262026-03-13T20:30:00 UTC+01 (+01)

o3 (medium) n=10

1.6000 avg points

2:1 2pt 7

1:1 0pt 2

1:0 2pt 1

gpt-5.5 (high) n=10

1.8000 avg points

1:0 2pt 8

1:1 0pt 1

2:1 2pt 1

gpt-5.4-nano (none) n=10

1.2000 avg points

1:1 0pt 4

1:0 2pt 3

2:1 2pt 3

gpt-5.5 (none) n=10

0.0000 avg points

1:1 0pt 10

gpt-5.5 (medium) n=10

0.0000 avg points

1:1 0pt 10

Match 15

FC St. Pauli1:2FSV Mainz 05

Matchday 322026-05-03T16:30:00 UTC+02 (+02)

o3 (medium) n=10

3.7000 avg points

1:2 4pt 7

0:1 3pt 3

gpt-5.5 (high) n=10

2.5000 avg points

0:1 3pt 7

1:1 0pt 2

1:2 4pt 1

gpt-5.4-nano (none) n=10

0.0000 avg points

1:1 0pt 10

gpt-5.5 (none) n=10

0.0000 avg points

1:1 0pt 10

gpt-5.5 (medium) n=10

0.0000 avg points

1:1 0pt 10

Summary

Datasetmatch-predictions/bundesliga-2025-26/pes-squad/repeated-match-slices/all-matchdays-after-20251202t230000z/random-15x10-seed-20260517-after-20251203

Task typerepeated-match-slice

Primary metricavg_kicktipp_points

Alpha0.0500

Friedman test across all paired runs; pairwise Wilcoxon signed-rank tests use holm correction, with bootstrap confidence intervals for paired differences.

Dataset metadata

repeated-match-slice dataset for 150 item(s) on random-15x10-seed-20260517-after-20251203

Field	Value
Competition	bundesliga-2025-26
Community	pes-squad
Season	2025/2026
Slice	random-15x10-seed-20260517-after-20251203
Source Pool	all-matchdays-after-20251202t230000z
Matches	15
Repetitions	10
Predictions	150
Sample Size	150
Sample Method	repeated-match-slice
Sample Seed	20260517
Scope	repeated-match-slice
Slice Kind	repeated-match-slice
Source Dataset	match-predictions/bundesliga-2025-26/pes-squad
Starts After	2025-12-03T00:00:00 Europe/Berlin (+01)

Run ranking

Rank	Run	Model	Primary metric
1	o3 (medium)	o3 (medium)	19.8000
2	gpt-5.5 (high)	gpt-5.5 (high)	19.5000
3	gpt-5.4-nano (none)	gpt-5.4-nano (none)	18.9000
4	gpt-5.5 (none)	gpt-5.5 (none)	16.2000
5	gpt-5.5 (medium)	gpt-5.5 (medium)	15.0000

Multi-run comparison

Friedman p-value 0.0001


o3 (medium)	gpt-5.5 (high)	0.3000	0.4102	1.0000	no	8/0/2
o3 (medium)	gpt-5.4-nano (none)	0.9000	0.4062	1.0000	no	5/1/4
o3 (medium)	gpt-5.5 (none)	3.6000	0.0195	0.1172	no	9/0/1
o3 (medium)	gpt-5.5 (medium)	4.8000	0.0020	0.0195	yes	10/0/0
gpt-5.5 (high)	gpt-5.4-nano (none)	0.6000	0.4297	1.0000	no	5/1/4
gpt-5.5 (high)	gpt-5.5 (none)	3.3000	0.0020	0.0195	yes	10/0/0
gpt-5.5 (high)	gpt-5.5 (medium)	4.5000	0.0020	0.0195	yes	10/0/0
gpt-5.4-nano (none)	gpt-5.5 (none)	2.7000	0.0234	0.1172	no	8/0/2
gpt-5.4-nano (none)	gpt-5.5 (medium)	3.9000	0.0039	0.0273	yes	9/0/1
gpt-5.5 (none)	gpt-5.5 (medium)	1.2000	0.3125	1.0000	no	4/5/1

Per-item win/tie/loss counts compare paired Kicktipp points for the listed run ordering on each prepared dataset item.