KicktippAi experiment analysis
gpt-5.5 (high) vs gpt-5.5 (medium)
Task: repeated-match-slice
Primary metric: avg_kicktipp_points
Runs: 2
Pairings: 10
Compact head to head
Significant
p-value
0.0020
gpt-5.5 (high)
19.5000
avg points
gpt-5.5 (medium)
15.0000
avg points
Prediction distribution
gpt-5.5 (high)
n=150
2:1
50
1:2
48
0:1
16
2:0
11
0:3
10
1:0
10
1:1
3
0:2
2
gpt-5.5 (medium)
n=150
1:1
55
2:1
47
1:2
17
2:0
12
0:3
10
0:2
5
1:3
3
3:1
1
Matches
15 fixtures
gpt-5.5 (high)
n=10
2.0000
avg points
0:3
2pt
10
gpt-5.5 (medium)
n=10
2.0000
avg points
0:3
2pt
10
gpt-5.5 (high)
n=10
2.0000
avg points
2:1
2pt
10
gpt-5.5 (medium)
n=10
2.0000
avg points
2:1
2pt
10
gpt-5.5 (high)
n=10
0.2000
avg points
1:2
0pt
8
0:1
0pt
1
2:1
2pt
1
gpt-5.5 (medium)
n=10
0.0000
avg points
1:1
0pt
8
1:2
0pt
2
gpt-5.5 (high)
n=10
0.0000
avg points
1:2
0pt
10
gpt-5.5 (medium)
n=10
0.0000
avg points
1:2
0pt
7
1:3
0pt
3
gpt-5.5 (high)
n=10
0.0000
avg points
0:1
0pt
6
0:2
0pt
2
1:2
0pt
2
gpt-5.5 (medium)
n=10
0.0000
avg points
0:2
0pt
5
1:2
0pt
5
gpt-5.5 (high)
n=10
0.0000
avg points
1:2
0pt
7
0:1
0pt
2
2:1
0pt
1
gpt-5.5 (medium)
n=10
1.8000
avg points
1:1
2pt
9
1:2
0pt
1
gpt-5.5 (high)
n=10
3.0000
avg points
2:1
3pt
10
gpt-5.5 (medium)
n=10
3.0000
avg points
2:1
3pt
10
gpt-5.5 (high)
n=10
0.0000
avg points
2:1
0pt
10
gpt-5.5 (medium)
n=10
0.0000
avg points
2:1
0pt
10
gpt-5.5 (high)
n=10
0.0000
avg points
1:2
0pt
10
gpt-5.5 (medium)
n=10
1.8000
avg points
1:1
2pt
9
1:2
0pt
1
gpt-5.5 (high)
n=10
4.0000
avg points
1:2
4pt
10
gpt-5.5 (medium)
n=10
0.4000
avg points
1:1
0pt
9
1:2
4pt
1
gpt-5.5 (high)
n=10
2.0000
avg points
2:1
2pt
10
gpt-5.5 (medium)
n=10
2.0000
avg points
2:1
2pt
10
gpt-5.5 (high)
n=10
2.0000
avg points
2:1
2pt
7
2:0
2pt
3
gpt-5.5 (medium)
n=10
2.0000
avg points
2:1
2pt
7
2:0
2pt
2
3:1
2pt
1
gpt-5.5 (high)
n=10
0.0000
avg points
2:0
0pt
8
1:0
0pt
2
gpt-5.5 (medium)
n=10
0.0000
avg points
2:0
0pt
10
gpt-5.5 (high)
n=10
1.8000
avg points
1:0
2pt
8
1:1
0pt
1
2:1
2pt
1
gpt-5.5 (medium)
n=10
0.0000
avg points
1:1
0pt
10
gpt-5.5 (high)
n=10
2.5000
avg points
0:1
3pt
7
1:1
0pt
2
1:2
4pt
1
gpt-5.5 (medium)
n=10
0.0000
avg points
1:1
0pt
10
Summary
Datasetmatch-predictions/bundesliga-2025-26/pes-squad/repeated-match-slices/all-matchdays-after-20251202t230000z/random-15x10-seed-20260517-after-20251203
Task typerepeated-match-slice
Primary metricavg_kicktipp_points
Alpha0.0500
Dataset metadata
| Field |
Value |
| Competition | bundesliga-2025-26 |
| Community | pes-squad |
| Season | 2025/2026 |
| Slice | random-15x10-seed-20260517-after-20251203 |
| Source Pool | all-matchdays-after-20251202t230000z |
| Matches | 15 |
| Repetitions | 10 |
| Predictions | 150 |
| Sample Size | 150 |
| Sample Method | repeated-match-slice |
| Sample Seed | 20260517 |
| Scope | repeated-match-slice |
| Slice Kind | repeated-match-slice |
| Source Dataset | match-predictions/bundesliga-2025-26/pes-squad |
| Starts After | 2025-12-03T00:00:00 Europe/Berlin (+01) |
| Rank |
Run |
Model |
Primary metric |
| 1 | gpt-5.5 (high) | gpt-5.5 (high) | 19.5000 |
| 2 | gpt-5.5 (medium) | gpt-5.5 (medium) | 15.0000 |
Better rungpt-5.5 (high)
Other rungpt-5.5 (medium)
avg_kicktipp_points delta4.5000
Wilcoxon p-value0.0020
Mean difference4.5000
Median difference5.0000
Per-item W/T/L10/0/0
Effect size confidence intervals
| Statistic |
Point estimate |
Low |
High |
| Mean difference | 4.5000 | 3.3000 | 5.7000 |
| Median difference | 5.0000 | 4.0000 | 7.0000 |