Friedman test across all paired runs; pairwise Wilcoxon signed-rank tests use holm correction, with bootstrap confidence intervals for paired differences.
Dataset metadata
repeated-match-slice dataset for 150 item(s) on random-15x10-seed-20260517-after-20251203
Field
Value
Competition
bundesliga-2025-26
Community
pes-squad
Season
2025/2026
Slice
random-15x10-seed-20260517-after-20251203
Source Pool
all-matchdays-after-20251202t230000z
Matches
15
Repetitions
10
Predictions
150
Sample Size
150
Sample Method
repeated-match-slice
Sample Seed
20260517
Scope
repeated-match-slice
Slice Kind
repeated-match-slice
Source Dataset
match-predictions/bundesliga-2025-26/pes-squad
Starts After
2025-12-03T00:00:00 Europe/Berlin (+01)
Run ranking
Rank
Run
Model
Primary metric
1
o3 (high)
o3 (high)
20.0000
2
o3 (medium)
o3 (medium)
19.8000
3
o3 (low)
o3 (low)
19.0000
Multi-run comparison
Friedman p-value 0.9726
o3 (high)
o3 (medium)
0.2000
0.7969
1.0000
no
4/1/5
o3 (high)
o3 (low)
1.0000
0.2969
0.8906
no
5/1/4
o3 (medium)
o3 (low)
0.8000
0.6562
1.0000
no
4/2/4
Per-item win/tie/loss counts compare paired Kicktipp points for the listed run ordering on each prepared dataset item.