KicktippAi experiment analysis

o3 (medium) vs o3 (high)

match-predictions/bundesliga-2025-26/pes-squad/slices/all-matchdays-after-20251130t230000z/random-40-seed-20260505-prod-plus-o3-effort

Task: slice Primary metric: total_kicktipp_points Runs: 2 Pairings: 40 Deep-dive analysis

At a glance

not significant ยท p-value 0.8852
Compact head to head

Not significant

p-value 0.8852
o3 (medium)
1.2750 avg points
o3 (high)
1.2500 avg points

Prediction distribution

o3 (medium) n=40
2:1 13
1:2 12
3:1 5
2:0 3
1:1 2
0:2 1
1:0 1
1:3 1
2:2 1
4:1 1
o3 (high) n=40
2:1 16
1:2 11
2:0 4
0:2 2
1:0 2
1:1 2
0:1 1
4:0 1
4:1 1

Summary

Datasetmatch-predictions/bundesliga-2025-26/pes-squad/slices/all-matchdays-after-20251130t230000z/random-40-seed-20260505-prod-plus-o3-effort
Task typeslice
Primary metrictotal_kicktipp_points
Alpha0.0500

Paired Wilcoxon signed-rank test on per-item Kicktipp-point differences; bootstrap confidence intervals summarize mean and median paired differences.

Dataset metadata

random-sample dataset for 40 item(s) on random-40-seed-20260505-prod-plus-o3-effort

Field Value
Competitionbundesliga-2025-26
Communitypes-squad
Season2025/2026
Slicerandom-40-seed-20260505-prod-plus-o3-effort
Source Poolall-matchdays-after-20251130t230000z
Sample Size40
Sample Methodrandom-sample
Sample Seed20260505
Scopematch-slice
Slice Kindrandom-sample
Source Datasetmatch-predictions/bundesliga-2025-26/pes-squad
Starts After2025-12-01T00:00:00 Europe/Berlin (+01)

Run ranking

Rank Run Model Primary metric
1o3 (medium)o3 (medium)51.0000
2o3 (high)o3 (high)50.0000

Two-run comparison

not significant
Better runo3 (medium)
Other runo3 (high)
total_kicktipp_points delta1.0000
Wilcoxon p-value0.8852
Mean difference0.0250
Median difference0.0000
Per-item W/T/L5/32/3

Effect size confidence intervals

Statistic Point estimate Low High
Mean difference0.0250-0.17500.2250
Median difference0.00000.00000.0000

Per-item win/tie/loss counts compare paired Kicktipp points for the listed run ordering on each prepared dataset item.