Phase 2c · v2026-06-15_090237_phase2c Δ +0.000 vs prev train
Rich blender, 88 features (adds dew/RH/cloud/wind/pressure). Trained 2026-06-15. Metric: Test MAE (°C).
| Lead | Blend | Best single | Δ vs best |
|---|---|---|---|
| +24h | 0.255 | temp_ecmwf (0.284) | -10.1% |
| +48h | 0.303 | temp_aifs (0.310) | -2.3% |
| +72h | 0.365 | temp_aifs (0.382) | -4.4% |
| +96h | 0.423 | temp_aifs (0.464) | -8.9% |
| +120h | 0.492 | temp_aifs (0.479) | +2.7% |
Verify history (2 runs)
Mon + Thu rolling MAE (°C). Per-lead cells turn red when the rolling metric breaches the lead-specific drift threshold; check the verify report for the per-cell breakdown. Version column names the trained model — a fresh champion takes ~5-9d to show.
| Run (UTC) | Version | N | +24h | +48h | +72h | +96h | +120h |
|---|---|---|---|---|---|---|---|
v2026-06-07_145248_phase2c |
384 | 0.507 temp_ecmwf: 0.223 | 0.450 temp_ecmwf: 0.275 | 0.224 temp_ecmwf: 0.377 | 0.276 temp_mf: 0.400 | — | |
v2026-06-07_145248_phase2c |
48 | 0.588 temp_aifs: 0.092 | — | — | — | — |
By actual NWP forecast lead (6h buckets)
Same data grouped by ValidTime − freshest contributing NWP cycle (6h buckets) instead of trained-lead label. Reveals MAE structure within a trained bucket once predict spread to hourly outputs (2026-05-04+). Buckets start at the trained lead — earlier figures measured from the cron-fire time, which made offset-day models look like sub-lead forecasts.
| Run (UTC) | 24-29h | 30-35h | 36-41h | 42-47h | 48-53h | 54-59h | 60-65h | 66-71h | 72-77h | 78-83h | 84-89h | 90-95h | 96-101h | 102-107h |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.506 | 0.460 | 0.561 | 0.612 | 0.479 | 0.407 | 0.304 | 0.481 | 0.211 | 0.287 | 0.241 | 0.220 | 0.254 | 0.520 | |
| 0.582 | 0.656 | — | — | — | — | — | — | — | — | — | — | — | — |