WeatherBlend

Multi-model forecast blending for Bonehill Rocks, Dartmoor

Dry-window models

Per-(station, window) P(at least one N-hour dry block in 09–18 local time). Phase 3b (53-feature LightGBM champion) and Phase 3g (parameter-free MC over 3a's hourly P(wet) — guarantees cross-window monotonicity by construction). Brier — lower is better. The Δ column compares the blend to the best single NWP on the same test slice; negative means the blend wins.

Dry window — Bellever Dartmoor — 3-hour

Phase 3b · v2026-04-29_120905

Per-(station, window) classifier for whether at least one N-hour dry block occurs in the target UTC day. Trained 2026-04-29. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.117 has_dry_window_ecmwf (0.242) -51.6%
+48h 0.121 has_dry_window_ecmwf (0.242) -50.0%
+72h 0.121 has_dry_window_gfs (0.220) -45.1%
Verify history (1 run)

Twice-weekly Brier/MAE on the held-out rolling window — one row per verify run, drift flag in the last column. Metric: Brier. Version column names which trained model the row's numbers came from — a freshly retrained champion shows zero rows here for ~5-9d (one verify cycle plus 5d ERA5 latency), so a row labelled with an older version is the previous lineage's history under the same phase.

Run (UTC) Version +24h+48h+72h Drift
v2026-04-23_101107
v2026-04-27_192657
0.0000.0000.000

Phase 3g · v2026-05-03_152442_phase3g

Parameter-free MC — 10,000 Bernoulli draws over Phase 3a's hourly P(wet); the prediction is the fraction whose longest dry run reaches the window length. Cross-window monotonicity holds by construction. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.108 (0.000 val)
+48h 0.114 (0.000 val)
+72h 0.134 (0.000 val)
Verify history (no runs yet)

No verify rows on disk match this card's phase (3g). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Dry window — Bellever Dartmoor — 4-hour

Phase 3b · v2026-04-29_121003

Per-(station, window) classifier for whether at least one N-hour dry block occurs in the target UTC day. Trained 2026-04-29. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.120 has_dry_window_aifs (0.192) -37.6%
+48h 0.125 has_dry_window_ecmwf (0.218) -42.8%
+72h 0.154 has_dry_window_ecmwf (0.260) -40.9%
Verify history (1 run)

Twice-weekly Brier/MAE on the held-out rolling window — one row per verify run, drift flag in the last column. Metric: Brier. Version column names which trained model the row's numbers came from — a freshly retrained champion shows zero rows here for ~5-9d (one verify cycle plus 5d ERA5 latency), so a row labelled with an older version is the previous lineage's history under the same phase.

Run (UTC) Version +24h+48h+72h Drift
v2026-04-23_101150
v2026-04-27_192749
0.0000.0000.000

Phase 3g · v2026-05-03_152503_phase3g

Parameter-free MC — 10,000 Bernoulli draws over Phase 3a's hourly P(wet); the prediction is the fraction whose longest dry run reaches the window length. Cross-window monotonicity holds by construction. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.115 (0.000 val)
+48h 0.116 (0.000 val)
+72h 0.158 (0.000 val)
Verify history (no runs yet)

No verify rows on disk match this card's phase (3g). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Dry window — Bellever Dartmoor — 6-hour

Phase 3b · v2026-05-03_120517

Per-(station, window) classifier for whether at least one N-hour dry block occurs in the target UTC day. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.119 has_dry_window_mf (0.207) -42.5%
+48h 0.140 has_dry_window_aifs (0.228) -38.7%
+72h 0.177 has_dry_window_ecmwf (0.256) -30.8%
Verify history (1 run)

Twice-weekly Brier/MAE on the held-out rolling window — one row per verify run, drift flag in the last column. Metric: Brier. Version column names which trained model the row's numbers came from — a freshly retrained champion shows zero rows here for ~5-9d (one verify cycle plus 5d ERA5 latency), so a row labelled with an older version is the previous lineage's history under the same phase.

Run (UTC) Version +24h+48h+72h Drift
v2026-04-23_101214
v2026-04-27_182428
v2026-04-27_192839
0.0010.0000.001

Phase 3g · v2026-05-03_152525_phase3g

Parameter-free MC — 10,000 Bernoulli draws over Phase 3a's hourly P(wet); the prediction is the fraction whose longest dry run reaches the window length. Cross-window monotonicity holds by construction. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.108 (0.000 val)
+48h 0.108 (0.000 val)
+72h 0.128 (0.000 val)
Verify history (no runs yet)

No verify rows on disk match this card's phase (3g). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Dry window — Bovey Tracey — 3-hour

Phase 3b · v2026-05-03_233816

Per-(station, window) classifier for whether at least one N-hour dry block occurs in the target UTC day. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.064 has_dry_window_icon (0.111) -42.2%
+48h 0.063 has_dry_window_icon (0.145) -56.8%
+72h 0.075 has_dry_window_gfs (0.155) -51.8%
Verify history (no runs yet)

No verify rows on disk match this card's phase (3b). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Phase 3g · v2026-05-03_234315_phase3g

Parameter-free MC — 10,000 Bernoulli draws over Phase 3a's hourly P(wet); the prediction is the fraction whose longest dry run reaches the window length. Cross-window monotonicity holds by construction. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.062 (0.000 val)
+48h 0.070 (0.000 val)
+72h 0.075 (0.000 val)
Verify history (no runs yet)

No verify rows on disk match this card's phase (3g). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Dry window — Bovey Tracey — 4-hour

Phase 3b · v2026-05-03_233924

Per-(station, window) classifier for whether at least one N-hour dry block occurs in the target UTC day. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.089 has_dry_window_gfs (0.214) -58.5%
+48h 0.106 has_dry_window_mf (0.179) -41.2%
+72h 0.108 has_dry_window_gfs (0.241) -55.3%
Verify history (no runs yet)

No verify rows on disk match this card's phase (3b). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Phase 3g · v2026-05-03_234420_phase3g

Parameter-free MC — 10,000 Bernoulli draws over Phase 3a's hourly P(wet); the prediction is the fraction whose longest dry run reaches the window length. Cross-window monotonicity holds by construction. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.099 (0.000 val)
+48h 0.113 (0.000 val)
+72h 0.116 (0.000 val)
Verify history (no runs yet)

No verify rows on disk match this card's phase (3g). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Dry window — Bovey Tracey — 6-hour

Phase 3b · v2026-05-03_234027

Per-(station, window) classifier for whether at least one N-hour dry block occurs in the target UTC day. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.135 has_dry_window_mf (0.162) -16.9%
+48h 0.144 has_dry_window_mf (0.214) -32.7%
+72h 0.173 has_dry_window_mf (0.259) -33.1%
Verify history (no runs yet)

No verify rows on disk match this card's phase (3b). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Phase 3g · v2026-05-03_234523_phase3g

Parameter-free MC — 10,000 Bernoulli draws over Phase 3a's hourly P(wet); the prediction is the fraction whose longest dry run reaches the window length. Cross-window monotonicity holds by construction. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.096 (0.000 val)
+48h 0.109 (0.000 val)
+72h 0.128 (0.000 val)
Verify history (no runs yet)

No verify rows on disk match this card's phase (3g). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Dry window — Dartmoor Nr Hexworthy — 3-hour

Phase 3b · v2026-04-29_121459

Per-(station, window) classifier for whether at least one N-hour dry block occurs in the target UTC day. Trained 2026-04-29. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.127 has_dry_window_jma (0.192) -33.7%
+48h 0.156 has_dry_window_aifs (0.272) -42.6%
+72h 0.143 has_dry_window_jma (0.208) -31.2%
Verify history (no runs yet)

No verify rows on disk match this card's phase (3b). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Phase 3g · v2026-05-03_152650_phase3g

Parameter-free MC — 10,000 Bernoulli draws over Phase 3a's hourly P(wet); the prediction is the fraction whose longest dry run reaches the window length. Cross-window monotonicity holds by construction. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.123 (0.000 val)
+48h 0.142 (0.000 val)
+72h 0.151 (0.000 val)
Verify history (no runs yet)

No verify rows on disk match this card's phase (3g). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Dry window — Dartmoor Nr Hexworthy — 4-hour

Phase 3b · v2026-04-29_121554

Per-(station, window) classifier for whether at least one N-hour dry block occurs in the target UTC day. Trained 2026-04-29. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.132 has_dry_window_aifs (0.224) -41.3%
+48h 0.175 has_dry_window_ecmwf (0.234) -25.4%
+72h 0.171 has_dry_window_aifs (0.264) -35.2%
Verify history (no runs yet)

No verify rows on disk match this card's phase (3b). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Phase 3g · v2026-05-03_152711_phase3g

Parameter-free MC — 10,000 Bernoulli draws over Phase 3a's hourly P(wet); the prediction is the fraction whose longest dry run reaches the window length. Cross-window monotonicity holds by construction. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.127 (0.000 val)
+48h 0.152 (0.000 val)
+72h 0.168 (0.000 val)
Verify history (no runs yet)

No verify rows on disk match this card's phase (3g). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Dry window — Dartmoor Nr Hexworthy — 6-hour

Phase 3b · v2026-05-03_120600

Per-(station, window) classifier for whether at least one N-hour dry block occurs in the target UTC day. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.165 has_dry_window_ecmwf (0.223) -26.0%
+48h 0.175 has_dry_window_ecmwf (0.281) -37.7%
+72h 0.200 has_dry_window_ecmwf (0.264) -24.3%
Verify history (no runs yet)

No verify rows on disk match this card's phase (3b). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.

Phase 3g · v2026-05-03_152732_phase3g

Parameter-free MC — 10,000 Bernoulli draws over Phase 3a's hourly P(wet); the prediction is the fraction whose longest dry run reaches the window length. Cross-window monotonicity holds by construction. Trained 2026-05-03. Metric: Test Brier.

Lead Blend Best single Δ vs best
+24h 0.165 (0.000 val)
+48h 0.164 (0.000 val)
+72h 0.185 (0.000 val)
Verify history (no runs yet)

No verify rows on disk match this card's phase (3g). Either the next verify (twice-weekly Mon + Thu 09:30 UTC, then 5d ERA5 latency) hasn't yet scored predictions made by this version, or older verify files used a different phase tag for this lineage. Re-check after the next Mon/Thu cycle.