WeatherBlend

Multi-model forecast blending

About WeatherBlend

WeatherBlend is a multi-model weather-forecast blending proof of concept for (0.0000°, 0.0000°, 0m). Eight numerical weather-prediction (NWP) models feed in via Open-Meteo — NOAA GFS, ECMWF IFS, DWD ICON, Météo-France, UK Met Office (UKV+UM Global blended), Environment Canada GEM, ECMWF AIFS (the GraphCast-style AI model), and JMA Global — and the predictions are blended with LightGBM trained against ERA5 reanalysis or per-station EA Hydrology rainfall, depending on the target. The Met Office DataHub Spot product also ships as a sanity check on the temp + rain skill pages, plotted alongside the blenders.

What it predicts

Temperature at 2 m — leads 24 / 48 / 72 / 96 / 120 h, blended against ERA5 reanalysis at the Bonehill grid cell. Two flavours ship side by side: Phase 2b lean (13-feature champion) and Phase 2c rich (88-feature challenger, adds per-NWP humidity / cloud / wind / pressure secondaries).
Precipitation occurrence P(wet ≥ 0.1 mm/h), per hour — one classifier per EA gauge (Bellever Dartmoor, Bovey Tracey, Dartmoor nr Hexworthy) at the same five leads. Phase 3a lean (27 features) and Phase 3c rich (55 features, adds per-NWP humidity, surface pressure, and EA trailing-rainfall persistence). Truth from EA Hydrology 15-min tip readings, hourly-aggregated with a 4-of-4 reading gate.
Dry-window probability per UTC day — P(∃ contiguous N-hour dry block in 09:00–18:00 local time) for N ∈ {3, 4, 6} hours at leads 24 / 48 / 72 h, per station. Phase 3b (53-feature LightGBM champion) and Phase 3p (Gaussian copula Monte Carlo over Phase 3o's hourly P(wet) marginals — 20,000 draws per row through a single empirical Σ per station fit on train-split observed daytime wet/dry sequences) ship side-by-side. The copula sampler captures within-day wet/dry autocorrelation that an iid sampler misses, and reads three indicators off the same MC pass so cross-window monotonicity P(N=3) ≥ P(N=4) ≥ P(N=6) holds by construction.
Start-hour curve — for each (station, window, lead, day), P(an N-hour dry block runs from each candidate start hour within the daytime window). Derived from the same 3p MC pass as the dry-window prob — each hour is its own marginal probability (overlapping windows, so the curve need not sum to the daily "any block" figure). Sits alongside each window's dry-window cards.
Feels-like — Bröde 2012 UTCI and Steadman 1994 shade-form apparent temperature. Both derived at predict time (no separate model training) from the temperature blender plus four element blenders (humidity, wind, shortwave radiation, cloud cover). UTCI is the rigorous biothermal index; Steadman is the BBC/BoM "feels like" the public knows.
Confidence tags. Conformal calibrators (split-conformal, α = 0.10 → 90% coverage) wrap every active P(wet) and dry-window blender. They're auto-fit on the validation slice the moment a champion or challenger is promoted, so a freshly-retrained version always ships with calibrators in place. Each forecast hour or window carries a "confident wet" / "ambiguous" / "confident dry" tag based on which prediction-set the calibrator places it in.

Data sources

Forecasts: Open-Meteo (live + historical-forecast API) provides every NWP listed above through one consistent JSON interface; Met Office DataHub Spot adds a ninth deterministic forecast as a comparator (not a blender input).
Training truth (temperature + element blenders): ERA5 reanalysis via Open-Meteo (gapless, quantitative, ~5-day publication lag).
Training + verification truth (precipitation, dry window): Environment Agency Hydrology rainfall gauges (Bellever Dartmoor, Bovey Tracey, Dartmoor nr Hexworthy), 15-min tips aggregated to hourly with a 4-of-4 reading gate.
Verification cross-checks (temperature): METAR EGTE from aviationweather.gov (Exeter Airport, ~30 km E of Bonehill, 31 m elevation), and Met Office DataHub Land Observations at geohash gcj0z3 (Cocktree Throat / Taw Green near North Wyke, ~22 km NNW, ~120-150 m elevation). Both sit well below Bonehill's 393 m so carry a systematic warm bias — Taw Green's is smaller (~1.6 °C lapse-rate estimate vs ~2.4 °C for EGTE). Used as cross-checks, not metrics we tune to.

Pipeline

A Cloudflare Worker fires four GitHub Actions workflows on cron schedules: collect pulls fresh NWP forecasts + observations every 6 h (08:30 / 14:30 / 20:30 / 02:30 UTC); predict-and-render runs 30 min later on the same cycle, executing every blender against the freshest inputs and regenerating this static site on Cloudflare Pages; truth-refresh backfills the daily ERA5 + EA-rainfall truth window at 12:00 UTC; verify runs Mon + Thu at 09:30 UTC and flags rolling-MAE / Brier drift > 1.5× training-test score per (model version, lead), emitting JSON sidecars that feed the Models-page verify-history tables.

Caveats

ERA5 is a 0.25° gridded reanalysis. It represents a grid-cell average near Bonehill, not the tor itself — the blender learns the systematic offset.
METAR EGTE and Met Office obs (Taw Green) are both lowland sources well below Bonehill's 393 m. Useful as cross-checks; lapse-rate bias is a few °C and the blender doesn't try to predict either.
Rolling-MAE / Brier charts and Models-page verify-history tables warm up over the first 5-9 days post-retrain (one verify cycle plus 5-day ERA5 latency before a fresh champion's predictions reach the verify window).
Open-Meteo's historical-forecast endpoint returns best-available-per-valid-time, not rigorous "as-issued" forecasts. Good for PoC training; not publication-grade re-verification.
Met Office Spot's PoP threshold is "any measurable precip", looser than our 0.1 mm/h training label; its line on the rain skill chart reads as direction-of-effect, not like-for-like.

Source: github.com/harry1310/WeatherBlend. Rendered 2026-06-20 21:37Z.