Daily briefing
All value bets across every open market, ranked by edge. Status tells you whether to bet now or wait for fresher models.
Model freshness — current time: UTC
Analyse market
Paste a Polymarket URL to load buckets automatically, then run the forecast analysis.
Step 1 — Load from Polymarket
Paste a URL above — city, date and buckets will all fill in automatically.
Step 2 — Confirm details
Step 3 — Buckets
Temperature bucketMarket yes %
Results tracker
Paper trades logged automatically from the briefing. Results resolve overnight — check here each morning.
📋 Paper trades
Auto-logged from briefing · resolves overnight from Polymarket
Manual trade log — for real bets or manual paper trades
Bet analysis
AI-powered statistical report on your paper trade history. Filters by bet type, outcome, stars, and city. Meaningful conclusions only appear once sufficient trades are logged.
Filters
Weather Edge — User Guide
The definitive reference: how the app works, how to use it, and the betting strategy behind it.
1 · What this app does and why it works

Polymarket runs daily weather markets where traders bet on whether the temperature in a given city will land in a specific range. The market price reflects collective human judgment about the probability. Weather Edge replaces that human judgment with something better: a 31-member numerical weather prediction ensemble that produces a genuine probability distribution across outcomes.

The edge comes from a structural asymmetry that will not close: processing 31 ensemble forecasts and comparing them to market prices requires System 2 thinking — slow, deliberate, computational. The people pricing these markets use System 1 — fast, intuitive, heuristic. System 1 is incapable of running the calculation. This is not a temporary inefficiency. It is permanent, because it is cognitive.

The app automates the entire System 2 process: fetch models → compute probabilities → compare to market → size the bet → log and track. Your job is to check it twice a day and press the button.

2 · The four data sources
GFS ensemble (31 members)

The American Global Forecast System run 31 times with slightly different initial conditions. Each run produces a complete temperature forecast. We count what fraction of the 31 runs land in each bucket after rounding to whole degrees — matching Wunderground's resolution, which is how markets resolve. This is the primary betting signal. If 14 of 31 members show a daily high of 19°C, model probability = 45%.

ECMWF IFS (deterministic + ensemble)

The European Centre for Medium-Range Weather Forecasts — generally considered the world's most accurate NWP system. Used as a cross-check against GFS. When both agree, confidence is high. When they diverge by more than 1.5°, caution is warranted.

GFS deterministic

The single best-estimate GFS forecast (as opposed to the ensemble). Used as a sanity check — it should be close to the ensemble mean. Large divergence between deterministic and ensemble mean is a flag.

METAR (live station)

Current observed temperature at the exact airport station the market resolves against. Most valuable for same-day markets. Requires a free CheckWX API key in your Cloudflare environment variables.

3 · Your daily workflow

The 80/20 answer: set one alarm for 07:05 BST. That single session captures the majority of available edge.

Primary — 07:05 BST daily

GFS 06Z and ECMWF 00Z both available ~06:00 BST. Both models fresh simultaneously. Markets dormant all night — maximum anchoring gap. European markets not yet repriced. US traders asleep. Run briefing, click Log all BET NOW, done in 15 minutes.

Secondary — 19:00 BST daily

GFS 12Z and ECMWF 12Z both available ~18:00 BST. Good for US markets — evening repricing often incomplete. Best window for Asian cities.

Opportunistic — ~00:30 BST

GFS 18Z available ~midnight BST. ECMWF stale. BET NOW fires on GFS alone if spread ≤1.5° and edge ≥12pp — shown with blue GFS only badge. Do not set an alarm for this.

4 · Model freshness — the three-tier status system
Tier 1 — Both GFS and ECMWF fresh

Maximum confidence. BET NOW fires when edge ≥10pp, spread tight, members sufficient. Standard case at 07:05 and 19:00 BST.

Tier 2 — GFS fresh, ECMWF stale (blue GFS only badge)

BET NOW fires only if GFS spread ≤1.5° AND edge ≥12pp. Applies at the midnight window.

Tier 3 — GFS stale

Always WAIT regardless of ECMWF freshness. GFS ensemble is the primary signal.

GFS runs 00Z/06Z/12Z/18Z + ~5h lag. ECMWF runs 00Z/12Z + ~5h lag. In BST: GFS available ~05:00/11:00/17:00/23:00. ECMWF ~06:00/18:00.

5 · Reading the daily briefing table

Each row is the single best opportunity in that market — the bucket with the largest model-vs-market divergence.

Dir — Direction

YES — model thinks this bucket is more likely than market implies. Buy YES shares. NO — market has overpriced this bucket. Buy NO shares.

Model% — Model probability

Fraction of GFS ensemble members landing in this bucket after rounding to whole degrees. 14 of 31 = 45%.

Mkt% — Market implied probability

Current YES price as a percentage. Edge figures are overstated by roughly 2-4pp due to fees and spread. Never bet on edges below 8pp gross.

Edge — The opportunity

Model% minus Mkt%. Green (+) = underpriced, bet YES. Red (−) = overpriced, bet NO.

Kelly — Suggested bet size

Quarter-Kelly from your bankroll, scaled by lead time and confidence stars. Always treat Kelly as a ceiling, not a target.

Spread — GFS internal uncertainty

Standard deviation of the 31 GFS members. ±0.8° = confident. ±2.5° = uncertain — consider halving Kelly.

Stars — Combined confidence rating

★★★ all conditions favourable. ★★☆ one condition marginal. ★☆☆ multiple conditions weak — speculative only.

6 · The status signals

BET NOW Edge ≥10pp · GFS fresh · spread acceptable · members sufficient.

BET NOW GFS only  Tier 2 — GFS fresh, ECMWF stale, spread ≤1.5° and edge ≥12pp.

WAIT Edge exists but one or more conditions unmet. Check again after the next model run.

PASS Edge below threshold. Not worth trading after fees.

7 · Betting strategy — from conventional edge to Kahneman/Taleb

The people pricing these markets are not irrational. They are human. Human brains run decision-making shortcuts that create predictable, systematic, exploitable errors.

Daniel Kahneman — Thinking, Fast and Slow

System 1 (fast, intuitive) prices the market. System 2 (slow, deliberate) runs the 31-member ensemble calculation. System 1 can't do that. It will never do that. This structural gap is the permanent, non-arbitrageable core of your edge.

Nassim Taleb — The Black Swan

Markets built on human intuition systematically underprice tail events. The tail buckets in temperature markets face a double discount: statistically underpriced (Taleb) and psychologically avoided (Kahneman).

WYSIATI. The market prices what it can see: yesterday's weather, the BBC headline, the season. It cannot see 31 ensemble runs, the spread across members, or the ECMWF divergence. You can.

Anchoring. The first prices set on a market are highly sticky. The largest edges appear in the 1–2 hours after model updates, before the market has repriced.

Availability bias. After a cold spell, cold feels probable. The ensemble has no memory of last week. After unusual weather, the opposite tail is systematically underpriced.

Overconfidence — the rule for you. Never override the model based on personal weather intuition. The moment you do, you have become the market you are trying to beat.
The four opportunity types
Type A — Fresh model · stale market (WYSIATI + Anchoring)

The bread and butter. Every day at 07:05 BST the GFS model updates. The market price was set by humans yesterday. The gap between them is your edge — it closes within 1-2 hours as traders reprice. The majority of your BET NOW rows will be Type A.

Type B — Recency play · check opposite tail (Availability bias)

After unusual weather — a heatwave, cold snap — the market overweights continuation. When the model starts showing reversion the market hasn't priced, the opposite tail is systematically underpriced. Not auto-detected yet; spot manually after any sustained unusual weather run.

Type C — Tail underpriced · barbell bet (Loss aversion + Fat tails)

A tail bucket trading at very low odds (≤10%) where the ensemble shows meaningful support. Small stake, high payout. Auto-detected when market odds ≤10% and edge ≥8pp. Individually speculative — collectively exploiting a structural inefficiency that will never close.

Type D — History + model agree · highest confidence (Base rate neglect)

Where GFS model, ERA5 historical base rate, and edge all point in the same direction. Two independent signals disagreeing with the market. The D tag fires in the Analyse tab when Hist%, Model%, and edge all agree direction and edge ≥8pp.

8 · Kelly sizing — what it is and how to use it

The Kelly criterion answers: given an edge, what fraction of bankroll should you stake to maximise long-run growth without risking ruin?

The formula

f = (p×b − q) / b — where p = model probability, q = 1−p, b = decimal odds. The result f is the optimal fraction of bankroll.

Quarter-Kelly baseline. The app uses 25% of full Kelly. Accounts for the fact that our probability estimates are uncertain.

Continuous lead-time decay. max(0.25, 1 − 0.12×(daysAhead−1)). Day 1: 100%. Day 3: 76%. Day 5: 52%. Day 7: 28%.

Divergence reducer. When GFS and ECMWF disagree: 0–1°: 100%, 1–2°: 85%, 2–3°: 70%, 3–4°: 50%, >4°: 25%.

Star multiplier. ★★★ = 100%, ★★☆ = 50%, ★☆☆ = 25% of scaled Kelly.

5% bankroll cap. Hard ceiling per trade regardless of formula output.

The honest caveat. All multipliers are derived from judgment, not empirical calibration. Until you have 200+ resolved trades and Brier score below 0.20, start real-money bets at half the displayed amount.
9 · Reading the reliability diagram

The reliability diagram answers: when the model says 40% probability, does it actually win 40% of the time? It appears in the tracker once you have 15 resolved trades.

How to read it. X axis = model predicted probability. Y axis = actual win rate. Dashed diagonal = perfect calibration. Dots sized by trade count per bucket.

Points above the diagonal — underconfident. Model predicts 40% but you win 55%. True edge is larger than calculated.

Points below the diagonal — overconfident. Model predicts 60% but you win 45%. Kelly stakes are too large. Reduce until the diagram corrects.
Brier score

Mean squared error of probability forecasts. 0 = perfect. 0.25 = uninformative coin flip. A well-calibrated weather model achieves 0.15–0.20 for day-1 forecasts.

10 · City bias analysis

The city analysis panel in the tracker diagnoses whether GFS has a systematic warm or cold bias for each location. Cards appear at 5+ resolved trades per city.

The YES/NO split. The key diagnostic. If GFS runs warm, YES bets on high-temperature buckets lose more than expected while NO bets win more.

Confidence gates. Below 10 trades: no diagnosis. 10–19: tentative. 20–29: emerging, 10% stake reduction. 30+: 25% reduction.

Non-stationarity. GFS bias varies by season and is reset by model upgrades. Treat city bias as a rolling signal, not a fixed correction.
10a · Historical base rates — ERA5

The Hist% column in the Analyse tab shows the historical frequency of each temperature bucket for that city and month, drawn from 10 years of ERA5 reanalysis data.

What it's good for. Catching structural market mispricing. Detecting availability bias. Identifying Type D opportunities.

What it cannot do. Predict tomorrow. GFS is far better at that.

Recency weighting. The app applies linear decay weight to the 10-year ERA5 data. Most recent year gets weight 2.0, oldest year 0.2. This partially corrects for the warming trend cold bias.

Sample size note. ~21 observations per bucket — treat Hist% as having ±5-8pp uncertainty. A gap of 2-3pp is not meaningful. A gap of 15pp+ is.
11 · Known limitations
31 members is not a large ensemble. Probability estimates are granular to ~3pp. True probability could differ from our estimate by 10pp in either direction.

Members are not independent. Effective sample size for capturing true atmospheric uncertainty is considerably less than 31.

Edge overstated by ~2-4pp. We compare to the displayed market price, not the true breakeven price after fees.

Kelly multipliers are arbitrary. Not derived from empirical data. Reasonable starting points, nothing more.

Selection bias in hit rate. We only bet when edge exceeds a threshold. Hit rate is not an unbiased estimator of model accuracy.

One season is not calibration. 50 trades in June tells you almost nothing about winter.
11a · Sample sizes — when can you trust the data?
To detect 60% hit rate vs 50% (large effect): 50 trades at 90% confidence.

To detect 55% hit rate vs 50% (moderate effect): 193 trades at 90% confidence.

For city-level bias direction: 100 trades per city.

For auto-betting go/no-go: 500 total trades, 50+ per major city, Brier score below 0.20.
The autocorrelation problem

Sequential trades during the same weather regime are not statistically independent. Ten trades during a June heatwave may have an effective sample size of 2-3. Real-world requirements are roughly double the thresholds above.

12 · What comes next
Historical backtest tab. Fetch resolved Polymarket markets + ERA5 actuals, simulate Kelly bets, test market mispricing thesis without waiting for live trade volume.

Automatic bias compensation. Once city bias is confirmed at 30+ trades, Kelly auto-reduces on the biased direction.

Auto-betting infrastructure. Python backend, ★★★ only, Tier 1 only, daily loss limit 5%, kill switch. Not until 500+ calibrated trades.

Bootstrap confidence intervals. Error bars on all dashboard statistics.

Precipitation markets. Infrastructure exists, awaiting Polymarket daily markets.