Daily briefing
All value bets across every open market, ranked by edge. Status tells you whether to bet now or wait for fresher models.
Model freshness — current time: UTC
Analyse market
Paste a Polymarket URL to load buckets automatically, then run the forecast analysis.
Step 1 — Load from Polymarket
Paste a URL above — city, date and buckets will all fill in automatically.
Step 2 — Confirm details
Step 3 — Buckets
Temperature bucketMarket yes %
Results tracker
Paper trades logged automatically from the briefing. Results resolve overnight — check here each morning.
📋 Paper trades
Auto-logged from briefing · resolves overnight from Polymarket
Manual trade log — for real bets or manual paper trades
Weather Edge — User Guide
The definitive reference: how the app works, how to use it, and the betting strategy behind it.
1 · What this app does and why it works

Polymarket runs daily weather markets where traders bet on whether the temperature in a given city will land in a specific range. The market price reflects collective human judgment about the probability. Weather Edge replaces that human judgment with something better: a 31-member numerical weather prediction ensemble that produces a genuine probability distribution across outcomes.

The edge comes from a structural asymmetry that will not close: processing 31 ensemble forecasts and comparing them to market prices requires System 2 thinking — slow, deliberate, computational. The people pricing these markets use System 1 — fast, intuitive, heuristic. System 1 is incapable of running the calculation. This is not a temporary inefficiency. It is permanent, because it is cognitive.

The app automates the entire System 2 process: fetch models → compute probabilities → compare to market → size the bet → log and track. Your job is to check it twice a day and press the button.

2 · The four data sources
GFS ensemble (31 members)

The American Global Forecast System run 31 times with slightly different initial conditions. Each run produces a complete temperature forecast. We count what fraction of the 31 runs land in each bucket after rounding to whole degrees — matching Wunderground's resolution, which is how markets resolve. This is the primary betting signal. If 14 of 31 members show a daily high of 19°C, model probability = 45%.

ECMWF IFS (deterministic + ensemble)

The European Centre for Medium-Range Weather Forecasts — generally considered the world's most accurate NWP system. Used as a cross-check against GFS. When both agree, confidence is high. When they diverge by more than 1.5°, caution is warranted. The briefing runs GFS-only for speed. The full Analyse tab fetches both.

GFS deterministic

The single best-estimate GFS forecast (as opposed to the ensemble). Used as a sanity check — it should be close to the ensemble mean. Large divergence between deterministic and ensemble mean is a flag.

METAR (live station)

Current observed temperature at the exact airport station the market resolves against. Most valuable for same-day markets. Requires a free CheckWX API key in your Cloudflare environment variables.

3 · Your daily workflow

The 80/20 answer: set one alarm for 07:05 BST. That single session captures the majority of available edge.

Primary — 07:05 BST daily

GFS 06Z and ECMWF 00Z both available ~06:00 BST. Both models fresh simultaneously. Markets dormant all night — maximum anchoring gap. European markets not yet repriced. US traders asleep. Structurally the best window of the 24-hour cycle. Run briefing, click Log all BET NOW, done in 15 minutes.

Secondary — 19:00 BST daily

GFS 12Z and ECMWF 12Z both available ~18:00 BST. Good for US markets — evening repricing often incomplete. Worth doing; adds roughly 25% more opportunities. Best window for Asian cities (Tokyo, Singapore, HK) whose trading day is ending.

Opportunistic — ~00:30 BST (if you are up)

GFS 18Z available ~midnight BST. ECMWF won't update until ~06:00. The app fires BET NOW on GFS alone if spread ≤1.5° and edge ≥12pp — shown with a blue GFS only badge. Do not set an alarm for this. US markets have the best anchoring here.

4 · Model freshness — the three-tier status system
Tier 1 — Both GFS and ECMWF fresh

Maximum confidence. BET NOW fires when edge ≥10pp, spread tight, members sufficient. Standard case at 07:05 and 19:00 BST.

Tier 2 — GFS fresh, ECMWF stale (blue GFS only badge)

BET NOW fires only if GFS spread ≤1.5° AND edge ≥12pp. A tight spread means 31 members are self-consistently clustering — the ensemble is its own cross-check. Applies at the midnight window. Slightly lower confidence than Tier 1; consider reducing Kelly by one star rating.

Tier 3 — GFS stale

Always WAIT regardless of ECMWF freshness. GFS ensemble is the primary signal — stale primary = no reliable edge. Check again after the next GFS update.

GFS runs 00Z/06Z/12Z/18Z + ~5h lag. ECMWF runs 00Z/12Z + ~5h lag. In BST: GFS available ~05:00/11:00/17:00/23:00. ECMWF ~06:00/18:00.

5 · Reading the daily briefing table

Each row is the single best opportunity in that market — the bucket with the largest model-vs-market divergence.

Dir — Direction

YES — model thinks this bucket is more likely than market implies. Buy YES shares. NO — market has overpriced this bucket. Buy NO shares. Both directions are equally valid; the edge calculation is symmetric.

Model% — Model probability

Fraction of GFS ensemble members landing in this bucket after rounding to whole degrees. 14 of 31 = 45%. Real probability from real forecast data. Briefing uses GFS only; Analyse tab adds ECMWF for a combined figure.

Mkt% — Market implied probability

Current YES price as a percentage. Note: this is not the true fair-value probability — Polymarket takes ~2% fees and there is a bid-ask spread. Edge figures are overstated by roughly 2-4pp. Never bet on edges below 8pp gross.

Edge — The opportunity

Model% minus Mkt%. Green (+) = underpriced, bet YES. Red (−) = overpriced, bet NO. Edge is a point estimate with considerable uncertainty — with 31 members, model probabilities are granular to ~3pp. Do not treat edge figures as precise.

Kelly — Suggested bet size

Quarter-Kelly from your bankroll, scaled by lead time and confidence stars. Stars apply a multiplier (★★★ = 100%, ★★☆ = 50%, ★☆☆ = 25%). A dash means no active market — skip. Always treat Kelly as a ceiling, not a target. Until you have 30+ calibrated trades, consider halving the displayed figure for real bets.

Spread — GFS internal uncertainty

Standard deviation of the 31 GFS members. ±0.8° = confident. ±2.5° = uncertain — consider halving Kelly. Tight spread is also the condition enabling BET NOW without ECMWF at the midnight window.

Stars — Combined confidence rating

★★★ all conditions favourable: 25+ members, models agree within 1.5°, spread tight, edge above 12pp. ★★☆ one condition marginal. ★☆☆ multiple conditions weak — speculative only. Stars feed into Kelly multiplier automatically.

6 · The status signals

BET NOW Edge ≥10pp · GFS fresh · spread acceptable · members sufficient. The reason any condition is not met is shown inline.

BET NOW GFS only  Tier 2 — GFS fresh, ECMWF stale, but spread ≤1.5° and edge ≥12pp. Actionable with slightly lower confidence.

WAIT Edge exists but one or more conditions unmet. The specific reason is shown. Check again after the next model run.

PASS Edge below threshold. Not worth trading after fees.

7 · Betting strategy — from conventional edge to Kahneman/Taleb

The people pricing these markets are not irrational. They are human. Human brains run decision-making shortcuts that create predictable, systematic, exploitable errors. Two intellectual frameworks from Nobel Prize winners map precisely onto the opportunities this app finds.

Daniel Kahneman — Thinking, Fast and Slow

Human decision-making runs on two systems. System 1 is fast, automatic, and intuitive — it handles roughly 96% of all decisions, including pricing a temperature market at 8am. System 2 is slow and deliberate — the kind required to process 31 ensemble runs and calculate probability distributions. System 1 can't do that. It will never do that. This structural gap is the permanent, non-arbitrageable core of your edge.

Nassim Taleb — The Black Swan, Fooled by Randomness

Markets built on human intuition systematically underprice tail events — the extreme outcomes that feel unlikely because they rarely come to mind easily. This mispricing is structural. It persists as long as humans price markets. The tail buckets in temperature markets face a double discount: statistically underpriced (Taleb) and psychologically avoided (Kahneman). That is where the highest-payout opportunities live.

WYSIATI — What You See Is All There Is. The market prices what it can see: yesterday's weather, the BBC headline, the season. It cannot see 31 ensemble runs, the spread across members, or the ECMWF divergence. You can. That is the entire edge in one sentence.

Anchoring. The first prices set on a market are highly sticky. Even when new model data arrives, the market under-adjusts from the opening anchor. The largest edges appear in the 1–2 hours after model updates, before the market has repriced. This is why timing matters.

Availability bias. After a cold spell, cold feels probable. After a hot week, warmth feels inevitable. The ensemble has no memory of last week — it uses only current atmospheric state. After unusual weather in one direction, the opposite tail is systematically underpriced.

Loss aversion. People avoid long-shot tail bets emotionally. Losing a 10:1 bet feels worse than the arithmetic loss. This suppresses tail bucket prices beyond statistical analysis alone, creating a structural double discount on extreme outcomes.

Overconfidence — the rule for you. Algorithms consistently outperform expert judgment in complex probabilistic environments. Kahneman proved this repeatedly. You are the algorithm. Never override the model based on personal weather intuition. The moment you do, you have become the market you are trying to beat.
The four opportunity types
Type A — Fresh model · stale market (WYSIATI + Anchoring) · Live now

The bread and butter. Every day at 07:05 BST the GFS model updates with new atmospheric data. The market price was set by humans yesterday and hasn't moved yet. The anchor is yesterday's price; the signal is today's model. The gap between them is your edge — it closes within 1-2 hours as traders reprice. This is what the daily briefing finds automatically. The majority of your BET NOW rows will be Type A.

The cognitive error: anchoring. The market under-adjusts from the opening price even when new information arrives.

Example: At 07:10 BST, GFS ensemble shows 19°C as 45% likely in London tomorrow. Market prices it at 6%. Edge +39pp, spread ±0.9°, ★★★. The anchor was set last night when 19°C felt like a stretch. The model now says otherwise. Act within 2 hours before the market reprices.

Type B — Recency play · check opposite tail (Availability bias) · Live now

After unusual weather — a heatwave, cold snap, five consecutive days of rain — the market overweights continuation. The ensemble has no memory of last week. It sees only the current atmospheric state. When the model starts showing reversion the market hasn't priced, the opposite tail is systematically underpriced. You need to spot these manually — run the briefing after any sustained unusual weather run and look at the tail buckets in the opposite direction. Not auto-detected yet; coming in a future build.

The cognitive error: availability bias. Recent vivid events dominate probability estimates. The market thinks "it's been hot all week so it'll stay hot" even when the model says otherwise.

Example: London has had five consecutive days above 28°C. The market is pricing warmth continuation heavily. GFS suddenly shows the cold front arriving — 14°C or below is now 35% likely but priced at only 8%. The market is stuck on last week's weather. After any sustained unusual run, check the opposite tail before doing anything else.

Type C — Tail underpriced · barbell bet (Loss aversion + Fat tails) · Live now

A tail bucket trading at very low odds (≤10%) where the ensemble shows meaningful support. Two things suppress the price below the statistical probability: loss aversion (people hate losing a longshot emotionally) and the availability heuristic (extreme temperatures feel unlikely because they're hard to imagine). Small stake, high payout, hold to resolution. The strategy is explicitly barbell — many small Type C bets running alongside your standard Type A plays. Individually speculative. Collectively exploiting a structural inefficiency that will never close. Auto-detected when market odds ≤10% and edge ≥8pp.

The cognitive errors: loss aversion (Kahneman) + fat tail underpricing (Taleb). The double discount that Taleb built a career identifying.

Example: The 22°C+ bucket in London on a June day is priced at 4%. Six of 31 GFS members show it — model probability 19%. Edge +15pp. Kelly suggests $3. Place it, log it, do not check it obsessively. You are not predicting the outcome — you are exploiting structural underpricing of tails. The law of large numbers works in your favour over time.

Type D — History + model agree · highest confidence (Base rate neglect) · Live now

The strongest signal in the framework. A Type D opportunity is where the GFS model, the ERA5 historical base rate, and the edge all point in the same direction simultaneously. The market is wrong not just because of today's model but because it's inconsistent with the long-run empirical frequency of that outcome. When you see a purple D tag in the Analyse bucket table, treat it as higher confidence than a standard Type A — you have two independent signals disagreeing with the market. The ideal bet is A+D simultaneously: fresh model signal confirmed by history.

The cognitive error: base rate neglect. People ignore the long-run frequency of outcomes when a recent vivid signal dominates. The market prices today's narrative; history prices the structural reality.

Auto-detection: the D tag fires when Hist%, Model%, and edge all agree direction and edge ≥8pp. Shown in the Analyse tab bucket table. The briefing does not fetch ERA5 for speed — click through to Analyse to check for Type D confirmation on any BET NOW row.

Example: A June day above 25°C in London has occurred 35% of the time historically over 10 years. The market prices it at 18% due to a recent cold week (availability bias). GFS ensemble shows 40% probability. All three signals agree — market is underpricing. Edge from model alone is +22pp; history confirms it. This is a Type A+D grand slam — the highest confidence opportunity type.

8 · Kelly sizing — what it is and how to use it

The Kelly criterion answers the question: given an edge, what fraction of your bankroll should you stake to maximise long-run growth without risking ruin? It was developed by John Kelly at Bell Labs in 1956 and is used by professional gamblers and quantitative traders.

The formula

f = (p×b − q) / b — where p = your probability, q = 1−p, b = decimal odds (what you win per dollar staked). The result f is the optimal fraction of bankroll to stake. If the answer is negative, don't bet. The formula assumes your probability estimate is correct — if it's wrong, Kelly can lead to overbetting.

Quarter-Kelly baseline. The app uses 25% of full Kelly. This accounts for the fact that our probability estimates are uncertain — with only 31 ensemble members and no long-run calibration yet, our model probability could easily be off by 10-15pp. Quarter-Kelly protects against this estimation error.

Continuous lead-time decay. Kelly is scaled by max(0.25, 1 − 0.12×(daysAhead−1)). Day 1: 100%. Day 2: 88%. Day 3: 76%. Day 4: 64%. Day 5: 52%. Day 6: 40%. Day 7: 28%. NWP forecast skill degrades continuously with lead time — further-ahead forecasts are less reliable so stakes are smaller.

Divergence reducer. When GFS and ECMWF disagree, Kelly is further scaled: 0–1°: 100%, 1–2°: 85%, 2–3°: 70%, 3–4°: 50%, >4°: 25%. Model disagreement signals genuine forecast uncertainty.

Star multiplier. ★★★ = 100%, ★★☆ = 50%, ★☆☆ = 25% of the scaled Kelly. Translates combined confidence rating into stake size.

5% bankroll cap. Hard ceiling per trade regardless of formula output. Prevents ruin on data errors or anomalous signals.

The honest caveat. All multipliers are derived from judgment, not empirical calibration of this specific model. Until you have 200+ resolved trades and a reliability diagram Brier score below 0.20, treat displayed Kelly as a ceiling and start real-money bets at half the displayed amount.
9 · Reading the reliability diagram

The reliability diagram is the single most important diagnostic in the tracker. It answers the fundamental question: when the model says 40% probability, does it actually win 40% of the time? It appears once you have 15 resolved trades.

How to read it. X axis = model predicted probability. Y axis = actual win rate. The dashed diagonal = perfect calibration. Dots are sized by trade count in that probability bucket — bigger dots are more statistically reliable. Numbers above each dot show the count.

Points above the diagonal — underconfident. Model predicts 40% but you win 55% of the time. The model is being too conservative — true edge is larger than calculated. Once confirmed across 30+ trades, you can be more aggressive with Kelly sizing.

Points below the diagonal — overconfident. Model predicts 60% but you win 45% of the time. The model overstates certainty — Kelly stakes are too large. Reduce Kelly until the diagram corrects. This is the more dangerous pattern.

Systematic directional bias. If YES bets cluster below the diagonal but NO bets sit on it, GFS is running warm — overestimating high-temperature outcomes. The city analysis panel will flag this separately and suggest a stake adjustment.
Brier score

Mean squared error of probability forecasts. Shown as the fourth stat tile once 15 trades are resolved. 0 = perfect. 0.25 = uninformative coin flip. A well-calibrated weather model achieves 0.15–0.20 for day-1 forecasts. Below 0.15 is excellent. Above 0.22 suggests either poor calibration or you are betting markets where the model has no real edge.

Important caveat. 15 trades renders the diagram but is far too few for conclusions — most buckets will have 1–2 data points. Treat as directional only until 50+ trades spanning varied weather conditions. A heatwave producing 10 consecutive trades in one probability band is not 10 independent calibration points due to autocorrelation.

10 · City bias analysis

The city analysis panel in the tracker diagnoses whether GFS has a systematic warm or cold bias for each location. Cards appear at 5+ resolved trades per city; bias diagnosis unlocks progressively.

The YES/NO split. The key diagnostic. If GFS runs warm, YES bets on high-temperature buckets lose more than expected while NO bets win more. A city showing 50% overall hit rate but YES 38% / NO 62% has a warm GFS bias that aggregate stats would miss entirely.

Confidence gates. Below 10 trades: data shown, no diagnosis. 10–19 trades: tentative signal, no adjustment. 20–29 trades: emerging pattern, 10% stake reduction on biased direction. 30+ trades: 25% reduction. These thresholds are conservative by design — sequential trades during a single weather regime are not independent observations.

The sparkline. The coloured squares at the right of each city card show the last 10 outcomes (green = win, red = loss). This tells you whether a bias pattern is recent or historical — a streak of recent reds during a heatwave may be regime noise rather than systematic model error.

Non-stationarity. GFS bias varies by season and is reset by model upgrades. A warm bias measured in June may not hold in October. Treat city bias as a rolling signal, not a fixed correction.

10a · Historical base rates — what ERA5 tells you and what it doesn't

The Hist% column in the Analyse tab shows the historical frequency of each temperature bucket for that city and month, drawn from 10 years of ERA5 reanalysis data. ERA5 is the gold standard historical weather dataset produced by ECMWF, covering 1940 to present.

What it's good for. Catching structural market mispricing. If London June history says 15°C happens 2% of the time and the market prices it at 40%, something is wrong. Detecting availability bias — after a cold week the market overprices cold buckets; history provides an anchor. Identifying Type D opportunities where historical frequency and model agree against the market.

What it cannot do. Predict tomorrow. 300 historical data points tells you the climatological distribution, not what tomorrow will do. GFS is far better at that. It also cannot tell you about fine-grained bucket probabilities with high confidence — 300 days split across 10-15 temperature buckets gives only 20-30 observations per bucket. Confidence intervals are wide.

The climate non-stationarity problem. London June temperatures have a measurable upward trend — roughly +0.2-0.3°C per decade in mean maximum temperature. This means historical data from 2016-2020 is systematically cooler than 2021-2025. A naive 10-year average has a slight cold bias — the true 2026 base rate for warm buckets is probably 2-5pp higher than the raw ERA5 estimate suggests.

The fix — recency weighting. The app applies a linear decay weight to the 10-year ERA5 data. The most recent year gets weight 2.0, the oldest year gets weight 0.2, with linear interpolation between. This down-weights older cooler years and up-weights recent warmer ones, partially correcting the cold bias while preserving sample size. The label "recency-weighted" confirms this is applied.

Residual limitations. The weighting is an approximation, not a precise climate correction. The warming trend varies by city, season, and temperature range. Cities with strong urban heat island effects (Tokyo, Shanghai) may have larger trends than rural stations. Treat Hist% as a directional signal, not a precise probability. A gap of 2-3pp between Hist% and Mkt% is not meaningful. A gap of 15pp+ is.

Sample size note. The effective sample size after weighting is approximately 70% of the raw count. For 300 days this gives ~210 effective observations. Split across 10 buckets that's ~21 per bucket — wide confidence intervals. The Hist% figures should be treated as having ±5-8pp uncertainty at the individual bucket level.

11 · Known limitations — what a statistician would say

This app generates systematic hypotheses and collects calibration data. It is not yet a validated betting system. Be honest about these limitations:

31 members is not a large ensemble. Probability estimates are granular to ~3pp. True probability could differ from our estimate by 10pp in either direction. Edge figures are point estimates, not precise quantities.

Members are not independent. Generated by perturbing a single initial state. Effective sample size for capturing true atmospheric uncertainty is considerably less than 31.

Edge is overstated by ~2-4pp. We compare to the displayed market price, not the true breakeven price after fees and spread.

Kelly multipliers are arbitrary. Not derived from empirical data on this model. Reasonable starting points, nothing more.

No multiple testing correction. The briefing scans hundreds of markets per session — some will show spurious edge by chance. High-confidence rows (★★★, tight spread) are more likely to reflect real edge.

Selection bias in hit rate. We only bet when edge exceeds a threshold. Our observed hit rate is conditional on having detected edge, which correlates with model bias. Hit rate is not an unbiased estimator of model accuracy.

One season is not calibration. 50 trades in June tells you about summer. Almost nothing about winter or market regime changes. Stay humble across seasons.
11a · Sample sizes — when can you trust the data?

The most common mistake in self-assessed betting systems is drawing conclusions from insufficient data. Here are the minimum sample sizes required before any finding can be acted on, based on standard statistical power analysis.

To detect a 60% hit rate vs 50% chance (large effect): 50 trades at 90% confidence. This is the bare minimum for any claim about model performance.

To detect a 55% hit rate vs 50% chance (moderate effect): 193 trades at 90% confidence. This is the realistic target for claiming the model works.

For city-level bias direction: 100 trades per city.

For city temperature offset magnitude: 200 trades per city.

For bet type comparison (is Type A better than Type C?): 200 trades per type.

For auto-betting go/no-go decision: 500 total trades, 50+ per major city, Brier score below 0.20.
The autocorrelation problem

Sequential trades during the same weather regime are not statistically independent. Ten trades during a June heatwave may have an effective sample size of 2-3. Genuine calibration requires trades spanning multiple distinct weather situations. The sample size thresholds above assume independence — real-world requirements are roughly double to account for autocorrelation.

The betting analysis report applies these thresholds rigorously. It will not suggest parameter changes without sufficient data, and will explicitly state confidence levels and sample sizes with every finding.

12 · What comes next
Historical base rates (ERA5). Open-Meteo's historical API covers 80+ years of daily temperatures. For each market the app will fetch the empirical frequency for that city, month, and bucket — a third signal implementing the Type D opportunity. Will appear as Historical% in the bucket table.

Automatic bias compensation. Once city bias is confirmed at 30+ trades, Kelly will automatically reduce stakes on the biased direction rather than requiring manual application.

Regime-conditional calibration. Separate reliability diagrams for anticyclonic vs cyclonic conditions, short vs long lead time. Pooling all trades hides the structure that matters for betting decisions.

Bootstrap confidence intervals. Error bars on hit rate, edge, and Brier score. Every number currently presented as a point estimate should carry uncertainty bounds.