Polymarket runs daily weather markets where traders bet on whether the temperature in a given city will land in a specific range. The market price reflects collective human judgment about the probability. Weather Edge replaces that human judgment with something better: a 31-member numerical weather prediction ensemble that produces a genuine probability distribution across outcomes.
The edge comes from a structural asymmetry that will not close: processing 31 ensemble forecasts and comparing them to market prices requires System 2 thinking — slow, deliberate, computational. The people pricing these markets use System 1 — fast, intuitive, heuristic. System 1 is incapable of running the calculation. This is not a temporary inefficiency. It is permanent, because it is cognitive.
The app automates the entire System 2 process: fetch models → compute probabilities → compare to market → size the bet → log and track. Your job is to check it twice a day and press the button.
The American Global Forecast System run 31 times with slightly different initial conditions. Each run produces a complete temperature forecast. We count what fraction of the 31 runs land in each bucket after rounding to whole degrees — matching Wunderground's resolution, which is how markets resolve. This is the primary betting signal. If 14 of 31 members show a daily high of 19°C, model probability = 45%.
The European Centre for Medium-Range Weather Forecasts — generally considered the world's most accurate NWP system. Used as a cross-check against GFS. When both agree, confidence is high. When they diverge by more than 1.5°, caution is warranted. The briefing runs GFS-only for speed. The full Analyse tab fetches both.
The single best-estimate GFS forecast (as opposed to the ensemble). Used as a sanity check — it should be close to the ensemble mean. Large divergence between deterministic and ensemble mean is a flag.
Current observed temperature at the exact airport station the market resolves against. Most valuable for same-day markets. Requires a free CheckWX API key in your Cloudflare environment variables.
The 80/20 answer: set one alarm for 07:05 BST. That single session captures the majority of available edge.
GFS 06Z and ECMWF 00Z both available ~06:00 BST. Both models fresh simultaneously. Markets dormant all night — maximum anchoring gap. European markets not yet repriced. US traders asleep. Structurally the best window of the 24-hour cycle. Run briefing, click Log all BET NOW, done in 15 minutes.
GFS 12Z and ECMWF 12Z both available ~18:00 BST. Good for US markets — evening repricing often incomplete. Worth doing; adds roughly 25% more opportunities. Best window for Asian cities (Tokyo, Singapore, HK) whose trading day is ending.
GFS 18Z available ~midnight BST. ECMWF won't update until ~06:00. The app fires BET NOW on GFS alone if spread ≤1.5° and edge ≥12pp — shown with a blue GFS only badge. Do not set an alarm for this. US markets have the best anchoring here.
Maximum confidence. BET NOW fires when edge ≥10pp, spread tight, members sufficient. Standard case at 07:05 and 19:00 BST.
BET NOW fires only if GFS spread ≤1.5° AND edge ≥12pp. A tight spread means 31 members are self-consistently clustering — the ensemble is its own cross-check. Applies at the midnight window. Slightly lower confidence than Tier 1; consider reducing Kelly by one star rating.
Always WAIT regardless of ECMWF freshness. GFS ensemble is the primary signal — stale primary = no reliable edge. Check again after the next GFS update.
GFS runs 00Z/06Z/12Z/18Z + ~5h lag. ECMWF runs 00Z/12Z + ~5h lag. In BST: GFS available ~05:00/11:00/17:00/23:00. ECMWF ~06:00/18:00.
Each row is the single best opportunity in that market — the bucket with the largest model-vs-market divergence.
YES — model thinks this bucket is more likely than market implies. Buy YES shares. NO — market has overpriced this bucket. Buy NO shares. Both directions are equally valid; the edge calculation is symmetric.
Fraction of GFS ensemble members landing in this bucket after rounding to whole degrees. 14 of 31 = 45%. Real probability from real forecast data. Briefing uses GFS only; Analyse tab adds ECMWF for a combined figure.
Current YES price as a percentage. Note: this is not the true fair-value probability — Polymarket takes ~2% fees and there is a bid-ask spread. Edge figures are overstated by roughly 2-4pp. Never bet on edges below 8pp gross.
Model% minus Mkt%. Green (+) = underpriced, bet YES. Red (−) = overpriced, bet NO. Edge is a point estimate with considerable uncertainty — with 31 members, model probabilities are granular to ~3pp. Do not treat edge figures as precise.
Quarter-Kelly from your bankroll, scaled by lead time and confidence stars. Stars apply a multiplier (★★★ = 100%, ★★☆ = 50%, ★☆☆ = 25%). A dash means no active market — skip. Always treat Kelly as a ceiling, not a target. Until you have 30+ calibrated trades, consider halving the displayed figure for real bets.
Standard deviation of the 31 GFS members. ±0.8° = confident. ±2.5° = uncertain — consider halving Kelly. Tight spread is also the condition enabling BET NOW without ECMWF at the midnight window.
★★★ all conditions favourable: 25+ members, models agree within 1.5°, spread tight, edge above 12pp. ★★☆ one condition marginal. ★☆☆ multiple conditions weak — speculative only. Stars feed into Kelly multiplier automatically.
BET NOW Edge ≥10pp · GFS fresh · spread acceptable · members sufficient. The reason any condition is not met is shown inline.
BET NOW GFS only Tier 2 — GFS fresh, ECMWF stale, but spread ≤1.5° and edge ≥12pp. Actionable with slightly lower confidence.
WAIT Edge exists but one or more conditions unmet. The specific reason is shown. Check again after the next model run.
PASS Edge below threshold. Not worth trading after fees.
The people pricing these markets are not irrational. They are human. Human brains run decision-making shortcuts that create predictable, systematic, exploitable errors. Two intellectual frameworks from Nobel Prize winners map precisely onto the opportunities this app finds.
Human decision-making runs on two systems. System 1 is fast, automatic, and intuitive — it handles roughly 96% of all decisions, including pricing a temperature market at 8am. System 2 is slow and deliberate — the kind required to process 31 ensemble runs and calculate probability distributions. System 1 can't do that. It will never do that. This structural gap is the permanent, non-arbitrageable core of your edge.
Markets built on human intuition systematically underprice tail events — the extreme outcomes that feel unlikely because they rarely come to mind easily. This mispricing is structural. It persists as long as humans price markets. The tail buckets in temperature markets face a double discount: statistically underpriced (Taleb) and psychologically avoided (Kahneman). That is where the highest-payout opportunities live.
Anchoring. The first prices set on a market are highly sticky. Even when new model data arrives, the market under-adjusts from the opening anchor. The largest edges appear in the 1–2 hours after model updates, before the market has repriced. This is why timing matters.
Availability bias. After a cold spell, cold feels probable. After a hot week, warmth feels inevitable. The ensemble has no memory of last week — it uses only current atmospheric state. After unusual weather in one direction, the opposite tail is systematically underpriced.
Loss aversion. People avoid long-shot tail bets emotionally. Losing a 10:1 bet feels worse than the arithmetic loss. This suppresses tail bucket prices beyond statistical analysis alone, creating a structural double discount on extreme outcomes.
Overconfidence — the rule for you. Algorithms consistently outperform expert judgment in complex probabilistic environments. Kahneman proved this repeatedly. You are the algorithm. Never override the model based on personal weather intuition. The moment you do, you have become the market you are trying to beat.
The standard BET NOW. Models fresh, GFS spread tight, edge clear, days 1–2. The market has anchored on yesterday's prices and the new model data has not yet been absorbed. Your bread and butter — this is what the daily briefing finds automatically.
Example: At 07:10 BST, GFS ensemble shows 19°C as 45% likely in London tomorrow. Market prices it at 6%. Edge +39pp, spread ±0.9°, ★★★. The anchor was set last night when 19°C felt like a stretch. The model now says otherwise. Act within 2 hours before the market reprices.
After a sustained run of unusual weather, the market overweights continuation and underweights reversion. The ensemble has no memory — it sees only current atmospheric conditions. When the model starts showing reversion that the market has not yet priced, check the opposite tail buckets manually.
Example: London has had five consecutive days above 28°C. The market is pricing warmth continuation heavily. The GFS ensemble suddenly shows the cold front arriving — 14°C or below is now 35% likely but priced at only 8%. The market is stuck on last week's weather. After any sustained unusual run, the Type B opportunity is the first thing to check.
A tail bucket trading at very low odds (3–8%) where the ensemble shows meaningful support. The market's loss aversion suppresses the price below even the already-low statistical probability. Small stake, high payout, hold to resolution. The strategy is barbell: many small tail bets alongside the standard plays. The law of large numbers works in your favour across many such bets.
Example: The 22°C+ bucket in London on a June day is priced at 4%. Six of 31 GFS members show it — model probability 19%. Edge +15pp. Kelly suggests $3. Place it, log it, do not check it obsessively. You are not predicting the outcome — you are exploiting structural underpricing of tails across many bets over time.
The market prices today's forecast but systematically ignores the long-run historical frequency of outcomes. Open-Meteo's historical ERA5 dataset covers 80+ years. When current market odds diverge significantly from the empirical base rate for that city and month, that is a base rate neglect play — independent of what today's model says. When GFS ensemble, ECMWF, and the historical base rate all agree the market is wrong in the same direction, that is the highest-confidence opportunity in the entire framework.
Example (coming): A June day above 25°C in London has occurred 12% of the time historically across 80 years of ERA5 data. The market prices it at 5% due to a recent cold week. Even without a strong model signal, the base rate alone implies 7pp underpricing. The app will show this as a third column in the bucket table — Historical% alongside Model% and Mkt%.
Kelly criterion gives the mathematically optimal fraction of bankroll to stake given an edge. Full Kelly requires that your probability estimate is correct. Ours is not yet verified by calibration data.
Lead-time scaling. Days 1–2: 100%. Days 3–4: 75%. Days 5–6: 50%. Day 7: 25%. NWP skill degrades with lead time. These steps are approximations of a continuous decay in forecast skill.
Star multiplier. ★★★ = 100%, ★★☆ = 50%, ★☆☆ = 25% of the scaled Kelly. Translates confidence directly into stake size.
5% bankroll cap. Hard ceiling per trade. Prevents ruin on data errors.
The honest caveat. All multipliers are derived from intuition, not empirical calibration. Once you have 50+ resolved trades and the reliability diagram shows the model is well-calibrated, consider moving to half-Kelly. Until then, treat displayed Kelly as a ceiling and start with half that amount in your first real-money bets.
The reliability diagram is the single most important diagnostic in the tracker. It answers the fundamental question: when the model says 40% probability, does it actually win 40% of the time? It appears once you have 15 resolved trades.
Points above the diagonal — underconfident. Model predicts 40% but you win 55% of the time. The model is being too conservative — true edge is larger than calculated. Once confirmed across 30+ trades, you can be more aggressive with Kelly sizing.
Points below the diagonal — overconfident. Model predicts 60% but you win 45% of the time. The model overstates certainty — Kelly stakes are too large. Reduce Kelly until the diagram corrects. This is the more dangerous pattern.
Systematic directional bias. If YES bets cluster below the diagonal but NO bets sit on it, GFS is running warm — overestimating high-temperature outcomes. The city analysis panel will flag this separately and suggest a stake adjustment.
Mean squared error of probability forecasts. Shown as the fourth stat tile once 15 trades are resolved. 0 = perfect. 0.25 = uninformative coin flip. A well-calibrated weather model achieves 0.15–0.20 for day-1 forecasts. Below 0.15 is excellent. Above 0.22 suggests either poor calibration or you are betting markets where the model has no real edge.
Important caveat. 15 trades renders the diagram but is far too few for conclusions — most buckets will have 1–2 data points. Treat as directional only until 50+ trades spanning varied weather conditions. A heatwave producing 10 consecutive trades in one probability band is not 10 independent calibration points due to autocorrelation.
The city analysis panel in the tracker diagnoses whether GFS has a systematic warm or cold bias for each location. Cards appear at 5+ resolved trades per city; bias diagnosis unlocks progressively.
Confidence gates. Below 10 trades: data shown, no diagnosis. 10–19 trades: tentative signal, no adjustment. 20–29 trades: emerging pattern, 10% stake reduction on biased direction. 30+ trades: 25% reduction. These thresholds are conservative by design — sequential trades during a single weather regime are not independent observations.
The sparkline. The coloured squares at the right of each city card show the last 10 outcomes (green = win, red = loss). This tells you whether a bias pattern is recent or historical — a streak of recent reds during a heatwave may be regime noise rather than systematic model error.
Non-stationarity. GFS bias varies by season and is reset by model upgrades. A warm bias measured in June may not hold in October. Treat city bias as a rolling signal, not a fixed correction.
This app generates systematic hypotheses and collects calibration data. It is not yet a validated betting system. Be honest about these limitations:
Members are not independent. Generated by perturbing a single initial state. Effective sample size for capturing true atmospheric uncertainty is considerably less than 31.
Edge is overstated by ~2-4pp. We compare to the displayed market price, not the true breakeven price after fees and spread.
Kelly multipliers are arbitrary. Not derived from empirical data on this model. Reasonable starting points, nothing more.
No multiple testing correction. The briefing scans hundreds of markets per session — some will show spurious edge by chance. High-confidence rows (★★★, tight spread) are more likely to reflect real edge.
Selection bias in hit rate. We only bet when edge exceeds a threshold. Our observed hit rate is conditional on having detected edge, which correlates with model bias. Hit rate is not an unbiased estimator of model accuracy.
One season is not calibration. 50 trades in June tells you about summer. Almost nothing about winter or market regime changes. Stay humble across seasons.
Automatic bias compensation. Once city bias is confirmed at 30+ trades, Kelly will automatically reduce stakes on the biased direction rather than requiring manual application.
Regime-conditional calibration. Separate reliability diagrams for anticyclonic vs cyclonic conditions, short vs long lead time. Pooling all trades hides the structure that matters for betting decisions.
Bootstrap confidence intervals. Error bars on hit rate, edge, and Brier score. Every number currently presented as a point estimate should carry uncertainty bounds.