Methodology

The Markets page aggregates real-time probabilities from two major prediction markets: Manifold Markets (community-driven) and Kalshi (real-money exchange). This provides a live, crowd-sourced forecast updated automatically as new trades occur.

Overall Weighting: 40% Manifold + 60% Kalshi

Manifold Markets (40% weight):

Community prediction market using play money
Lower barrier to entry, broader participation
Captures general consensus and casual forecasters
Uses an Automated Market Maker (AMM) for price discovery
Good for long-term sentiment, less susceptible to short-term noise

Kalshi (60% weight):

Real-money exchange with cash-settled contracts
Higher barrier to entry, attracts serious forecasters with skin in the game
Reflects genuine conviction (money is at risk)
Traditional order book exchange (bid/ask spread pricing)

Kalshi Component Breakdown (60% total weight)

Within Kalshi's 60% weight, we use three price signals:

1. Last Price (42% overall = 70% of Kalshi's 60%)

The most recent completed trade price
Reflects real money execution and recent market sentiment
Given highest weight because it shows actual conviction backed by capital
Most responsive to new information and recent trades

2. Midpoint (12% overall = 20% of Kalshi's 60%)

Average of the current best bid and best ask prices: (bid + ask) / 2
Represents the fair value between buyers and sellers right now
Includes current order book state but not yet executed
Smooths out last trade anomalies or outliers

Thin-Market Fallback (Midpoint & Liquidity)

When a candidate has no yes-side bids (yes_bid = 0), the standard midpoint formula (yes_bid + yes_ask) / 2 produces a meaningless number — it averages zero against the ask, massively inflating the result. In these cases, both the Midpoint and Liquidity-Weighted components fall back to the candidate's last trade price instead.

Example — Mike Simmons: Simmons has virtually no buy-side interest on Kalshi. His order book might show yes_bid = 0, yes_ask = 19, last_price = 1. Without the fallback, the midpoint formula would compute (0 + 19) / 2 = 9.5%, and the liquidity-weighted price would land around 9.0% — suggesting nearly 10% support for a candidate trading at 1 cent. With the fallback, both Midpoint and Liquidity correctly report 1%, matching his actual last trade. The spread-based liquidity adjustment is also skipped entirely because there is no real two-sided market to analyze.

3. Liquidity-Weighted Price (6% overall = 10% of Kalshi's 60%)

Purpose: Captures buying vs selling pressure by analyzing where the last trade occurred within the bid-ask spread
Why only Kalshi? Kalshi uses a traditional order book exchange where buyers and sellers post bids and asks. Manifold uses an Automated Market Maker (AMM) that algorithmically sets prices, so order book analysis doesn't apply
How it works: Calculate position in spread = (Last Price - Bid) / (Ask - Bid). Position of 0.0 = traded at bid (sellers aggressive), 0.5 = at midpoint (balanced), 1.0 = at ask (buyers aggressive)
Spread dampening: Wider spreads reduce the adjustment because they indicate less confidence. At 0pp spread: factor = 1.0 (full shift potential). At 10pp spread: factor = 0.2 (heavily dampened)
Example: Bid=60, Ask=68, Last=66, Mid=64, Spread=8. Position = (66-60)/8 = 0.75 (near ask). Offset = 0.75 - 0.5 = +0.25. Dampening = 1 - (8/10)×0.8 = 0.36. Shift = 0.25 × 8 × 0.36 = +0.72pp. Result = 64 + 0.72 = 64.7% (buyers showed aggression)
Shift is proportional to the spread width, so the result always stays within the bid-ask range

The Complete Formula

Aggregate Probability = (0.40 × Manifold) + (0.42 × Kalshi Last Price) + (0.12 × Kalshi Midpoint) + (0.06 × Kalshi Liquidity-Weighted)

Why Aggregation Beats Single-Source Data: The Laura Fine Example

A real incident during this race perfectly illustrates why IL9Cast's multi-source approach is more reliable than watching a single market.

What happened: A trader on Kalshi bought Laura Fine yes-contracts aggressively, pushing the last trade price all the way up to 30%. Then they stopped. The market maker immediately placed no-orders at 16%, while existing yes-orders sat unfilled at 14%. For quite a while, Kalshi's displayed probability showed 30% — based solely on that last executed trade.

The reality: The actual market consensus was a 14-16% spread. No one was willing to buy above 14%, and no one was willing to sell below 16%. The 30% price was a single trade artifact, not a reflection of genuine market sentiment.

IL9Cast's response: Our aggregated probability showed the upward price movement somewhat — the trade did contain real information — but stayed grounded around the spread rather than jumping to 30%. Here's why:

Manifold's 40% weight provided a stabilizing second opinion that didn't react to the thin Kalshi trade
Midpoint component (12%) used the 14-16% spread, reporting ~15% instead of 30%
Liquidity-weighted component (6%) also used spread-based pricing rather than the outlier trade
Spike dampening capped the change to ±3 percentage points per 3-minute interval, preventing wild single-trade swings

The result: while Kalshi briefly showed Laura Fine at 30%, IL9Cast accurately reflected the true market state — modest upward movement constrained by the actual bid-ask reality. This is exactly the kind of thin-market artifact our aggregation approach is designed to filter out.

The takeaway: Single prediction markets can be misleading when liquidity is low or a single large trade occurs. By combining multiple price signals and applying dampening, IL9Cast gives you a clearer, more stable view of the race.

Spread-Based Weight Throttle (Feb 2026)

When a Kalshi last trade price falls outside the current bid-ask spread, its weight in the aggregate is automatically reduced. This prevents stale or outlier trades from disproportionately influencing the forecast.

Normal weights (last price within spread): 40% Manifold, 42% Kalshi Last, 12% Midpoint, 6% Liquidity-Weighted
Throttled weights (last price outside spread): 40% Manifold, 20% Kalshi Last, 28% Midpoint, 12% Liquidity-Weighted
Why not zero: Even an outlier trade contains some price signal — a buyer was willing to pay that price — so we reduce its influence rather than ignoring it entirely
Thin markets unaffected: When yes_bid = 0 (no buy-side orders), the throttle does not activate. These cases are already handled by the separate thin-market fallback that uses last_price directly

Using the Laura Fine example above: with last_price at 30% but the spread at 14–16%, the throttle would shift weight toward the spread-based components (midpoint ~15%, liquidity-weighted ~15%), keeping the aggregate grounded near market consensus instead of being pulled toward the 30% outlier.

Soft Normalization (30% Strength)

Final probabilities are lightly normalized to prevent excessive drift while preserving raw market values:

Candidates with no Kalshi market (marked with *) use 100% of their Manifold probability
Only 30% of the normalization adjustment is applied to each candidate
70% of the raw aggregated value is preserved
This keeps top candidates higher while preventing probabilities from summing to unrealistic totals

Chart Smoothing

To produce clean, readable trend lines, the chart pipeline applies multiple smoothing layers:

Spike dampening: Per-candidate probability changes are capped at ±3 percentage points per 3-minute collection interval, preventing sudden jumps from thin-market trades or API glitches
Exponential Moving Average (EMA): Chart data is smoothed server-side with an EMA (alpha = 0.15) — each displayed point is 15% new value + 85% previous smoothed value
RDP simplification: The Ramer-Douglas-Peucker algorithm reduces thousands of raw data points to ~200-400 visually significant points, removing redundant noise
Monotone cubic interpolation: The frontend renders curves using monotone splines that prevent overshoot between data points

Update Frequency

Default: Every 3 minutes
Final 3 days before election: Every 1 minute
Election day: Every 1 minute

Historical snapshots are saved with every update and aggregated hourly for clean trend visualization.

Results Heatmap (1st & 2nd Place Probabilities)

The Markets page includes a results heatmap showing the derived probability of every possible 1st-place and 2nd-place finish combination. Since prediction markets only provide marginal win probabilities (P(candidate wins)), we derive the joint probabilities using conditional probability:

Joint probability formula:
P(A finishes 1st, B finishes 2nd) = P(A wins) × P(B wins) / (1 − P(A wins))

Intuition: If candidate A wins (with probability P(A)), the remaining candidates compete for 2nd place. We assume their relative win probabilities determine who finishes 2nd. So candidate B's chance of finishing 2nd, given A won, equals B's probability renormalized among all non-A candidates: P(B) / (1 − P(A)). Multiplying by P(A) gives the joint probability of A finishing 1st and B finishing 2nd.

Key assumption: This treats candidates as independent — it assumes that A winning doesn't change the relative ordering of the remaining candidates. In practice, ideologically similar candidates may cannibalize each other's support, which would make some 2nd-place finishes more or less likely than this model predicts. The heatmap is a first-order approximation, not a full correlated model.

Reading the heatmap: Rows represent the 1st-place finisher, columns represent the 2nd-place finisher. The diagonal is empty (a candidate can't finish both 1st and 2nd). Brighter blue cells indicate higher probability pairs. The most likely 1st/2nd combination is highlighted at the top.

Our precinct-level election model is built in Stata and produces precinct-by-precinct projections displayed on an interactive map on the Model page. The full methodology is documented in the paper linked below.

Read the full methodology (PDF) →

Note: The PDF documents the February 2026 model architecture. See v3.1 updates below.

Model v3.1 Update (March 5, 2026)

Key changes from the February model:

New poll integrated: PPP/Evanston Roundtable poll (Feb 20–21, n=501 likely voters). First non-affiliated poll of the race — no candidate or party commissioned it. Weighted at 73% in the polling composite.
Schakowsky endorsement boost removed: Biss’s endorsement from the retiring incumbent is now reflected in the polling data itself, so the separate model multiplier has been removed to avoid double-counting.
Wider polling uncertainty: Polling error sigma increased from 7 to 8 percentage points, reflecting the genuine uncertainty with only two independent polls.
Abughazaleh progressive lane loading: Increased from 0.6 to 0.75, reflecting stronger youth/progressive constituency alignment observed in the PPP data.
Updated results: Biss 77.2% win probability (was ~81%), Abughazaleh 15.6% (was ~13%), Fine 7.2% (was ~6%).
5 new visualization graphs added to the Model page.

The Money page aggregates campaign finance data directly from the Federal Election Commission's public API. This gives us real-time insights into total raised, cash on hand, spending patterns, and grassroots support metrics for all candidates in the race.

FEC Data Sources

All fundraising data is sourced from official FEC filings, currently hardcoded from Pre-Primary reports (coverage through Feb 25, 2026; filed March 5, retrieved March 6, 2026). The original data was pulled from two FEC API endpoints:

Candidate Totals: GET /v1/candidate/{candidate_id}/totals/
Cumulative financial summary: receipts (Line 11e), disbursements (Line 22), cash on hand (Line 27), and individual contribution breakdowns.

Period-Specific Figures: FEC Form 3 Column A
Receipts and disbursements for the current filing period only (Jan 1 – Feb 25, 2026). Used for burn rate and raise rate calculations.

Each candidate is identified by their FEC committee ID. For example:

Daniel Biss → Committee C00905307
Kat Abughazaleh → Committee C00900449
Mike Simmons → Committee C00910976

Note on Cash on Hand: The FEC-reported COH (Line 27) may differ slightly from total receipts minus total disbursements due to beginning balance carryforward, loans, refunds, and other adjustments. We use the FEC-reported figure as the authoritative value.

Small Dollar Donations

The "Small $%" metric measures grassroots support by calculating what percentage of a candidate's individual contributions come from donations under $200. This is the FEC's standard definition of small-dollar/grassroots fundraising.

Calculation formula:

Small Dollar % = (individual_unitemized_contributions ÷ individual_contributions) × 100

The FEC breaks down individual contributions into two categories:

Itemized contributions — Individual donations of $200 or more (reported with donor names)
Unitemized contributions — Individual donations under $200 (aggregated without names)

Why we use individual_contributions as the denominator: Total receipts include PAC money, party transfers, and other non-individual sources. To accurately measure grassroots support, we only consider money from individual human donors.

Real example from Kat Abughazaleh's FEC data:

Individual contributions (total): $2,702,469.35
Unitemized (under $200): $1,898,015.77
Itemized ($200+): $804,453.58

Small Dollar % = ($1,898,015.77 ÷ $2,702,469.35) × 100 = 70.2%

This means 70.2% of Kat's individual donations came from contributors giving less than $200 — a strong indicator of broad grassroots support.

Burn Rate & Raise Rate

Burn rate and raise rate measure monthly spending and fundraising velocity during the most recent FEC filing period. Both use period-specific figures from FEC Form 3, Column A — not cumulative totals.

Monthly Rate = period_amount ÷ (period_days ÷ 30.44)

For the Pre-Primary filing (Jan 1 – Feb 25, 2026 = 56 days):
• Burn rate uses period disbursements (Column A, Line 22)
• Raise rate uses period receipts (Column A, Line 11e)

Cash runway = cash_on_hand ÷ burn_rate_monthly

Why period-specific? Cumulative totals include money raised and spent over the entire campaign lifecycle. Period-specific figures capture the campaign's current spending and fundraising velocity, which is more useful for projecting forward to election day.

Important distinction: A candidate's burn rate may exceed their raise rate even if cumulative total_raised exceeds total_spent. This indicates the campaign is currently spending faster than it's raising — drawing down reserves built up earlier in the cycle.

Data Source & Updates

The fundraising data is currently hardcoded into the application using Pre-Primary FEC filings through February 25, 2026 (filed March 5, retrieved March 6, 2026). The data is static and doesn't require live API calls.

This approach provides instant page loads (no 2-3 second API delays) while ensuring the displayed data remains accurate.

Chart Visualization

The fundraising chart uses a dual Y-axis approach to compare two fundamentally different metrics:

Blue bars (left Y-axis): Total dollars raised, scaled in currency
Blue bars (right Y-axis): Percentage of raised funds spent, scaled 0-100%

The spent percentage bars are nested inside the blue fundraising bars (same X position, narrower width) to save horizontal space while keeping both metrics independently readable on their own scales. This nested-bar technique is implemented using Chart.js with stack: 'combined' and stacked: false to center both bars at the same position without summing their values.

IL9Cast is a pretty lean operation — a single Python app running on Railway that does everything from fetching market data to serving the website. Here's how it all fits together.

The Cloud Setup

We run on Railway, which is a platform-as-a-service that deploys straight from GitHub. Every time we push code to the main branch, Railway automatically builds and deploys a new version using Nixpacks (their build system). The whole deploy cycle takes about 30 seconds.

The app runs behind Gunicorn, a production-grade Python WSGI server. We use the --preload flag, which is important — it loads the app once in memory before forking workers, so our background data collector only starts a single thread instead of duplicating itself across workers. Without that flag, you'd get multiple scrapers all writing to the same file at the same time, which is a recipe for corrupted data.

Here's the key configuration from railway.toml:

startCommand = "gunicorn app:app --preload"
healthcheckPath = "/"
restartPolicyType = "ON_FAILURE"
restartPolicyMaxRetries = 10

That restart policy is our safety net. If the app crashes — maybe a dependency breaks, maybe Railway has a hiccup — it'll automatically restart up to 10 times before giving up. In practice, it almost never needs more than one.

Persistent Storage (the tricky part)

Railway containers are ephemeral by default — when the app redeploys, everything on the filesystem gets wiped. That's fine for code, but our historical data needs to survive between deploys. So we use a Railway persistent volume mounted at /app/data.

When the app boots up, it runs a path resolution function that checks, in order: /data, /app/data, then falls back to a local data/ directory. This means the same code works on Railway (where the volume lives at /app/data) and on a developer's laptop (where it just uses the local folder). No environment variables, no conditional imports — just a directory check.

Why JSONL (and not a database)

You might wonder why we're storing data in a flat file instead of PostgreSQL or SQLite. Honestly? For our use case, JSONL is better.

JSONL (JSON Lines) means one JSON object per line. Every 3 minutes, the scraper appends a single line to the file — that's it. No connection pools, no schema migrations, no ORM overhead. The file is human-readable (you can literally tail it to see the latest data), and if a line gets corrupted, every other line is still perfectly valid. Try saying that about a SQLite database after a partial write.

The append operation is actually a bit more careful than a simple file write:

Write to a temp file first — the new snapshot goes to historical_snapshots.jsonl.tmp
Copy existing content + new line — both old data and the new snapshot get written to the temp file
Atomic replace — os.replace() swaps the temp file into place in a single filesystem operation. If the app crashes mid-write, either the old file or the new file exists — never a half-written mess

At 480 snapshots per day (~3 KB each), the file grows at roughly 1.4 MB/day. After a full election cycle, we're looking at maybe 70 MB total. That's nothing — your phone has thousands of times more storage. A database would be overkill here.

The Data Collection Loop

Every 3 minutes, a background thread wakes up and runs our collect_market_data() function. Here's what happens in those few hundred milliseconds:

1. Fetch Manifold — HTTP GET to their public API. Returns JSON with each candidate's probability (0.0–1.0 scale, which we multiply by 100). Timeout: 10 seconds.
2. Fetch Kalshi — HTTP GET to their trade API. Returns an array of markets, each with last_price, yes_bid, and yes_ask. Same 10-second timeout.
3. Normalize names — Manifold calls her "Kat Abughazaleh" and Kalshi calls her "Katheryn Abughazaleh." We strip prefixes, suffixes, and known variations to get a canonical key for matching.
4. Aggregate — Apply the 40/42/12/6 weighted formula (see the Markets Aggregation section above).
5. Soft normalize — Nudge probabilities 30% toward summing to 100%, so the chart doesn't show the race at 130% or 80% total.
6. Spike dampen — Compare to the previous snapshot. If any candidate moved more than ±3 percentage points, clamp the change. This prevents chart artifacts from thin-market Kalshi trades.
7. Save — Atomically append to the JSONL file.

If both APIs fail (say Manifold is down and Kalshi returns an error), we skip the snapshot entirely. Bad data is worse than missing data — the chart just won't have a point for that 3-minute window, and the gap detection handles it gracefully.

The Scheduler Problem

Running a background task alongside a web server sounds simple, but there's a subtle gotcha. When you run Flask locally, you get one process — easy. But Gunicorn can spawn multiple worker processes, and each one would try to run its own scheduler. That means two workers = two scrapers = double the data (and double the API calls).

We solve this two ways depending on the environment:

Local development: Uses APScheduler's BackgroundScheduler, which runs the job in a background thread within the single Flask process.
Production (Gunicorn): Detects Gunicorn via sys.argv[0] and instead spins up a plain threading.Thread in daemon mode. The --preload flag ensures this thread is created once in the master process before workers fork, so only one thread ever exists.

Chart Data Pipeline

When you load the Markets page, your browser hits /api/snapshots/chart?period=1d (or 7d or all). Here's what the server does before sending data back:

Cache check — We keep a 60-second in-memory cache. If the same period was requested within the last minute, we serve the cached version instantly.
Load & filter — Read all snapshots from the JSONL file, parse timestamps, sort chronologically, and filter to the requested time window.
Gap detection — Scan consecutive timestamps for gaps > 2 hours. These become the dashed-line segments on the chart — they represent real outages (Railway restarts, AWS issues), not normal 3-minute intervals.
EMA smoothing — Run an exponential moving average (alpha = 0.15) across each candidate's probability series. This is the biggest smoothing step — each data point becomes 15% raw value + 85% previous smoothed value, which kills jitter while preserving genuine trends.
RDP simplification — The Ramer-Douglas-Peucker algorithm finds which points you can remove without changing the visual shape of the line (within an epsilon tolerance of 0.5 percentage points). A week of data at 3-minute intervals = ~3,360 points per candidate. After RDP, that drops to maybe 200–400 points. Your browser thanks us.

The RDP algorithm is actually kind of elegant. Imagine drawing a straight line from the first data point to the last. Now find whichever intermediate point is farthest from that line. If it's farther than epsilon, that point matters — keep it, and recursively check both halves. If it's closer than epsilon, the whole segment is "flat enough" to represent with just the endpoints. It's O(n log n) on average and perfectly preserves peaks, valleys, and inflection points while throwing away the boring flat stretches.

The Frontend Rendering

The chart itself is rendered with Chart.js using a time-scaled x-axis. The data comes in as {x: timestamp, y: probability} pairs, and Chart.js handles the rest. A few important settings:

Monotone cubic interpolation — This is the cubicInterpolationMode: 'monotone' setting. Regular cubic splines can "overshoot" — if a candidate goes from 60% to 62%, a regular spline might draw a curve that briefly dips to 59% between the points. Monotone splines guarantee the curve never exceeds the actual data values. No fake dips, no fake peaks.
Tension 0.5 — Controls how curvy the lines are. At 0 you get straight segments (ugly). At 1 you get maximally curvy (too smooth, hides real movement). 0.5 is the sweet spot.
Central Time display — All timestamps are stored in UTC but displayed in Central Time (America/Chicago) using Intl.DateTimeFormat. This handles daylight saving transitions automatically — no hardcoded offsets.
Segment styling for gaps — Chart.js lets you style individual line segments. For each segment, we check if the two endpoints span a known gap period. If they do, the segment gets dashed and faded. This is done per-frame via a callback, so it works even when you zoom or pan.

What Could Go Wrong (and what we do about it)

A few things have bitten us before, so we built defenses:

API timeouts: Both Manifold and Kalshi calls have 10-second timeouts. If they're slow, we fail fast instead of blocking the scheduler thread.
Partial API failure: If only one API goes down, we still collect what we can, but spike dampening prevents the sudden weight shift from creating chart artifacts.
Railway restarts: The persistent volume survives container restarts. When the app boots, it checks for existing data and picks up where it left off.
Corrupt JSONL lines: The reader skips unparseable lines and logs a warning. One bad line doesn't take down the whole dataset — that's the beauty of line-delimited formats.
Duplicate schedulers: The Gunicorn --preload + sys.argv detection ensures exactly one scraper thread exists in production.

Dependencies

The whole app runs on five Python packages:

Flask 2.3.2 — web framework
Werkzeug 2.3.6 — WSGI utilities (Flask dependency)
Requests 2.31.0 — HTTP client for API calls
Gunicorn 21.2.0 — production WSGI server
APScheduler 3.10.4 — background job scheduling (dev mode)

No NumPy, no Pandas, no heavyweight data libraries. The EMA, RDP, and aggregation math are all hand-written in ~100 lines of plain Python. For a project that processes a few hundred data points, there's no reason to import a 30 MB library.

Prediction Markets Aggregation