Postmortem - IL9Cast

The final precinct model landed every candidate's district share within two points and got the finishing order right at the top, though a closer look at the map reveals where it saw the district a good deal more smoothly than the voters actually drew it.

When I built the IL9Cast precinct model the honest constraint was how little there was to anchor it to. The Ninth had not held a genuinely competitive Democratic primary in roughly twenty-five years; the public polling was thin and uneven, a handful of external surveys, several of them already dated by primary day, set beside a larger body of internal numbers whose crosstabs and underlying samples were sparse enough that they had to carry more weight than I would have wanted; and the model itself was built in Stata, which imposed constraints of its own. Forecasting a six-candidate field under those conditions is less a question of precision than of whether the model captures the basic shape of the race, and so the fairer way to grade it, now that the canvass is in, is to ask two separate questions, kept apart on purpose: did it get the district right, and did the precinct-level detail actually add anything beyond what a single district-wide guess would already have told you.

< 2 pts

Largest topline miss, all six candidates

+11%

Mean-error skill over the no-skill district baseline

+10%

Brier skill on the precinct win-probability calls

72%

Precinct winners called, weighted by votes cast

The district result, more or less

On the topline the model did about as well as one could reasonably ask of it. The final simulation, run on the fifth of March and frozen there, put Daniel Biss at 32.4 percent of the six-candidate vote against an actual 31.2, Kat Abughazaleh at 27.1 against 28.0, and Laura Fine at 19.4 against 21.3, with the three minor candidates each missed by under two points as well; every contender, in other words, landed within two points of the forecast, and the order of finish at the top of the field, including the Biss-over-Abughazaleh call that was the only real suspense of the night, came out exactly right, with only the two trailing minor candidates, Andrew and Amiwala, ending a point apart in the reverse of the predicted order. I would not want to oversell this, given that some of the precision is the ordinary good fortune of errors that partly cancel one another out, but a six-way forecast assembled on polling this thin and uneven has no business being this close to the mark, and it was.

Forecast versus actual district vote share, weighted by votes cast. Hollow markers are the forecast, filled markers the result; the gap between them is the miss.

Did the precinct map earn its keep?

A district forecast and a precinct forecast are different animals, though, and the more searching test is whether the map carried real information or merely dressed up the district average in 435 different colours. The natural benchmark is what forecasters sometimes call the no-skill baseline: assign every precinct the district-wide share for each candidate, predict no geographic variation whatsoever, and see how far off that lands you. Measured against that baseline the model comes out modestly but genuinely ahead, cutting the average precinct error by about eleven percent across the six candidates pooled together, and the improvement is concentrated almost entirely where it mattered, with Fine's precinct error falling twenty-four percent, Abughazaleh's seventeen, and Biss's ten. That concentration was partly by design, because the precinct model was built around the crosstabs and demographic profiles of the three candidates anyone believed could win, whose numbers I averaged and leaned on directly, while the minor candidates were never given the same demographic scaffolding; for them the model knew next to nothing the district average did not already say, which is the expected result both of that deliberate focus elsewhere and of how little signal there was to find for them in the first place. The point that survives is that the map was not decoration, but carried real, if modest, geographic information about the candidates who were actually contesting the race.

Reduction in mean precinct error against predicting the district average everywhere. Bars to the right of zero mean the map added real signal.

The model saw the district too smoothly

Where the precinct forecast fell down is visible the moment you plot predicted share against actual share for each candidate, because in place of the diagonal smear you would hope for, what you get is a set of nearly flat vertical bands: the model placed almost every precinct close to the candidate's district average and badly understated how extreme individual precincts really were. This is a compression problem rather than a bias problem, which is to say the forecast was not systematically wrong about who was strong where so much as it was too timid about the magnitude of that strength, and the consequence is a respectable precinct correlation for the top three, between roughly 0.4 and 0.6, sitting beside a near-zero correlation for Bushra Amiwala, for whom the model genuinely had no usable geographic signal at all. That unevenness was not an accident. For Biss, Abughazaleh, and Fine the model was allowed to move with the precincts, which is the reason their clouds tilt toward the diagonal at all; but for Simmons, Amiwala, and Andrew, where the polling gave me almost nothing to work with, I deliberately smoothed hard toward the district average, on the reasoning that a flat guess near a candidate's overall share is safer than a confident precinct-by-precinct estimate that could be badly wrong in some direction. It is a defensible hedge and also, plainly, not an ideal one, since it throws away whatever local signal those candidates did carry, and if there is one thing I would change first in a second version it is this, because the simulation needs to let precincts wander further from the district baseline than it currently permits, at least once there is enough data to justify the wandering.

Each dot is a precinct, sized by votes cast; the dashed line is a perfect forecast and the solid line is what the model actually did. Pick a candidate and hover a dot to read it.

Were the probabilities honest?

The precinct win probabilities are a separate matter, and a more reassuring one for Abughazaleh than for Fine. A well-calibrated forecast that gives a candidate a sixty percent chance of carrying a precinct should, taken across all the precincts it speaks of that way, carry about sixty percent of them, and Abughazaleh's probabilities track that diagonal closely once you are past the very low end, where even she was handed rather too little chance in a sizable block of precincts she went on to carry, which is most of what you want to see. Biss's are noisier, and Fine's are close to inverted at the low end, with a cluster of precincts the model gave her only a modest chance that she went on to carry more often than not, a pattern that follows directly from the same underlying flaw, since a model that compresses everyone toward the centre will hand out too few confident calls and misjudge the tails when it does.

A perfectly calibrated forecast would sit on the dashed line. The thick grey line pools all three contenders; the lone star is the combined minor field, which the model gave a zero percent chance in every precinct yet which still carried nine of them. Each label shows that series' Brier contribution, and the four candidate pieces add up to the overall 0.566. Click any name in the legend to add or remove its line.

A single number summarises all of this: the Brier score, the standard way of grading probabilistic forecasts, where zero is a perfect call sheet and lower is better. Scoring the precinct win probabilities across the four outcomes that mattered (Biss, Abughazaleh, Fine, and everyone else) gives a Brier of 0.566, against 0.631 for the honest climatological baseline of simply quoting each candidate's district-wide win rate in every precinct, so the probabilities were about ten percent sharper than knowing nothing precinct-specific at all, which is the same modest, real edge the vote-share errors showed.

The geography of the misses, and McHenry

Mapping the errors makes the one real structural failure jump out. Across most of the district the misses are the ordinary salt-and-pepper of precinct noise, but McHenry County, the exurban western end, is a block of solid red on the Abughazaleh error map, where she ran on the order of sixteen points ahead of what the model expected and the model called fewer than a third of the precincts correctly, against roughly three-quarters in Chicago and in Lake. The regional breakdown tells the same story from a different angle, since outside Chicago the model leaned consistently toward Mike Simmons and away from Abughazaleh, and in McHenry that lean hardened into a genuine blind spot, almost certainly a turnout or demographic-weighting problem specific to that end of the district rather than anything wrong with the model's basic structure. It is the first place I would look in a rebuild, because a sixteen-point county-level miss is the kind of thing that ought to be fixable once you know to go looking for it.

Hover any precinct to read the miss. Toggle between Abughazaleh's signed error and the total combined error across all six candidates.

Mean forecast error by region and candidate. Warm cells mean the model ran a candidate too high there; cool cells mean it ran them too low.

What the model could not see

The geography is really a stand-in for demography, and joining the precinct errors to census and turnout data makes the blind spot concrete. The model's miss on Abughazaleh tracks two variables more than any others: the median age of a precinct and its Hispanic share. She ran well ahead of the forecast in younger and more Hispanic precincts and behind it in older ones, which is to say the model under-weighted exactly the young, diverse coalition that a candidate like her was always likeliest to assemble, and which the thin crosstabs in the polling gave it almost no way to anticipate. The correlations are modest, as everything at the precinct level is, but they point in a single coherent direction, and they line up precisely with the McHenry miss, since the exurban west is the older, whiter, more competitive end of the district where the model was least accurate of all.

Correlation between each precinct characteristic and the model's error. Longer bars mean the characteristic tracks more closely with where the forecast went wrong, though even the strongest of these explains only a modest share of it.

Each dot is a precinct; the line is the trend. Above zero the model forecast Abughazaleh too high; below zero she beat the forecast.

Where the winner calls landed

Pulling all of that together, the model's 435 precinct winner calls flowed mostly, though not entirely, to where they were meant to. It handed Biss a plurality in 346 precincts and Abughazaleh in 72, whereas the actual count was 215 for Biss and 131 for Abughazaleh, with Fine carrying 80, so a sizable band of precincts the model gave to Biss went in the event to Abughazaleh or to Fine, which is the visual signature of that same too-smooth map crediting the front-runner with a little more than the neighbourhoods owed him. Counted straight the model got 63 percent of precinct winners right, and weighting by the number of votes actually cast it got 72 percent, against 49 and 45 percent respectively for the naive baseline of simply calling Biss everywhere, so the map roughly halved the error of the dumb model, which is about where I would expect a first honest attempt to come down.

Each ribbon traces precincts from the winner the model forecast (left) to the winner they actually elected (right). Hover a ribbon to read the count.

The verdict

So, was it a good forecast? On the district question the answer is an unqualified yes, and on the precinct question it is a qualified one. The model beat the fair baseline, it captured the progressive-versus-establishment geography that the race was actually about, and it did so on polling thin and uneven enough that anchoring to it was a strain, which is genuinely hard and which I am glad to have gotten broadly right. Where it fell short was on magnitude and on McHenry, both of them the failures of a model that was too cautious rather than one that was confused, and both of them the sort of thing a second version can address directly, by loosening the precinct variance and rethinking how the western counties are weighted. For a first competitive-primary forecast in a generation, built mostly in the dark, I will take it.

How this was measured: the forecast is the final precinct simulation of the fifth of March, frozen there, set against the official county canvass of the seventeenth of March primary. Of 436 precincts, 435 matched cleanly between the two sources. All shares are computed over the six candidates who appear in both files, and the no-skill baseline assigns each precinct the vote-weighted district share. The underlying figures and code live on GitHub.

How the Forecast Did