Introduction
Defined as group level characteristics that can be attached to individuals, campaigns often look to demographics to inform predictions of election outcomes — including the likelihood of candidate support and turnout — and determine which populations to target with their limited resources. As such, demographic conformity is almost an expected norm: individuals are expected to vote in line with their perceived group interests.
However, are demographics truly predictive of vote choice? A 2024 paper by Seo-young Silvia Kim and Jan Zilinsky found that the accuracy for vote choice predictions is generally low (< 65%) when using just demographics. In this blog, I aim to apply demographics to my state-level popular vote model from Blog 3 to assess its predictive effects. I also use recent polling data to update last week’s combined model for the incumbent party candidate’s national popular vote share.
Examining Demographics for 2024 Voter File Samples
One of the ways in which campaigns measure individual-level demographics is through state voter files. Before applying demographics to my predictive models, I wanted to take a closer look at the 2024 voter file data that Statara graciously provided to our class. Thank you to our excellent TF Matthew Dardet, who initially cleaned these voter files and took a 1% sample from each state’s 2024 voter file. I work with those 1% samples here.
In particular, I examine demographic data from 13 states: these states have all been designated as either “Lean/Likely Democrat,” “Lean/Likely Republican,” or “Toss Up” in the 2024 state-level predictions from Cook Political Report and Sabato’s Crystal Ball, which were the expert models examined last week. I exclude NE-02 and ME-02 as data on the congressional district level was difficult to obtain. Assuming that states designated as “Safe/Solid Democrat” or “Safe/Solid Republican” by these expert models are confident calls for their respective parties, we should look at these 13 states to try to predict where their electoral votes will end up.
The descriptive table of demographics — specifically registered party, gender, and race — from those 13 state voter file samples is below.
State | Democrat | Republican | Unaffiliated/ Independent | Unregistered | Other Party | Male | Female | Other Gender | African-American | Asian | Non-Hispanic White | Hispanic | Native American | Other Race |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Arizona | 21.13 | 24.68 | 27.78 | 25.80 | 0.61 | 47.60 | 48.90 | 3.49 | 2.18 | 2.40 | 70.29 | 21.06 | 1.15 | 2.92 |
Florida | 23.55 | 26.79 | 23.31 | 26.09 | 0.26 | 48.00 | 50.97 | 1.03 | 12.58 | 2.22 | 61.84 | 18.93 | 0.24 | 4.18 |
Georgia | 0.83 | 0.51 | 75.81 | 22.83 | 0.02 | 47.42 | 51.79 | 0.79 | 29.16 | 3.16 | 52.18 | 5.09 | 0.62 | 9.79 |
Michigan | 0.29 | 0.20 | 79.25 | 20.25 | 0.00 | 49.22 | 50.12 | 0.66 | 11.87 | 2.32 | 80.76 | 2.63 | 0.12 | 2.30 |
Minnesota | 0.47 | 0.29 | 71.54 | 27.69 | 0.01 | 48.76 | 48.34 | 2.91 | 3.36 | 3.71 | 88.05 | 2.13 | 0.28 | 2.47 |
Nevada | 22.61 | 21.04 | 31.68 | 23.97 | 0.70 | 48.27 | 46.68 | 5.05 | 7.22 | 5.77 | 62.73 | 20.39 | 0.18 | 3.71 |
New Hampshire | 21.53 | 20.75 | 27.95 | 29.72 | 0.04 | 47.94 | 49.81 | 2.25 | 0.27 | 1.38 | 94.30 | 1.71 | 0.03 | 2.31 |
New Mexico | 30.46 | 22.62 | 18.69 | 27.04 | 1.19 | 47.70 | 51.47 | 0.84 | 0.68 | 1.19 | 52.02 | 38.52 | 3.80 | 3.79 |
North Carolina | 22.96 | 21.89 | 28.92 | 25.77 | 0.46 | 47.17 | 51.66 | 1.17 | 18.71 | 1.96 | 64.55 | 4.88 | 0.65 | 9.26 |
Pennsylvania | 32.31 | 29.29 | 11.60 | 26.36 | 0.44 | 47.68 | 51.00 | 1.32 | 8.46 | 2.68 | 82.19 | 4.30 | 0.04 | 2.33 |
Texas | 0.43 | 0.51 | 71.01 | 28.04 | 0.02 | 47.37 | 49.45 | 3.18 | 10.16 | 4.33 | 52.37 | 29.87 | 0.07 | 3.20 |
Virginia | 0.94 | 0.59 | 75.50 | 22.94 | 0.03 | 47.51 | 51.56 | 0.93 | 15.47 | 5.48 | 71.07 | 5.16 | 0.04 | 2.78 |
Wisconsin | 0.19 | 0.17 | 87.08 | 12.55 | 0.01 | 46.70 | 50.55 | 2.75 | 4.99 | 1.61 | 88.44 | 3.24 | 0.24 | 1.48 |
Notably, I find the information on registered party especially interesting. In most of these voter file samples, the majority of individuals are registered as either “Unaffiliated/Independent” or did not register a party. It is important to note, however, voters in some states register without reference to party. These voter file samples are also fairly balanced between genders. As expected, non-Hispanic White voters comprise the largest percentage of the voter file samples compared to other racial categories.
Revisiting Blog 3’s State Sept. Polling Averages Model
To predict state-level outcomes and ultimately the Electoral College, I start by updating my state-level popular vote share model from Blog 3 with new polling data; I now have polling data for the entire month of September.
This is a regression table for the Democratic candidate’s state-level popular vote share and state-level polling averages in September (weighted by weeks left before the election) for the presidential elections from 2000-2020.
Democrat's State-Level Popular Vote Share | ||||
---|---|---|---|---|
Predictors | Estimates | std. Error | CI | p |
(Intercept) | -0.15 | 1.28 | -2.68 – 2.37 | 0.904 |
sept poll | 1.13 | 0.03 | 1.07 – 1.19 | <0.001 |
Observations | 291 | |||
R2 / R2 adjusted | 0.839 / 0.838 |
With an R-squared value of 0.84, this table suggests that state-level September polling is a strong predictor for the Democratic candidate’s two-party popular vote share for that state. Like Blog 3, I then use this regression model to predict Vice President Harris’ share of the two-party popular vote at the state-level.
State | Prediction | Lower Bound | Upper Bound | Winner |
---|---|---|---|---|
Arizona | 52.69 | 44.63 | 60.75 | Harris |
Florida | 50.92 | 42.86 | 58.98 | Harris |
Georgia | 53.20 | 45.14 | 61.26 | Harris |
Michigan | 53.90 | 45.84 | 61.96 | Harris |
Minnesota | 56.09 | 48.02 | 64.16 | Harris |
Nevada | 53.10 | 45.04 | 61.16 | Harris |
New Hampshire | 57.48 | 49.41 | 65.55 | Harris |
New Mexico | 56.49 | 48.42 | 64.56 | Harris |
North Carolina | 53.15 | 45.09 | 61.22 | Harris |
Pennsylvania | 53.67 | 45.61 | 61.74 | Harris |
Texas | 49.84 | 41.78 | 57.90 | Trump |
Virginia | 56.14 | 48.07 | 64.20 | Harris |
Wisconsin | 54.72 | 46.66 | 62.79 | Harris |
This model predicts that Harris will win 12 out of the 13 states — including all 7 key battleground states — while Trump only wins Texas under this model. Of note: the numbers here are quite similar to those predicted a few weeks ago in Blog 3, although with smaller prediction intervals. This indicates that there is less uncertainty, which makes sense given that we are now closer to November 5, 2024. However, many of these races remain contested as the predicted two-party popular vote share for Harris is quite close to 50% (e.g. Nevada at 53.10%, Arizona at 52.69%). With this model, Harris would carry the Electoral College by a margin of 319 to former President Trump’s 219.
Adding Demographics to the State Sept. Polling Averages Model
I now turn to incorporating demographics data — specifically race — into my September polling averages model. To do so, I created a new ordinary least squares (OLS) regression model using past state-level September polling averages and demographics data on race from the U.S. Census for 2000-2020. The regression table is below.
Democrat's State-Level Popular Vote Share | ||||
---|---|---|---|---|
Predictors | Estimates | std. Error | CI | p |
(Intercept) | -19.86 | 5.92 | -31.51 – -8.21 | 0.001 |
sept poll | 1.09 | 0.03 | 1.03 – 1.15 | <0.001 |
non hispanic white | 0.21 | 0.06 | 0.10 – 0.33 | <0.001 |
african american | 0.18 | 0.06 | 0.06 – 0.30 | 0.004 |
hispanic | 0.50 | 0.11 | 0.29 – 0.71 | <0.001 |
asian | 0.53 | 0.12 | 0.31 – 0.76 | <0.001 |
native american | 0.18 | 0.12 | -0.06 – 0.42 | 0.133 |
Observations | 291 | |||
R2 / R2 adjusted | 0.857 / 0.854 |
Similarly, the R-squared values of approximately 0.85 indicate a relatively good fit for the model. Here, September polling averages are the strongest predictor of the Democratic candidate’s state-level two-party popular vote share with the race category coefficients as weaker predictors in comparison.
I input the race data from the 2024 voter file samples to create state-level predictions of Harris’ two-party popular vote share for the upcoming election.
State | Prediction | Lower Bound | Upper Bound | Winner |
---|---|---|---|---|
Arizona | 58.57 | 50.53 | 66.61 | Harris |
Florida | 55.57 | 47.65 | 63.49 | Harris |
Georgia | 52.28 | 44.58 | 59.98 | Harris |
Michigan | 54.25 | 46.57 | 61.92 | Harris |
Minnesota | 56.93 | 49.23 | 64.63 | Harris |
Nevada | 59.52 | 51.47 | 67.58 | Harris |
New Hampshire | 57.56 | 49.85 | 65.27 | Harris |
New Mexico | 66.63 | 57.78 | 75.48 | Harris |
North Carolina | 52.29 | 44.63 | 59.96 | Harris |
Pennsylvania | 54.74 | 47.06 | 62.43 | Harris |
Texas | 58.65 | 50.20 | 67.10 | Harris |
Virginia | 57.90 | 50.18 | 65.61 | Harris |
Wisconsin | 55.41 | 47.72 | 63.11 | Harris |
This model seems to be a bit more optimistic regarding Harris’ prospects: the Vice President’s two-party vote share in each state is greater than the previous model. However, some of these results appear counter-intuitive at face value. For example, this model predicts that Harris will win both Texas and Florida — both of which have been won by the Republican candidate in recent elections — by 58.62% and 55.45%, respectively. Of course, we have to take into account the wide prediction intervals, but these results seem unlikely with just face validity. Perhaps, then, the demographic variables included in this model overestimate the potential gains for Harris in some of these states. With this model, Harris would carry the Electoral College by a margin of 389 to Trump’s 149.
Updating Last Week’s National Popular Vote Share Model
Lastly, I also update the incumbent party candidate’s national popular vote share model from Blog 4 with the recent September polling data. The following regression table is copied from last week’s blog.
National Popular Vote Share for Incumbent Party Candidate | ||||
---|---|---|---|---|
Predictors | Estimates | std. Error | CI | p |
(Intercept) | 31.62 | 4.03 | 22.33 – 40.91 | <0.001 |
GDP growth quarterly | 0.67 | 0.23 | 0.15 – 1.19 | 0.018 |
RDPI growth quarterly | -0.58 | 0.29 | -1.26 – 0.10 | 0.083 |
sept poll | 0.42 | 0.10 | 0.19 – 0.65 | 0.003 |
incumbent | 0.05 | 1.39 | -3.16 – 3.26 | 0.972 |
Observations | 13 | |||
R2 / R2 adjusted | 0.876 / 0.815 |
This model is then utilized to predict Harris’ expected share of the national two-party popular vote. Similar to last week, the combined model of economic fundamentals and national September polling averages predicts that Harris will win by approximately 53.4%. The numbers here are quite similar to the ones from Blog 4, changing by a few decimal points.
Prediction | Lower Bound | Upper Bound |
---|---|---|
53.44 | 47.49 | 59.4 |
Conclusion
In line with my other blogs, the models in this week’s entry anticipate that Harris will win both the national two-party popular vote as well as obtain the 270 electoral votes required to win the Electoral College. However, my attempt at incorporating demographics in my state-level predictive model produced unexpected — and seemingly unlikely — election outcomes, which is a source of growth for future work. As such, I’m eager keep developing my predictions for the national popular vote and the Electoral College and look forward to learning about campaign-focused factors in the coming weeks.