A Short Realization on Gelman-King (1994)

Elections, Bayes, and one realization about the Gelman-King Model

People make a lot of hay out of the rise of Nate Silver and Bayesian poll averaging when it comes to the rise of data-driven electoral prediction and analysis. When it comes to data-driven politics, these methods are pretty neat. But, they’re based on very old understandings of statistics which, in the right light, seem quite intuitive. 

Gelman-King (1994)’s model

Say, for instance, we’re examining some vector of electoral outcomes, (y) given some predictor matrix (\mathbf{X}). Depending on the strength and significance of the predictors recorded in (\mathbf{X}), we may have strongly-related and weakly-related factors. Let’s call a vector of these strengths (\beta). Because we’re charitable, we allow our prediction to be a little bit wrong, so we include some unknown (or stochastic) error term (\epsilon).

But, elections are somewhat deterministic: an electoral system, when given full information about voters, will transform them into an electoral result. However, we don’t have perfect information, so let’s model that bit of information we don’t know as (\gamma). 

Thus, our model for how the electoral system outcome will look after all the votes are cast looks something like:

$$ y = \mathbf{X}\beta + \gamma + \epsilon$$

This is the Gelman & King (1994) model of electoral outcomes. What was novel in this model formulation for 1994 was how it walked the thin (and often blurry) line of observed data and the theoretical constructs that data supposedly measures. In essence, this distinction between “random error” and “structural error” allows Gelman and King to hop between different “states of information” in their understanding of the electoral system, which is quite novel. 

Bayes Estimators?

But, getting to the connection with Bayesian statistics, let’s consider what happens when our structural error, (\gamma) significantly affects our model.

Bayesian statistics has a few central realizations, and there are many many good discussions of Bayes’ Theorem and the differences between frequentist and subjectivist statistics[^1]. But, let’s instead focus on an important realization that ties Bayesian prediction back to the “normal” world of statistics.

Say we have some well-derived statistical estimator for (y), (\hat{y})[^2]. We could then use (\hat{y}), if we only observe (\mathbf{X}), to predict what (y) will be or would be, given different circumstances. This kind of analysis is typically called “predictive” or “counterfactual,” respectively, and plays a big part in strategic planning of campaigns. 

But, let’s say you think you know something about how the system behaves that could take into account different information than what we’ve observed? That is, before you really think about analyzing the data and generating potential estimates of the election outcomes, you have some belief ((b))of how the electoral system will go. Could that be used?

If you had to combine both pieces of information into one estimator, you could weight the relative importance of both sets of information and include them in your predictions. Given some weight (w) between zero and one, you could have a convex combination of the two pieces of information that shows your relative confidence in the estimator and your beliefs:

$$ \hat{\theta} = w * b + (1 - w)\hat{y]$$

Typically, when that prior belief $b$ is well-behaved and (\hat{y}) has certain properties, we call  ( \hat{\theta} ) a Bayes Estimator

Getting back to Gelman

So, where do Gelman & King’s electoral model come into this?

Given that we’ve estimated our model outlined above, we get that, to find some new, hypothetical electoral outcome, we use the equation:

$$ P(v^{hyp} | v) = N\left(v^{hyp} | \lambda v + (X^{hyp} - \lambda X)\hat{\beta} + \delta^{hyp}, (1 - \lambda^2)\sigma^2I + (X^{hyp} - \lambda X)\Sigma_\beta (X^{hyp} - \lambda X) ’ \right)$$

That is, we have some new voting results (v^{hyp}), given some believed voting outcome (v), is distributed over a normal distribution with variance.

But, what if we have a real good idea of (v) occurs? Like, so good that it already happened? Could we make this something like a Bayes Estimator?

The short answer is yes.

See, Gelman & King’s model is really quite useful for prediction of hypothetical outcomes. This allows our prior belief (b) to be based on actual information that we observe, like (v). Thus, given that we actually can observe voting outcomes and let’s call them (v_o), our expectation of electoral outcomes becomes:

$$ E(v_i^{hyp} | v_o ) = \lambda v_o + (1 - \lambda) X \beta$$

where (lambda) is just a meaningful version of the (w) presented above, based on the variance of the process observed. 

So what?

Before getting into other applied perambulations that occasionally implicate the “largely-overblown” concerns about the differences between Frequentism and Bayesianism discussed earlier, it’s clear that there are intuitive connections to Bayesian perspectives that simply have not been explored in most treatments of probability. Here, for electoral study, we have so much historical knowledge, all quantified in high-quality data, that can provide reasonable expectations about our prior expectations.

Statistical analyses that fail to take into account these priors can’t really leverage the large value which Bayesian analysis methods can provide, in terms of making prior expectations useful. 

If you’re studying a phenomena in social science with a long literature, you might want to check out Bayesian methods in your field or subdomain. 

[^1]: as an aside, I think these articles (and conflicts over frequentist v. subjectivist statistical interpretations) is of the most edited & flamed topics on wikipedia. 

[2^]: Usually (and I mean almost always), Bayesians use Maximum Likelihood estimators, but the choice of estimator is not critically important here.

imported from: yetanothergeographer