# Some New Realizations on Oooooold Material

For my TA appointment for the Masters in Applied Sciences program, I am having to generate some lectures on statistical theory. Because of some interest across the program in improving the statistics portion of the curriculum, I wrote about two weeks of lectures and assignments on spatial statistics, pushing from a start point of the **Kolmogorov Axioms**, fundamental results in elementary probability theory all the way up through spatial regression.

I thought about providing a start from a subjectivist standpoint, considering my later statistical writing starts getting into ideas about prior beliefs and plausibility, but I think Kolmogorov works pretty much anywhere and for anyone, until you start really getting a solid understanding of where the Bayesians part ways from Bayes’s theorem. And, considering Bayes’s theorem recieved a two-sentence explanation in the first lecture of the course, it’s unlikely that people will really hang onto a thirst for the more interesting aspects of Bayesian theory as it pertains to spatial statistics.

But, I thought I’d share some realizations on the nature of common spatial autocorrelation statistics from my lecture notes. Namely, I’d never even *heard* of the generalized crossproduct statistic, or
the fact that it’s essentially the driver for both Moran’s I and Geary’s C, two fundamentally important spatial autocorrelation measures.

I guess this really speaks to why it might be a good idea to have a strong, formal, theoretical course offering in theoretical spatial statistics.

It took the Mathematical Statistics from over in Industrial
Engineering to really get my head in the game about how the Weights matrix is just a precomputation of an indicator function for the joint probability distributions at each polygon. We learn in geography that it’s this piece of data, floating around representing the connectivity structure of our data. But, really, it’s definable by a simple indicator function used *any* time statisticians need to define joint distributions with incompatible or varying domains.

That was just never on my radar, cause I’ve never had a super-theoretical, high-level mathematical course on spatial statistics.

What I wouldn’t give for one now, though. If, as a matter of curriculum, we moved from requiring a strong, mathematical-statistics based course (ideally taught out of my one true love, Casella & Berger ;)) and then, after it, taught mathematical spatial stats, where we derived and proved things like Moran’s I, Geary’s C, LISA statistics, and Spatial Regression techniques, that would be beautiful. Just for fun, when I figured this out, I sketched a proof of statistical sufficiency for Moran’s I, something I’d never even seen discussed in a course.

Maybe not a fault of the curriculum being *bad* at a university I’ve thoroughly enjoyed attending. But, maybe a slight incompatibility between me and its thematic focus. As it stands, I guess these kinds of realizations are part of the “personal development” I’m supposed to take on as an enterprising PhD student.

What follows is just a short selection of what I realized throughout the course of writing my lecture notes.

## Join Counts and the $C$ Statistic

One of the simplest measures of spatial autocorrelation is the join count statistic. The join count statistic codes the data into a binary classification, then counts the number of similar-group joins and different-group joins. Then, the number of inter-class joins can be used as a test statistic for spatial autocorrelation.

In fact, the number of inter-color joins is related to a different kind
of statistic known as the “cross product statistic.” Indeed, *all* major
techniques for calculating spatial autocorrelation in lattices use some
manner or manipulation of the cross product statistic.

To calculate a crossproduct statistic, there must be some index of difference between observations, $Y_{i,j}$, and some connectivity matrix $\mathbf{W}$ constructed, where each $W_{i,j}$ is either 1, if observations are considered neighboring, or 0 if they are not neighboring. Then, the crossproduct statistic, $C$ is calculated by:

$$ CR = \sum_i \sum_j W_{ij} Y_{ij} $$

Join count statistics are constructed when $Y_{ij}$ returns a 1 when a join is between different elements and 0 when it is between similar elements. For data that is *not* binary, we typically set $Y_{ij} = (x_i - x_i)^2$.

## Moran and Geary

Using insights related to the crossproduct statistic, two other families of
methods for assessing global spatial autocorrelation have been developed. First, the methods typically recognize the importance of the *total* connectivity of the weights matrix. This clearly can play a role in the possible outcomes of spatial autocorrelation, as perfectly connected networks, where all elements are connected to all other elements, will always show *no* autocorrelation because *all* observations neighbor all other observations.

So, two statisticians, R. Geary and A. Moran, developed two other measures of spatial autocorrelation. First, Geary’s C:

$$ C = \frac{ N - 1 \sum_i \sum_j w_{ij} (x_i - x_j)^2 } { 2 W \sum_i (x_i -\bar{x})^2 }$$

Where $N$ is the total number of spatial units, $W$ is the sum of the
connectivity matrix $\mathbf{W}$, $w_{ij}$ are the spatial weights matrix entry between units $i$ and $j$, $x_i$ and $x_j$ are observations, and
$\mathbf{\bar{x}}$ is the average of all observations. This statistic ranges
between 0 and 2, with values greater than 1 indicating *negative* spatial
autocorrelation, and values between 0 and 1 indicating positive spatial
autocorrelation. A value close to 1 indicates no spatial autocorrelation.

This measure works well, but is sometimes more sensitive to local autocorrelation than is desirable for a statistic used across the entire problem frame. Therefore, we typically use Moran’s I to calculate global autocorrelation:

$$ I = \frac{N}{W} \cdot \frac{\sum_i \sum_j (x_i -\bar{x})(x_j - \bar{x})} {\sum_i (x_i - \bar{x})^2} $$

Where all symbols are the same as above. Moran’s I ranges from -1 to 1, with negative values indicating negative spatial autocorrelation and positive values indicating positive spatial autocorrelation. However, the expected value of Moran’s I is actually:

$$ \frac{-1} {N-1}$$

which, critically, is *not zero*. This means that the expected value of this
statistic in cases of no spatial autocorrelation is *slightly negative*.

## The Point Where I Really Grok It

More importantly, though, examine what these statistics are doing intuitively. Put in the language of “sums of squared deviations” from regression literatures, we have that Geary’s $C$ statistic is finding some sum of squared deviations between pairs of observations ($i, j$) and standardizing it by the sum of squared deviations from the mean. Let’d call the the sum of squared deviations from the mean $S_{x\bar{x}}$.

But, note that, for any pair of observations $(i,j)$, we only
include their information if pairs ${i,j}$ are neighbors. That is, the sum of
squared deviations between pairs $(i,j)$ are multiplied by the corresponding entry in the spatial weights matrix $\mathbf{W}$. If $(i,j)$ are neighbors, we keep their squared difference. If they are not neighbors, we drop them from the sum of squared deviations. In this way, let us call the term $\sum_i \sum_j W_{ij} (x_i - x_j)^2$ the *sum of squared deviations between neighbors*. This is the same as $CR$!

We could also think of this as the “sum of squared spatial lags” rather than the “sum of squared deviations between neighbors”, but we’ll call it $S_{CR}$ here to reflect its origin *and* the fact that it’s a sum of squared deviations statistic. Thus, we can rephrase Geary’s C as:

$$C = \frac{N - 1}{2W} \frac{S_{CR}}{S_{x\bar{x}}} $$

Then, we can interpret the first term as capturing or expressing the relationship between the degrees of freedom of our study area and its total spatial connectivity. The second term, then, reflects the relationship between deviations between neighbors and deviations in the data overall. Put this way, the statistic is more understandable.

Then, Moran’s I can be expressed similarly, but not identically. Recall the
crossproduct statistic $CR$. It compares observations to their neighbors by taking the sum of squared differences between an observation $x_i$ and all of its neighbors $x_j$, and then adds this value for *all* $x_i$ in the sample frame. For Moran’s $I$, we are computing something similar, but not exactly the same. We are still computing the crossproduct of squared deviations between $x_i$ and its neighbors, but these deviations are *relative to the sample mean*!

This is an important difference. For Geary’s $C$ statistic, we are
comparing neighbors directly using $S_{CR}$. For the $I$ statistic, we are comparing neighbors indirectly, through their relationship to the global mean. Let’s call this comparison of neighbors *through* the mean of observations $S_{\bar{CR}}$ Keeping this in mind, we can express Moran’s $I$ as:

$$ I = \frac{N}{W} \frac{S_{\bar{CR}}}{S_{x\bar{x}}} $$

This difference is slight but important. Recall that Geary’s $C$ is *more*
sensitive to local autocorrelation than is desirable for a local measure? That is a *clear result* of its use of direct comparisons between neighboring observations, rather than Moran’s $I$ using the deviations of neighbors from the mean.

## Conclusions

I guess, when you learn how statistics work in a purely-applied context, you fail to see the neat trends of statistical reasoning building on top of one another as the literature progresses. I bet that both Geary and Moran were aware of the use of the general crossproduct statistic (or the theory behind it, which is much broader than geographic research alone), and then constructed their respective measures from there.

In my lecture notes, I go on to explain LISA statistics through discussing the local Moran’s $I$ statistic from Anselin(1995). But, I pretty much understood the logic behind LISA statistics, given that I could abstract away the original Moran’s I as a given, inherited piece of knowledge.

I really wish I had the opportunity to take a high-level PhD-level course on this stuff. But, with figuring out what applications to do as my advisor splits from ASU and this TA-ship sucking up all of my oxygen (and doing so even before the point where I kinda started functioning as a full statistics instructor) I can’t even say that I’d have the time to chase up these leads until I get a little bit less off of my plate.

* imported from:* yetanothergeographer