Stressing the Local

Today, we’re going to look at local spatial autocorrelation. Like a kind of outlier diagnostic, local spatial autocorrelation measures how the local structure of a spatial relationship around each site either conforms to what we expect or is different from what we expect. Together, local spatial statistics are a general branch of statistics tha aim to characterize the relationship between a single observation and the sites surrounding it.

Often, local spatial autocorrelation is contrasted with global spatial autocorrelation, which is the structural relationship between sites (in abstract) and their surroundings (again, in abstract)., and this may have strongly different structure for many ideas of what surrounds each observation. Thus, local statistics are an attempt at measuring the geographical beahvior of a given social, physical, or behaviorial process around each observation.

library(sf)
library(mosaic)
bristol = sf::st_read('./data/bristol-imd.shp')

## Reading layer `bristol-imd' from data source `/home/lw17329/OneDrive/teaching/second_year_methods/spatial-practicals/data/bristol-imd.shp' using driver `ESRI Shapefile'
## Simple feature collection with 263 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 350239.4 ymin: 166638.9 xmax: 364618.1 ymax: 183052.8
## epsg (SRID):    27700
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs

First, Correlation

First, though, let’s really refresh our understanding of plain, a-spatial correlation, and how our knowledge of outliers work in two dimensions for a standard relationship between two variables. Here, let’s look at the relationship between housing deprivation and crime in Bristol:

correlation = cor.test(housing ~ crime, data=bristol)
correlation

## 
##  Pearson's product-moment correlation
## 
## data:  housing and crime
## t = 7.8199, df = 261, p-value = 1.309e-13
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3322359 0.5287749
## sample estimates:
##       cor 
## 0.4356841

While there is some correlation between these two scores (areas with housing deprivation tend to have higher crime), we can also see that not every observation agrees with this trend. Namely, we can highlight one such observation with really low crime rates, but relatively high housing deprivation:

plot(housing ~ crime, data=bristol, pch=19)
abline(lm(housing ~ crime, data=bristol), 
       col='orangered', lwd=2)
abline(v=mean(bristol$crime), lwd=1.5, col='slategrey', lty='dashed')
abline(h=mean(bristol$housing), lwd=1.5, col='slategrey', lty='dashed')
housing_outlier = bristol[bristol$LSOA11CD == 'E01014714',]
points(housing_outlier$crime, housing_outlier$housing, col='red', pch=20)

Exercise: Thinking about outliers

There are about 4 values with our possible outlier’s level of housing deprivation. - Approximately (using your eyeballs, not R), what’s the mean of those four observations’ crime levels?

Is this mean substantially different from the value at our potential outlier?

There are approximately three observations with the same crime levels as our candidate outlier.

Approximately (using your eyeballs, not R), what’s the mean of those three observations’ housing deprivation?
Is this mean substantially different from the value at our potential outlier?

Make a map of bristol housing and crime values. Highlight the potential outlier (don’t forget, named housing_outlier), in white. Describe the difference in the two maps around the housing_outlier. Where is the housing_outlier?

Challenge: Statistics as noise & distance

Humans do a few things they “see” outliers in a scatterplot. Mainly our intution about which observations are (visual) outliers comes from the distance between an observation and the center of the data being analyzed. This is why we are apt to think that the red observation is an outlier, but the blue may not be.

Fundamentally, this is because the data at our outlier site, \((x_i, y_i)\), is still very far away from the center of the point cloud, \((\bar x, \bar y)\), despite the variability and co-variability in the data. The co-variability matters greatly for how easy it is to detect outliers. For instance, it’s much easier for you to just “see” an outlier in this data:

Than in this data:

even though those two plots have means in about the same location. The strength of covariation matters greatly to our perception of what data are outliers in 2 dimensions, and this holds in higher dimensions as well (though it becomes incredibly difficult to see).

One way that we can mathematically model this idea of extreme values (those that are unlike other values) using the mean co-variance in our problem is to use another kind of variance-weighted distance. Recently, we worked with Cohen’s \(d\), a univariate measure of the distance between two samples, standardized by the variance of the samples. Another measure of difference for 2 (or more) dimensional data is the Mahalanobis distance. This is a statistical distance that is defined as a weighted difference between observations and the “center” of the point cloud. The weighting is a function of the covariation between variables, and the “center” is the mean of each variable. The Mahalanobis distance itself measures how far out of the point cloud a given point is.

Another statistic we work with (first shown in Year 1) is also a variance-standardized distance. Can you recall which statistic this is?[^hint-tstatistic]
Compute the Mahalanobis distance in R using the following code:

# Compute the mahalanobis distance:
just_data = sf::st_drop_geometry(bristol[c('housing','crime')])
housing_mean = mean(bristol$housing)
crime_mean = mean(bristol$crime)
mdistance = mahalanobis(just_data, 
                        c(housing_mean, crime_mean),
                        cov(just_data))

Then, make a histogram of mdistance. Describe the distribution of Mahalanobis distances. Why might the distances be distributed this way if the Mahalnobis distance measures how far out of the point cloud each point is?¹ 2. Make a scatterplot of the same housing ~ crime relationship we’ve seen before, but make each point’s color a function of the Mahalanobis distance we computed before. Below is the code (using ggplot, which we haven’t explicitly used yet) that can build the plot:

# We'll use ggplot because it's easier for the coloring:
library(ggplot2)
# Start a ggplot
ggplot2::ggplot(bristol, aes(x=crime, y=housing, color=mdistance)) +
  # Then, add points at each x,y in (crime, housing)
  ggplot2::geom_point() +
  # add an "lm" smooth to the scatterplot with no standard error:
  ggplot2::geom_smooth(method='lm', se=FALSE) +
  # add a horizontal line at the average of housing
  ggplot2::geom_hline(yintercept = mean(bristol$housing)) +
  # add a vertical line at the average of crime
  ggplot2::geom_vline(xintercept = mean(bristol$crime)) +
  # use viridis colors instead of the default
  ggplot2::scale_color_viridis_c(option='plasma') +
  # and use a dark scheme so it's easier to see
  ggplot2::theme_dark()

Describe what the Mahalanobis distance shows in the scatterplot above. Do the colors in the plot conform to subjective interpretations of how “outlier-y” observations are with respect to the line of best fit plotted?
Make a map of the Mahalnobis distance². Do areas with large Mahalnobis distances tend to be near one another?

Local Stats

The idea of observations that just don’t fit within an expected relationship or structure in our data takes special meaning in a geographical context. Given our existing explorations into spatial autocorrelation, measures of local spatial autocorrelation measure how outlier-y each observation is from its surroundings. In a similar manner to our previous dicsussions of outliers in a scatterplot, it is easy to develop a measure of local spatial autocorrelation from the Moran Scatterplot.

To get started, let us build touches weights, also known as contiguity weights for our data. This can be done quickly using the spdep package in R. Using spdep, we can build an object that contains all the information about which areas touch each other area:

library(spdep)
touches = spdep::poly2nb(bristol)
# for some reason, spdep has many formats of weight:
touches = spdep::nb2listw(touches)

While before we used sf to build the within_distance relationships between every pair of observations manually, this function turns these kinds of requests into a handy function (that’s also substantially faster). We can compute a few different types of weights using spdep, and do so way faster than we did before:

# All of the following are `nb` objects. Don't forget to convert them to listw!
within_2000m = spdep::dnearneigh(sf::st_centroid(bristol), 0, 2000)
within_4000m = spdep::dnearneigh(sf::st_centroid(bristol), 0, 4000)
nearest_10 = spdep::knearneigh(sf::st_centroid(bristol), k=10)
nearest_10 = spdep::knn2nb(nearest_10)

One common kind of local statistic used in spatial statistics relates to the Moran scatterplot.³ We can make this plot below in the way we did last practical:

nearby_crime = spdep::lag.listw(touches, bristol$crime)
bristol['nearby_crime'] = nearby_crime

plot(nearby_crime ~ crime, data=bristol, pch=20)
abline(lm(nearby_crime ~ crime, data=bristol), lwd=2, col='orangered')
abline(h=mean(bristol$nearby_crime), col='slategrey', lwd=2, lty='dashed')
abline(v=mean(bristol$crime), col='slategrey', lwd=2, lty='dashed')

And, this scatterplot corresponds to the map of crime in our data:

plot(bristol['crime'], lwd=.1)

Recalling the formula of the line of best fit, we can express the relationship between the variable observed at sites \(i\) and \(j\) (spelled \(x_i\) and \(x_j\)), as a function of the variability (and co-variability) of \(x_i\) and \(x_j\) around the mean, weighted by the spatial relationship between sites \(i\) and \(j\), modeled by \(w_{ij}\). Formally, this is stated: \[ I = \frac{N}{\sum_i^N\sum_j^N w_{ij}}\frac{\sum_i^N\sum_j^Nw_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i^N \left(x_i - \bar{x}\right)^2}\] But, practically speaking, you can think of the top as the spatially-weighted co-variation for all pairs of sites \(i\) and \(j\). You can think of the bottom as the intrinsic variation, or just the variance, of \(x\), regardless of site. Like the Mahalanobis Distance we looked at above, we might want to try to build a measure of how outlier-y the data at each site \(x_i\) is from its local average, \(\sum_j^N w_{ij}(x_j - \bar{x})\). To build this, Anselin (1995) broke up the estimator into two distinct parts:⁴ \[ I = \sum_i^N (x_i - \bar{x}) \left[\frac{N}{\sum_i^N\sum_j^Nw_{ij}} \frac{\sum_j w_{ij}(x_j - \bar{x})}{\sum_i^N(x_i - \bar{x})^2}\right]\] Then, we can say that the contribution of each site to Moran’s \(I\), the slope of the line shown above, is a result of the partial sums over \(i\). That is, we just pick some \(i\), and then the local Moran’s \(I\), the co-variation between the site and the local average becomes: \[ I_i = (x_i - \bar{x})\left[\frac{N}{\sum_i^N\sum_j^N w_{ij}} \frac{\sum_j^Nw_{ij}(x_j - \bar{x})}{\sum_i^N (x_i - \bar{x})^2} \right] \] This statistic is called the local Moran’s \(I\).

Exercise: Working with a Local Moran

Local Moran’s \(I\) statistics are one of the most common ways to understand the local variability of data.

Interpreting the statistic above, what is the sign of \(I_i\) if:

\(x_i\) is above the average of \(x\) (spelled \(\bar{x}\), also known as the map average or just the average), and the local average, \(\sum_j^N x_{ij}(x_j - \bar{x})\), is also above the map average of \(x\)?
What happens if the site value \(x_i\) is above the map average (\(\bar{x}\)), but the local average is below the map average?
If I told you that \(I_i\) were positive overall, would the local \(I_i\) in part (a.) contribute to this being positive? Would the local \(I_i\) in part (b.)?

Computing Local Moran’s \(I_i\) statistics is simple if you’ve loaded spdep:

local_morans = spdep::localmoran(bristol$crime, touches)

Compute the Local Moran statistic for the other notions of nearness as well (within_2000m, within_4000m, and nearest_10)⁵ 3. The local_moran result is a matrix with a few columns. Ii is the Moran score itself. Using one of the senses of geographical nearness, make a Moran scatterplot for nearby_crime ~ crime, using the first column of the local_moran matrix (e.g. within_2000m_moran[,1]) to color-in the points in the scatterplot. Where are large, positive values of the Local Moran statistic? Where are the large, negative values of the Local Moran statistic? ⁶ 4. Make a map of the Local Moran’s \(I_i\) values. Keeping in mind that large local statistics are not necessarily large crime values (recalling the scatterplot above,) how can you interpret the areas of light yellow and areas of dark blue in the plot?

Which quadrant(s) reflect plateaus, where an observation and its neighbors are both above the mean?
Which quadrant(s) reflect peaks, where an observation is much larger than average, but its surroundings are much lower than average?
Which quadrant(s) reflect pits, where an observation is much lower than average, and its surroundings are much higher than average?
Which quadrant(s) reflect playas, where an observation and its neighbor are both below the mean?⁷
Conceptually, which two categories are clusters? Which two are outliers?

Repeat the local analysis with housing, income, education, and employment . Interpret the patterns in Local Moran scores there, too. Which patterns are the most similar? the most different?

Challenge: Cluster mapping with Local Morans

The Local Moran statistic (like the Mahalanobis statistic) is actually amenable to statistical inference, in addition to simply visualizing the structure of the data. Here, let’s do some significance testing on our results. The null hypothesis for Local Moran’s \(I_i\) testing is that there is no significant clustering or outlier structure in the data.

There are 263 observations in Bristol LSOAs. At a 95% significance level, how many local statistics would you expect to be statistically significant, even if there were no clustering/ouliers in the data? How about at a 99% significance level?
How many of the \(p\)-values in your chosen local_moran matrix are significant at a 95% significance level? How about at a 99% significance level?
Using what you know about boolean/binary mask variables, complete the following code to get a mask for only the \(i\) values that are statistically significant:

local_moran_scores = local_moran[ ,1]
p_values = local_moran[ ,5]
is_significant = p_values <= # set your significance level here!

Make a Moran Scatterplot for crime like we have done before. Then, in addition to the lines we add for axes and a line of best fit, use points to plot over significant \(I_i\) values in red.

How many of each local type (peak/pit/plateau/playa) do you have?
How does this change if you increase your significance threshold?

Make a map of crime like we have done before. Be sure to set the reset=FALSE option, however. It also will help legibility if you use lwd=.01, to make the width of the lines between LSOAs very small. Then, add the following line to plot the outlines of the sigificant values in white:

plot(sf::st_geometry(bristol[is_significant,]), lwd=2, border='white', add=TRUE, reset=FALSE)

Interpret the detected clusters (outlined in white) in the preceeding plot. What clusters or outliers are detected in the map at your given significance level? How does that change if you increase your significance level from, say, .05 to .001?

Conclusion

Local spatial statistics are a form of outlier diagnostic, designed to give you insight about how the structure of the relationship between an observation and its surroundings changes right around the observation itself. Because of how flexible these measures are, local spatial statistics are among the most commonly-used spatial statistics across the social sciences. Especially the local Moran’s \(I\), these statistics are easy to understand, simple to visualize, and quick to compute. Thus, as you’ve seen, they give a good indication of how to explore the local structure of univariate data in a given dataset.

In the future, however, we will need to analyze more than just one variable at a time. Indeed, most social or physical modelling problems involve multiple variables all at once. To do this, we need to build more models, not just use exploratory statistics. Our final lab for this segment will consider how geography is built into models, and strategies that are common in all forms of spatial models (not just regression).

Footnotes

Put another way, If everyone is unique, how can anyone be unique? Some points have to be more “typical” than others.↩
Add the Mahalanobis distance as a column.↩
as a bit of an R challenge, try making this scatterplot in ggplot instead of the basic plot? No pressure; ggplot is tough.↩
this is possible because you can factor out each \(x_i - \bar{x}\) from the inner sum over \(j\). If you think of site \(i\) as the “focus,” then the sum over \(i\) in the numerator is capturing the contribution of each site. The sum over \(j\) happens for each site, and reflects the local average at site \(i\). Thus, we can pull out that sum over \(i\) from the part that talks about how to get the local average at each site. That denominator is the variance of \(i\), and never changes regardless of where we take the sum or where we compute the local average.↩
Don’t forget: we had to use spdep::nb2listw to convert the touches object from one type of representation to another. This is simply a legacy issue for spdep.↩
Even if you didn’t attempt the challenge on the Mahalanobis distance, there is R code in part 2 that may be of assistance.↩
These are just nicknames for these structures of relationships, and many people have different nicknames. Think of what they represent (e.g. a focal point and its surroundings are all higher than average, a focal and its surroundings are all below average, or the focal is below (above) and the surroundings are above (below). This name is only meant to be evocative and geographical, but other kinds of names might include hotspot/coldspot (for plateau/playa), donut/diamond in the rough (for pit/peak), or cluster/outlier (for both platueau&playa/peak&pit). There is no standardization here.↩

Local Statistics: Finding the Uniqueness in All of Us