# Statistical Surfaces in GIS

| |

A statistical surface is any geographic entity that can be thought of as containing a Z value for each X,Y location.  Digital elevation models being the most well known example, others include gradient, temperature, population, economic potential.  A statistical surface can be any numerically measurable attribute that varies continuously over space, such as temperature and population density (interval/ratio data).  These surfaces are “Statistical” because Z values are a statistical (e.g. mean or sigma) measure of the features under consideration.

There are two basic types of statistical surfaces:

1. Continuous; Z values occur everywhere within the area of study
2. Discrete; Z values occur only at specific locations, but can be summarized (such as number per neighborhood).  These discrete surfaces are calculated from “punctiform” (point) data which are composed of individuals whose distribution can be modeled as a field (such as population density).

Unknown value areas (i.e. those areas between the measure points) can be estimate in two ways.  Interpolation looks at “filling in the gap by estimating the values of locations for which there is no data using the known data values of nearby locations. Extrapolation looks at “guessing what’s beyond the edges” by estimating the values of locations outside the range of available data using the values of known data.

## Types of Interpolation

Linear is the simplest method, but it is not very accurate. Linear interpolation works best when data points are uniformly spaced.

Non-linear methods are designed to eliminate the assumption of linearity. There are three types of non-linear interpolation methods: weighting, trend surfaces and Kriging.

Global interpolation looks all the average of all values in the dataset to interpolate an unknown data value point.  Local interpolation looks at the average of values found within a specified radius of the unknown data point.

Distance Weighted (Inverse Distance Weighted – IDW) uses the weight (influence) of a neighboring data value is inversely proportional to the square of its distance from the location of the estimated value.  Distance weighted interpolation assumes that the closer values are to each other, the more likely they are to be affected by one another. An example of distance weighting to interpolate an unknown value (red dot). The values closest to the unknown value are weighted more heavily than values that are farther away.

Trend Surface interpolation uses a global method and multiple regression (predicting z elevation with x and y location).  Conceptually trend surface is a plane of best fit passing through a cloud of sample data points which does not necessarily pass through each original sample data point.  This interpolation is used when the user wants to understand general trends of a surface.

For trend surfaces, the more complex the surface to be modeled, the more degrees of trend. Splines uses local interpolation.  Spline interpolation fits a mathematical function to a neighborhood of sample data points.  The surface is a ‘curved’ surface and passes through all original sample data points.

Kriging is an interpolation commonly used for geologic applications.  Kriging addresses both global variation (i.e. the drift or trend present in the entire sample data set) and local variation (over what distance do sample data points ‘influence’ one another).  Krigin looks at three variables:

1. Drift – general trend of the surface
2. Fluctuations – small, spatially correlated changes in the surface
3. Noise – random changes not related to the underlying surface

## Problems in Interpolation

Since interpolation is use for predictive modeling, it involves “guessing” at unknown values by using available information.  If there are too few control points, there may not be enough measured values to have a statistically significant sample.  Also, the distribution of control points is important.  More complex surfaces need more sample points than flat or simple surfaces.  Areas at the edges of the map contain the highest area.  Therefore, control points should be sampled beyond the area being interpolated and then the interpolated surface cropped back to remove edge error.