This analysis was centered on looking at two interpolation methods for
predicting rainfall in Santa Barbara County Proper, i.e., not including
the Channel Islands. The methods initially chosen were multiple regression
(via map algebra) and kriging (via a GIS package). Kriging requires analysis
of a semi-variogram to ascertain which function is necessary to fit the data.
After a semi-variogram was produced, no definite form could be recognized, and
it was concluded that kriging would be unsuitable for this data. At this point, Inverse
Distance Weighting (IDW) was chosen for comparison to multiple regression, since no such
initial condition is required.
In addition, two separate rainfall seasons were selected
for both interpolation methods. An "El Niño" season (1982-83) and an anomalously
dry season (1989-90) were chosen for contrast. This allows for two initial hypotheses:
1. Multiple Regression analysis is a better predictor of rainfall than IDW.
i.e., the relative residual error term, (X),
for multiple regression, is less than that for IDW, where:
X = e/(observed),
e = (observed-expected).
2. Rainfall interpolation is generally better for wet seasons than for dry seasons.
i.e., the relative residual error term (X),
for 1982-83, is less than that for 1989-90.
Data Description and Collection
Rainfall data was collected from the Santa Barbara Flood Control District
for both seasons, 1982-83 and 1989-90.
Rainfall was reported in inches.
Gauge station locations were reported by latitude and longitude, to the thousandth of a degree.
Method of Data Analysis
A total of 60 rainfall gauge sites were selected throughout the
county. These sites reported rainfall data for the two
seasons in question.
Three factors are assumed (for the purpose of simplicity)
regarding precipitation generation at all points
in Santa Barbara County for this analysis; rainfall is a function of:
Distance from the west-facing coast
Distance from the south-facing coast
Semi-Variogram for Kriging
Here, gamma equals half of the sample variance. It can be seen that there exists no easily recognizable
shape to the curve for either year;
neither a punctate nor universal form would be suitable for kriging with this data. Obviously, this is due
to the large variations in rainfall between sample points that may or may not be due to extremes in elevations between
these points, or due to measurement error in collection of the rainfall data itself. Thus, it was concluded that
kriging should not be performed on these data sets.
Data, Grid and Coverage Preparation
Data Model: Vector
Imported gauge stations data into ARC/INFO via a comma-delimited file to create points coverage
Names of gauge stations
F Gauge station points coverage created
Data Model: Raster
Created grids of distance from a control parallel and a control meridian ("control lines", collectively)
Converted arcs coverages of control lines into grids
Created distance grids by calculating distance in each pixel from the control lines' pixels
F Two distance grids (one for each coast) were established
y = a(x1) + b(x2) + c(x3) + d
applied above equation with regression coefficients for each year
F Rainfall grids for each year were established, graphically pictured below:
Cross-validation of the multiple regression model is straightforward
and easily calculated by a spreadsheet program. Distance of a gauge station point from the
west and south coasts, and elevation of the point, is entered into the corresponding
regression equation for the appropriate year. The returned value is the predicted
rainfall at that point.
Regression grids of each of the seasons are shown below. (The pound signs represent
one of the sub-sampled gauges, Santa Barbara Sewage Station).
Inverse Distance Weighting (IDW)
Cross-validation of the IDW model is moderately complicated and time-consuming, but necessary.
With the 60 sample points, a point is removed, and an IDW interpolation of the remaining
59 station points is calculated. From here, the removed point's rainfall value is returned by
obtaining a cell value at that location. Ideally, this would be repeated for each of the remaining 59
points, for both seasons, resulting in 120 individual grids.
However, due to limitations in computer time and storage, a random sub-sample of 20 points from
the 60 were analyzed. Thus, a total of 40 different IDW interpolations were produced. Regression
values of the same 20 points were calculated. A second-order, radial IDW was performed where the radius = 25.0 km.
General IDW grids for each season are shown below. NOTE: These are NOT the exact grids in which
values were derived for the 20 points. Each individual, interpolated grid will appear slightly different.
Results of Data Analysis
Statistical analyses were performed by looking at the values returned for
each of the 20 sub-sample stations (for each season).
For these stations, the relative residuals were calculated. Furthermore,
the average relative residual (X) for the four cases were derived:
It can be seen that the average relative residual error term, X,
for multiple regression is less than that for IDW for each corresponding season:
And that X
for 1982-83 is less than that for 1989-90 for each interpolation method:
Confident predictors return near-zero, relative residual values for each input. The
X values can also be regarded
as percentage error off the observed value when multiplied by 100.
Therefore, this data, analysis, and procedure suggest that:
The multiple regression method appears to
interpolate rainfall values better than the Inverse Distance
Weighting method (with the specified parameters), and;
Spatial interpolation of rainfall in Santa Barbara County proper
appears to work better for significantly wetter winter seasons
than significantly drier ones.