Introduction
Statistics developed side by side with the nation-state in 18th century Europe. The word stems from l’etat, which is French for state. Thus Statistics has had, in general, the central tendency as its primary concern. A key idea in statistics is that the mean minimizes the sum of the squared deviations between the mean value and all the points in the distribution. This paper builds on this idea – that the sum of distances between a given point in ‘distribution space’ and all the data points, when raised to a power (not necessarily 2) and summed, provides a measure of the relative ‘location’ of a given distribution. Furthermore, it should be noted that this approach is inductive rather than deductive. Our focus is on giving the data another voice "to speak for itself" [Tukey,1975], rather than trying to prove some optimal values based on assumptions that are very likely not going to be true for the majority of real world datasets. The data being analyzed is a TM scene of the Maraba region of Brazil, [Roberts, In Press]. The Sum of Distance technique is used to qualitatively explore this data and quantitatively determine a natural number of ‘best’ clusters.
Overview
This paper is comprised of six sections. In the first section we briefly state the background for this topic. In section two the data being analyzed is discussed. In the third section the Sum of Distance method is defined. In the fourth section a bivariate clustering of the Maraba scene is performed with the SoD technique. In the fifth section multivariate classification is discussed. In the sixth section an overview of this research is given, and goals for further work are set forth.
Background
The problem of clustering/classification has had a very long and broad history. Hartigan has provided one classic reference surveying this broad field [Hartigan, 1975]. His book summarizes the traditional methods of clustering, which can be generally partitioned into three regimes: optimal, agglomerative (hierarchical), and divisive. Optimal techniques make use of the rigorous methods of integer programming to ensure partitions which meet certain optimality constraints. Generally, these techniques are limited to several hundred, or at most thousands of points, and are thus inappropriate for remote sensing applications. Agglomerative techniques work by pairing individual points, and then iteratively collapsing two clusters or swapping points between them. These techniques build up a hierarchy of clusters. While more robust than optimal methods, agglomerative techniques have difficulty handling the millions of observations typically encountered in remote sensing applications. For this reason, various flavors of partitioning techniques such as k-means [Hartigan, 1975], maximum likelihood and parrallelpiped [Richards, 1993, Lillesand, 1987] are typically used to classify images. These methods commonly share the following attributes:
In this paper we explore an innovative new direction – the sum of distances transformation. This technique has as its roots in the work of Geographers and Location Scientists who have developed methods to determine the spatial median, modes and dispersion of points in geographic space [Neft,1966]. Our insight has been that these techniques may be fruitfully applied to non-geographic information spaces. Further study into this area will be required to make the appropriate citations.
Classification in remote sensing has continued to maintain a place of significance, in part due to the need to identify and quantify landcover change. Our study draws heavily from the work of Roberts, et al. [Roberts, In Press]. Roberts’ work stresses the need for remotes sensing techniques to map land cover change, and presents the Multitemporal Spectral Mixture Model Analysis (MeSMA) as a standard frame of reference for comparing scenes from different years. While we have chosen to work with principal components in this study, since they most compactly express the variance of the TM data allowing us to work effectively with just 2 dimensions, we plan on using Robert’s MeSMA generated fractions in future studies, due to the increased comprehensibility and temporal stability provided by this method.
Data
A 1024 by 1024 pixel TM scene was analyzed. The statistics for the bands are listed below (table 1). Note that all the bands exhibit a high degree of skew and kurtosis, with means very close to their minimum values, and maximum values ranging up to 35 times the mean for that band. Also note that the correlation matrix (table 2) denotes a high degree of cross-correlation between the bands, indicating that actual rank of the TM matrix is considerably less than 6. Bands 1 through strongly correlated with each other, and somewhat weakly with bands 5 and 7, which were also strongly correlated with each other. The interesting exception to this pattern of inter-band linkage is TM4, which is only weakly negatively correlated to bands 3 and 7. The negative correlation between bands 3 and 4 makes sense, since vegetative material reflects in 4 and chlorophyll absorbs in 3 [Roberts, A Spectral Primer], and a large amount of the scene is covered in vegetation. Similarly, leaves reflect more in the NIR than in the MIR, which seems to generate the negative correlation between bands 4 and 7.
Table 1: Basic Statistics of Maraba 92 TM Bands
| Band | Min | Max | Mean | Stdev | Standardized
Max |
| TM1 |
51
|
255
|
62.88
|
8.06
|
25.31
|
| TM2 |
15
|
168
|
24.88
|
4.86
|
31.45
|
| TM3 |
12
|
204
|
22.64
|
7.88
|
24.36
|
| TM4 |
10
|
189
|
62.52
|
10.03
|
17.85
|
| TM5 |
4
|
255
|
57.51
|
19.81
|
12.67
|
| TM7 |
0
|
255
|
17.56
|
10.13
|
25.18
|
Table 2: Correlation Matrix of Maraba 92 TM Bands
| Band | TM1 | TM2 | TM3 | TM4 | TM5 | TM7 |
| TM1 |
1.00
|
0.96
|
0.91
|
0.04
|
0.70
|
0.76
|
| TM2 |
0.96
|
1.00
|
0.95
|
0.05
|
0.77
|
0.82
|
| TM3 |
0.91
|
0.95
|
1.00
|
-0.09
|
0.87
|
0.92
|
| TM4 |
0.04
|
0.05
|
-0.09
|
1.00
|
0.03
|
-0.11
|
| TM5 |
0.70
|
0.77
|
0.87
|
0.03
|
1.00
|
0.96
|
| TM7 |
0.76
|
0.82
|
0.92
|
-0.11
|
0.96
|
1.00
|
A principal components analysis was performed on the TM bands, which resulted in the factor loadings presented in table 3. A map of the first three components is shown in image 1. The first factor is, as expected, an indication of ‘brightness’, with all bands except 4 loading positively. This ‘Brightness’ factor loaded most heavily on the MIR, suggesting that it was picking up the clouds most strongly. Indeed, the high skew and kurtosis in each of the raw TM bands can largely be accounted for by the existence of a relatively few extremely reflective pixels. While the cloud pixels could have been removed, or another, less cloudy, time period could have been examined, we chose to include the cloudy regions in our data in the hope that the sum of distance analysis would reveal them and successfully isolate them from other types of pixels. After brightness, the second principal component was ‘Greenness’. We have already described how TM4 was poorly correlated with the other bands. Since PC1 is largely just a weighted average of the other bands, it is not surprising that the PC2 would largely be just TM4, which is independent of the other bands. Note that the negative correlations between TM4 and TM3 and TM7 reappear as negative factor loadings. The third component used in this study was interpreted as an NPV factor. This band differentiated reflectance in the visible with reflectance in TM5. The fourth component appeared to differentiate between ‘wet’ black bodies (e.g. rivers) and ‘dry’ dark bodies such as burned areas. Factors 5 did not have any immediate physical interpretation and Factor 6 was almost complete noise. Below table 3 is a small plot of the eigenvalues of the components. One question this study would like to address is the correct number of components to use. Clearly, the first 2 factors ‘explain’ most of the data, but will they be sufficient to differentiate between burned areas and rivers, as well as bare soil and NPV?
Table 3: Factor Loadings of Principal Components
| Band | Bright | Green | NPV | Burn | Band5 | Band6 |
| TM1 |
0.26
|
0.03
|
-0.73
|
0.15
|
-0.60
|
-0.16
|
| TM2 |
0.17
|
0.02
|
-0.36
|
0.11
|
0.32
|
0.85
|
| TM3 |
0.30
|
-0.07
|
-0.38
|
0.02
|
0.72
|
-0.49
|
| TM4 |
0.00
|
0.99
|
-0.03
|
-0.13
|
0.06
|
-0.04
|
| TM5 |
0.80
|
0.07
|
0.45
|
0.37
|
-0.10
|
0.01
|
| TM7 |
0.41
|
-0.11
|
0.01
|
-0.90
|
-0.09
|
0.08
|
| Eigenvalue |
593
|
103
|
44
|
4
|
2
|
.5
|
Maraba 1992 Principal Components
Red=PC1, Green=PC2, Blue=PC3
Image 1. This scene from Maraba is composed large stands of primary
and secondary growth, as well as pasture and degraded farmland. Note the
large number of clouds and their associated shadows.
Methods
One of the most significant features of space is that a large amount of it is empty. This is true of variable ‘space’ as much as it is of ‘geographic’ space. The fact that our measurements of the world fall, in reality, into only a limited subspace allows us to formulate theories and generate predictions, thereby making sense out of our sensations. In remote sensing, researchers routinely face a need to map sets of wavelength measurements to more understandable types of variables. One of two general approaches are usually followed: Classification Analysis (CA) or Spectral Mixture Analysis (SMA). Cluster Analysis takes an image and generates a thematic transform describing the scene as composed of a limited number of classes. Spectral Mixture Analysis takes an image and generates a set of fractions; metric variables describing the relative membership of a pixel in a class defined by a set number of endmembers. SMA is clearly a much more rich description of a given data set, yet, in the end, a clustering of data is often required. Maps must be made, or the change of landcover types must be analyzed. Since the need for clustering is not likely to go away, it might be worthwhile to ask how we might improve upon the current methods. In this paper we explore a new way of looking at our data, using a statistic called the Sum of Distance.
Sum of Distance (SoD) analysis takes a given set of m points of n-dimensional data (x) and a grid G of potential locations in n-space. From these generates a sum of distances grid, SD. The relative values of SD will be indicative of the distribution of the original points of X. Formally, let X be an n-dimensional vector of m observations. A matrix G, typically of the same dimension as X, is defined which contains points regularly distributed over the span of x. Each element of G is an n-dimensional vector defining a potential cluster location. For simplicity, we assume that we have broken up the variable range of vector x into q quintiles. G will then contain qn locations. We further define a matrix SoD of the same rank as G, composed of scalar quantities equal to the sum of distance between the corresponding grid point and all the observations in X. The following equation expresses this value:
qn m
let
SoD = S S [ D(
Gi, Xj)p
]
i=1 j=1
where
SoD is
a sum of distance matrix
G is an n-dimensional
grid of potential locations
X is a vector of
m observed values
m is the number of
observed values
n is the number of
variables in each observation
D is the Euclidean
Distance Function
p is the power to
which the distance is raised
It is a well established fact that if p is greater than 1.0 then the
value of SoD will be convex up, with a minimum value at the mean
of X. The following sum of distance plot (image 1) of PC1 versus
PC2 illustrates this. This plot was generated from a 300 x 300 sub-pixel
area. The brightness axis flows along bottom of the image, and greenness
goes up and down. The plot was calculated using a power value of 2.0. This
image gives us almost no information concerning the distribution of points
within the Maraba 1992 scene. Compare this image with the one next to it.
This image was calculated with a power of –2.0. Thus metaphorically, the
plot represents the sum of ‘gravitational attractions’ between each pixel
in our original image and every grid cell in PC space. Note the tasseled
cap structure. This structure is more clearly revealed by subtracting out
the ‘expected value’ of the sum of distance distribution, represented by
the third plot.
The values of this plot are calculated as:
qn m
let SoDT = S
S [ D( Gi, Xj)p
]
i=1 j=1
where
SoDT
is a theoretically derived sum of distance matrix
G is an n-dimensional
grid of potential locations
X is a vector of
m observed values
m is the number of
observed values
n is the number of
variables in each observation
D is the Euclidean
Distance Function
p is the power to
which the distance is raised
P is the chance of
a pixel landing in grid cell Gi
P was calculated by assuming a bivariate normal distribution, centered at the mean of X, and having the same variance as PC1 and PC2. Image is the ‘residual’ left after we subtract SoDT from SoD. Note how the outliers are emphasized. The images are difficult to see, but there are clearly endmembers at three points of the triangle. The images on the bottom have been convolved by a 3x3 highpass filter matrix of the form:
| -1 | -1 | -1 |
| -1 | 9 | -1 |
| -1 | -1 | -1 |
This highpass filter raises the values of abnormally high pixels, bringing up abnormally high values even further. In particular, note the bright pixels to the left of the tassled cap. These pixels lie outside the ‘veg line’ and would likely generate errors in a 3 end member mixing model. After these explorations showed promise, the study was repeated for the full Maraba image.

PC1 vs. PC2, p=2
PC1 vs. PC2, p=-2

PC1 vs. PC2, p=-2, Null Hypothesis
PC1 vs. PC2, p=-2, ‘Normalized’

Highpass Filter of p=-2
Highpass filter of Residuals

PC1 vs. PC2, p=-2 PC1 vs. PC2, p=-2, Residuals

PC1 vs. PC2,
p=-2, Highpass Filtered
PC1 vs. PC2, p=-2, Residual, Highpass
Bivariate Clustering with the Sum of Distance Technique
These 2d plots show the sum of distance measurement calculated to a range of greenness and brightness values for the full Maraba image. The images were calculated to a coarser resolution in order to keep processing times reasonable. The first plot shows the sum of inverse distance squared plot for the entire 1024 by 1024 image. Note the tasseled cap structure. The orange area defines the mean for the image, which is mostly composed of forest. To the right of this image is the ‘normalized version’, which has had the expected value of the sum of distance measure subtracted away. Note how this accentuates the outliers located to the far right and on the bottom of the images. The areas are almost four standard deviations away from the mean center, and should ‘normally’ be empty. These pixels, therefore, are almost certainly pure, and make good candidates for an endmember analysis. In order to better differentiate cluster centers the normalized SoD field was passed through a highpass filter, and the twlve grid cells with highest values were extracted. Two of these points were deemed redundant, and removed. These centroids are plotted below. Note that the ‘tasseled cap’ seems stretched out to the right. Presumably this is caused by the existence of clouds in our image.

We took these coordinates and used them to perform a ‘minimum distance classification. This classification appears below

These results seem reasonable, but the classification failed to adequately distinguish between clouds and pasture. Another considerable error in this clustering is the confusion of burned and river regions.
To test the accuracy of our method, we reintroduced the clusters that
we deemed ‘redundant’ and reran the cluster analysis. We considered it
possible that we might need more contrast along the brightness dimension
to catch the difference between clouds and pasture.
This is the resulting image. Our cloud problem has become less severe and fewer burned areas appear as ‘water’. The vegetation seems to have been nicely differentiated.

In general this is rather impressive for an unsupervised classification based on just two variables. The problems that we encountered seem to be mostly attributable to the nature of the principal components we didn’t use: PC3 – the NPV component, would have provided additional information helping us
differentiate burned areas, pastures and degraded farmland. PC4, which appears to load positively on water and negatively on burned areas, would have helped to split the river/burn class appropriately. The Sum of Inverse Distance plots for these components can be found on the next two pages. Clearly, these plots are telling us something very powerful about a small number of pixels. This is a significant advantage of the SoD approach – small portions of a large population can have a significant impact on the SoD field, if they are strongly clustered and adequately far enough away from the other points. This suggests that the SoD approach might prove very useful in detecting and visualizing weak, yet distinct, signals within remotely sensed data. This could be useful in a wide array of applications.
Our original intention with the SoD programs was to extend our G matrix
into three, four, and possibly more dimensions. Code for the three dimensional
case, as well as for 3d visual display was written. When these were tested
on a 300 by 300 sub-region of the study site, for PC’s 1,2 and 3, nothing
exciting happened – because the region was mostly composed of forest. Experiments
with applying the method to the full 1024x1024 image were not encouraging.
The computational complexity of the SoD technique, at present, is O(qn*m),
where q is the number of steps in variable space (the width, height, etc.
of our grid), n is the number of dimensions, and m is the total number
of pixels. Thus for a 20x20x20 ‘Component Space’, which is not really a
sufficient resolution, for our small subset of 10242 pixels,
roughly 8x1016 distance calculation/additions were needed. This
is not very appropriate for an ‘exploratory’ data analysis tool. The curse
of dimensionality is indeed difficult to overcome. Still, we feel that
we have shown that this method is worth exploring further. There are several
approaches to exploring/utilizing higher dimensions within this framework.
These are discussed in the next section: Multidimensional Clustering with
the SoD technique.
Maraba 1992 Full Image
PC1 vs. PC3 , p = -2.0
Highpass Filtered



PC1 vs. PC4 Highpass Filter



Multidimensional Clustering with the Sum of Distance Technique
In general, moving data analysis techniques into higher dimensions is usually either trivial, or very difficult. Performing a k-means clustering, on one single additional ‘dimension’ is as simple as adding up one more number for every point examined. Yet this ‘simplicity’ may be very misleading. Please examine the brightness-NPV plot. The NPV values ranged from –13 to +4 standard deviations. The interesting behavior all occurs at the extreme bottom of the range. There are almost no statistical models which would predict this occurrence, and a set of seed points spread ‘randomly’ throughout the data would almost surely miss these outliers entirely. The SoD method shows the outlying patterns pretty clearly, even without looking at the ‘residuals’ from a normal model as we did earlier in this study. How to take advantage of this additional structures is another matter. Working on an n-dimensional hypercube might be feasible, but since this space grows exponentially, and since visualizing higher dimensions becomes exceedingly difficult, the next section of this study will focus on ‘faking it in higher dimensions’ – that is using multidimensional relationships derived from multiple sets of bivariate plots.
We begin with our cluster centroids from the end of the previous section.
What we would like to do is extend this set of clusters out to four dimensions
to include the NPV and wetness components. We list the 12 clusters here,
ranked by shade and greenness. Some of these are redundant, as shown by
the chart below. What we would like to do is to add components to the ‘cloud’
and ‘bright soil’ classes so that they can differentiate better. By observing
the plots above we see that indeed, both the NPV and wetness plots show
2 clusters of outliers at high brightness ranges – thus high brightness,
high NPV, and low wetness is probably East Campina sand or a recently burned
area, while the opposite would hold true of clouds. We modify the table
below appropriately, based on the values indicated in the SoD plots. A
similar analysis of the plots above allows us to modify the darker elements
of our classes in a similar fashion. The middle values we assign a value
of 127, the average for each PC. Note that there are still interesting
aspects of the Greenness SoD plots listed above. The extremely high Greenness,
low NPV values might reveal Capoeria stands, but we let this pass for now,
defining a new set of clusters as:
| Brightness | Greenness | NPV | Wetness |
|
222.52
|
127
|
25 | 189 |
|
210.58
|
127
|
200 | 10 |
|
186.7
|
124
|
200 | 20 |
|
162.82
|
127
|
127 | 127 |
|
103.12
|
148.264
|
127 | 127 |
|
103.12
|
91.56
|
127 | 127 |
|
79.24
|
127
|
127 | 127 |
|
79.24
|
91.56
|
189 | 25 |
|
55.36
|
77.384
|
189 | 25 |
|
43.42
|
91.56
|
25 | 189 |
This set of values produced a set of clusters so bad that we are not
going to even include them in this paper. Clearly, teasing out appropriate
clusters from multiple bivariate plots must be done appropriately. To increase
our comprehension of this problem we ran our Sum of Distance algorithm
for a 20x20x20 cube and wrote a program to interactively explore this space.
A screen shot, which fails to capture the full excitement, shows the first
three components, after being transformed by a high pass filter.

Clearly, as we can see from this image, there is a significant outlying group in the PC1/PC2/PC3 combination. Figuring out how to map and understand these spaces remains a challenge, which until resolved, will impose an impediment to the contributions that the Sum of Distance method can make. There is however, hope. At present the code which generates the Sum of Distance matrix is exhaustive, calculating the distance to every data point for every grid point. This is massively inefficient if the power exponent p is less than zero, since each inverse distance value will approach zero quickly. This suggests that order-of-magnitude sppeed enhancements could be made by cleverly written code, if a maximum distance was defined for each grid element. For example, assume that a high resolution 4 dimensional, 50 x 50 x 50 x 50 cube was being generated, but that the distance limit constrained calculations such that each data point only contributed to 64 grid cells (3 on each side). The latter process would run in something like 2/1000th of the time required for the former.
If such a speed-up was effective, then higher dimensional datasets could be generated, and the spaces searched automatically for interesting points. These sets of higher dimensional points could in turn be used to perform clustering, or to help define the convex hull of a Spectral Mixing Model. Multidimensional Scaling can reduce higher order topologies with a minimum of distortion in the inter-point distances [Kruskal and Wish, 1978]. This technique might be reasonably applied to automatically generated higher dimensional sets of clusters, bringing them down to smaller sets of dimensions comprehensible to our mortal minds.
Summary
In this study we developed a new data analysis technique, the Sum of Distance method, and used it to examine and cluster a TM Scene from Maraba. The TM scenes were converted to Principal Component scores, and it was seen that these scores reiterated the basic covariance structure of the data. The principal components appeared in the ‘typical’ KT transform format – brightness, greenness, NPV and Wetness. Bivariate sum of inverse distance squared plots were then generated for PC1 and PC2. A model SoD plot was also generated for a theoretical a bivariate (independent) normal distribution with the same variance values. This theoretical model could be improved by including a covariance term. The residuals (SoD-SoDT) were examined, and showed large outliers, lying near the usual locations of the 3-em shade, vegetation, and soil members. Both the raw SoD grid and the residual SoD grid were passed through a 3 by 3 highpass filter. The highest 12 filtered residual points were then selected as cluster centroids. The resultant clustered image appears to be physically reasonable, except for some confusion regarding clouds and burned areas. This is perhaps to be anticipated, given the principal components which were used.
Our experiments with higher order clustering were evocative but frustrating. Since a straight forward application of the SoD method seemed to generate good clusters in the bivariate case, this would probably be the case in higher dimensions. One issue, which has heretofore been avoided, is the question of appropriate distance metrics. In this study, simple euclidean distance was used. It is not clear that this was better than using measures of statistical distance, where each deviation value is normalized by the standard deviation of the associated variable. As higher principal components are used this issue becomes increasingly important, since by definition they have lower and lower variance levels. It may be that using an MNF transform, or MeSMA produced fractions will obviate this issue.
Future Research
This present study is lacking in many respects, paramount of which is the need to tie the actual clusters more tightly with the biophysical character of the remotely sensed image. Do the clusters make sense? Do they map to physical reality, or are they chimerical? These can only be answered by examining this and other SoD clusterings more rigorously. The SoD program needs to be written much efficiently, allowing fast calculations of higher dimensional grids. An image display module need to be written and linked to the SoD displays, allowing users to circle SoD regions and have the corresponding pixels highlighted on an image. Together these two innovations would allow the technique to be applied to higher dimensional data.
We hope to apply the improved SoD clustering technique to each of the Maraba scenes analyzed by Roberts: 1984, 1989, 1992, 1994. Comparisons of the resultant annual SoD fields will hopefully offer a unique way to visualize and quantify landcover change. Additionally, interclass transition matrices will be generated, and the possibility of simulating landcover change will be examined.
Bibliography
Breiman, L, Freidman, J. H., Olshen, R.A., Stone, C. J. (1984). Classification
and Regression Trees. Wadsworth International Group, Belmont, CA.
Everitt, B. (1980). Cluster Analysis (second edition). Halsted, New York.
Hartigan, J. A. (1975). Clustering Algorithms. Wiley. New York, NY.
Lillesand, T.M., Kiefer, R. W., (1987). Remote Sensing and Image Interpretation, John Wiley and Sons, New York, NY.
Neft, D. S. (1966). Statistical Analysis for Areal Distributions, Regional Science Research Institute, Philadelphia, PA.
Richards, John A., (1993). Remote Sensing Digital Image Analysis, An Introduction. Springer-Verlag, Berlin.
Roberts, D.A., Batista, G., Pereira, J., Waller, E., and Nelson, B., (In Press), Remote Sensing Change Detection: Environmental Monitoring Application and Methods. Elvidge, C.D., and Lunetta, R. Eds, Ann Arbor Press, Ann Arbor, MI.
Tukey, J. (1977), W., Exploratory Data Analysis, Addison-Wesley.