1 : Principal Components Analysis

To get one zipped file which contains all of the image files needed to complete this exercise click here.

To get a zipped file containing each separate image and its documentation click on the image name.

H87TM1

H87TM2

H87TM3

H87TM4

H87TM5

H87TM6

H87TM7

Introduction

Principal Components Analysis is an image transformation technique that is used for a variety of purposes in Remote Sensing and GIS, including data compression and change analysis. In this exercise we will explore the nature of the Principal Components transformation and its application in data compression. By doing so, we will also be able to gain an appreciation for the fundamental information content of the different bands associated with a multi-spectral image.

The spectral bands of a multi-spectral image most commonly do not contain completely independent information. More likely, there will be some degree of correlation between bands, indicating that they share elements of information in common. To illustrate this, consider the example in Figure 1.

Figure 1 depicts the reflectance levels for a set of pixels by plotting their positions in what is commonly called band space (in this case, for an image with two spectral bands). Each of the axes represents reflectance in the spectral band indicated. Each image pixel can thus be plotted in this space by placing its location at the intersection of its reflectance level on each band. As can be noted, there is a significant amount of correlation between the bands (i.e., if a pixel has a high reflectance on Band 1, it is likely also to have a high reflectance on Band 2).

Since the bands in Figure 1 are correlated they do not each carry independent information. The fact that you have a good chance of being able to predict the reflectance of a pixel on one band from the reflectance on the other confirms this -- there is some degree of redundancy in the information they carry. It is this redundancy that Principal Components seeks to remove when it is used for the purpose of data compression. By removing redundant data, we are left with the same information, but with smaller volumes of data. Unfortunately, the process is not so clear cut. Unless they are perfectly correlated, some indepedent information will always exist in each band. Thus removing data after transforming to remove redundancy always implies some loss of information. However, as we shall see in this exercise, the information we typically reject in order to achieve data compression is often inconsequential or indeed, even undesirable.

The Principal Components transformation is a linear transformation that is closely related to Factor Analysis. Its structure can be described as a weighted linear combination -- i.e., it produces a new set of bands (called components) by multiplying each of the bands in the original image by a weight, and adding the results (note 1)

e.g. C = w1B1 + w2B2 + w3B3 .... wnBn

The weights in the transformations are collectively known as the eigenvectors. For any given number of original bands, an equal number of transformation equations can be produced, thus yielding an equivalent number of component images.

Perhaps the easiest way to understand the result of the Principal Components transformation is to think of the process as a mathematical determination of a new set of axes in band space such that the resulting images are :

  1. uncorrelated with one another; and
  2. ordered in terms of their explanatory power.

This is illustrated in Figure 1 in which Component 1 (CI) is oriented along the axis of largest variation. Component 2 (CII) will be, by definition, perpendicular to Component 1. Note this it is oriented in the direction of lesser variation. Because there were two input bands, there may also be two output components, and these components describe all of the information inherent in the original set (indeed, it is possible to reverse the transformation and thereby reconstruct the original band set).

Given this form of output, we can now see how data compression can be achieved. As a result of the transformation, redundancy in the data has already been removed (to test this, see if you feel you can predict the value on Component 2 given its value on Component 1). However, we still have the same amount of data (we started with two bands and end up with two). To achieve data compression we will have to get rid of one of more of the new component images. In this simple example, consider what would happen if you were to get rid of Component 2 and keep only Component 1. Since Component 1 explains the major element of variation, most pixels would be meaningfully related to each other (i.e., be distinguishable from one another with roughly the same relative difference). If it is known, for example, that Component 1 contains 90% of the original information then we will have kept 90% of the information while retaining only half of the original data.

The procedure just described may seem risky since we achieve data compression at the cost of some information loss. However, as will be seen in the next example, the decision about how to balance these two is often not difficult.

The Data Set

The data to be used for this exercise consist of seven bands of Landsat TM data for a location just to the west of Worcester Massachusetts in the USA. The images are intentionally small (just 72 columns by 86 rows) in order to allow the effects on individual pixels to be seen. The date of the image in September 10, 1987, just at the end of the summer season. The area is largely covered by deciduous forest, with distinctive stands of conifers (largely red and white pine) planted in the vicinity of reservoirs to reduce the input of tanins from deciduous species and thereby enhance the visual quality of the water supply. A small residential area is located to the north of the image.

Procedure

1.P1 Use the display system of your software to examine the image named H87TM4 as a grey level image. If you undertake any contrast stretching, use either a linear or a linear with saturation contrast procedure.(implementation note 1.P1)
The near-infrared band is often the most most informative band from a multi-spectral set. The water body farthest to the west and that farthest south are reasonably deep reservoirs while the other lakes are shallow ponds. To the north of the image is a small residential area. Otherwise the area is predominantly forested. To the north of both reservoirs can be found distinctive conifer plantations (also a small one to the immediate west of the southern reservoir). Because of their needle leaf structure these conifers (largely white and red pine) appear darker on this band than the more prevalent deciduous trees.

1.P2 Now examine all of the other bands in this set using a similar display procedure. The names for the bands range from H87TM1 for TM Band 1 to H87TM7 for TM band 7. If your system permits it, display them simultaneously on the screen. (implementation note 1.P2)

1.Q1 Comparing the seven bands to each other, visually estimate the two pairs of images that appear to be the most alike. Then visually estimate the two pairs of images that appear to be the most dissimilar. Which images are these? Overall, does it appear that there is much redundancy? Make a very rough guess about what proportion of the data in these seven bands that is truely unique(note 2) (don't worry about making an incorrect or imprecise guess)
1.P3 Now run your software's Principal Components Analysis routine. If you are given a choice between Standardized and Unstandardized, choose Unstandardized. Indicate that you wish create 7 component images and when required, specify the names of the input bands: H87TM1 through H87TM7. You may be offered options for scaling of these output images -- simply choose whatever defaults are offered. Then print out any tabular results or summary statistics it produces. (implementation note 1.P3)
Your software should offer several tables of information about the transformation undertaken. These might include:

COR MATRIX

h87tm1

h87tm2

h87tm3

h87tm4

h87tm5

h87tm6

h87tm7

h87tm1

1.000000

0.865584

0.896827

0.212218

0.528601

0.668724

0.727225

h87tm2

0.865584

1.000000

0.907277

0.448274

0.732874

0.629802

0.855495

h87tm3

0.896827

0.907277

1.000000

0.215375

0.598823

0.690495

0.814087

h87tm4

0.212218

0.448274

0.215375

1.000000

0.827869

0.078081

0.565691

h87tm5

0.528601

0.732874

0.598823

0.827869

1.000000

0.403770

0.896793

h87tm6

0.668724

0.629802

0.690495

0.078081

0.403770

1.000000

0.604335

h87tm7

0.727225

0.855495

0.814087

0.565691

0.896793

0.604335

1.000000

COMPONENT

% var.

eigenval.

C1

86.53

890.14

C2

11.16

114.83

C3

1.51

15.53

C4

0.37

3.85

C5

0.17

1.79

C6

0.17

1.78

C7

0.07

0.76

LOADING

h87tm1

h87tm2

h87tm3

h87tm4

h87tm5

h87tm6

h87tm7

C1

0.366946

0.599055

0.398372

0.972001

0.935887

0.226973

0.734338

C2

0.722886

0.638897

0.801506

-0.231374

0.333748

0.680047

0.646630

C3

0.531826

0.369463

0.372870

0.041073

-0.110188

0.329322

0.011719

C4

-0.089418

-0.084987

-0.090066

0.001944

-0.001728

0.638006

0.000000

C5

-0.212893

0.166828

0.176656

0.001635

-0.005435

0.015140

-0.006569

C6

-0.047863

-0.034201

-0.037718

0.004109

-0.022295

-0.036257

0.207079

C7

-0.000219

-0.238578

0.104731

0.001197

-0.000563

-0.000659

-0.000684

1.Q2 Using the correlation matrix, what pair of original bands were the most correlated? Which ones were the least correlated? Compare these to your original guesses in 1.Q1.

1.Q3 Using the loadings chart, which of the original bands is most correlated with Component 1? What is the level of correlation?

1.P4 Use your display system to bring the image for Component 1 onto the screen (again as a grey level image with only a simple linear contrast stretch). Then bring up the image you identified in 1.Q3 in a similar fashion for comparison. Notice that they look almost identical. (implementation note 1.P4)
1.Q4 What is the percentage of variance explained by Component 1?

1.Q5 Using the loadings chart, which of the original bands is most correlated with Component 2? What is the level of correlation?

1.P5 Use your display system to bring the image for Component 2 onto the screen (again as a grey level image with only a simple linear contrast stretch). Then bring up the image you identified in 1.Q5 in a similar fashion for comparison. (implementation note 1.P5)

1.Q6 What is the percentage of variance explained by Component 2? What is the combined percentage of variance explained by Components 1 and 2? Since percent of variance can be equated with information (in an Information Theoretic sense), what proportion of information resides in the remaining 5 components?
1.P6 Let's now jump to the other end of the sequence. Use your display system to bring the image for Component 7 onto the screen (again as a grey level image with only a simple linear contrast stretch). (implementation note 1.P6)
1.Q7 How would you describe the pattern you see on the screen? Would you describe this as information? What proportion of variance (i.e., information) does this component describe?
This is a fascinating image! Notice that there are some elements which appear to have a somewhat systematic horizontal pattern. This image most probably represents a combination of system noise and atmospheric interference. Clearly there is little here that we might wish to save. Thus discarding the band entirely would not only be of little concern, but would in fact most likely be considered a benefit (since we are discarding noise). The percentage of variance explained by this component is indicative of information only in an Information Theoretic sense, where information is equated with variation. However, this is not meaningful information. Therefore we can discard it without concern.

1.P7 Now use your display system to bring the images for Components 6, 5, 4 and 3 onto the screen (again as a grey level image with only a simple linear contrast stretch). (implementation note 1.P7)

1.Q8 What is the percent variance explained by each of these images? Moving from the component which explains the least to that which explains the most, at what point do you start to see evidence of real geographic features?

1.Q9 For purposes for data compression, one wishes to minimize the loss of geographically meaningful information which maximizing the amount of data reduced. Which components do you feel should be kept, and which ones should be rejected?

1.Q10 Given your choices in 1.Q9, what proportion of the data have you kept (i.e., what proportion of the original number of bands). What is the proportion of variation (i.e., information) retained?

Observations

As we have seen in this exercise, multi-spectral images often possess a significant amount of redundancy in the data carried by each band. The Principal Components transform allows us to remove this redundancy, and thereby to select an informationally-rich subset of the components produced -- an effective means of data compression.

During the course of this exercise, we have also observed several other important points. First, we saw that one band (the near-infrared) carried an enormous amount of the geographically meaningful information inherent to this data set. This will not always be the case, but is commonly so in vegetated landscapes (since leaf structure differences show up in this wavelength region most, and the contrast of vegetation with water and non-vegetated surfaces is strong). It is for this reason that the near-infrared channel is often a good one to examine if you are only able to view a single band.

Next in importance was the red wavelength band. Again, this is a result of the fact that the landscape examined here is highly vegetated. The red band is frequently called a chlorophyll absorption band since it is in this area of the electromagnetic spectrum that energy is absorbed most by chlorophyll for the purpose of photosynthesis.

We also noted that Components 1 and 2 carried almost 98% of all the information in the component set. Since these are most heavily correlated with TM bands 4 and 3 respectively, we can safely assume that in vegetated landscapes, these two bands will carry most of the geographically meaningful information. You can begin to see why counterparts of these two wavelength bands are found in SPOT multi-spectral imagery.

Finally, we noted that the PCA procedure was very effective in feeding out the noise in the image. Since the transformation can be reversed, this would suggest that we should be able to do so without these elements by simply forcing the reverse coefficients for the components concerned to zero before transformation. Also, note that it is this tendency to order information elements into meaningful groups that underlies the use of PCA in such areas as Change and Time Series Analysis.

Credits

This exercise was written by Ron Eastman at Clark University. The data were provided by EOSAT Corporation.

References

The Principal Components transform is described in many Remote Sensing texts. However, for an excellent intermediate-level discussion, you may wish to consult:

Jenson, J.R., (1986) Introductory Digital Image Processing: A Remote Sensing Perspective, (Prentice Hall: Englewood Cliffs, NJ).

For a more detailed account illustrating the mathematics of the transformation process, consult:

Richards, J.A., (1986) Remote Sensing Digital Image Processing: An Introduction, (Springer-Verlag: Berlin).

For examples of the use of Principal Components for Change and Time Series Analysis, see:

Lodwick, G.D., (1979) "Measuring Ecological Changes in Multitemporal Landsat Data Using Principal Components", Proceedings, 13th International Symposium on Remote Sensing of the Environment, Vol. 2, 1131-1141.

Eastman, J.R., and Fulk, M., (1993) "Long Sequence Time Series Evaluation using Standardized Principal Components", Photogrammetric Engineering and Remote Sensing, 59, 8, 1307-1312.

Finally, for a more general discussion of the underlying logic of Principal Components Analysis and its relationship to Factor Analysis, see:

Johnston, R.J., (1980) Multivariate Statistical Analysis in Geography, (Longman : London).

Note 1.

Technically, this logic describes what is known as Unstandardized Principal Components (also known as the Karhunen-Loeve or Hotelling Transform). There is second variant known as Standardized Principal Components in which the input data are effectively converted into standard scores (by subtracting the mean and dividing the result by the standard deviation) before transformation.

Note 2.

If you have trouble with this question, consider a case with four bands in which two appear almost identical and a third is quite similar, while the fourth is completely different. The image that looks completely different clearly carries new information and thus might be given a weight of 1. Only one of the virtually identical pair carries full information. Therefore give one of them a weight of 1 and the other a weight of 0.1. Finally, the image which likes fairly similar to the virtually identical pair only carries a limited amount of new information (say 25%). Therefore give it a weight of 0.25. Adding the weights one gets 1 + 1 + 0.1 + 0.25 = 2.35. Since there are four bands of data, this constitutes only 2.35 / 4 = 0.59. Very roughly, then, we might estimate that only 60% of the data offer unique information.

Return to the RSCC Homepage