GEOGRAPHY 275: SEMINAR IN GIS

What is geographic information?

information that is geographically referenced
every set of geographic information must contain some form of coordinate reference

some agreed method for identifying location on the Earth's surface

agreed in the sense that the method is known to all users of the information
information that links a point on the Earth's surface to some general property, and perhaps a time interval
the atom of geographic information <x, t, z>
<120W, 34N; 10am 4/8/01; air temperature 10C>
if you had enough such atoms you could build up a complete picture of the Earth

an infinity of atoms would be needed

compression methods
spatial aggregation
general property applies to a region

regional geography

area class maps

sampling
properties between samples can be interpolated
"Mt Everest is 8848m high"
What is information?
is it physical stuff or something else?

how do I know when I have it?

can I measure how much I have?

Shannon-Weaver information theory
the information content of a message

information content associated with a code is determined by the number of options

26 options for a single letter in the Roman alphabet
one letter resolves uncertainty by specifying one of 26 options
some letters occur more often than others
value in resolving uncertainty varies in inverse proportion to rate of occurrence

e is less valuable than z

if there are n symbols in a code, and the proportion of each symbol i is denoted by pi, then a measure of the information content of the code is its information statistic:
- (sum over i) pi log2 pi
this is maximum (the code is most efficient) when all symbols are used equally
in this case the information statistic is log2 n

this is the number of binary digits needed to represent the code

if there are 60,000 words in the English language, the most efficient code would have 16 bits (2 bytes)
English could be written in a language of pairs of 8-bit ASCII codes
instead of combinations of Roman letters ranging from 1 letter to ~16
Shannon-Weaver is a syntactic theory
concerned with the syntax of communication
sender and receiver both understand the code
no concern for semantics
what does the code mean?
Issues not resolved by Shannon-Weaver
what if the receiver does not understand the code?

what if the receiver already knew the content?

no measure of the impact of the message on the receiver
theory is dependent on media
a message in code

what about paper maps?

System
a container of information
human being, computer, map
A system contains a given item of information if it is capable of resolving a query to which that information is the answer
I know the temperature in SB if I can tell you the temperature in SB
a map or GIS "knows" the temperature in SB if it can tell you....
a map requires a trained user
definition is medium-independent

suppose a system was capable of resolving all geographic queries

Digital Earth

it would contain all of the information necessary to create a complete geographic Earth

is any query capable of determining whether such a system is Digital Earth or the real Earth?

compare the Turing test

can a query resolve whether a computer is a machine or a human being?

are there exceptions to this idea?
facts about the Earth that cannot be derived from any representation of the Earth?

naive geography

Information links already-familiar concepts
Mt Everest is 8848m high
meaningless to someone who does not understand the concept Mt Everest or the concept 8848m high

geographic information is meaningless to someone who does not understand the Earth's space-time frame

a concept is understood or already familiar if it is already linked to other concepts
Mt Everest to images, facts, coordinates

8848m to systems of measurement, definition of m, definition of high

information's semantic content is related to the number of links that already exist in the receiving system for the concepts at both ends of the link
and is zero if the link it contains already exists in the receiving system
the semantic value of an atom of information is determined by the receiving system
not by the sender, the channel, or the message
Example:
20 coordinate pairs representing the outline of California
each resolved to 1m (Mercator projection)
Shannon-Weaver perspective
each x coordinate specifies one of 40,000,000 options
each y coordinate specifies one of 20,000,000 options

51 bits in total

8 bytes
160 bytes for the whole message
How many distinct and new geographic queries does this information allow the receiving system to resolve?
how many queries of the form "is x,y in California"?
how many queries of the form "is z in California" given knowledge of the location of z
1. Infinity
but coordinates are only to 1m accuracy
answers should be also
2. The number of sq m in California
411,000,000,000

the system appears to be extremely knowledgeable

because it possesses a simple rule that allows a very short message to resolve an enormous number of apparently independent queries
the rule (or theory, generalization) compresses information at a very high rate
allows a small amount of information to be expanded
compare the rule for generating the infinite number of digits of pi
the rule:
the sequence of ordered coordinate pairs defines a polygon
all points within this polygon have the property specified in the message
Conclusions:
geographic information consists of atoms that link properties

information is possessed when queries can be resolved

information is not physical stuff
Shannon-Weaver information theory deals with the efficiency of codes

the semantic value of information is determined by the receiving system alone

by the number of new queries that can be answered using the information

by the number of links already existing for the pairs of concepts that the information links

geographic information has properties that allow answers to queries to be imputed, predicted, and otherwise obtained from rules
Tobler's law

uniform regions