LECTURE 12 -
HIERARCHICAL DATA STRUCTURES
A.
INTRODUCTION
B.
INDEXING PIXELS
C.
THE QUADTREE
D.
VARIANTS OF QUADTREES
E.
ADVANTAGES OF HIERARCHICAL DATA STRUCTURES
LECTURE 12 - HIERARCHICAL DATA STRUCTURES
A.
INTRODUCTION
-
different scan orders produce only small differences in compression
-
the major reason for interest in Morton and other hierarchical
scan orders is for faster data access
-
the amount of information shown on a map varies enormously
from area to area, depending on the local variability
-
it would make sense then to use rasters of different sizes
depending on the density of information
-
large cells in smooth or unvarying areas, small cells in
rugged or rapidly varying areas
-
unfortunately unequal-sized squares won't fit together ("tile
the plane") except under unusual circumstances
-
one such circumstance is when small squares nest within large
ones
-
there are, however, some methods for compressing raster data
that do allow for varying information densities
B.
INDEXING PIXELS
-
consider the 16 by 16 array in which just one cell is different
-
notation: row and column numbering starts at 0
-
thus the odd cell is at row 4, column 7
Procedure
Decoding
locations
-
the conversion to row and column is the same as for decoding
Morton numbers except that in this case the code is in base 4
-
in the example the lone B pixel is assigned code 0311
1. convert the code to base 2
-
hint: every base 4 digit converts to a pair of base 2 digits
-
thus 0311 becomes 00110101
2. separate the bits to get:
-
row 0100 = 4
-
column 0111 = 7
-
so the numbering system is just the Morton numbering of blocks,
expressed in base 4
-
however, sequence and data compression are not the most useful
aspects of this concept
C.
THE QUADTREE
-
can express this sequencing as a tree
-
the top is the entire array
-
at each level there is a four-way branching
-
each branch terminates at a homogeneous block
-
see illustration
-
the term quadtree is used because it is based on a rule of
4
-
each of the terminal branches in the tree (the ones having
values) is known as a leaf
-
in this case there are 13 leafs or homogeneous square blocks
Coding
quadtrees
-
to store this tree in memory, need to decide what to store
in each memory location
-
there are many ways of storing quadtrees, but they all share
the same basic ideas
-
one way is to store in each memory location EITHER: 1. the
value of the block (e.g. A or B), or 2. a pointer to the first of the four
"daughter" blocks at the next level down
-
all four daughter blocks of any parent always occur together
-
thus, the quadtree might be stored in memory as:
Position: 1
2 3 4
5 6 7
8 9 10
11 12 13
14 15 16
17
Contents: 2
6 A A
A A A A
10 A 14
A A A
B A A
(level): 0
1 1 1
1 2 2
2 2 3
3 3
3 4
4 4
4
-
the content of position 1 is a pointer indicating that the
map is subdivided into four blocks whose contents can be found starting
at position 2
-
position 2 indicates that the four parts of the 0 block can
be found beginning at position 6
-
positions 3, 4 and 5 indicate that the other three level
1 blocks are all A and are not further subdivided
-
see illustration
Accessing
data through a quadtree
-
consider two ways in which this quadtree may be accessed:
1. find all parts of the map with a given value 2. determine the contents
of a given pixel
-
notation: if the array has 2n by 2n
pixels
-
there are n possible levels in the tree, or n+1 if we count
the top level (level 0)
-
use m for the number of leafs
1. to find the parts of a map with a given value we must
examine every leaf to see if its value matches the one required
-
this requires m steps as there are m leafs
2. to find the contents of a given pixel, start at the top
of the tree
-
if the entire map is homogeneous, stop as the contents of
the pixel are known already
-
if not, follow the branch containing the pixel
-
do know which branch to follow:
-
take the row and column numbers, write them in binary, interleave
the bits, and convert to base 4
-
e.g. row 4, column 7 converts to 0311
-
at each level, use the appropriate digit to determine which
branch to follow
-
e.g. for 0311, at level 0 follow branch 0, at level 1 follow
branch 3, etc.
-
in the worst case, may have to go to level n to find the
contents of the pixel, so the number of steps will be n
Comparison
of different data structures
-
summary of the work needed to perform the two types of queries:
Option Find
parts with given value Find contents
of pixel
Quadtree
m
n
Row by row
22n (a)
1 (b)
Row by row
run encoded
m (c)
m (d)
Morton
run encoded
m (c)
m (d)
-
notes: (a) must examine every pixel in the array
(b) can calculate the position of every pixel and access
it directly
(c) the number of runs will be approximately the same
as the number of leafs
-
although the orders are different, we decided earlier that
order made little difference to the number of runs
(d) each run must be examined to see if it contains the pixel
-
thus, quadtree structures offer definite advantages over
other systems for queries
D.
VARIANTS OF QUADTREES
-
the octree (or octtree) is a three-dimensional version of
the quadtree, based on a rule of 8
-
the cube is divided recursively into eight pieces
-
octrees are useful for 3D data, particularly in mining and
geology, and in medical imaging
-
global data presents significant problems
-
we might use a projection such as the Mercator, and represent
the data as a raster on this projection
-
the area and shape of the area represented by each pixel
would be significantly distorted, particularly near the poles
-
the relationships between neighboring pixels would be distorted
as well
-
in reality all of the pixels in the top and bottom rows are
neighbors of each other across the poles
-
these problems create serious distortions in models based
on such data
-
a more suitable approach devised by Geoffrey Dutton is as
follows:
-
project the globe onto an octahedron, consisting of eight
triangles
-
the vertices of the octahedron are at the poles, and 90 degrees
apart around the equator
-
number these from 0 through 7
-
each triangle is recursively divided using a rule of 4 into
4 smaller triangles, by connecting the midpoints of the edges
-
number these 0 (central triangle), 1 (vertically above or
below), 2 (diagonally to the left) and 3 (diagonally to the right)
-
level 20 in this scheme has a resolution (triangle size)
of about 1 m on the earth's surface
-
its address requires one base 8 digit and 20 base 4 digits,
or 43 (binary) bits
E.
ADVANTAGES OF HIERARCHICAL DATA STRUCTURES
-
both coordinates are in a single address
-
a single number indicates a 2D location
-
every square meter on the earth's surface has a consistent
21-digit address
-
resolution is automatically known from the length of the
address
-
in the global scheme described above, a 21-digit address
has 1 m resolution, for 1 km resolution we need only a 13-digit address
in comparison, a lat/long address needs two numbers, and
it is not always easy to tell resolution from the way the numbers are presented