Sie sind auf Seite 1von 45

Chapter 4

GIS Data Model Raster Data model

A. THE DATA MODEL (Raster Data Model)


geographical variation in the real world is infinitely complex the closer you look, the more detail you see, almost without limit it would take an infinitely large database to capture the real world precisely data must somehow be reduced to a finite and manageable quantity by a process of generalization or abstraction geographical variation must be represented in terms of discrete elements or objects the rules used to convert real geographical variation into discrete objects is the data model Tsichritzis and Lochovsky (1977) define a data model as "a set of guidelines for the representation of the logical organization of the data in a database... (consisting) of named logical units of data and the relationships between them. current GISs differ according the way in which they organize reality through the data model each model tends to fit certain types of data and applications better than others the data model chosen for a particular project or application is also influenced by : - the software available -the training of the key individuals historical precedent

there are two major choices of data model - raster and vector raster model divides the entire study area into a regular grid of cells in specific sequence the conventional sequence is row by row from the top left corner each cell contains a single value . is space-filling since every location in the study area corresponds to a cell in the raster one set of cells and associated values is a layer there may be many layers in a database, e.g. soil type, elevation, land use, land cover vector model uses discrete line segments or points to identify locations discrete objects (boundaries, streams, cities) are formed by connecting line segments vector objects do not necessarily fill space, not all locations in space need to be referenced in the model a raster model tells what occurs everywhere - at each place in the area a vector model tells where everything occurs - gives a location to every object conceptually, the raster models are the simplest of the available data models

B. CREATING A RASTER consider laying a grid over a geologic map create a raster by coding each cell with a value that represents the rock type which appears in the majority of that cells areas when finished, every cell will have a coded value in most cases the values that are to be assigned to each cell in the raster are written into a file, often coded in ASCII this file can be created manually by using a word processor, database or spreadsheet program or it can be created automatically then it is normally imported into the GIS so that the program can reformat the data for its specific processing needs there are several methods for creating raster databases

B. CREATING A RASTER Cell by cell entry direct entry of each layer cell by cell is simplest entry may be done within the GIS or into an ASCII file for importing each program will have specific requirements the process is normally tedious and time-consuming layer can contain millions of cells average Landsat image is around 7.4 x 106 pixels, average TM scene is about 34.9 x 106 pixels

run length encoding can be more efficient values often occur in runs across several cells this is a form of spatial autocorrelation - tendency for nearby things to be more similar than distant things data entered as pairs, first run length, then value e.g. the array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 0 1 1 1 1 would be entered as 3 0 21203120311041 this is 16 items to enter, instead of 20 in this case the saving is 20%, but much higher savings occur in practice imagine a database of 10,000,000 cells and a layer which records the county containing each pixel suppose there are only two counties in the area covered by the database each cell can have one of only two values so the runs will be very long only some GISs have the capability to use run length encoded files Digital data much raster data is already in digital form, as images, etc however, resampling will likely be needed in order that pixels coincide in each layer because remote sensing generates images, it is easier to interface with a raster GIS than any other type elevation data is commonly available in digital raster form from agencies such as the US Geological Survey 6

C. CELL VALUES Types of values the type of values contained in cells in a raster depend upon both the reality being coded and the GIS different systems allow different classes of values, including : whole numbers (integers) real (decimal) values alphabetic values many systems only allow integers, others which allow different types restrict each separate raster layer to a single kind of value if systems allow several types of values, e.g. some layers numeric, some non-numeric, they should warn the user against doing unreasonable operations e.g. it is unreasonable to try to multiply the values in a numeric layer with the values in a non- numeric layer integer values often act as code numbers, which "point" to names in an associated table or legend e.g. the first example might have the following legend identifying the name of each soil class 0 : " =no class" 1 = "fine sandy loam" 2 = "coarse sand" 3 = "gravel "

One value per cell each pixel or cell is assumed to have only one value this is often inaccurate - the boundary of two soil types may run across the middle of a pixel in such cases the pixel is given the value of the largest fraction of the cell, or the value of the middle point in the cell note, however, a few systems allow a pixel to have multiple values the NARIS system developed at the University of Illinois in the 1970s allowed each pixel to have any number of values and associated percentages e.g. 30% a, 30% b, 40% c D. MAP LAYERS the data for an area can be visualized as a set of maps of layers a map layer is a set of data describing a single characteristic for each location within a bounded geographic area only one item of information is available for each location within a single layer multiple items of information require multiple layers on the other hand, a topographic map can show multiple items of information for each location, within limits e.g. elevation (contours), counties (boundaries), roads, railroads, urbanized areas these would be 5 layers in a raster GIS typical raster databases contain up to a hundred layers each layer (matrix, lattice, raster, array) typically contains hundreds or thousands of cells important characteristics of a layer are its resolution, orientation and zone

Resolution in general, resolution can be defined as the minimum linear dimension of the smallest unit of geographic space for which data are recorded in the raster model the smallest units are generally rectangular (occasionally systems )have used hexagons or triangles these smallest units are known as cells, pixels note: high resolution refers to rasters with small cell dimensions high resolution means lots of detail, lots of cells, large rasters, small cells Orientation the angle between true north and the direction defined by the columns of the raster Zones each zone of a map layer is a set of contiguous locations that exhibit the same value :these might be ownership parcels political units such as counties or nations lakes or islands individual patches of the same soil or vegetation type there is considerable confusion over terms here other terms commonly used for this concept are patch, region, polygon each of these terms, however, have different meanings to individual users and different definitions in specific GIS packages in addition, there is a need for a second term which refers to all individual zones that 10

have the same characteristics class is often used for this concept note that not all map layers will have zones, cell contents may vary continuously over the region making every cell's value unique e.g. satellite sensors record a separate value for reflection from each cell )major components of a zone are its value and location(s Value is the item of information stored in a layer for each pixel or cell cells in the same zone have the same value Location generally location is identified by an ordered pair of coordinates (row and column numbers) that unambiguously identify the location of each unit of geographic space )in the raster (cell, pixel, grid cell usually the true geographic location of one or more of the corners of the raster is also known E. EXAMPLE ANALYSIS USING A RASTER GIS Objective identify areas suitable for logging :an area is suitable if it satisfies the following criteria )is Jackpine (Black Spruce are not valuable is well drained (poorly drained and waterlogged terrain cannot support )equipment, logging causes unacceptable environmental damage is not within 500 m of a lake or watercourse (erosion may cause deterioration of 11 )water quality

12

Procedure recode layer 2 as follows, creating layer 4 y if value 2 (Jackpine n if other value( recode layer 3 as follows, creating layer 5 y if value 2 (good n if other value ( spread the lake on layer 1 by one cell (500 m), creating layer 6 recode the spread lake on layer 6 as follows, creating layer 7 n if in spread lake y if not overlay layers 4 and 5 to obtain layer 8, coding as follows y if both 4 and 5 are y n otherwise overlay layers 7 and 8 to obtain layer 9, coding as follows y if both 7 and 8 are y n otherwise Operations used recode overlay spread
13

Raster GIS capability

14

15

16

17

Slopes and aspects if the values in a layer are elevations, we can compute the steepness of slopes by looking at the difference between a pixel's value and those of its adjacent neighbors the direction of steepest slope, or the direction in which the surface is locally "facing", is called its aspect aspect can be measured in degrees from North or by compass points - N, NE, E slope and aspect are useful in analyzing vegetation patterns, computing energy balances and modeling erosion or runoff aspect determines the direction of runoff this can be used to sketch drainage paths for runoff E. OPERATIONS ON EXTENDED NEIGHBORHOODS Distance calculate the distance of each cell from a cell or the nearest of several cells each pixel's value in the new layer is its distance from the given cell(s) Buffer zones buffers around objects and features are very useful GIS capabilities e.g. build a logging buffer 500 m wide around all lakes and watercourses buffer operations can be visualized as spreading the object spatially by a given distance the result could be a layer with values: 1 if in original selected object 2 if in buffer 0 if outside object and buffer 18 applications include noise buffers around roads, safety buffers around hazardous facilities

in many programs the buffer operation requires the user to first do a distance operation, then a reclassification of the distance layer the rate of spreading may be modified by another layer representing "friction" e.g. the friction layer could represent varying cost of travel this will affect the width of the buffer - narrow in areas of high friction, etc. Visible area or "viewshed" given a layer of elevations, and one or more viewpoints, compute the area visible from at least one viewpoint e.g. value = 1 if visible, 0 if not useful for planning locations of unsightly facilities such as smokestacks, or surveillance facilities such as fire towers, or transmission facilities F. OPERATIONS ON ZONES (GROUPS OF PIXELS) Identifying zones by comparing adjacent pixels, identify all patches or zones having the same value give each such patch or zone a unique number set each pixel's value to the number of its patch or zone Areas of zones measure the area of each zone and assign this value to each pixel instead of the zone's number alternatively output may be in the form of a summary table sent to the printer or a file

19

Perimeter of zones measure the perimeter of each zone and assign this value to each pixel instead of the zone's number alternatively output may be in the form of a summary table sent to the printer or a file length of perimeter is determined by summing the number of exterior cell edges in each zone note: the values calculated in both area and perimeter are highly dependent upon the orientation of objects (zones) with respect to the orientation of the grid however, if boundaries in the study area do not have a dominant orientation such errors may cancel out Distance from zone boundary measure the distance from each pixel to the nearest part of its zone boundary, and assign this value to the pixel boundary is defined as the pixels which are adjacent to pixels of different values Shape of zone measure the shape of the zone and assign this to each pixel in the zone one of the most common ways to measure shape is by comparing the perimeter length of a zone to the square root of its area by dividing this number by 3.54 we get a measure which ranges from 1 for a circle (the most compact shape possible) to 1.13 for a square to large numbers for long, thin, wiggly zones 20 commands like this are important in landscape ecology

Perimeter of zones helpful in studying the effects of geometry and spatial arrangement of habitat e.g. size and shape of woodlots on the animal species they can sustain e.g. value of linear park corridors across urban areas in allowing migration of animal species G. COMMANDS TO DESCRIBE CONTENTS OF LAYERS it is important to have ways of describing a layer's contents particularly new layers created by GIS operations particularly in generating results of analysis One layer generate statistics on a layer e.g. mean, median, most common value, other statistics More than one layer compare two maps statistically e.g. is pattern on one map related to pattern on the other? e.g. chi-square test, regression, analysis of variance Zones on one layer generate statistics for the zones on a layer e.g. largest, smallest, number, mean area
21

H. ESSENTIAL HOUSEKEEPING list available layers input, copy, rename layers import and export layers to and from other systems other raster GIS input of images from remote sensing system other types of GIS identify resolution, orientation "resample" changing cell size, orientation, portion of raster to analyze change colors provide help to the user exit from the GIS (the most important command of all!)

22

INTRODUCTION Why use raster ? data are acquired in that form remote sensing, photogrammetry or scanning is a common way of structuring digital elevation data . raster assumes no prior knowledge of the phenomenon, sampling is done uniformly knowledge of variability would allow us to sample more heavily in areas of high variability (rugged terrain) and less heavily in smooth terrain data are often converted to raster as a common format for data interchange for merging with remote sensing images or DEMs raster algorithms are often simpler and faster e.g. buffer zone generation is simpler in raster raster may be appropriate if the solution requires uniform resolution, e.g. in finding optimum routes for linear features such as power lines, or in inferring the locations of stream networks from DEMs Objectives there are many options for storing raster data (many data structures( some are more economical than others in use of storage some are more efficient in access and processing speed

23

B. STORAGE OPTIONS FOR RASTER DATA by convention, raster data is normally stored row by row from the top left this is the European/North American reading order is also the order of scan of a TV image example the image A A A A A B B B A A B B A A A B would be stored in 16 memory positions, one for each pixel, in the sequence :A A AAA B B B AA B B AAA B What if there is more than one layer ? two options .1 :store the layers separately this is the normal practice . 2store all information for each pixel together this requires extra space to be allocated initially within each pixel's storage location for layers which might be created later during analysis this is usually difficult to anticipate What do raster systems store in each pixel ? some allow only an integer, in a fixed range, e.g. -127 to +127 (1 byte per pixel) or 32767 to +32767 (2 bytes per pixel( some allow integers, real (decimal) numbers and mixed alphabetic letters and numbers in each pixel in this case it helps if the system keeps track of what type of data is stored in each layer and stops the user doing wrong types of analysis on the data
24

Example :vegetation data is recorded as a class (A thru G) in each pixel elevation data is recorded as a decimal number (e.g. 100.3 m ( the system should not allow the user to add the pixel values from the two layers (A + 100.3) or perform any other kind of arithmetic operation on the vegetation data Raster/Vector combinations many raster-based systems allow vector input Example : a polygon, defined by its vertices, is input convert this to a raster e.g. assign 1 to all pixels inside the polygon, 0 to all outside some forms of data are really hybrids of raster and vector : Freeman chain code has finite resolution based on pixels (raster-like) but defines lines and the boundaries of objects (vector-like ) a raster can be used to define objects at fixed resolution if every pixel is given an object number instead of a value the object numbers are pointers to an attribute table :Raster ObjectAttributes 23 24 23 23 23A 100.0 23 23 24 24 24 B 101.1 23 23 24 24 23 23 23 24 this gives us an object with its attributes, plus a list of pixels associated with the object instead of the object's coordinates in this sense, a raster is a finite resolution geometry rather than an alternative way of structuring spatial data
25

C. RUN ENCODING geographical data tends to be "spatially auto-correlated", meaning that objects which are close to each other tend to have similar attributes Tobler expressed it this way: "All things are related, but nearby things are more related than distant things " because of this principle, we expect neighboring pixels to have similar values so instead of repeating pixel values, we can code the raster as pairs of numbers (run length, value( e.g. instead of 16 pixel values in original raster matrix, we have 4 :A 1A 3B 2A 2B 3A 1B produces 7 integer/value pairs to be stored if a run is not required to break at the end of each line we can compress this further : 5A 3B 2A 2B 3A 1B = 6 pairs however, it helps to limit the possible size of the run so that we can use less space to store the run length, as the amount of space allocated must be sufficient for the maximum run length Problems layers now have different lengths depending on the amount of compression (lengths of runs ( storing all layers together for each pixel now makes no sense run encoding would be little use for DEM data or any other type of data where neighboring pixels almost always have different values
26

D. SCAN ORDER .1 Row order described already are there better ways of ordering the raster than row by row from the top left ?other orders may produce greater compression 2 Row prime order (Boustrophedon ( suppose we reverse every other row :diagram this has the charming name boustrophedon from the Greek for "how an oxen plows a field " avoids a long jump at the end of each row, so perhaps the raster would produce fewer runs and thus greater compression this order is used in the Public Land Survey System: the sections in each township are numbered in this way one the original raster it results in4 :A 3B 3A 3B 3A = 5 runs 3 Morton order Morton order is the basis of many efforts to reduce database volume named for Guy Morton who devised it as a way of ordering data in the Canada Geographic Information System however, this way of ordering or scanning a raster was well known long before Morton it is associated with the names of several mathematicians and geometers: Hilbert,

27

Peano, and Koch coincidentally, Morton is the name of the lower left corner county in Kansas the strategy is to exhaust each area of the map in sequence, whereas row by row order scans from one side to the other this minimizes the number of large jumps diagram this is one of several hierarchical ordering systems it is built up level by level, repeating the same pattern at each level, as follows 3 2 6 3 2 61 60 57 56 45 44 41 40 13 12 9 8 1 0 63 62 59 58 47 46 43 42 15 14 11 10 30 27 26 15 14 11 10 53 52 49 48 37 36 33 32 5 4 1 0 55 54 51 50 39 38 35 34 7 29 28 25 24 13 12 9 8 31 21 20 17 16 5 4 1 0 23 22 19 18 7 6 3 2 it is only valid for square arrays where the numbers of rows and columns are powers of 2 e.g. 2x2, 4x4, 8x8, 16x16, 32x32, 64x64, etc . how does it do on our 4x4 array5 ?A 3B 1A 1B 2A 2B 2A = 7 runs which is as long as row by row compression 4Peano scan (also Pi-Order or Hilbert( the Peano scan or Pi-order is like boustrophedon in always moving to a neighboring pixel diagram
28

E. DECODING SCAN ORDERS since Morton and Peano orders are useful but complex, two types of questions arise when they are used : 1 What are the row and column numbers for a given pixel ? 2 What is the position in the scan order for a given row and column number ? Method start by numbering the rows and columns from 0 up 4 1 0 0 7 6 3 2 1 13 12 9 8 2 15 14 11 10 3 : 5 3210 -row 2, column 3 is position 13 in the Morton sequence 1- How to go from row 2, column 3 to Morton sequence? a. convert row and column numbers to binary representations : 16s 8s 4s 2s 1s 1 0 row 2 1 1 column 3 b. interleave the bits, alternating row and column bits (called bit interleaving( 1 0 1 1row col c. evaluate this sequence of bits as a binary number : Answer: 8 + 4 + 1 = 13 so to get the Morton position, interleave the bits of the row and column number . How to find row and column number from Morton position 9? a. convert the position number to a binary number16 s 8s 4s 2s 1s 1 0 0 1 (8 + 1 = 9) row col b. separate the bits : 0 1row = 2 0 1 col = 1 Generalization can express the row and column number to any base, not just base 2 (binary), and including mixtures of bases example: row 6, column 15, using base 4 instead of base 264 64s 16s 4s 1s 1 2 row 6 = 1x4 + 2x1 3 3 col 15 = 3x4 + 3x1 interleaving : 29 1 3 2 3 1x64 + 3x16 + 2x4 + 3x1 = 123 answer: row 6 column 15 is position 123

HIERARCHICAL DATA STRUCTURES A. INTRODUCTION different scan orders produce only small differences in compression the major reason for interest in Morton and other hierarchical scan orders is for faster data access the amount of information shown on a map varies enormously from area to area, depending on the local variability it would make sense then to use rasters of different sizes depending on the density of information large cells in smooth or unvarying areas, small cells in rugged or rapidly varying areas unfortunately unequal-sized squares won't fit together ("tile the plane") except under unusual circumstances one such circumstance is when small squares nest within large ones there are, however, some methods for compressing raster data that do allow for varying information densities B. INDEXING PIXELS consider the 16 by 16 array in which just one cell is different notation: row and column numbering starts at 0 thus the odd cell is at row 4, column 7
30

Procedure begin by dividing the array into four 8x8 quadrants, and numbering them 0, 1, 2 and 3 as in the Morton order quads 1, 2 and 3 are homogeneous (all A) quad 0 is not homogeneous, so we divide only it into four 4x4 quads these are numbered 00, 01, 02 and 03 because they are partitions of the 8x8 quad 0 of these, 00, 01 and 02 are homogeneous, but 03 is divided again into 030, 031, 032 and 033 now only 031 is not homogeneous, so it is divided again into 0310, 0311, 0312 and 0313

what we have done is to recursively subdivide using a rule of 4 until either: a square is homogeneous or

we reach the highest level of resolution (the pixel size) this allows for discretely adaptable resolution where each resolution step is fixed this concept is related to the use of Morton order for run encoding if we had coded the raster using Morton order, each homogeneous square would have been a run 8x8 squares are runs of 64 in Morton order, 4x4 are runs of 16, etc the run encoded Morton order would have been: 16A 16A 16A 4A 1A 1B 1A 1A 4A 4A 64A 64A 64A if we allow runs to continue between blocks we could reduce this to: 53A 1B 202A i.e. a homogeneous block of 2m by 2m pixels is equivalent to a Morton run of 22m pixels
31

Decoding locations the conversion to row and column is the same as for decoding Morton numbers except that in this case the code is in base 4 in the example the lone B pixel is assigned code 0311 1. convert the code to base 2 hint: every base 4 digit converts to a pair of base 2 digits thus 0311 becomes 00110101 2. separate the bits to get: row 0100 = 4 column 0111 = 7
so the numbering system is just the Morton numbering of blocks, expressed in base 4 however, sequence and data compression are not the most useful aspects of this concept C. THE QUADTREE can express this sequencing as a tree the top is the entire array at each level there is a four-way branching each branch terminates at a homogeneous block the term quadtree is used because it is based on a rule of 4 each of the terminal branches in the tree (the ones having values) is known as a leaf in this case there are 13 leafs or homogeneous square blocks

32

Coding quadtrees to store this tree in memory, need to decide what to store in each memory location there are many ways of storing quadtrees, but they all share the same basic ideas one way is to store in each memory location EITHER: 1. the value of the block (e.g. A or B), or or 2. a pointer to the first of the four "daughter" blocks at the next level down all four daughter blocks of any parent always occur together overhead - Coding quadtrees thus, the quadtree might be stored in memory as: Position: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Contents: 2 6 A A A A A A 10 A 14 A A A B A A (level):0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 the content of position 1 is a pointer indicating that the map is subdivided into four blocks whose contents can be found starting at position 2 position 2 indicates that the four parts of the 0 block can be found beginning at position 6 positions 3, 4 and 5 indicate that the other three level 1 blocks are all A and are not further subdivided Accessing data through a quadtree consider two ways in which this quadtree may be accessed: 1. find all parts of the map with a given value 2. determine the contents of a given pixel notation: if the array has 2n by 2n pixels there are n possible levels in the tree, or n+1 if we count the top level (level 0) use m for the number of leafs 33

1. to find the parts of a map with a given value we must examine every leaf to see if its value matches the one required this requires m steps as there are m leafs 2. to find the contents of a given pixel, start at the top of the tree if the entire map is homogeneous, stop as the contents of the pixel are known already if not, follow the branch containing the pixel do know which branch to follow: take the row and column numbers, write them in binary, interleave the bits, and convert to base 4 e.g. row 4, column 7 converts to 0311 at each level, use the appropriate digit to determine which branch to follow e.g. for 0311, at level 0 follow branch 0, at level 1 follow branch 3, etc. in the worst case, may have to go to level n to find the contents of the pixel, so the number of steps will be n

34

QUADTREE ALGORITHMS AND SPATIAL INDEXES A. INTRODUCTION this unit examines how quadtrees are used in several simple processes, including: - measurement of area - overlay - finding adjacent leafs - measuring the area of contiguous patches in addition, this unit will look at how quadtrees can be used to provide indexes for faster access to vector-coded objects finally, alternative forms of spatial indexing will be reviewed Definition to traverse a quadtree: begin by moving down the leftmost branch to the first leaf after processing each leaf in this branch, move back up to the previous branching point, and turn right this will either lead down to another leaf, or back to a previous branching point diagram several of the following examples use this simple raster and its associated 35 quadtree

B. AREA ALGORITHM Procedure to measure the area of A on the map: traverse the tree and add those leafs coded A, weighted by the area at the level of the leaf Example in the example quadtree, elements at level 0 have area 16, at level 1 - area 4, at level 2 - area 1 thus, area of A is: 1 (leaf 00) + 1(leaf 02) + 1 (leaf 03) + 4 (leaf 2) + 1 (leaf 32) = 8 units C. OVERLAY ALGORITHM Procedure to overlay the two maps: traverse the trees simultaneously, following all branches which exist in either tree where one tree lacks branches (has a leaf where the other tree has branches), assign the value of the associated leaf to each of the branches e.g. node 3 is branched on map 1, not on map 2 the leafs derived from this node (30, 31, 32 and 33) have values B, B, A and B on map 1, all 2 on map 2 the new tree has the attributes of both of the maps, e.g. A1, B2

36

D. ADJACENCY ALGORITHM Problem find if two leafs (e.g. 03 and 2) are adjacent Corollary: find the leafs adjacent to a given leaf (e.g. 03) note that in arc based systems adjacencies are coded in the data structure (R and L polygons), so this operation is simpler with vector based systems Definition here adjacent means sharing a common edge, not just a common point diagram Two cases leaf codes are: 1. same length (same size blocks, e.g. 01 and 02) or 2. one is longer than the other (different size blocks, e.g. 03 and 2) solving this problem requires the use of: 1. conversion from base 4 to binary and back base 4 because of the "rule of 4" used in constructing quadtrees 2. bit interleaving 3. a new concept called Tesseral Arithmetic Tesseral Arithmetic tesseral arithmetic is an alternate arithmetic useful for working with the peculiarities of quadtree addressing to add binary numbers normally, a "carry" works to the position to the left e.g. adding 1 to 0001 gives 0010 this is the same as decimal arithmetic except that carries occur when the total reaches 2 instead of 10

37

in tesseral arithmetic, a "carry" works two positions to the left e.g. adding 1 to 0001 gives 0100 the reverse happens on subtraction 1000 less 1 is 0010 not 0111, as the subtraction affects only the alternate bits in other words, if we number the bits from the left starting at 1 adding or subtracting 1 affects only the even- numbered bits adding or subtracting 2 (binary 10) affects only the oddnumbered bits Determining Adjacency Determining adjacency 1. same size blocks: two leafs are adjacent if their binary representations differ by binary 1 or 10 (decimal 1 or 2) in tesseral arithmetic

example: 01 and 03 are adjacent because 0001 and 0011 differ by binary 10, or decimal 2 example: 033 and 211 are adjacent because in tesseral arithmetic 001111 + 10 = 100101, or 100101 - 10 = 001111 2. different size blocks: taking the longer of the two codes: convert it from base 4 to binary tesseral-add and -subtract 01 and 10 to create four new codes reject any cases where subtracting was not possible (a "negative" code would have resulted, or a "carry" would have been necessary to the left of the leftmost digit) discard the excess rightmost digits in the resulting transformed longer codes 38 convert back to base 4 to get the leaf

the two blocks are adjacent if any of the transformed and truncated codes are equal to the shorter code example: Are 02 and 2 adjacent? convert 02 to binary = 0010 0010 + 1 = 0011 0010 + 10 = 1000 0010 - 1 (impossible) 0010 - 10 = 0000 truncating gives 00 and 10 these are equal to 0 and 2 in base 4 therefore, 02 and 2 are adjacent (also 02 and 0 are adjacent) example: Are 033 and 2 adjacent? convert 033 to binary = 001111 001111 + 1 = 011010 001111 + 10 = 100101 001111 - 1 = 001110 001111 - 10 = 001101 truncating to two digits gives 01, 10 and 00 these are equal to 1,2 and 0 in base 4 therefore, 033 and 2 are adjacent example: Find leafs adjacent to 03 in the first map above method: find the codes of adjacent blocks of the same size, then work down the tree to find the appropriate leaf (note: can only find equal or shorter codes - equal or bigger leaf blocks) 0011 + 1 = 0110 = 12 : leaf 1 0011 + 10 = 1001 = 21 : leaf 2 0011 - 1 = 0010 = 02 : leaf 02 0011 - 10 = 0001 = 39 01 : leaf 01

Length of common boundary the length of common boundary between the two blocks is determined by the level of the longer code can use this to construct an algorithm to determine the perimeter of a patch e.g. the length of the A/B boundary in the first example map diagram E. AREA OF A CONTIGUOUS PATCH ALGORITHM Problem find the area of a contiguous patch of the same value, e.g. all A Corollary: How many separate patches of A are there? note: this is a general method which can be used in both quadtree and vector data structures i.e. find contiguous sets of quadtree blocks or irregularly shaped polygons, given that adjacencies are known or can be determined the following example uses the original raster map note that there are only two contiguous patches; the areas of A and B form only one patch each Method Area of a contiguous patch create a list of leafs, with their associated codes, by traversing the tree allow space for a "pointer" for each leaf, and give it an initial value of 0
40

Algorithm for each leaf i: find all adjacent leafs j with equal or shorter length codes (4 maximum) if the adjacent leaf j has the same value, determine which of i and j has the higher (larger value) position in the list, and set its pointer to the lower position (note: if a pointer has already been changed, it may be changed again or left, the result is the same) this produces the final pointer list Results 1. the number of contiguous patches will be equal to the number of zeros in the example, two pointers are zero, indicating two contiguous patches

2. the value of each patch can be obtained by looking up the values of leafs with 0 pointers in the example, leafs 00 and 01 have 0 pointers these have the values A and B respectively 3. to find the area of each patch, select one of the zeros and sum its area plus the areas of any leafs which point to it directly or indirectly the component leafs of each patch can be found by starting at with a leaf at the end (or beginning) of the list and following the pointers until a 0 is found
41

e.g. leaf at position 10 (code 33) points to 8, which points to 7, which points to 5, which points to 2, which has a zero pointer therefore, leaf position 10 (code 33) is part of the same patch as leaf 2 (code 01) and has the value B the areas can be found by summing the leaf areas for the example: A leafs: 00 02 03 2 32 A positions: 1 3 4 6 9 Area of A: 1 + 1 +1+4+1=8 B leafs: 01 1 30 31 33 B positions: 2 5 7 8 10 Area of B: 1 + 4 + 1 + 1 + 1 = 8 F. QUADTREE INDEXES Indexing using quadtrees indexes are used in vector systems to get fast access to the objects in a particular area of a map very useful in searching for potentially overlapping or intersecting objects therefore, they are an essential part of a polygon overlay operation looked at the usefulness of a simple sort of objects on one axis (e.g. x) in the moving band operation for intersection calculations now will look at methods which can be thought of as sorting on both axes simultaneously these use 2D coding systems and a simple one dimension sort Setting up the index steps are: 1. for each object (point, line, area) in the database, find the smallest quadtree leaf which encloses the object
42

some large objects will have to be classified as NULL, as they span more than one of the four leafs in the first branching (0, 1, 2 and 3) other smaller objects may be enclosed within a small leaf, e.g. 031 2. sort or index the objects by the enclosing quadtree leafs Using the index to find all objects which might intersect an area, line or point of interest find the quadtree leaf enclosing the object of interest starting at this point follow up the quadtree through all branching points that contain the original cell and down the quadtree to all branching points and leafs below the cell example: the area of interest is enclosed in leaf 31 of the original example quadtree the objects which may intersect the area of interest are those in leaf 31 and all leafs above it thus, these are 3 and the null leaf objects in other (remote) leafs cannot intersect the area of interest, so need not be checked example: the area of interest is enclosed in leaf 0 the objects which may intersect the area are in leaf 0, the null leaf and all leafs below 0 - 00, 01, 02, 03 there may be other leafs below these as well such as 010, 011, 012, 013, etc

43

Generalizations quadtree indexing is most effective for small objects, particularly points large objects tend to require large enclosing leafs even though they may not fill much of the space (i.e. highway corridors) these objects will always need to be checked for intersection it may pay to subdivide objects so that the pieces fall entirely within smaller leafs indexing in this way is intuitively more efficient than indexing by x or y alone since the quadtree index is effectively two-dimensional the divisions at each branching need not be equal in size it may pay to have some blocks of smaller area and some of larger area, rather than four equal squares at each branching however, for general efficiency the blocks should be rectangular G. R-TREE INDEXES R-tree indexes are a response to the problem of indexing large areas R stands for "range", a concept similar to MER Method find two, possibly overlapping, rectangles (aligned with x and y axes) such that: as many objects as possible are wholly within one or the other rectangle there are roughly equal numbers of objects wholly enclosed in each rectangle the overlap between the rectangles is minimum

44

indexing is determined by the rectangle in which the object is contained objects which are wholly within a rectangle are associated with that rectangle objects which are not wholly within either of the two rectangles are associated with the undivided map apply the procedure recursively, finding two new smaller rectangles within each existing rectangle this creates a tree structure similar to the quadtree every object is associated with some node in the tree to find the objects which might intersect a given area of interest: find the smallest rectangle used in the indexing procedure which wholly encloses the area of interest the objects are those associated with this rectangle and all nodes above and below it in the tree Problem although benchmark tests have shown that R-trees are generally more efficient than quadtrees and simple 1-D sorts, they are computationally intensive to construct

45