Geography 176B Lab Home | Geography 176B Lecture Home | Geography Home | 176B Help & FAQs

LAB 2: GIS Data Models
Tme for completion: One Week

Outline

* Please use the double sided option when printing [File -> Print, Click on Properties, Select Print on Both Sides] 

1.0 Purpose
  • To gain a clear understanding of what a data model is, and why data models are important.
  • To learn the data models ESRI supports in ArcGIS, and the similarities and differences between them.
  • To learn the advantages and disadvantages of using certain data models for different tasks.
  • To reinforce basic ArcGIS skills.
  • 2.0 Introduction and background

    Basic Concepts

    You will notice some diversity in these definitions, as they are in the context of different companies, software, times, and degrees of specificity. For this lab, focus on the hierarchy described in the main body of the lab.

    There are three basic spatial data types used with GIS (points, lines, and areas):

    * Points represent anything that can be described as a discrete x, y location
    * Lines represent anything having a length
    * Areas, or polygons, describe anything having boundaries

    These data types comprise the vector model, which is the model you will deal with most often in GIS.

    Vector data model:
    Discrete features, such as customer locations, are usually represented using the vector model. Features can be discrete locations or events, lines, or areas. Lines, such as streams or roads, are represented as a series of coordinate pairs. Areas are defined by borders, and are represented by closed polygons. When you analyze vector data, much of your analysis involves working with (summarizing) the attributes in the layer's data table.

    Raster data model:
    Continuous numeric values, such as elevation, and continuous categories, such as vegetation types, are represented using the raster model. The raster data model represents features as a matrix/lattice of cells in continuous space. A point is one cell, a line is a continuous row of cells, and an area is represented as continuous touching cells.

    Tabular data:
    Contain information describing a map feature in the form of a table or spreadsheet. For example, a GIS database of customer locations may be linked to address and personnel information. GIS links this tabular data to associated spatial data.

    For more information, see GIdata models [from Universidade Nova de Lisboa]

    Question 1:
    Give an example of how a continuous phenomenon can be represented using the vector data model.

    Geographic Data Modeling: An Introduction

    Data Model - An abstraction of the real world which incorporates only those properties thought to be relevant to the application at hand, defines specific groups of entities and their attributes, and states relationships between these entities. A data model is independent of a computer system. [Association for Geographic Information]

    Data models are a crucial concept for GIS users to understand. Data models describe how geographic features will be represented in the GIS. Any time you wish to deal with geographic data in a computing environment, you must choose a geographic data model by which to do it. The choice of data model will yield benefits in terms of simplifying real-world features enough to deal with them easily, but will also incur costs in terms of oversimplifying or misrepresenting different aspects of them in the process.

    A paper map is an example of an analog data model -- it is a formalized framework that cartographers use to capture and represent information about a landscape on a sheet of paper. The same sort of thing is also needed to capture and represent geographic information when the medium is digital rather than ink-and-paper. In a GIS, abstractions of real-world features must therefore be formalized into a data model that defines how the computer will represent and manage the geographic information (geometry and attributes).

    Bernhardsen (1999) diagrams the data model formalization process along these lines:



    Figure 1: The modeling process.
    (after Bernhardsen 1999, p.39.  Map graphics from www.gis.com)



    Most of the confusion about data models arises from their diversity. Some data models are more abstract/theoretical while others are made with specific database frameworks in mind. For example, the vector data model and the raster data model are very general, whereas the georelational data model and geodatabase data model are made to fit specific categories of database software. Furthermore, a given data model may belong to more than one category: a coverage is both a vector data model (general) and a georelational data model (database specific).

    The many types of data models are easier to think about if one pictures of them as being part of a general hierarchy. Below is a figure showing the hierarchy of ArcGIS's data models:

    Figure 2: Hierarchy of ESRI's ArcGIS data models.


    The data models go from most general at the top level (vector, raster, TIN) to most specific at the bottom level (shapefile, coverage, geodatabase). It is important to note that a geodatabase can handle all three general models, not just the vector model.

    Geographic data models have evolved under the influences of technology (e.g., increasing storage space and processing power, networking, or software evolution) and even history (e.g., ESRI introduced the "coverage" data model in 1980).

    Every GIS software package will be capable of supporting a number of data models. The capabilities of the data models may change with new versions of the software, and compatibility issues may arise between different GIS software, and even between different versions of the same software. Certain functions will be accessible using data in the form of one data model but not another.

    Data Structures vs. Data Models

    Once we have decided on a data model to use, there remains the question of how to actually store this model in the computer. The specific format to be used for storing it is known as the data structure. To illustrate, consider a basic vector data model. The vector model represents features as consisting of lines which individually link together a start node, vertices in between, and an end node. To draw and analyze features represented this way, the computer needs information on the locations of each node and vertex of the lines. This could be provided in the form of a table listing the coordinates of these points, and indicating which line(s) go through them. This table would be the basic data structure. Coverages and shapefiles use this type of structure..

    In Figure 1 above, the lower left box titled "DATABASE (relational tables)" represents the data structure. In it you can see numbered rows and columns with labels, this is the 'structure' of the data. Some columns have only numbers, some have only text and some have both.

    Several different types of data structures can potentially be used to represent the same data model. For example, you could represent a vector data model using coverages, shapefiles, or geodatabases. Although these all take the same basic approach in representing the model, there are still significant differences between them. We will discuss what these differences are later, but for now keep in mind that 1) data models do not necessarily imply any particular data structures; and 2) data structures can represent the same data model while still being very different from one another.

    GIS Information Resources

    National Center for Geographic Information and Analysis (NCGIA) has its Core GIScience curriculum online. Some resources relevant to Data Structures and Data Models: Fundamentals of Data Storage, Information Organization and Data Structure, and Non-spatial Database Models.

    Question 2:
    a) Which data models can be stored in a Geodatabase?

    b) Do CAD data use a different kind of data model or a different type of data structure?

    c) What is the difference between a data structure and a data model?

    Data Models, Datasets, and Feature Classes in ArcGIS

    In ArcCatalog the geometry and data model of every dataset is identified by a small picture or icon.  This works much like Windows Explorer, except that only file formats recognized by ArcCatalog as geographic data will be displayed.

    Your life will be made much easier if you learn ArcCatalog's icons. There are a lot of them and they can be initially confusing, so here is the handy table from Lab #1 that you can refer back to.  Below is a display from ArcCatalog showing how the icons are identified by type.



    In the frame on the right, along the top, is a folder tab called 'Contents' which shows the
    contents of the folder with the icons symbolizing each data model and type.

    The folders and files that make up shapefiles, coverages, geodatabase feature classes, rasters, and TINs fall into an organizational hierarchy (similar but not identical to the Windows folder/file hierarchy, beware!). This is a VERY different type of hierarchy from the one we discussed with regard to Figure 2 and data models. There we address the theoretical/conceptual relationship of the geographic data model, not the "nuts and bolts" of the actual files. With regard to ArcCatalog we are referring directly to the particular data model and the specific data structure of each file type or file format. Figure 3, below, shows this hierarchy of folders, data models, datasets, and feature classes as displayed in ArcCatalog. Feature classes are the lowest level that the user accesses.


    Figure 3:  Icons and hierarchy

    Some file format basics:

    • Shapefiles: A single geographic feature type (counties, roads, capitals, etc.) will be contained in a shapefile, and each shapefile corresponds to a feature class. The geometric information (stored in hidden binary files) will be displayed in ArcCatalog's "Preview" and the attribute information (stored in dBASE tables) will be displayed in the "Table Preview". This linkage of geometric files to separate attribute tables is common to shapefiles and coverages and is called a georelational data model  by ESRI.
    • Coverages: Multiple geographic feature types will be contained in a coverage, and each of these types corresponds to a feature class. The folder that contains all of these feature classes is the actual coverage. Within it, the geometric and attribute information (again stored in hidden binary files) can be displayed using ArcCatalog's "Preview" and "Table Preview", respectively). Like shapefiles, coverages employ a georelational data model.
    • Geodatabases: A single geographic feature type corresponds to a feature class, as with shapefiles. Multiple feature classes can be grouped into a feature dataset (symbolized as three overlapping grey tiles) which specifies a common geographic framework for all its constituent feature classes contained within it.  All feature classes grouped inside of a feature dataset must have the same spatial reference i.e. projection and coordinate system information (In Figure 3, for example, the "USA container" contains information about the USA, capitals, counties etc.). Unlike shapefiles and coverages, geodatabases employ a geodatabase data model  that stores each feature, complete with its geometry, as a row in a relational database table.  A number of feature datasets can be stored in a geodatabase. 
    • It is important to note that there are two types of personal geodatabases supported in ArcGIS Desktop.  The first is a Microsoft Access database with a 2 GB storage limit.  In many cases, this is sufficient space to store the data necessary for a small project.  With the release of ArcGIS 9.2, an alternative file based personal geodatabase has been made available that uses the same data structure stored in a folder instead of an Access file.  Individual datasets in the new file based geodatabase have a size limitation of 1 terabyte, and the geodatabase itself has no size limitation other than the size of your hard drive. 
    • Looking again at Figure 3, you will notice that the geodatabase, the coverages, and the shapefiles are all contained within the folder named 'Some-Data.' The little blue symbol on the folder indicates that it contains recognizable geographic data in the first level beneath 'Some-Data.'  In the context of coverages, this folder is referred to as a workspace.  Remember, you will not see the blue symbol on the folder in ArcCatalog unless you specifically set this option.  If this option is set, ArcCatalog takes longer to refresh while it searches every single folder on your system for recognizable data.  This can be time consuming.
    • Note: Do NOT use spaces in file/folder names! Use an underscore ("_") instead. Certain ArcGIS software components seem to need an "unbroken path" to function correctly -- if you use spaces, you may run into problems. This is a holdover from older Arc software even though most Windows-based software can now handle spaces in names.  An example of a location that contains spaces in the path name is C:\Documents and Settings\student\Desktop\176B_Lab2.  Notice that there is a space after the word "Documents" and another space after the word "and". 
    Question 3:
     a) The fact that you can't use spaces in file names or folders has to do with what? (the software, data model, data structure, or something else)

     b) What is a "feature class"?

     c) What is an ArcGIS "coverage" and how is it different from a shapefile?

     d) What is the main difference between the geodatabase data model and the other data models?



    3.0 Get the data
    Open My Computer and go to either your removable disk or C:\Workspace and create a folder (right-click New -> Folder) and name it "Lab_2" (remember - do not use spaces in names, use an "_" or "-" instead).

    Right-click, save lab_2.zip to the C:\Workspace\Lab_2 folder (or equivalent) you just created. Open My Computer go to C:\Workspace\Lab_2 and double-click on the file you just saved. WinZip will open. Click the Extract button and make sure you extract the files to C:\Workspace\Lab_2.  Alternatively, you can right click the folder and choose "Extract Here" if WinZip is not available.  After you successfully extract the files you can close WinZip. You can view the data in ArcCatalog to verify that the following files and folders have been extracted.

    /mystery -- Contains 8 data layers of several features in different data models. You will be figuring out what these are in the lab.

    /sb
        roads -- Santa Barbara county roads coverage, clipped to the Goleta-Santa Barbara region

        sbdem -- digital elevation model of Santa Barbara county

        sbtin -- TIN derived from sbdem

        sbcontour -- Contour coverage derived from sbdem

        cacounties -- counties of California, from the GDT dataset

    The street data we are using in this class was provided by Tele Atlas




    4.0 Procedure

    4.1 Understanding data models
     
    Question 4:
    As you work through the lab, fill out Tables A and B below based on information from the lab introduction, exercises, course text, and lecture.  If time is short, you may want to leave some of the tables to fill out outside of section.

    Table A.
      Vector Raster TIN
    Briefly describe the essential characteristics of each. 
         
    Include the types of data generally represented 
    (i.e. continuous or discontinuous) 
         
    Give an example of a likely geographic feature that would be represented.      

    Table B.
     
    Geodatabases
    Coverages
    Shapefiles
    Historic Software Origin:
    ArcInfo8
       
    How the data is stored in the computer (i.e. does the data need to be in a special type of folder?  What files are required for the data model?)      No special folder for storage.  Three files containing spatial and attribute data are required, there may be other files with index information.
    In what type of files are the attributes stored?   INFO files  
    Describe the topological features in each data model
    Allows for topological feature classes, geometric networks.  Polygon topology implemented through on-the-fly topological editing.
       
    What type of data can be created in each data model?   Points, arcs, lines, linear measurement system, polygon, regions, tics, nodes, annotation  

    ArcGIS Help

    ArcGIS Help works like any Windows program help section. This is THE MOST important resource you will have for this class and in the future, read it and learn how to use it. Go to Menu Bar -> Help -> ArcGIS Help. 

    When you're looking for something in ArcGIS Help, make sure to Search in both the Index and the Search tab. Trying the search with different terms (e.g., data models, or coverage, or geodatabase) increases the odds of finding something useful.  support.esri.com is another excellent resource as well as the GIS Dictionary.


    Question 5:
    Use ArcGIS Help to find coverages to answer the following questions.

    a) List the feature classes that a coverage can contain.

    b) What is the purpose of an INFO table?

    c) What are tic points?

    d) What is planar topology?

    Use ArcGIS Help to find shapefiles to answer the following questions.
    e) How many feature classes can a shapefile use?

    f)  Do shapefiles have planar topology?


    4.2 Mystery Models

    Copy the mystery and sb directories onto your removable disk or to your Lab_2 folder in C:\Workspace if you have not done so already.  Connect to this folder in ArcCatalog and examine the layers in the folder mystery.
     
    Question 6:
    What are the data models for each of the layers? What geographic feature does each layer seem to represent? 
    (be as specific as possible)

    mystery1 -- 
    mystery2 -- 

    mystery3 -- 

    mystery4 --

    mystery5 -- 

    mystery6 -- 

    mystery7 -- 

    mystery8 -- 

    Once you have identified the layers and their data models, convert mystery5 into the same data model as mystery2. You will have to figure out how to do this yourself, but here are some hints:
     
    Converting Between Data Models
  • You will have to use ArcToolbox to accomplish this task.  Recall that you can open ArcToolbox by clicking on the ArcToolbox button in either ArcCatalog or ArcMap.
  • Find the toolbox menu that would contain the appropriate tools, Find the appropriate sub menu for converting data in mystery5's data model, Find the tool that will let you convert to mystery2 's data model. You should be able to figure out which layer to use as input. Recall that you can drag-and-drop from ArcCatalog instead of typing or browsing.

    Be sure to UNcheck the "Simplify" option. Use the defaults for everything else.
  • Specify the output directory, give the file a name you will remember, and run the conversion.

    Take your resulting layer and display it in ArcMap, along with mystery5.
     
    Question 7:
    a) How similar are mystery5 and your converted layer?

    b) Briefly describe the major differences between the two. What is the cause of them? 

    c) What source data was used to make mystery5 ?

        Go to the directory sb.

    Now, add sbcontour, sbdem, and sbtin into ArcMap.  Display just sbcontour and sbtin, and overlay sbcontour on top of sbtin.  To make the display intelligible, you will have to change the properties for the two layers.
     
    Changing Layer Properties in ArcMap

         Make the sbtin layer display on top of the DEM by dragging layers up or down in the Table of Contents which contains a list of the layers in your map.  To change the Properties of a layer in ArcMap, right-click on sbtin in the legend and go to Properties. Double-clicking on it will also work.

    • You get a large window with many tabs, like this:
    • Go to the Display tab. 
    • Change the transparency of sbtin so that the DEM raster can be seen underneath it, and click OK

    •  

       

    If you're curious about making better use of Properties, the main methods are the creation of Layers in ArcCatalog, and ArcMap's Style Manager, found in the Menu Bar under Tools -> Styles -> Style Manager.

    You will be repeating these steps to change a layer's properties many times throughout the quarter.  You will probably find the Properties functions very useful. ArcMap's Style Manager is an easier way to manipulate layer properties that we will learn about later on in the quarter but feel free to experiment with it.
     
    Question 8:
    Which of the three layers (sbdem, sbtin, sbcountour) do you think was the original data layer?  Which is "second generation" and which is "third generation"?  Why do you think this?

    4.3 Data Models and ArcToolbox

    ArcGIS continues to draw on a variety of data models and formats for its functionality. The ArcToolbox tools reflect this, requiring different data types for input depending on the analysis and data management tasks at hand. This situation can be a little confusing, but once you gain some exposure to how the tools are categorized and the patterns of the ways they ask for inputs and function settings, you will quickly be able to navigate through their use.

    To familiarize yourself with ArcToolbox and the input formats various tools require, find each tool listed below and figure out what kind of input file(s) it supports (e.g., coverage, geodatabase feature class, grid, TIN, etc.)

    Finding and Examining Tools
    • Again, recall that you can open ArcToolbox by clicking on the ArcToolbox button in ArcCatalog or ArcMap.
    • If you can't find a particular tool in ArcToolbox, use the Search tab at the bottom of the ArcToolbox window to search by name or description.
    • Important: If you click on a tool and get an error message referring to a license problem, you need to switch to ArcMap to be able to open it. Some tools are unfortunately module-specific and cannot be opened from every ArcGIS module.  In some cases, you may just need to enable ArcGIS Desktop extensions if they have not been enabled already.  From the Tools drop down menu near the top of ArcMap or ArcCatalog, choose Extensions and check all of the boxes except for the Data Interoperability extension.  UCSB has licenses for all of the extensions except for this one. 
    • Every time you click on a tool name, a short description displays in the bottom of the Toolbox window. For more information on a tool, double click it to open it and and click Show Help near the bottom right of the dialog box if it is not already showing.  You can now click your mouse in any of the input boxes to read a description of what the tool is looking for. 
    • From ArcToolbox, you can also right-click any tool and choose help from the context menu to open up the ArcGIS Desktop help system for that particular tool.

    Question 9:
    Find each of these tools and determine what data model type(s) (or perhaps other file types) it takes as input:

    a) Clip, Select, Intersect, & most other Analysis Tools (all the same answer)
    b) Viewshed
    c) Buffer
    d) Add Spatial Index
    e) Float to Raster
    f) Feature Class to Geodatabase
    g) Raster to Other Format
    h) Export to Interchange File
    i) Join Info Tables
    j) Create Labels


    4.4 AATs & PATs

    As discussed above, coverages have been the standard vector data model for previous releases of Arc/INFO. With the release of ArcGIS 8, all of the modules of Arc/INFO (Arc, ArcEdit, Grid, Tables, ArcPlot, INFO etc) were fully integrated, and the new geodatabase model was launched and promoted. However, coverages have been in use for such a long time, you will undoubtedly encounter them. 

    Recall that coverages employ the georelational database model. The INFO part of Arc/INFO was the relational database manager for earlier versions of the Arc software (Arc was the name given to the mapping component). An INFO file is a table that stores the information associated with the geographic features of a spatially referenced dataset. This gives a GIS the ability to manipulate information both spatially and via standard tabular database functions. An example relational model is when two tables share a common column. In a georelational model the individual records in two or more tables are related through their location in space. The polygon coverage below serves as a simple example of this concept. The common column is often called the KEY (of ID) column and is used to relate the tables and features.
     

    Figure 4. Diagram showing the coverage data structure for storing vector data.


    Let's explore an attribute table that is part of the roads coverage. Go to ArcCatalog and Preview the data.
     
    Previewing Tables
    • Below the preview map, locate the Preview box: 
    • Change the preview option from Geography to Table.
    • You are now looking at the arc attribute table (AAT). 
    Answer the question below.

    Question 10:
    a) How many records are there?

    b) What do FNODE# and TNODE# mean?

    c) What other attribute information can you recognize or guess at in the table (pick 3 columns)?

    For a look at polygons and Polygon Attribute Tables (PATs), open cacounty.  Explore the tables for the arc, polygon, and region.cty coverage feature classes.
     
    Sorting a Column in Table Preview
    • Click on any column heading in the table you wish to sort. This should highlight the column.
    • Right-click and choose Sort Ascending or Sort Descending (as appropriate).

    Question 11:
    a) How many counties are there in California?

    b) Why do the AAT, PAT, and RAT have different numbers of records?

    c) Explain the relationship between arc, polygon, and region.cty in this coverage.

    d) What are the label and tic feature classes for?

    Hints:  To figure out the answers, you will need to examine several of the tables. In addition, you might want to use the Identify Tool in the Geography Preview to query a few of the features. Also read Help.

    Map for Lab 2: 
         Make a map of the greater Santa Barbara metropolitan region with the roads coverage overlaid on the contour coverage.  You will have to choose appropriate properties for the two themes so that map readers can tell them apart. Also, make sure you follow the basic principles of cartography outlined in Lab 1.  Export your map to a pdf, jpeg, gif, or bmp.


    2.5 Conclusion

    In this lab, you have gained a basic understanding of geographic data models and data modeling, and the primary data models used in ESRI's ArcGIS 9.2 software.  You have seen how the ESRI data models are similar and different from each other, and how each has advantages and disadvantages for certain purposes.  You have gained further experience with some basic ArcGIS 9 skills, such as changing properties and using the help functions.

    2.6 Additional Reading

    Bernhardsen, Tor.  Geographic Information Systems: An Introduction . New York: John Wiley & Sons, Inc., 1999, pp. 37-99.

    Booth, Bob.  Getting Started with ArcInfo.  Redlands, CA: ESRI Press, 1999, pp. 45-56.

    Minami, Michael, Sakala, Michelle, and Wrightsell, Jennifer.  Table:"Comparing the structure of vector datasets."  In Using ArcMap . Redlands, CA: ESRI Press, 1999, pg. 403.

    Zeiler, Michael.  Modeling Our World: The ESRI Guide to Geodatabase Design .  Redlands, CA: ESRI Press, 1999, pp. 1-199.

    Online Sources:
         W01 Lecture 1: OVERVIEW OF ARCINFO 8 (Arc8 Data models)

          LECTURE2: REPRESENTATION

          LECTURE3: STRUCTURE OF ARCINFO 8

          AGI dictionary definition of data model
          FOLDOC definition of data model



    2.7 To turn in

    • The question sheet, with typed answers (Word document)
    • One map of Santa Barbara roads and contours

    Created by Sean Benison, Sunhui Sim, and Jordan Hastings

    Based on previous lab by Nicholas Matzke, Sarah Battersby and Jeff Hemphill
    UC Santa Barbara, Department of Geography

    © 2000-2007 Regents of the University of California

    This page was last modified on January 21, 2008 by Indy Hurt