GIS/EM4 - Implementation of cellular automata models in a raster GIS dynamic modeling environment

Implementation of cellular automata models
in a raster GIS dynamic modeling environment:

an example using the Clarke Urban Growth Model

GIS/EM4

Matthew J. Ungerer

Abstract

This paper discusses some of the issues involved in implementing a cellular automaton (CA) model of urban growth in a fully-integrated raster Geographic Information System (GIS) dynamic modeling environment. CA models are dynamic spatial models in which the basic unit is the cell, situated in a two dimensional plane. CA models are thus very similar to raster GIS in many respects. Work by Batty, White and others has shown that cellular automata (CA) models are simple and effective means for creating portable models of urban growth. The Clarke Urban Growth model is a CA model which has been calibrated and used to predict the urban growth in several urban areas. Using only the built-in functions of the PCRaster GIS package, an approximation of the Clarke Urban Growth Model can be implemented using only about 200 lines of code. The GIS model has the same input maps and growth parameters, very similar growth rules and produces the same output files as the original C-code version. Thus the output files from the two models can be compared to examine if the models are functionally equivalent. The C-code version and the PCRaster version of the Clarke Urban Growth model were each run ten times using the same input maps and model parameters, for a period of 20 years. The last output file (i.e. year 20) from each of the ten runs of the C-code version was compared to the last output file from each of the ten runs of the PCRaster version. The results of the comparison indicate that the output files from the two versions of the model do differ statistically. However, it is also apparent from a visual comparison of the output files in animation format that the output files are remarkably similar. Modeling using a dynamic modeling language in a GIS does not give the modeler the same degree of control as an ordinary computer language, but has several advantages including a reduced development time, enhanced ease of use, and access to the functionality of the GIS for display and file management. A dynamic modeling language tightly coupled with a raster GIS offers an integrated environment that makes it relatively simple to implement cellular automata models, as well as other models of continuous phenomena which vary in time as well as space.

Keywords

Cellular automata, human-environment interactions, urban development impacts, GIS, dynamic model, integrated modeling

Introduction

GIS has great potential as an environment for the creation of dynamic models of physical environmental processes. Many models with explicitly spatial parameters or inputs which are currently implemented in modeling languages such as STELLA (process modeling and analysis), CELLANG (cellular automata) or are written in a standard computer language like C, C++ or Java could easily be created directly in a GIS with dynamic modeling capabilities. Unfortunately, most contemporary GIS packages are notably deficient in providing built-in dynamic modeling capabilities (for a discussion, see Fedra 1993, Nyerges 1993, Van Deursen 1995, Peuquet 1999). In this paper I describe the implementation of the Clarke Urban Growth Model in the high-level modeling language of the PCRaster, a dynamic GIS package, and compare the original version of the model, written in C, with the version implemented in PCRaster. In addition, I describe some of the issues surrounding dynamic modeling in GIS. Throughout this paper I will refer to the compiled C-code version of the Clarke Urban Growth Model as the UGM, or simply UGM, and the PCRaster script version as PCR.

Background

Urban growth is a familiar phenomenon to most people today: during this century we have seen the boundaries between cities in the eastern seaboard of the United States slowly disappear as a massive urban sprawl has taken shape. This has resulted in a significant increase in research which attempts to model urban growth. Similarly, research on GIS as a means to store and query large spatially-enabled databases has also increased. Given the spatial nature of urban growth models and the sophistication of today's GIS packages, it is not surprising that an integration of urban growth models and GIS is occurring.

Modeling in a GIS Environment

Four main strategies exist for coupling dynamic models with GIS, and these span a range from low to high integration: isolated, loose coupling, tight coupling, and full integration (Chou and Ding 1992, Goodchild et al. 1992, Raper and Livingstone 1993, Fedra 1993, Nyerges 1993, Peuquet 1999). The model is considered isolated if it runs independently of a GIS and integrated if it is run using the GIS. Models may be coupled using various strategies which may include the exchange of transfer files between the model and the GIS and/or embedding model commands within the GIS. There are tradeoffs unique to each of the coupling strategies (see Van Deursen 1995), but in general a high level of integration is preferable in many cases because it is flexible and adaptable to changes in model inputs and parameters. Also, this strategy involves no computational overhead in file conversion. Researchers have worked on the integration of GIS and CA models with both loose and tight coupling strategies, noting that integrating GIS with CA modeling functions could serve to enhance the dynamic modeling capabilities of current GIS packages (Park and Wagner 1997).

Cellular automata and the Clarke Urban Growth Model

The Clarke Urban Growth Model is a cellular automaton model which simulates urban expansion over time (Clarke et al. 1997, Clarke and Gaydos 1998). Cellular automata were first developed in the 1940’s and 1950’s by two mathematicians working at the Los Alamos National Laboratory, Stanislaw M. Ulam and that John von Neumann (von Neumann 1966). CA are dynamic mathematical systems based on discrete time and space. The system is aligned on a regular lattice of cells which all have a value or state. At each time step a rule of a series of rules are applied to the entire lattice of cells and each cell’s state is updated. The state of a cell at time step t is determined by state of the cell at time step t-1 and the state of the cell’s neighbors at time step t-1. The neighborhood of any given cell is usually defined as the eight cells which share a boundary with the cell on the lattice. While the transition rules from one state to the next may be simple in comparison with a model based on differential equations, the behavior exhibited by CA systems may be quite complex. Figure 1 shows a familiar example of a cellular automaton model, John Conway's "Game of Life" (Gardner 1971).

Figures 1: The most famous example of a cellular automaton model, John
Conway's "Game of Life" (Gardner 1971), shown with an "explosion" input layer.
This animation is from a PCRaster implementation of "Life".

Modeling urbanization and land use transition as formal cellular automaton models began with the work of White and Engelen, who examined the fractal nature of urban areas and developed a CA model of land use transition which they ran on data from four U.S. cities (1993). Batty and Longley have also used a somewhat similar approach, called diffusion-limited aggregation, to model urban expansion (1994).
The Clarke Urban Growth Model builds upon this previous work to create a unique and very complex CA model of urban growth and land use transition. The model takes multiple input sources, including topography, road networks, currently urban areas and areas of urban growth exclusion and applies a series of growth rules. There are three growth functions: spontaneous growth occurs by the formation of randomly located new urban areas, organic growth occurs as a cellular-automata-like expansion of current urban areas, and road gravity growth simulates the tendency of urban areas to grow around the transportation network. At each time step each of the three growth rules are applied to the current map of urban areas. There are five parameters which control the growth functions: breed, spread, road gravity, diffusion and slope resistance. In the original model these parameters are estimated by a brute calibration method which attempts to estimate values of each parameter based on known past growth patterns. For the PCRaster model implementation the values of these five parameters are assumed to be known, and can be changed directly in the model script. The UGM also has the capability to self-modify the model parameters but PCR was not implemented in this manner (although this could be done). Finally, the UGM has been extended to model land-use transitions but the current version of PCR lacks these capabilities.

Methods

I ran the UGM and PCR models ten times each for a period of 20 years using the same parameters (see Table 1). The UGM was run in test mode, rather than predict mode. The output of year 20 for each model run constitutes a sample, giving a total sample size n = 10. In order to compare the output of the two models, I first calculated the following statistical measures for each sample of the UGM and PCR models using the Fragstats statistical analysis package: number of urban growth patches, the mean patch area, total perimeter of patches, an index of landscape shape (the perimeter-to-area ratio for the whole landscape) and an index of mean shape (the average perimeter-to-area ratio). The results of each statistical test were analyzed with a student’s T-test with the null hypothesis of equal means. The model output was also visually compared.

Parameter	Value
Breed	50
Spread	30
Road Gravity	10
Diffusion	4
Slope Resistance	20

Table 1. Summary of model
parameters.

Results

Table 2 lists the statistical results of a comparison of output from the Clarke Urban Growth Model written in C (UGM) and from PCRaster (PCR). The reported p value is from a Student’s T-Test with the null hypothesis of equal means; thus the UGM is statistically different from PCR for each variable. For each statistic that was analyzed UGM was significantly different from PCR.

	Total Area (ha)	No. Patches	Mean Patch Size (ha)	Total Edge (m)	Landscape Shape Index	Mean Shape Index
Mean (UGM)	7.44	479.00	0.02	15,824.00	15.58	1.04
Mean (PCR)	7.86	517.89	0.02	17,073.33	16.30	1.04
Std. Dev. (UGM)	0.04	411.11	0.00	239,826.67	0.07	0.00
Std. Dev. (PCR)	0.02	156.84	0.00	106,100.00	0.03	0.00
p value	0.00	0.00	0.00	0.00	0.00	0.04

Table 2. Summary of statistics for comparison of UGM and PCR.

Figures 2 and 3 show the output in year twenty from UGM and PCR, respectively. These figures clearly demonstrate that, although different statistically for the various statistics which were compared, the model outputs for UGM and PCR are indeed very similar visually.


Figures 2 and 3: An example of model output from UGM (left) and PCR (right). Simulations were each run for 20 years, using the same input data and model parameters. Legend: Gray = new urban growth, Red = urban seed cell, Green = excluded area, Black (PCR only) = road, Blue = other

Discussion

The two models differ in terms of output and execution time, but this is not particularly surprising upon examination of the implementation details.
There are several possible reasons for the statistical differences in model output. First, the PCR version of UGM differs from the UGM version because certain UGM functions have no equivalent in the PCRaster dynamic modeling language. For example, it is not possible to specify a random walk of a given length along a road in PCRaster, although approximations of this can be made. Also, because of the randomness involved in a CA model and the fact that the two versions use different random number generators, some variation between the models is expected. Finally, the small sample size used for comparison of the two models (n=10) could be influencing the results of the statistical tests to some degree.
Certain UGM functions must be approximated by using PCRaster functions in ways for which they were not intended, causing a large increase in execution time. Choosing a random neighbor of a cell, which is done a total of five times per time step in PCR, involves a call to PCRaster’s ldd (local drainage direction) function. The ldd function is very computationally expensive, causing PCR perform at least 100 times slower than UGM on the same data set. Fortunately, the PCRaster package includes the ability to add user-defined functions in the form of dynamic libraries written in C, so the missing functions could be added. The availability of the UGM source code, written in C, a language compatible with the PCRaster function API, makes that task even easier.

Conclusion

It is useful to generalize the comparison of the UGM and PCR models into a discussion of the advantages and disadvantages of models written in a standard computer or modeling language and integrated GIS models. Integrated GIS models can be comparatively short- PCR is only 200 lines of code compared to thousands of lines for UGM. This drastically reduces the time necessary to implement the model. One disadvantage of integrated GIS models is they may take much longer to run than standard computer or modeling language models. PCR tales an hour or more to predict 20 years into the future as compared to less than a minute for the compiled UGM. Also, when modeling in an integrated GIS environment the modeler might not have access to all of the appropriate functions available. This leads to approximations which may be a further abstraction from the actual process being modeled. If the functions are available in the integrated GIS modeling environment then the modeling task is greatly simplified. Even if all functions are not available, an integrated GIS offers a modeling environment which may be suitable for quick prototype creation and demonstration purposes. Provision of an API for adding user-defined functions makes an integrated GIS an even more attractive modeling environment.

Recommendations for future research

One direction for future research on integrated modeling with a GIS is to investigate and identify CA and also more generic modeling functions which may not normally be included in standard raster GIS implementations. Examples of these functions include:

cellular automata and diffusion-limited aggregation functions (e.g. random adjacent cell selection)
spatial neighborhood filters of different size and shape (other than the standard 8 neighbor)
random walk function

Acknowledgements

I would like to thank Keith Clarke and Mike Goodchild for their assistance in the implementation of the model. Jeannette Candeau provided invaluable aid in translating the original model code, without which this research could not have been accomplished. Finally, I would like to thank the other researchers in the NCGIA at UCSB for their support and guidance.

References used

Batty, M. and Longley, P. 1994. Fractal Cities. London: Academic Press

Clarke et al. 1997. A self-modifying cellular automaton model of historical urbanization in the San Francisco Bay area. Environment and Planning B: Planning and Design. 24: 237-61.

Fedra, K. 1993. Distributed models and embedded GIS: Strategies and case studies of integration. Second International Conference/Workshop on Integrating GIS and Environmental Modeling. Breckenridge, CO. Sept. 1993.

Nyerges, T. L. Understanding the scope of GIS: It’s relation to environmental modeling. In M. F. Goodchild, B. O. Parks, & L. T. Steyaert, eds. Environmental Modeling with GIS. Oxford University Press.

Peuquet, D. J. 1999. Time in GIS and geographical databases. In P. A. Longley, M. F. Goodchild, D. J. Maguire, D. W. Rhind, eds. Geographical Information Systems: Principles, Techniques, Management and Applications. New York: Wiley, pp. 91-103.

Raper, J. and Livingstone, D. 1993. High level coupling of GIS and environmental process modeling. Second International Conference/Workshop on Integrating GIS and Environmental Modeling. Breckenridge, CO. Sept. 1993.

van Deursen, W.P.A. 1995. Geographical Information Systems and Dynamic Models. Ph.D. thesis, Utrecht University, NGS Publication 190, 198 pp.

von Neumann, J. 1966. Theory of Self-Reproducing Automata. University of Illinois Press, Illinois. Edited and completed by A.W. Burks.

White, R. and Engelen, G. 1993. Cellular automata and fractal urban form: a cellular modeling approach to the evolution of urban land-use patterns. Environment and Planning A. 25:1175-1199.

Gardner, Martin. 1971. On cellular automata, self-reproduction, the Garden of Eden and the game 'Life.' Scientific American.

Author

Matthew J. Ungerer, Graduate Student, Department of Geography
University of California at Santa Barbara, 3611 Ellison Hall, Santa Barbara, CA, USA 93106-4060.
Email: unj@geog.ucsb.edu, Tel: 805-893-8652, Fax: 805-893-8617.