Saturday, September 28, 2013

Scale in GIS: An overview

F. Goodchild, Michael (2011).  Scale in GIS: An overview.  130 (2011), 5-9. 

This review article discusses the relevance of scale, as a disciplinary problem, within GIS and geomorphology.  Three general problems of scale are discussed, then two methods for conceptualizing phenomena, raster vs. vector data, problems specific to GIS & modeling, & lastly, methods of formalizing scale within various disciplines.

The first problem of scale is semantic.  Scale is used in 3 senses in science:
1)   Cartographers refer to the representative fraction as the parameter that defines the scaling of the earth to a sheet of paper
a.     ratio between map/ground
b.     defines level of detail, positional accuracy
c.      this is undefined for digital data
2) Extent of study area
3) Resolution 

The second problem (resolution is finite):
1)  The earth’s surface is infinitely complex & could be mapped to molecules/smaller
2    2)   Praxis determines the most largest (usually most relevant) features
a.     This makes me wonder why the largest features are usually most relevant

The third problem (resolution in processes):
a) If the process is significantly influenced by detail smaller than the spatial resolution of the data, then the results of analysis/modeling will be misleading (this is the problem of downscaling which geostatistics seeks to alleviate through the use of an inferred correlogram)
a.     This leads me to wonder if there are any processes that require all sorts of different levels of scale
      b) Most theories are scale-free
a.     can’t tell whether the model’s error (all models are imperfect) is due to spatial-resolution effects or the model, or both

There are two methods for conceptualizing geomorphological phenomena, which are scale-independent theoretical frames:
1) Discrete objects
a.     the world’s surface is empty like a table top (it is empty in the sense of it being simply flat, not empty empty or without a coordinate system) except discrete things
b.     things can overlap
c.      good for biological organisms, vehicles, buildings
d.     represented as points, lines, areas, volumes
e.     so these would be better able to map interactions between things (even if those things are not things per se), where in the continuous-field this would be lost/aggregated/generalized, esp. raster models?
f.      maybe we can make a partial analogy between this and substance ontology?
       2) Continuous-field
a.     phenomena expressed as mappings from location to value/class, so every location in space-time has exactly one value of each variable
b.     topography, soil type, air temperature, land cover class, soil moisture
c.      process ontology?

Both of these can be raster or vector data, but raster can’t be laid on curved surfaces
      1)   Raster data has resolution explicit in the size of its cells
a.     smaller cells, more of them = better resolution
b.     the intervals between samples defines this
c.      in 3-d sampling the vertical dimension is often sampled differently than the horizontals leading to a differential in resolution
                                               i.     e.g. remote-viewed 2-d map + field photos of vertical dimension
      2)   Vector resolution is poorly defined and difficult/impossible to infer a posteriori from the contents of its data sets (as most GIS work is done on already present data sets)
a.     if the data’s taken at irregular points the distance between points may be used as resolution – how often is the data taken irregularly?
                                               i.     use the minimum, mean, max nearest-neighbor distance?
b.     when data’s captured as attributes of areas, it’s represented as a polygon/polyhedra
                                               i.     resolution is the infinitely thin boundaries, density of sampling of the boundary, within-area/volume variation that’s replaced by homogeneity, size of areas/volumes
                                              ii.     this creates the impression that vector data has infinite resolution
1.     at finer resolutions the numbers of areas/volumes increase and their boundaries are given more detail but they’re more homogeneous

GIS encounters a problem that applies to many types of geographic data
-measuring the length of a digitized line
-a vector polyline’s length is easily computed as the sum of straight-line segments
-if we assume each sampled point lies exactly on the true line, we will get an underestimate of the truth because no line has the points lie exactly straight
            -the underestimate is by an amount that depends on sampling density
-this applies to all natural features, slope, land cover
-this reminds me of stats (line regression) and calc (instantaneous velocity)
-the issue of zooming out and losing detail/generalizing reminds me of “essentialism” from philosophy, & i’m sure the problem of definition comes up elsewhere in GIS (what variables to use, what sampling method)
-the modified area unit problem shows us that aggregation or intersections of mapping elements (data collection and state lines, for example) will change correlations in a manner that is not random variation from alternative samples

-mandelbrot tells us that the rate of info loss/gain is constant/predictable through scaling laws and exhibits power-law behavior and self-similarity 
             an image displaying self-similarity:

-geostatistics gives us spatial interpolation to refine the spatial resolution of a point data set artificially through methods such as spatial autocorrelation, correlograms, and variograms
when creating an inferred correlogram does one infer up in levels/”steps” till the desired resolution of the study?
-wavelet analysis (a subset of Fourier analysis) allows the decomposed field variables to vary spatially in a heirarchical fashion ­– this sounds very interesting

In light of all of this, I am currently wondering about the study of part-whole interactions in relation to resolution and vector/raster datasets.

1 comment:

  1. Nice article. I appreciate all your own thoughts. In light of your comment "This leads me to wonder if there are any processes that require all sorts of different levels of scale", Emma Davis recently posted an article about a study that used three or four different types of scaling to reach its conclusions. You should definitely look at it.