Mapview: A Visualization Tool for Spatial Outlier Detection
Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful patterns from large spatial datasets. Extracting patterns or rules from spatial datasets is challenging due to the complexity of spatial data types, spatial relationships, and spatial correlation. Spatial outliers are spatially referenced objects whose non-spatial attribute values are significantly different from those of other spatially referenced objects in their neighborhoods. A spatial outlier is a local instability, or an extreme observation with respect to its neighboring values.
Mapview is a web-based spatial analytical software, which is designed to facilitate the observation and discovery of spatial outliers for the US census data. Mapview supports the visualization of 11 different census attributes and provides the functionality of detecting local abnormalities using various spatial outlier detection algorithms. Developed in Java, Mapview allows users to conveniently access it through web browser and to interactively define execution parameters. To overcome the deficiencies of the existing spatial outlier detection algorithms, we also propose two new algorithms, Iterative Z-value and Iterative Ratio, which can detect true outliers ignored by the existing algorithms and remove falsely detected spatial outliers .
How to Access the Mapview Application
- You may need to download and install J2SE (Java 2 SDK 1.4.1_06) for Internet Explorer
- Reboot the machine
- Run it from this Web Page: Mapview
Software Architecture: The Mapview system has a three-tier architecture, including Graphic User Interface (GUI), outlier detection algorithms, and data files. The GUI draws a US map using the geographical coordination information of each county. The outlier detection algorithms receive user query from the GUI, compute the spatial outliers from database files, and send the results back to the GUI for display. There are three data files, including polygon, county attribute, and neighborhood relationship.
System Demonstration: Mapview can effectively discover spatial outlier counties and mark them with distinguishable color. Users can click each county to view its corresponding attribute value and those of its neighboring counties. In Figure 1, Fairfax county in Virginia is selected and its population density value is displayed. The population densities of its six neighboring counties are shown as well. Before running the outlier detection algorithm, the system will request user input for the number of spatial outliers. In Figure 2, the outlier counties are identified and marked in blue. Their attribute values and the attribute values of their neighboring counties are displayed in another window.
Figure 1: Population Densities of All Counties in the U.S.A. Figure 2: The Detected 20 Spatial Outliers.
Supported Spatial Outlier Detection Algorithms
Scatterplot is a graph based outlier detection method. It shows attribute values on X-axis and the average of the attribute values in the neighborhood on the Y-axis. A least square regression line is used to identify outliers. Nodes far away from the regression line are flagged as spatial outliers.
Moran Scatterplot is a plot of normalized attribute value against the neighborhood average of normalized attribute values. It contains four quadrants. The upper left and the lower right quadrants indicate a spatial association of dissimilar values: low values surrounded by high value neighbors and high values surrounded by low value neighbors. Spatial outliers can be identified from these two quadrants.
Z -value Approach calculates the standardized difference between the attribute value of a point and the average attribute value of its neighbors. Those points with the standardized difference values greater than a pre-defined threshold will be flagged as spatial outliers.
Iterative Z-value Approach is the proposed new algorithm . The key idea of iterative approach is to detect spatial outliers one by one. After one outlier is detected, its attribute value will be substituted with the average attribute value of its neighbors before next iteration begins.
Iterative Ratio Approach is similar to iterative Z-value approach, whereas it identifies outliers through the ratio between the attribute value of a point and the average attribute value of its neighbors. Those points with the ratios (or the inverse of ratios) greater than a pre-defined threshold will be flagged as spatial outliers.
- Algorithms for Spatial Outlier Detection, IEEE International Conference on Data Mining, 2003, C.T. Lu, D. Chen, Y. Kou.
- Mapcube and Mapview Demonstration Report, International Workshop on Next Generation Geospatial Information, 2003.
- Mapcube and Mapview Presentation Slides, 2003.
- Chang-Tien Lu (Assistant Prof., Dept. of Computer Science, VirginiaTech)
- Dechang Chen (Assistant Prof., Preventive Medicine and Biometrics, Uniformed Services University of the Health Sciences)
- Hongjun Wang (Dept. of Computer Science, Virginia Tech)
- Yufeng Kou (Dept. of Computer Science, Virginia Tech)