Spatial Outlier Detection: Application in Traffic Domain


Spatial outliers are significantly diffrent from their neighborhood even though they may not be significantly different from the entire population. For example, a brand new house in an old neighborhood of a growing metropolitan area is an spatial outlier. Figure 1 shows another use of spatial outliers in traffic measurements for sensors on I-35W (north bound) for a 24 hour time period. Sensor 9 seems to be a spatial outlier and may be a bad sensor. Note that the Figure 1 also show three clusters of sensor behaviours, morning rush hour, evening rush hour, busy day-time. Spatial statistics tests for detecting spatial outliers do not scale up to massive datasets, such as Twincities traffic dataset measured at thousands of locations in 30-second intervals and archived for years. We generalized spatial statistics tests to spatio-temporal datasets and developed scalable algorithms for detecting spatial ouliers in massive traffic datasets.

Figure 1 Traffic Volume for I-35W Northbound on 1/15 1997

Traditional approaches to outlier detections based on the global model may not detect spatial outliers since they do not capture the neighborhood relationships. In this project, we defined a neighborhood-based statistic and designed a statistically meaningful test for spatial outliers. We also developed fast algorithms to estimate the model parameters and to determine the results of a spatial outlier test. We showed that the spatial-self-join based algorithms are sufficient for detecting a general class of spatial outliers, namely S-outlier, which subsumes almost all popular definitions of spatial outlier. Our method was evaluated on a large real world data set from the Minnesota Department of Transportation. Transportation engineers found our spatial outlier detection method to be much more useful than conventional outlier detection methods in identifying faulty sensor. Preliminary results from this project were reported in Proc. ACM SIGKDD 2001, and Journal of Intelligent Data Analysis. A paper summarizing the final results is under review for IEEE Trans. on Knowledge and Data Eng. Technical details are also available in pdf formatted slides for computer science audience , and slides for traffic engineering audience .

Example: We provide an example of faulty sensors identified by our methods. Figure 2 shows the details of traffic measurements of a sensor 139 and its neighboring sensors 138 and 140. It is interesting to note that individual readings from sensor 139 are within the normal range of readings and thus are not global outliers. However several readings of sensor 139 are very different from the readings of its neighbors. In other words sensor 139 is a spatial outlier and traffic engineers considered it be a faulty sensor.

Figure 2: Traffic Volumes of Sensors 138, 139 and 140 on 1/12 1997