Probabilistic Inference of Unknown Locations: Exploiting Collective Behavior when Individual Data is Scarce

New sources of large-scale geospatial data can inform policy decisions ranging from disease monitoring and city planning to disaster management and humanitarian relief. However, existing methods for mining these data are not well suited to most developing country contexts where technology use is less intense and the digital traces are generally quite sparse. Here, we present a method for predicting the approximate location of a mobile phone subscriber that is more appropriate to contexts where the signal generated by each individual may be intermittent, but the collective population generates a large amount of data.

Datalab Faculty

Joshua Blumenstock

Project Description

New sources of large-scale geospatial data can inform policy decisions ranging from disease monitoring and city planning to disaster management and humanitarian relief. However, existing methods for mining these data are not well suited to most developing country contexts where technology use is less intense and the digital traces are generally quite sparse. Here, we present a method for predicting the approximate location of a mobile phone subscriber that is more appropriate to contexts where the signal generated by each individual may be intermittent, but the collective population generates a large amount of data. This method works well when, for instance, an individual is not consistently active on the network or when the phone is off. Our model uses a nonparametric approach to probabilistically interpolate locations, and has the advantage of associating a confidence with each prediction. We test this method on a large dataset of anonymized mobile phone records from Afghanistan, and find that we can correctly predict a subscriber's unknown location in 76%-95% of cases, and that on average our predicted location is off by 0.2-1.9 kilometers