Data visualizations convey patterns by encoding data in the visual attributes (e.g., color, size) of graphical marks (e.g., bars, lines). Interaction allows analysts to manipulate and compare large datasets. The DataLab develops better tools to facilitate visualization-based analysis and communication.
Understanding the causes and effects of internal migration is critical to the effective design and implementation of policies that promote human development. Here, we describe how large sources of geotagged data generated by mobile phones can provide a novel source of data on internal migration.
For hundreds of years, scientists have been laying down trails of citations. These trails form a vast network, where papers are nodes and citations are links. This network can tell us a lot about the formation of new ideas, fields, and technology. We can identify salient papers and authors. We can construct maps that help us navigate this ever growing network. And we can better understand how information flows in social systems. These are some of the goals of the Eigenfactor Project (http://www.eigenfactor.org).
New sources of large-scale geospatial data can inform policy decisions ranging from disease monitoring and city planning to disaster management and humanitarian relief. However, existing methods for mining these data are not well suited to most developing country contexts where technology use is less intense and the digital traces are generally quite sparse. Here, we present a method for predicting the approximate location of a mobile phone subscriber that is more appropriate to contexts where the signal generated by each individual may be intermittent, but the collective population generates a large amount of data.
Variation and uncertainty are unavoidable in data analysis but most people have trouble incorporating uncertainty into their interpretations of a data set. This project aims to identify the design constraints and experimentally evaluate the usefulness of a technique that depicts uncertainty as a set of alternative possible outcomes. The outcome plots are presented in an animated or interactive format, so that the user can gain a better sense of the potential for variation in the data by watching possible outcomes "play out".
Measurements are ubiquitous yet can be challenging to understand when the unit (e.g., decaliters, tons) or magnitude (e.g., 320m, $5 mil) are unfamiliar to us. Strategies like re-unitization, in which an unfamiliar measurement is re-expressed using a new unit (e.g., 10kg is equal to the weight of 2 printers), can aid understanding but often require a skilled designer to realize. These re-expressions can also be personalized given some information about the user, such as their location (e.g., 11 miles is twice the distance from your house to the Space Needle). This project develops databases of familiar objects and landmarks and their measurements, drawing on web crawling techniques, semantic databases like WordNet and ImageNet, object databases like Amazon and Wikipedia, and crowdsourcing. We design automated algorithms for strategies like re-unitization and proportional analogy that rank re-expressions based on a number of dimensions. We apply these automated strategies in web applications that allow a user to get on-demand re-expressions of complex measurements.
Visualizations like scatterplots and bar charts are common ways of presenting data for analysis but in contrast to statistical analysis, visual analysis introduces perceptual errors and cognitive biases. This project explores what factors, related to both the data set and the visual presentation of the data, most impact a person's judgments about the data.
Expert journalists and designers often present visualized data related to an article to help people gain context for what they are reading (e.g., a locator map to help readers place a foreign location). The systems we are developing analyze the text of a news article, identify relevant datasets, and produce automated, annotated visualizations to help readers better understand the context of the article.