Join the Lab

The active DataLab faculty are all actively involved in cutting-edge Data Science research that is published in leading venues around the world. There are several different ways for students to get involved in these projects – see below for guidelines. The different areas of research are supervised by core faculty members Joshua Blumenstock, Jessica Hullman, Emma Spiro and Jevin West have groups working on different research areas and projects.

Prof. Joshua Blumenstock’s group is concerned with using “big data” and cutting-edge techniques from data science to solve problems that affect poor and resource-constrained populations. Ongoing research using terabyte-scale data from large communication networks and social media outlets to understand processes of human and economic development in field sites including Afghanistan, Ghana, Pakistan, Rwanda, and the United States. Current questions include: Is it possible to predict someone’s wealth or gender by analyzing their mobile phone activity? Can large text message broadcasts influence the behavior of people in ways that improves their quality of life? How do the travel patterns of millions of individuals affect the structure of local communities and national economies?

Desired Qualifications (prof. Blumenstock):

  • Strong technical skills (computer programming, quantitative analysis, big data management, data visualization)
  • Impeccable work ethic that often involves working late into the night fueled by red bull and pizza.
  • Experience using the group’s current technologies, Hadoop, Spark, Hive, Scala, Python (Numpy/SciPy/scikit-learn), R, Java.
  • A detailed list of desired qualifications can be found on Dr. Blumenstock’s website

Prof. Jessica Hullman’s group designs, builds and studies interactive systems that apply data mining and visualization to help people better comprehend information they encounter online. We start by studying the work of expert designers, journalists and educators in order to identify the design principles they use to create useful visualizations, text articles, and other explanatory aids. We then apply these principles in automated and semi-automated algorithms to create the same types of artifacts that experts do at scale. Current and past systems generate customized, annotated visualizations to accompany news articles, automatically re-express unfamiliar measurments in text in more familiar terms, and suggest automated simplifications of scientific jargon to help journalists as they read and write about science.  

Desired Qualifications (prof. Hullman):

  • Programming and technical skills, including linux, back-end languages (e.g., java, python, C), front-end languages (javascript, D3, HTML5)
  • Strong quantitative analysis skills
  • Creativity and an interest in creative work
  • Ability to commit at least 10 hours a week to project work

Prof. Emma Spiro's group studies online communication and information-related behaviors in the context of emergencies and crisis events, such as natural disasters, acts of terrorism and civil unrest. Her group focuses on questions about how the structure and dynamics of interpersonal and organizational social networks affects information diffusion, communication rates, personal expression, and interaction. One project that illustrates some of this work is Project HEROIC

Desired Qualifications (prof. Spiro):

  • Experience with one programming language (R and python preferred)
  • Experience with statistical analysis and environments such as R, MatLab, NumPy/SciPy, etc.
  • Some experience working with large databases (e.g. mysql)
  • Ability to work in a linux/command line environment
  • Domain expertise and interest in an area of social science

Prof. Jevin West’s group tries to answer the following questions: how can we rank and map science for better understanding how new fields of science form and how we promote and fund good science? How can we use these algorithms to better find relevant papers in science? How can we visualize large information networks in general and better understand the flow of information in social systems? One project that illustrates some approaches is the Eigenfactor Project.

Desired Qualifications (prof. West):

  • Some scientific programming experience (e.g., python, Matlab, R, C++, etc.)
  • Some web programming experience (e.g., php, javascript, css, html, etc.)
  • Some experience working with large databases (e.g., mysql, etc)
  • Domain expertise in a related field (Information Science, CS, Complex Systems)
  • Comfortable in a linux/unix environment
  • Ability to commit at least 10+ hours/week to a given project

Guidelines for prospective students
If you are interested in joining the DataLab to work on one of these projects, send an email to and include the following information:

  • Cover letter explaining why you want to join the DataLab, who you are interested in working with, and why you want to work with that person.
  • Resume or CV that highlights your relevant experience, and a recent academic transcript.
  • 1-2 paragraph summary of a recent research article you read and why you found it noteworthy.
  • Optional and recommended: Link to recently completed projects that showcase your skills and aptitude. This could be a research paper you have written, a website you built, a github repository, a .pdf portfolio of data visualizations, a final project report, or anything else that you are particularly proud of. Put your best foot forward!