Khairi Reda, assistant professor of data science and human-computer interaction at the IU School of Informatics and Computing at IUPUI, recently received a $174,977 National Science Foundation grant for his project, “CRII: Concept-Driven Visual Analysis.” The funding will advance Reda’s research into data visualization techniques for exploring and analyzing scientific data sets.
These techniques will help scientists with data and questions about it to see their data in new ways, Reda says. His goal is to ultimately spur discovery by making data accessible and empowering humans to look at it more deeply, as opposed to having computers crunching alone.
Davide Bolchini, chair of the Human-Centered Computing (HCC) Department says of Reda, “The highly competitive NSF CRII award seeds a novel and very timely line of research in broadening the conceptual tools for hypothesis-based visual analytics. We are all very proud of his accomplishment.” The Computer and Information Science and Engineering (CISE) Research Initiation Initiative (CRII) is meant to support early-career faculty in establishing research independence.
Existing visualization tools typically support only a narrow style of analysis—like a one-way street—where computers generate plots, people look at them, and possibly update them. This style is useful when scientists have little previous knowledge of what questions to ask or what the data may hold.
According to Reda, most scientists tend to have developed prior hypotheses and models, and are often looking to see how well the data fits these models. In particular, they are interested in closely examining instances where the data violates their assumptions, as these so-called data violations could provide alternative explanations and potentially lead to new discoveries. However, current visualization tools don’t allow this process to occur naturally; they show the data generically, making it difficult for viewers to evaluate the fit of the data to their own hypotheses.
Reda proposes re-architecting data visualization tools to support bi-directional analysis. In this new paradigm, analysts will be able to share their hypotheses and models with the system. The system will then generate visualizations that are directly responsive to those inputs, while visually emphasizing any discrepancy between data and models.
An example is the study of gene mutations involved in a particular type of cancer and the order in which they are expressed. Rather than including all gene expression for which there is data, the researcher could inform the system about just the gene mutations thought to be involved in the disease, and specify the suspected chronology of gene expression. Given this information, the system would correlate gene activation levels with the patient’s disease stage and automatically highlight instances where activation occurs earlier or later than expected. Non-conforming data could be explored further to discover the underlying reasons for any discrepancies, possibly leading to the identification of a new cancer pathway.
Reda’s research team will first need to discover how people could intuitively but accurately express their models and hypotheses to the data analytics system. The project will therefore begin with developing an understanding of how people come to form hypotheses about data, and will involve testing of various expression techniques—from natural language to concept mapping and graphical sketching. Human-centered design methods will be utilized throughout the study to iteratively develop and validate system prototypes.