Same school, new name. The School of Informatics and Computing is changing its name effective January 11, 2023. Learn more about the name change

BioHealth Informatics Department Menu

Externally Funded Research Projects

This is an evolving list of selected, externally-funded research projects led by faculty in the BioHealth Informatics Department. Potential collaborators, faculty, and students at all levels are encouraged to contact the investigators to know more about the projects and explore research collaboration opportunities to get involved.

Molecular Genetics of Hereditary Endoplasmic Reticulum Diabetes

Funding Source: National Institutes of Health (2022)

Principal Investigator: Bohdan B Khomtchouk

The endoplasmic reticulum (ER) is best known for its role as the locus of protein folding, calcium storage, and lipid metabolism. The organelle also integrates numerous other molecular pathways and contributes to cellular calcium homeostasis, reduction-oxidation regulation, and cell death. Given the many vital and complex functions of the ER, it is little wonder that its failure can trigger a range of diseases. It has been shown that dysregulation of ER homeostasis may underlie β cell dysfunction and death in type 1 and type 2 diabetes, as well as in monogenic forms of diabetes, including Wolfram syndrome, Wolcott-Rallison syndrome, microcephaly, epilepsy, and diabetes syndrome (MEDS), and mutant insulin gene-induced diabetes caused by pathogenic variants in the WFS1 and CISD2, EIF2AK3, IER3IP1, and INS genes respectively. To further understand the contribution of ER dysfunction to β cell death and design novel treatments targeting ER for diabetes, we need to establish functional studies of gene variants affecting ER homeostasis, design treatments targeting common molecular pathways altered in ER stressed β cells, and identify other ER genes involved in β cell dysfunction and death. In this proposal, we will characterize WFS1 and CISD2, EIF2AK3, IER3IP1, and INS variants using functional assays and bioinformatics and test novel treatments targeting the common molecular pathways altered in β cells expressing pathogenic variants of WFS1 and CISD2, EIF2AK3, IER3IP1, and INS genes. Successful completion of this study will lead to the establishment of precision medicine for hereditary ER diabetes.

Learn more about this project

Real time high-throughput cost-effective sequencing platform for 2019-nCOV detection and genotyping

Funding Source: INDO-US SCI TECH FORUM (2021)

Principal Investigator: Sarath Chandra Janga

Transcriptome wide mapping of RNA-protein interactions in hepatic cells to dissect the role of post-transcriptional mechanisms in fatty liver disease

Funding Source: ELI LILLY & CO (2021)

Principal Investigator: Sarath Chandra Janga

Design and Analytic Support, Indiana Volunteer Workforce Solutions, Phase I

Funding Source: INTL ASSN FIRE CHIEFS (2021)

Principal Investigator: Schwebach Gary Dee

HeartShare DeCODE-HF: Data translation center to Combine Omics, Deep phenotyping, and Electronic health records for Heart Failure subtypes and treatment targets

Funding Source: National Institutes of Health (2021)

Co-Investigator: Bohdan B Khomtchouk

The Overarching Aim of the Northwestern HeartShare DeCODE-HF: Data translation center to Combine Omics, Deep phenotyping, and Electronic health records for Heart Failure subtypes and treatment targets is to provide overall management and oversight for HeartShare, including coordination and communication across all subsections and cores of the program, and with the 4 HeartShare Clinical Centers (CCs). The Data Translation Center (DTC) will need to ensure timely completion of the retrospective and prospective components of HeartShare. In our application, we demonstrate the ability and prior experience of our multi-PI team and core leaders in the conduct and leadership of large-scale, multi-center studies, particularly in the realms of heart failure with preserved ejection fraction (HFpEF), electronic health record (EHR)-based investigation, deep phenotyping, machine learning, and biorepositories. For both the retrospective and prospective HeartShare components, the Northwestern DTC will leverage its considerable experience in: (1) cohorts and trials that have data on adjudicated HF patients, and (2) its leadership and track record as the top enroller in HFpEF trials/studies to ensure optimal recruitment by the HeartShare CCs. We will also leverage our expertise in HFpEF, data coordination/management, BioData Catalyst, EHR-based research, biostatistics, machine learning, multi-omics, human-computer interaction, and mobile health data monitoring to (1) successfully execute the HeartShare program and (2) meet the HeartShare DTC goals of providing a rich resource to the research community to advance the science of next-generation phenomics to identify HFpEF subtypes and therapeutic targets. Our HeartShare DTC will carry out each of its 4 primary responsibilities. (1) The Administrative and Outreach Core will oversee HeartShare program operations, research skills development, and capitation to CCs for deep phenotyping costs; support a call center to follow patient outcomes; and promote rapid and broad data-sharing of clinical and molecular data. (2) The Data Portal Core will serve as the primary access point (via a web-based interface) and coordinating center for all HeartShare data; and to develop an interactive patient-facing web-based interface for remote consent, completion of forms, and collection of mobile health data using BioData Catalyst tools. (3) The Data Management Core will oversee ensure data quality, integration and harmonization of various data types; perform advanced analytics; coordinate biospecimen and imaging biorepositories; and integrate with TOPMed for omics analyses. (4) The Cohort Core will aggregate data on HF patients from completed epidemiology cohorts and HF trials and will combine data with further bioprofiling of existing samples to allow discovery/validation of novel targets.

Learn more about this project

CAREER: Computational strategies for incompleteness and heterogeneity in multi-omic data

Funding Source: National Science Foundation (2021)

Principal Investigator: Jingwen Yan

Multi-omics refers to the integrative analysis of multiple types of -omics data (e.g., genotype, gene expression and protein expression). Increasing multi-omic data provides opportunities for discovery of disease biomarkers from multiple molecular scales and therefore can further our understanding of underlying disease mechanisms. Despite this great potential, existing multi-omic data collections are mostly incomplete and of heterogeneous types (e.g., continuous and categorical numbers). Integrating these data for joint analysis typically requires exclusion of many subjects with missing values; as a consequence, a large chunk of data remains unused. This project provides novel perspectives in handling the incompleteness and heterogeneity problems in multi-omics data and hereafter allow biomedical researchers to gain more insights from rapidly growing yet imperfect biomedical data. In addition, the increasing multi-omics data has led to a massive transformation in biomedical research and has resulted in an unprecedented need in information management, decision support, and advanced analytics. In this project, a series of educational activities will be conducted to engage students at their early stages of education and to increase their awareness of educational opportunities and career paths in biomedical informatics.
This project aims to develop new classes of computational methods to enable the joint mining of incomplete and heterogeneous multi-omic data by leveraging various biological networks for discovery of functionally connected biomarkers. Towards this, two tasks will be performed: 1) identify multi-omic subnetworks as biomarkers via a multi-task joint network module detection and feature selection model, and 2) select associated features between heterogeneous -omics layers via a novel multi-task sparse association model. The first task aims to address the incomplete data problem. This new model can not only handle the incomplete data collected from one large-scale project, but also allow the joint analysis of -omics data from multiple small-scale projects without overlap in subjects. The second task addresses the heterogeneity problem with a novel two-step strategy in associating different -omics layers. Built upon these research efforts, three outreach educational activities will be conducted: 1) develop a project-based curriculum for high school students, 2) host an annual summer workshop on multi-omics for high school students, and 3) provide advanced research opportunities to undergraduates from biomedical informatics and related disciplines. This research effort will lead to discovery of more reliable biomarkers for further validation and better understanding of their relationships with disease traits than currently possible.

Learn more about this project

Monoalleleic autosomal spreading of a novel family of nucleolus-localized ncRNAs

Funding Source: UNIV IL URBANA-CHAM (2020)

Principal Investigator: Sarath Chandra Janga

Effective Industry and Academic Engagement Features and Processes

Funding Source: IND BIOSCI RES INST (2020)

Principal Investigator: Schwebach, Gary Dee

Gene co-expression underlying the connectomic alterations in Alzheimer’s disease

Funding Source: National Institutes of Health (2020)

Principal Investigator: Jingwen Yan

A brain connectome at the macroscale is typically represented as networks, where nodes are brain regions of interest (ROIs) and links indicate their functional or structural connections. Both functional and structural brain network architecture are heritable and found disrupted in AD or its prodromal stage. Recent availability of brain-wide transcriptome data has made possible another type of brain connectome, brain co-expression network, which captures spatial variations in gene expression with links as transcriptional coupling between ROIs. Some studies showed that co-expression network is closely connected to structural and functional brain networks. However, the genes inducing such connection remains unknown. Identification of these genes will transform our understanding of the biological underpinnings of altered neural system in AD and can exert a huge impact on the development of new diagnostic, therapeutic and preventative approaches for AD. The complexity of network data, however, has presented critical computational challenge requiring new concepts and enabling approaches. To address these challenges, we propose novel integrative approaches and perform the following two tasks: 1) identifying functional and structural brain networks altered in AD via meta-analyses, and 2) identifying the genes underlying the association between co- expression networks and AD-altered networks. By leveraging the brain-wide transcriptome data, we will learn a small set of genes whose co-expression patterns across ROIs can best explain their altered connections in AD.

Learn more about this project

FW-HTF-RM: Measuring learning gains in man-machine assemblage when augmenting radiology work with artificial intelligence

Funding Source: National Science Foundation (2020)

Principal Investigator: Saptarshi Purkayastha

The work setting of the future presents an opportunity for human-technology partnerships, where a harmonious connection between human-technology produces unprecedented productivity gains. A conundrum at this human-technology frontier remains – will humans be augmented by technology or will technology be augmented by humans? This project overcomes the conundrum of human and machine as separate entities and instead, treats them as an assemblage. As groundwork for the harmonious human-technology connection, this assemblage needs to learn to fit synergistically. This learning is called assemblage learning and it will be important for Artificial Intelligence (AI) applications in health care, where diagnostic and treatment decisions augmented by AI will have a direct and significant impact on patient care and outcomes. This project will also identify ways in which learning can be shared between assemblages, such that collective swarms of connected assemblages can be created. The project will create a new learning model that integrates and measures concepts from individuals learning to swarm learn. The project will help demonstrate a symbiotic learning assemblage, such that envisioned productivity gains from AI can be achieved without loss of human jobs. Even though the focus is on visual cognitive tasks in radiology, lessons from this project may be applicable to other domains where human intelligence will be augmented by machine intelligence.

Recent studies of human versus machine competitions have demonstrated that assemblages that combine human-technology partnerships are stronger than individual humans or machines. By building on these, this project will integrate state-of-the-art algorithms into the radiology workflow. The project will answer the following research questions: Q1: How to develop assemblages, such that human-technology partnerships produce a “good fit” for visually based cognition-oriented tasks in radiology? Q2: What level of training should pre-exist in the individual human (radiologist) and independent machine learning model for human-technology partnerships to thrive? Q3: Which aspects and to what extent does an assemblage learning approach lead to reduced errors, improved accuracy, faster turn-around times, reduced fatigue, improved self-efficacy, and resilience? A rigorous counterbalanced trial will be performed to assess individual radiologists interpreting images with and without the assemblage. Data on clinician engagement from EHR systems will be captured and analyzed, along with pre-test and post-test surveys and interviews. Deep and wide analysis of the quantitative and qualitative data from the trial will answer questions related to learning gains, task performance, emotional as well as behavioral aspects of learning in an assemblage. The project employs perspectives from Science & Technology Studies, Computer Science, Psychology, and Learning Sciences, to create and study assemblages that can produce gains in routine radiology work.

Learn more about this project

EAGER: Algorithmic frameworks and resources for mapping RNA modifications from single molecule direct RNA-sequencing data

Funding Source: National Science Foundation (2020)

Principal Investigator: Sarath Chandra Janga

A basic question in cell biology is to understand the driving mechanisms that control how and when genes are expressed, and to identify the active switches in those processes. The first step of gene expression is production of an RNA molecule from the genomic DNA, “transcription”. As instruments become available that allow detection of the original RNA molecules from cells, it is becoming possible to identify sites where RNA bases have been chemically modified after their initial transcription. This is important because some of these post-transcriptional modifications play a role in how the expressed RNA is translated into expressed protein. Little is known as yet about the molecular players that are involved in the myriad steps that govern expression patterns, including localization, splicing, stability and folded structure of the RNA. This project aims to detect, identify and quantify the extent of modifications on RNA molecules as measured on the Oxford Nanopore platform, as a required first step in understanding those biological functions. Gold-standard calibration sets of synthetic oligonucleotides will be designed, produced and tested as part of the experimental design, and new algorithms and subsequent software will provide single-nucleotide resolution of the type and locations of robustly detected modifications in natural transcripts in yeast and human data sets.

Lack of efficient high throughput detection methods has plagued the emerging field of epitranscriptomics, which is focused on the role of chemical modifications on RNA bases in modulating the biological function and structure of RNA molecules. The overarching research goal of this project is to develop computational methods to map RNA modification sites for 5-methyl cytosine (5mC), 1-methyl adenosine (m1A) and methylation of the backbone of the RNA nucleotides (Nm) at a single nucleotide resolution. Experiments will employ synthetic calibration oligonucleotides as well as use newly developed algorithms to probe natural yeast and human transcripts, using the long-read direct RNA sequencing data resulting from Oxford Nanopore sequencing technology. The project will complement current transcriptomic reference maps of these modification events with additional data needed to train computational methods, from gold-standard calibration sets composed of synthetic RNA oligonucleotides. The resulting Oxford Nanopore signatures of modification sites will be analyzed using deep learning for signal analysis and statistical methods for robustness in precision and accuracy. All resulting methods, databases and maps of RNA modification types across species will be made publicly available from the project web site.

Learn more about this project

Mapping RNA protein interaction networks in the human genome

Funding Source: National Institutes of Health (2018)

Principal Investigator: Sarath Chandra Janga

Detecting protein-RNA interactions is challenging–both experimentally and computationally– because RNA transcripts are large in number, diverse in cellular location and function. As a result, many RNA-binding proteins (RBPs) and their cognate motifs are likely unknown or uncharacterized in humans as well as other model organisms. With increasing number of RBPs implicated in human diseases, there is an urgent need for identifying and mapping functional and phenotypic information for RBPs as well as to complete a map of the protein-RNA interaction network. The objective here is to establish a robust computational technique that integrates expression associations with sequence as well as several RBP centric features for genome-scale prediction of binding motifs for hundreds of human RBPs to facilitate the elucidation of their tissue-specific post-transcriptional networks. At the completion of this project, we expect to have developed the most advanced tool for predicting human RBP motifs and methods as well as resources which can facilitate the construction of tissue-specific RBP-RNA networks. Our central hypothesis, supported by our initial genome-scale computational study and assessment by comparative analysis of known RBP binding motifs is that, since many RBPs are involved in different stages of RNA metabolism, exon expression level associations with an RBP and other exon related features can be very powerful in identifying the binding motifs of an RBP in a tissue-specific manner. The proposed integrated approach to experimentally validate several binding motifs using CLIP-seq and to deconvolute global posttranscriptional networks in specific cell/tissue types, using genome-wide data from protein protection assays (POP-seq) will significantly enhance our capability of uncovering network dynamics of RBPs in cell types and tissues. Such high-quality predictions based on experimental validations, resulting from all the Aims which will be made public, can become a venue for future experimental follow up to dissect the role of these important regulatory molecules in different tissues and disease states. The proposed studies will make an impact in the field as the first large-scale computational mapping of protein-RNA interaction networks in the human tissues by taking our ability to predict RBP targets to the next level.

Learn more about this project

Computational Methods to Mine Multi-omic Data for Systems Biology of Complex Diseases

Funding Source: National Science Foundation (2018)

Principal Investigator: Jingwen Yan

Recent advances in high throughput technologies have led to a substantial increase in multi-omic data characterizing various levels of molecular changes in the progression of disease, including genome, transcriptome, proteome and metabolome. The availability of computational methods that are sufficiently powerful to handle the high dimensionality and heterogeneity of multi-omic data is still very limited. In addition, major findings generated from current -omics studies have been largely restricted to relatively simple patterns, e.g., individual biomarkers, possibly with few functional interactions, which present difficulties for validating these findings and relating them to downstream biology. This project, by coupling the multi-omic data and the systems biology networks, will develop novel computational methods to explore the functional network modules associated with disease quantitative traits. By enabling both strategic and efficient knowledge extraction from the vast biological landscape represented by multi-omic data, this research has may lead to unprecedented discovery of disease mechanisms and suggest surrogate biomarkers for therapeutic trials.

This work will develop new computational methods to enable the integration of large scale heterogeneous multi-omic data with rich domain knowledge for better biomarker and association discovery. Two interrelated tasks will be performed: 1) Develop a novel biological knowledge guided structured sparse learning model together with large-scale optimization methods to integrate -omic data and biological networks from multiple sources and discover -omic modules involving heterogeneous biomarkers for accurately predicting outcomes of interest; and 2) Couple multi-task learning with structured sparse association models to jointly learn the bi-multivariate associations between imaging phenotypes and -omic features with dense functional connections for multiple groups.

Learn more about this project

RNA Rustbelt Meeting 2017

Funding Source: National Science Foundation (2017)

Principal Investigator: Sarath Chandra Janga

This award will support attendance by early career researchers at the 19th Rust Belt RNA Meeting to be held at the Sheraton Indianapolis City Centre Hotel in Indianapolis, Indiana on October 13-14, 2017. The format of the meeting is a combination of talks and posters, which will promote scientific exchange in both formal and informal settings. A number of activities are aimed at professional development and mentoring for early career scientists. There will be ample opportunity for sharing the latest research results and for networking.
The ever-increasing impact of RNA science has been fueled by its immense potential to shed light on a wide array of important problems ranging from the origin of life to the molecular basis of inter-cellular communication. Some of these advances will be among the topics featured in the scientific sessions, which will include: RNA-protein complexes, mRNA maturation, ribosome assembly and translation, and RNA-dependent regulation. In addition, some of the sessions will focus on emerging areas of RNA research such as RNA nanotechnology, and bioinformatics and systems biology of RNA.

Learn more about this project

Implementation of the Telemedicine system in Bhutan

Funding Source: WORLD HEALTH ORG (2016)

Principal Investigator: Saptarshi Purkayastha

Iraq Bioscience Fellowship Program, Mohammed Omar Baba Sheik

Funding Source: CRDF (2015)

Principal Investigator: Sarath Chandra Janga

Uncovering the bacterial contribution and their interplay during rhinovirus infection in asthma patients, through a genomics guided integrated network-based approach

Funding Source: UNIV OF WISCONSIN (2015)

Principal Investigator: Sarath Chandra Janga

Bench Fee in Support of Fellow

Funding Source: INDIAN CNCL AGR RSCH (2014)

Principal Investigator: Sarath Chandra Janga