DSI Funded Research Projects

2017 Projects:

Data science over graphs, streams, and sequences: From the analysis of fake news to prediction and intervention

Fake news and misinformation have been a serious problem for a long time and the advent of social media has made it more acute, particularly in the context of the 2016 U.S. Presidential election. This illustrates how social networks and media have started playing a fundamental role in the lives of most people--they influence the way we receive, assimilate, and share information with others. As such, online lives of users in social media tend to leave behind a trail of data which can be harnessed for driving applications that either did not exist or could not be launched effectively before. Drs. Laks Lakshmanan and Ruben Zamar will develop a multi-prong approach to detect fake news. This collaboration will include developing text mining techniques to analyze "tweets" and explore the possible use of new statistical tools for the analysis of social network and media data. In particular, they will use Twitter data to learn user intents and specifically analyze the language for discriminate features of fake news by comparing with genuine ones. This project hopes to devise strategies to combat and contain fake news propagation, reducing its consequence and harm for the effective functioning of a civil society.

Modeling multiple types of "omics" data to understand the biology of human exposure to pollution and allergens

Inhaled environmental and occupational exposures such as air pollution and allergens are known to have a profound effect on our respiratory and immunological health. This collaboration between Drs. Sara Mostafavi and Christopher Carlsten seeks to better understand how the human body responds adversely to these perturbants by developing and applying new computational models for analyses of integrated molecular data sets, collectively known as 'omics profiling (e.g., genomics, proteomics, metabolomics, epigenomics, transcriptomics, and polymorphisms). Joint analyzes of these high-dimensional data sets, derived from experimental controlled human exposures to inhaled particles and allergens, may unlock insight to better treat asthma, COPD, and other respiratory diseases.

This project is sponsored by PHIX and VCHRI.

From heuristics to guarantees: the mathematical foundations of algorithms for data science

Many of the most successful approaches commonly used in data-science applications (e.g., machine learning) come with little or no guarantees. Notable examples include convolutional neural networks (CNNs) and data-fitting formulations based on non-convex loss functions. In both cases, the training procedures are based on optimizing over intractable problems. While these methods are undeniably successful in a wide variety of machine learning and signal-processing tasks (e.g., classification of images, speech, and text), the robustness that comes with theoretical guarantees are paramount for more critical applications such as in medical diagnoses or in unsupervised algorithms embedded into electronic devices (e.g., self-driving car). This collaboration between Drs. Michael Friedlander and Ozgur Yilmaz aims to build theoretical foundations for key algorithmic approaches used in data-science applications.

A platform for interactive, collaborative, and repeatable genomic analysis

Computer systems – both hardware and software – currently represent an active barrier to the scientific investigation of genomic data. Answering even relatively simple questions requires assembling disparate software tools (for alignment, variant calling, and filtering) into an analytics pipeline, and then solving practical IT problems in order to get that pipeline to function stably and at scale. In this project, Dr. Loren Rieseberg and Andrew Warfield will employ a whole system approach for providing a framework for genomic analysis. Using an existing botany-based analysis pipeline as a starting point, and taking advantage of emerging high-density “rack-scale” computer hardware, they will refactor and extend existing genomic analysis software in order to provide a platform that moves many traditionally long-running analytical tasks to run fast enough to enable interactive analysis. This will facilitate sharing of datasets and analysis code across the research community and will provide sufficient capture of data and analysis provenance to encourage reproducibility of published results.

Application of deep learning approaches in modelling cheminformtics data and discovery of novel therapeutic agents for prostate cancer

The recent explosion of chemical and biological information calls for fundamentally novel ways of dealing with big data in the life sciences. This problem can potentially be addressed by the latest technological breakthroughs on both software and hardware frontiers. In particular, the latest advances in artificial intelligence (AI) enable cognitive data processing at very large-scale by means of deep learning (DL). In this project, Drs. Artem Cherkasvo and William Welch will develop a deep neural network (DNN) environment with a re-enforced learning component that will utilize GNU power to capture all available information on 100s of millions of existing small molecules (including their interactions with proteins and other cell components). The ultimate goal is to develop an “all chemistry on one chip” expert system that can accurately generate structures of a small molecule with user-defined biological, physical and chemical properties. Such a cognitive AI platform can be integrated with already existing technologies of high-throughput synthesis (click-chemistry) to yield a paradigm-shifting ‘molecular printer’ that will revolutionize life science.

This project is sponsored by PHIX and VCHRI.

Using text analysis for chronic disease management

The diagnosis, management, and treatment of chronic diseases (e.g., diabetes, chronic obstructive pulmonary diseases, and heart failure) have traditionally been focused on longitudinal histories and physical examinations as primary tools of assessment, and augmented by laboratory testing and imaging. Equally important to history taking and physical examinations is the objective assessments and understanding of the contribution of the patients' states of mind to their disease states. This is historically only documented qualitatively but highly challenging to measure quantitatively. However, recent advances in data science techniques such as natural language processing are providing new opportunities. Speech and text analysis is an emerging strategy to carry out analysis of cognition, sentiments, physical symptoms and social influences for such potential quantification. Thus, Drs. Kendall Ho and Giuseppe Carenini seeks to integrate speech and text analysis into the longitudinal management of chronic diseases to maintain optimal stability, support recovery and detect deterioration. Furthermore, as researchers of the UBC Digital Emergency Medicine (DigEM), a unit within UBC Department of Emergency Medicine that is employing wearables and sensors for chronic disease management, they will also analyze synergistic measurements of speech and text analysis with sensors’ based physiologic data to gauge the states of stability of patients’ chronic diseases.

This project is sponsored by PHIX and VCHRI.