Automated diagnosis and prognostication of severity in COPD via deep learning frameworks using multi-modal data
Chronic Obstructive Pulmonary Disease (COPD) is a progressive, debilitating, chronic respiratory disease. It is currently the 4th leading cause of mortality and is responsible for 100,000 hospitalizations and 10,000 deaths annually in Canada, and 3 million deaths worldwide. Although our understanding of COPD pathogenesis has improved substantially over the past 20 years, there is a notable lack of treatments that can modify disease progression and reduce mortality. Furthermore, current methods to clinically diagnose COPD are non-specific and insufficient to advance knowledge. This project will build on the recent successes of advanced machine learning (ML) techniques applied to automated image analyses of medical scans, in various medical fields, to improve COPD diagnosis and prognostication. Specifically, this project will implement and test new frameworks based on deep learning (DL) to automate staging of COPD disease severity and to predict disease progression by using multi-modal and/or heterogeneous data (e.g., non-imaging based and imaging-based data). The outcome of this project is the development of new machine learning tools to better support clinicians treating COPD patients.
Awarded to: Leonid Sigal and Roger Tam
DSI Postdoctoral Fellow: Lisa Tang
Large-scale Bayesian modelling of drug resistance and evolution in human cancers at single-cell resolution
Recent advances in next generation sequencing (NGS) technologies have led to the ability to measure gene expression and DNA mutations across thousands of cells in cancer tumors at the single-cell level. This allows us to quantify the effect of chemotherapeutic drugs on the way tumors mutate and answer questions about why particular groups of cells (known as clones) evade treatment and cause relapse. However, the vast quantities of data produced by such measurements combined with the low signal-to-noise ratio makes analysis and interpretation particularly difficult. This project aims to develop a suite of state-of-the-art Bayesian methods (e.g., sequential Monte Carlo (SMC) and black-box variational inference) for learning from single-cell cancer genomics data with a focus on scalable inference to help address these challenges. Development of these tools will enable precision medicine by equipping clinicians the ability to better predict which treatment(s) will work best, and adjust appropriately, for each individual cancer patient.
Awarded to: Alex Bouchard-Côté and Sohrab Shah
DSI Postdoctoral Fellow: Kieran Campbell
Leveraging more accurate and flexible discourse structures in question-answering and summarization
Existing systems for critical NLP tasks like question-answering and summarization are still unable to accurately uncover and effectively leverage the discourse structure of text; i.e., how clauses and sentences are related to each other in a document. This is a serious limitation in that relationships between clauses and sentences carries important information, which allows the text to express a meaning as a whole, beyond the sum of its parts. The goal of discourse parsing is to automatically determine the coherence structure of text. In essence, a discourse parser takes a document as input and returns its discourse structure, or tree, showing how clauses and sentences are related to each other, via the use of various discourse relations. In this project, Dr. Carenini's team seek to improve discourse parsing performance and to apply discourse parsing outputs to improve the performance of other NLP tasks, with a specific focus on state-of-the-art approaches to Q&A systems and text summarization.
Awarded to: Giuseppe Carenini
Trainees: Patrick Huber (PhD candidate), Wen Xiao (MSc candidate)
This project is sponsored by the DSI-Huawei Research Program
Computer vision and machine learning techniques for video and facial understanding
In this project, Drs. Sigal and Schmidt are pursuing a number of research goals at the intersection of computer vision and machine learning. In part one, the team will advance automatic video summarization by exploring novel richer joint video-linguistic and graph-structured representations to facilitate video retrieval, summarization and--potentially--action recognition tasks. In part two, the team will develop generative models that are able to effectively "imagine" what images of faces or objects would look like in a canonical (e.g., frontal face image in face recognition), or more broadly, any view or unobstructed configuration. In the last section of this project, the team aims to develop much faster methods for deep neural networks underlying computer vision systems (such as those applied in part one and two) by tuning the “parameters” of deep neural networks and tuning the “hyper-parameters”—this includes choosing the structure of the network and other design choices by developing an automated technique. The outcomes of this project will result in significant improvements to computer vision performance and runtime and applications for surveillance applications.
Awarded to: Mark Schmidt & Leonid Sigal
This project is sponsored by the DSI-Huawei Research Program
Knowledge Graphs – Mining, Cleaning and Maintenance
Extraction of knowledge from information sources ranging from unstructured and semi-structured, to structured has gained significant interest both in academia and in the industry. This is fueled by applications such as question answering and computational fact checking. Knowledge graphs (KG) have lately emerged as a de facto standard for knowledge representation, whereby knowledge is expressed as a collection of “facts", represented in the form of (subject, predicate, object) triples where subject and object are entities and predicate is a relation between those entities. This collection can be conveniently stored, queried, and maintained as a graph, with the entities modeled as vertices and relations as links or edges. In this project, Dr. Lakshmanan and his team will mine a large KG from information sources, with an emphasis on publicly available documents—including structured sources such as tables. They will also develop techniques for cleaning the KG and for maintaining it against updates. Finally, they will exploit the resulting KG in applications of question answering and computational fact checking, both of which will leverage the pattern search capabilities of a knowledge graph.
Awarded to: Laks Lakshmanan
Trainees: Michael Simpson (PDF), Sarah Habashi (MSc candidate)
This project is sponsored by the DSI-Huawei Research Program
Leveraging eye-tracking data to improve reliable detection of Alzheimer’s Disease and related patient’s states
Reliable detection of disease in the early stages of Alzheimer’s Disease (AD) continues to be a challenge. This project led by Drs. Conati and Field aims to investigate the value of eye-tracking data as one of the sources of information to build machine learning detectors of AD. In addition, the team will investigate eye-tracking based detectors of AD-related states such confusion and distress during naturally occurring tasks. This project aims to 1) validate the concept of using spontaneous speech and eye tracking data as clinical markers for early detection of AD through the use of machine-learning algorithms; and 2) investigate ways to increase detection accuracy by exploring both alternative machine learning settings, as well as alternative diagnostic tasks used for data collection. Eventually, the goal is to develop software that can detect states of confusion or distress in AD patients during day-to-day activities (e.g., reading an article) and automatically trigger interventions aimed at reducing the levels of discomfort and stress.
Awarded to: Cristina Conati and Thalia Field
Using contact networks, administrative, and linked genomic data to understand tuberculosis transmission in BC
Tuberculosis (TB) is still a problem in British Columbia, with approximately 250 cases diagnosed each year. In order to meet the WHO’s goal of achieving TB pre-elimination by 2030, TB rates in BC need to decline at a faster rate, and a change in how we manage TB prevention and care in the province is needed. Fortunately, all TB-related laboratory, epidemiology, clinical, and public health activities are centralized at the BC Centre for Disease Control (BCCDC). This provides a unique opportunity to harness this data to understand TB transmission in BC, which can in turn inform public health policy and action. This project, led by Drs. Jennifer Gardy and Matias Salibian-Barrera seeks to develop and implement a predictive analytics platform into the TB Services Program within the BCCDC. Specifically, it will explore whether features such as i) contact or transmission network properties (static or over time), ii) clinical/epidemiological/demographic attributes of early cases within a network, and/or iii) genomic data can be used to predict whether a newly diagnosed case is likely to lead to a sustained outbreak. In addition, the team will explore whether patterns of patient interaction with the healthcare system can be used to infer potentially undiagnosed TB infections.
Awarded to: Jennifer Gardy and Matías Salibián-Barrera
Data science over graphs, streams, and sequences: From the analysis of fake news to prediction and intervention
Fake news and misinformation have been a serious problem for a long time and the advent of social media has made it more acute, particularly in the context of the 2016 U.S. Presidential election. This illustrates how social networks and media have started playing a fundamental role in the lives of most people--they influence the way we receive, assimilate, and share information with others. As such, online lives of users in social media tend to leave behind a trail of data which can be harnessed for driving applications that either did not exist or could not be launched effectively before. This project will develop a multi-prong approach to detect fake news. It will develop text mining techniques to analyze "tweets" and explore the possible use of new statistical tools for the analysis of social network and media data. In particular, Twitter data will be used to learn user intents and specifically analyze the language for discriminate features of fake news by comparing with genuine ones. This project aims to devise strategies to combat and contain fake news propagation, reducing its consequence and harm for the effective functioning of a civil society.
Awarded to: Laks Lakshmanan and Ruben Zamar
DSI Postdoctoral Fellow: Ezequiel Smucler
Modeling multiple types of "omics" data to understand the biology of human exposure to pollution and allergens
Inhaled environmental and occupational exposures such as air pollution and allergens are known to have a profound effect on our respiratory and immunological health. This collaborative project seeks to better understand how the human body responds adversely to these perturbants by developing and applying new computational models for analyses of integrated molecular data sets, collectively known as 'omics profiling (e.g., genomics, proteomics, metabolomics, epigenomics, transcriptomics, and polymorphisms). Joint analyzes of these high-dimensional data sets, derived from experimental controlled human exposures to inhaled particles and allergens, may unlock insight to better treat asthma, COPD, and other respiratory diseases.
Awarded to: Chris Carlsten and Sara Mostafavi
DSI Postdoctoral Fellow: Zahra Jalali
From heuristics to guarantees: the mathematical foundations of algorithms for data science
Many of the most successful approaches commonly used in data-science applications (e.g., machine learning) come with little or no guarantees. Notable examples include convolutional neural networks (CNNs) and data-fitting formulations based on non-convex loss functions. In both cases, the training procedures are based on optimizing over intractable problems. While these methods are undeniably successful in a wide variety of machine learning and signal-processing tasks (e.g., classification of images, speech, and text), the robustness that comes with theoretical guarantees are paramount for more critical applications such as in medical diagnoses or in unsupervised algorithms embedded into electronic devices (e.g., self-driving car). This project aims to build theoretical foundations for key algorithmic approaches used in data-science applications.
Awarded to: Michael Friedlander and Ozgur Yilmaz
DSI Postdoctoral Fellow: Halyun Jeong
A platform for interactive, collaborative, and repeatable genomic analysis
Computer systems – both hardware and software – currently represent an active barrier to the scientific investigation of genomic data. Answering even relatively simple questions requires assembling disparate software tools (for alignment, variant calling, and filtering) into an analytics pipeline, and then solving practical IT problems in order to get that pipeline to function stably and at scale. This project will employ a whole system approach for providing a framework for genomic analysis. By building on an existing botany-based analysis pipeline and exploiting emerging high-density “rack-scale” computer hardware, the project will refactor and extend existing genomic analysis software in order to provide a platform that moves many traditionally long-running analytical tasks to run fast enough to enable interactive analysis. This will facilitate sharing of datasets and analysis code across the research community and will provide sufficient capture of data and analysis provenance to encourage reproducibility of published results.
Awarded to: Loren Rieseberg and Andy Warfield
DSI Postdoctoral Fellow: Jean-Sébastien Légaré
Application of deep learning approaches in modelling cheminformatics data and discovery of novel therapeutic agents for prostate cancer
The recent explosion of chemical and biological information calls for fundamentally novel ways of dealing with big data in the life sciences. This problem can potentially be addressed by the latest technological breakthroughs on both software and hardware frontiers. In particular, the latest advances in artificial intelligence (AI) enable cognitive data processing at very large-scale by means of deep learning (DL). This project will develop a deep neural network (DNN) environment with a re-enforced learning component that will utilize GNU power to capture all available information on 100s of millions of existing small molecules (including their interactions with proteins and other cell components). The ultimate goal is to develop an “all chemistry on one chip” expert system that can accurately generate structures of a small molecule with user-defined biological, physical and chemical properties. Such a cognitive AI platform can be integrated with already existing technologies of high-throughput synthesis (click-chemistry) to yield a paradigm-shifting ‘molecular printer’ that will revolutionize life science.
Awarded to: Artem Cherkasov and Will Welch
DSI Postdoctoral Fellow: Michael Fernandez Llamosa
Using text analysis for chronic disease management
The diagnosis, management, and treatment of chronic diseases (e.g., diabetes, chronic obstructive pulmonary diseases, and heart failure) have traditionally been focused on longitudinal histories and physical examinations as primary tools of assessment, and augmented by laboratory testing and imaging. Equally important to history taking and physical examinations is the objective assessments and understanding of the contribution of the patients' states of mind to their disease states. This is historically only documented qualitatively but highly challenging to measure quantitatively. However, recent advances in data science techniques such as natural language processing are providing new opportunities. Speech and text analysis is an emerging strategy to carry out analysis of cognition, sentiments, physical symptoms and social influences for such potential quantification. Thus, this project seeks to integrate speech and text analysis into the longitudinal management of chronic diseases to maintain optimal stability, support recovery and detect deterioration. Furthermore, the project will analyze synergistic measurements of speech and text analysis with physiologic data captured by wearables and sensors used in chronic disease management to gauge the states of stability of patients’ chronic diseases.
Awarded to: Kendall Ho and Giuseppe Carenini
DSI Postdoctoral Fellow: Hyeju Jang
User Modeling and Adaptive Support for MOOCS
Massive open on-Line courses (MOOCS) have great potential to innovate education, but suffer from one key limitation typical of many on-line learning environments: lack of personalization. Intelligent Tutoring Systems (ITS) is a field that leverages Artificial Intelligence and Machine Learning to devise educational tools that can provide instruction tailored to the needs of individual learners, as good teachers do. In this project, Drs. Conati and Roll aim to apply some of the concepts and technique from ITS research to MOOCS. Specifically, in previous work they have developed a framework to: i) discover from data, students behaviors that can be detrimental for or conducive to learning with specific educational software; ii) use these behaviors to build classifiers that can detect ineffective learners in real time and provide personalized support accordingly. They have already successfully applied this framework to two different on-line educational tools, and now plan to extend it to make existing MOOCS more reactive to specific student needs.
Awarded to: Cristina Conati and Ido Roll
DSI Postdoctoral Fellow: Sébastien Lallé