Data science over graphs, streams, and sequences: From the analysis of fake news to prediction and intervention
Fake news and misinformation have been a serious problem for a long time and the advent of social media has made it more acute, particularly in the context of the 2016 U.S. Presidential election. This illustrates how social networks and media have started playing a fundamental role in the lives of most people--they influence the way we receive, assimilate, and share information with others. As such, online lives of users in social media tend to leave behind a trail of data which can be harnessed for driving applications that either did not exist or could not be launched effectively before. Drs. Laks Lakshmanan and Ruben Zamar will develop a multi-prong approach to detect fake news. This collaboration will include developing text mining techniques to analyze "tweets" and explore the possible use of new statistical tools for the analysis of social network and media data. In particular, they will use Twitter data to learn user intents and specifically analyze the language for discriminate features of fake news by comparing with genuine ones. This project hopes to devise strategies to combat and contain fake news propagation, reducing its consequence and harm for the effective functioning of a civil society.
Modeling multiple types of "omics" data to understand the biology of human exposure to pollution and allergens
Inhaled environmental and occupational exposures such as air pollution and allergens are known to have a profound effect on our respiratory and immunological health. This collaboration between Drs. Sara Mostafavi and Christopher Carlsten seeks to better understand how the human body responds adversely to these perturbants by developing and applying new computational models for analyses of integrated molecular data sets, collectively known as 'omics profiling (e.g., genomics, proteomics, metabolomics, epigenomics, transcriptomics, and polymorphisms). Joint analyzes of these high-dimensional data sets, derived from experimental controlled human exposures to inhaled particles and allergens, may unlock insight to better treat asthma, COPD, and other respiratory diseases.
This project is sponsored by PHIX and VCHRI.
From heuristics to guarantees: the mathematical foundations of algorithms for data science
Many of the most successful approaches commonly used in data-science applications (e.g., machine learning) come with little or no guarantees. Notable examples include convolutional neural networks (CNNs) and data-fitting formulations based on non-convex loss functions. In both cases, the training procedures are based on optimizing over intractable problems. While these methods are undeniably successful in a wide variety of machine learning and signal-processing tasks (e.g., classification of images, speech, and text), the robustness that comes with theoretical guarantees are paramount for more critical applications such as in medical diagnoses or in unsupervised algorithms embedded into electronic devices (e.g., self-driving car). This collaboration between Drs. Michael Friedlander and Ozgur Yilmaz aims to build theoretical foundations for key algorithmic approaches used in data-science applications.