Blessings and curses of overparameterized learning: Optimization and generalization principles

Researcher headshots overlayed on top of a graphic of nodes and lines.

Post Doc Fellows

Drs. Christos Thrampoulidis and Mark Schmidt are teaming up to address unresolved challenges in the training of neural networks and its applications. With this postdoctoral funding award from the Data Science Institute, the team will combine their expertise in optimization and high-dimensional statistical learning theory to design more efficient training algorithms that are better suited for real-world use.

Summary: Deep neural networks are often exceedingly overparameterized and are often trained without explicit regularization. What are the principles behind their state-of-the-art performance? How do different optimization algorithms and learning schedules affect generalization? Does overparameterization speed up convergence? What is the role of the loss function on training dynamics and accuracy? How do the answers to these questions depend on the data distribution (eg. data imbalances)? This project aims to shed light on these questions by adapting a joint optimization and statistical viewpoint. Despite admirable progress over the past couple of years, existing theories are mostly limited to simplified data and architecture models (eg. binary classification, linear or linearized architectures, random feature models, etc.). The goal of the project is to extend the theory to more realistic architectures (eg. tangent-kernel regime, shallow and deep neural networks) and data models (eg. multiclass, imbalanced).

Details: This project will shine light on the fundamental statistical and optimization principles of modern machine-learning (ML) methods that aim to address system-design requirements on reliability, distributional robustness, safety, and resource efficiency. Such requirements become critical as we aspire to use data-driven ML algorithms to create automated decision rules in more aspects of everyday life. For example, in applications that directly involve data about people, such as decisions on who is granted a loan or who gets hired, we need to ensure fairness against demographic imbalances that exist in our society and translate to data. ML algorithms used for perception tasks in self-driving cars need to be safe against disturbances by adversaries. To effectively use modern deep-learning models – which are increasingly more complex, thus computationally expensive – in resource constrained platforms such as mobile devices, we need to carefully balance accuracy and resource efficiency.

Research Goals:

  • Design efficient training algorithms and study their generalization for distributed overparameterized learning. We will extend the theory of implicit regularization to distributed settings. We are particularly interested in the effect of system constraints such as limited communication-bandwidth and limited memory constraints.
  • Development of statistically grounded theory for training neural networks under data imbalances. In overparameterized learning, classical approaches to mitigate imbalances, such as weight normalization loss-adjustments, fail. Building on our recent works, we will propose alternative loss-adjustments and study the corresponding training dynamics and the induced generalization performance with respect to appropriate fairness metrics.
  • Out-of-domain generalization remains a significant challenge for machine learning models. Approaches that aim to train invariant representations across domains are formulated in terms of non-convex optimization objectives whose optimization and generalization principles remain unexplored even in simple settings. We aim to understand the generalization principles of these methods building on appropriate extensions of our statistical models to account for domain shift. On the optimization front, we will investigate complexity lower bounds when using stochastic first-order methods. We also expect our study to have implications about the closely related problem of adversarial learning.

Musqueam First Nation land acknowledegement

We honour xwməθkwəy̓ əm (Musqueam) on whose ancestral, unceded territory UBC Vancouver is situated. UBC Science is committed to building meaningful relationships with Indigenous peoples so we can advance Reconciliation and ensure traditional ways of knowing enrich our teaching and research.

Learn more: Musqueam First Nation

Data Science Institute

EOS Main Building
6339 Stores Road, Room 113C
dsi.admin@science.ubc.ca

Faculty of Science

Office of the Dean, Earth Sciences Building
2178–2207 Main Mall
Vancouver, BC Canada
V6T 1Z4
UBC Crest The official logo of the University of British Columbia. Urgent Message An exclamation mark in a speech bubble. Arrow An arrow indicating direction. Arrow in Circle An arrow indicating direction. A bookmark An ribbon to indicate a special marker. Calendar A calendar. Caret An arrowhead indicating direction. Time A clock. Chats Two speech clouds. External link An arrow pointing up and to the right. Facebook The logo for the Facebook social media service. A Facemask The medical facemask. Information The letter 'i' in a circle. Instagram The logo for the Instagram social media service. Linkedin The logo for the LinkedIn social media service. Lock, closed A closed padlock. Lock, open An open padlock. Location Pin A map location pin. Mail An envelope. Mask A protective face mask. Menu Three horizontal lines indicating a menu. Minus A minus sign. Money A money bill. Telephone An antique telephone. Plus A plus symbol indicating more or the ability to add. RSS Curved lines indicating information transfer. Search A magnifying glass. Arrow indicating share action A directional arrow. Spotify The logo for the Spotify music streaming service. Twitter The logo for the Twitter social media service. Youtube The logo for the YouTube video sharing service.