Optimal Transport Enhances Data Privacy in Diffusion-based Foundation Models

headshots of researchers and graphs

Post Doc Fellows

Drs. Mi Jung Park (Computer Science) and Young-Heon Kim (Math) have been awarded the DSI Postdoctoral Matching Fund for their project "Optimal Transport Enhances Data Privacy in Diffusion-based Foundation Models".

Summary

The research proposes a novel approach to “machine unlearning” in diffusion-based foundation models (e.g., text-to-image generators) using optimal transport (OT) theory, addressing the need for data privacy. It highlights that diffusion models often memorize training data, posing privacy risks, and aims to demonstrate how OT can rigorously remove or “forget” specific data points without costly retraining.

Background

Diffusion-based models, used in applications like image and text generation, can inadvertently expose sensitive or copyrighted information, underscoring the necessity for
effective “data negating” methods. Existing official associations like GDPR (European Union’s General Data Protection Regulation) emphasize the “right to be forgotten,” propelling research on machine unlearning to ensure compliant, privacy-preserving AI systems.

Technical Challenge

Retraining large-scale models after removing data is computationally prohibitive, while approximate forgetting methods can degrade performance and lack theoretically
guarantees. Implementing OT in the unlearning process is promising but poses its own complexities, especially for high-dimensional diffusion models, requiring careful algorithmic design and theoretical aspects.

Research Goals

  • Develop provable machine unlearning techniques by leveraging OT to measure distances between distributions, ensuring robust removal of private or sensitive data while preserving model capabilities. 
  • Explore methods such as stochastic sampling strategies and “machine-unlearned sampling” to efficiently erase data distributions, enhance generalization, and maintain or improve performance in generative tasks.

Musqueam First Nation land acknowledegement

We honour xwməθkwəy̓ əm (Musqueam) on whose ancestral, unceded territory UBC Vancouver is situated. UBC Science is committed to building meaningful relationships with Indigenous peoples so we can advance Reconciliation and ensure traditional ways of knowing enrich our teaching and research.

Learn more: Musqueam First Nation

Data Science Institute

EOS Main Building
6339 Stores Road, Room 113C
dsi.admin@science.ubc.ca

Faculty of Science

Office of the Dean, Earth Sciences Building
2178–2207 Main Mall
Vancouver, BC Canada
V6T 1Z4
UBC Crest The official logo of the University of British Columbia. Urgent Message An exclamation mark in a speech bubble. Arrow An arrow indicating direction. Arrow in Circle An arrow indicating direction. A bookmark An ribbon to indicate a special marker. Calendar A calendar. Caret An arrowhead indicating direction. Time A clock. Chats Two speech clouds. External link An arrow pointing up and to the right. Facebook The logo for the Facebook social media service. A Facemask The medical facemask. Information The letter 'i' in a circle. Instagram The logo for the Instagram social media service. Linkedin The logo for the LinkedIn social media service. Lock, closed A closed padlock. Lock, open An open padlock. Location Pin A map location pin. Mail An envelope. Mask A protective face mask. Menu Three horizontal lines indicating a menu. Minus A minus sign. Money A money bill. Telephone An antique telephone. Plus A plus symbol indicating more or the ability to add. RSS Curved lines indicating information transfer. Search A magnifying glass. Arrow indicating share action A directional arrow. Spotify The logo for the Spotify music streaming service. Twitter The logo for the Twitter social media service. Youtube The logo for the YouTube video sharing service.