Data Science for Social Good

The Data Science Institute (DSI) at UBC offers the Data Science for Social Good (DSSG) Summer Fellowships—modelled after successful programs at the University of Chicago, Georgia Tech and the University of Washington. This 14-week, full-time summer program is designed to bring together undergraduate students and graduate students from diverse backgrounds with experience in data science, urban research and planning, design thinking and other domains to work on focused, collaborative research projects that have the potential to benefit society.

2019 Projects

Open All  


Sponsor: City of Surrey

Description: The City of Surrey is undertaking the development of an Electric Vehicle Strategy as a core element of the City’s approach to reducing GHG emissions. As a highly auto-dependent community, electric vehicles will be critical to the City’s success in mitigating climate change. This project will involve the exploratory analysis of multiple datasets to help City staff better understand key vehicle stock, demographic, and sociotechnical characteristics necessary to ground the EV Strategy in our current reality, and to develop impactful, strategic policies and programs. Using the team’s findings, the project lead is also seeking a set of data visualizations that deftly communicate some of the most important finding and the key stories they tell. In both cases, the project lead will spend time with the student team to develop an understanding of what matters in energy system transformation and the passenger vehicle sector, and work with the team to design visualizations that make an impact.

This work builds on a DSSG project in 2018 that involved a lighter and less-focused analysis of multiple datasets, as part of a project more focused on building a workable projection tool. Some of the findings in the exploratory analysis provided such valuable insight, that City staff is pursuing a more comprehensive data analysis than has ever been done to inform a city’s electric vehicle strategy before.

In addition to the passenger EV focus, the City is working alongside organizations like BC Hydro and Renewable Cities to better understand the medium/heavy-duty vehicle sector and how to transition it to zero-emission technologies in the future. A smaller aspect of this project will involve exploratory analysis on multiple datasets to help the City better understand the existing and time-series changes in these vehicles in Surrey.


Sponsor: MetroVancouver

Description: British Columbia, like the rest of the world, is experiencing unprecedented declines in biodiversity. Simultaneously, our knowledge about where species are and which habitat is most important to them is patchy and difficult for local governments to access and consider when making decisions. To address this challenge, urban planners across the Metro Vancouver region need tools that help practitioners consider and plan for biodiversity.

While data on species occurrence (plant and animal; common and rare) in Metro Vancouver is distributed across many platforms, in different formats, and therefore challenging for local government planners and other practitioners to use in decision-making. A single source of this consolidated knowledge does not exist.

This project will create a consolidated spatial database and web mapping viewer that brings together these disparate data sources into one location for planners and others (e.g., researchers, public) to view and manipulate. This tool would greatly assist in incorporating biodiversity in land use planning.


Sponsor: National Energy Board of Canada

Description: The National Energy Board is Canada’s energy regulator. We regulate pipelines, energy development and trade from coast to coast to coast. One of our key data management systems is called REGDOCS, in it contains a wealth of information – including the transcripts of our environmental assessment hearings and oral traditional evidence from indigenous communities.

We are in the process of consolidating our transcript data from 1959 to present day (i.e. the National Energy Board approves energy projects as an administrative court) together into a single data file and would like assistance in making this an accessible tool for staff, communities, and researchers to analyze this information.

This project will assist in the development of this tool or parts by:

a) transforming the data into Linked Open Data (if this is not already completed by our internal team prior to the commencement of this project)
b) creating a visual tool to create “community profiles” that highlight concerns and issues by indigenous communities, landowner or geographic location
c) creating a method to preserve and share information back to communities
d) identifying trends that can help the regulator be more responsive in addressing community issues

This project was inspired by web applications such as this one that turns data into a tool that educates individuals about residential schools.


Sponsor: BC Centre for Disease Control

Description: While laboratory tests have been standardized to LOINC coding schemes, the results of those tests are often unstructured text, or where result codes exist, many different codes can be used to represent the same result.  New result codes are often created when small differences in the associated text are needed (e.g. test negative for WNV vs. test negative for WNV-convalescent sera required).  Our data warehouse team works with lab personnel to review and classify all result codes as either positive, negative or indeterminate.  This is a time-intensive process, but necessary to determine if someone meets the case definition for a reportable communicable disease.

This project will build on the work of the 2018 DSSG project to apply machine learning techniques to laboratory data in order to classify the results automatically.  In phase 1, exploratory work was done on various techniques to prepare the data and apply machine learning algorithms, MetaMap was used to tag concepts to assist the algorithms, and a pipeline was coded to classify data.

In this phase, we will tweak the configuration of MetaMap to optimize the input to the machine learning algorithms, attempt to improve the accuracy of the existing pipeline that classifies test outcomes, and investigate the use of a terminology dictionary to improve microorganism classification results.

2019 Fellows

Raghav Aggarwal, BSc - Computer Science
Alex Chow, BA - Sociology
Huaiwen Dong, BSc - Statistics
Iris Gao, BSc - Computer Science & Statistics
Laura Greenstreet, BSc - Computer Science & Mathematics
Yingqiu Kuang, PhD - Political Science
Eugenie Lai, BA - Business & Computer Science
Jackie Lam, BA - Economics
Tae Yoon Lee, MSc - Statistics
Lesley Miller, BSc - Land & Food Systems
Alexi Rodriguez-Arelis, PhD - Statistics
Gabriel Smith, PhD - Psychology



2018 Data Science for Social Good Program

2017 Data Science for Social Good Program



BC Centre for Disease Control

City of Surrey


National Energy Board of Canada

Applied Statistics and Data Science Group (ASDa)


The 2019 DSSG Program is supported by:


For questions about this program, email