Data Science for Social Good

The Data Science Institute (DSI) at UBC, in partnership with the University of Washington eScience Institute, offers the Data Science for Social Good (DSSG) Summer Fellowships—modelled after successful programs at the University of Chicago, Georgia Tech and the eScience Institute. This 14-week summer program is designed to bring together undergraduate students and graduate students from diverse backgrounds with experience in data science, urban research and planning, design thinking and other domains to work on focused, collaborative research projects that have the potential to benefit society.

2018 Projects

Improving Early and Middle Childhood Outcomes

The goal of this project is to improve decision making in the areas of planning and investment across the early and middle years sector in Surrey. To achieve this goal, this project will asses the impact the City is having on the community through the use of data and measurement. For example, we will examine relationships between neighbourhood environments (e.g., community service and program registrations and access records, service work requests, built-environments), the EDI (Early Development Index) and the MDI (Middle Development Index) to better understand conditions that support childhood outcomes. Ultimately, the hope is that the project will uncover new methods to capture data and determine what are the indicators and measures that can ampliy positive change to improve outcomes for children and families.

Transportation Energy and Emissions Baseline and Forecast for Ongoing Modelling and Policy Analysis

Surrey has the highest transportation emissions out all municipalities’ greenhouse gas (GHG) emissions in the Greater Vancouver Regional District (GVRD), representing 25% of the total transportation emissions in the region. Currently, transportation emissions represent 65% of Surrey’s community GHG emissions. The goal of this project is to analyze the trends of transportation and vehicle registration to establish a baseline and business-as-usual (BAU) forecast for Surrey to 2050. For example, the project will analyze passenger vehicle data alongside demographic, transit, and property use data to help guide the design and development of targeted intermodal infrastructure (e.g. transit, fuelling infrastructure, bike lanes), and to identify areas within Surrey for potential electric vehicle adoption. Overall, a better understanding of trends in vehicle registrations and intermodal commuting options for Surrey’s current and future population can help in future development projects and improving liveability in Surrey’s neighbourhoods.

Uncovering the Hidden Universe of Rental Units in Surrey

Current information on the distribution and numbers of rental units in the City of Surrey is incomplete. This hampers the City's ability to accurately plan for community services, schools, transportation, and other infrastructure. Without a full accounting of rental units, the City also does not know the actual vacancy rate, which has policy implications in terms of affordable housing supply. The goal of this project is to assemble information (e.g., create a database and visualizations) from multiple sources and datasets to construct as complete a picture as possible. Equipped with this data, the City will review and shape policy development on a number of fronts; including parking management policies, secondary suite legalization campaigns, and affordable housing policies that specifically address the supply on rental housing.

Use of Machine Learning Techniques to Classify Laboratory Test Results

While laboratory tests have been standardized to LOINC coding schemes, the results of those tests are often unstructured text, or where result codes exist, many different codes can be used to represent the same result. New result codes are often created when small differences in the associated text are needed (e.g. test negative for WNV vs. test negative for WNV-convalescent sera required). The BC Centre for Disease Control employs a data warehouse team who works with lab personnel to review and classify all result codes as either positive, negative or indeterminate. This is a time-intensive process, but necessary to determine if someone meets the case definition for a reportable communicable disease. This project will explore whether and how machine learning techniques might be applied to laboratory data in order to automatically classify the results.


2017 Data Science for Social Good Program


Cascadia Urban Analytics Cooperative (CUAC)

eScience Institute, University of Washington

Applied Statistics and Data Science Group (ASDa)

City of Surrey

BC Centre for Disease Control

This program is sponsored by Microsoft.

Resources for Students:



For questions about this program, email