Data Science for Social Good

The Data Science for Social Good (DSSG) program at the University of British Columbia is an interdisciplinary and applied research training program that partners with public organizations to extract insights from open and proprietary data sets. This 14-week, full-time summer program is designed to bring together undergraduate students and graduate students from diverse backgrounds with experience in data science, urban research and planning, design thinking and other domains to work on focused, collaborative research projects that have the potential to benefit society. The UBC DSSG program is modelled after successful programs at the University of Chicago, Georgia Tech and the University of Washington.

2019 Projects

Close All  

DEVELOPING NLP TOOLS FOR SHARING OF INDIGENOUS AND COMMUNITY KNOWLEDGE

Sponsor: National Energy Board of Canada

Description: The National Energy Board is Canada’s energy regulator. We regulate pipelines, energy development and trade from coast to coast to coast. One of our key data management systems is called REGDOCS, in it contains a wealth of information – including the transcripts of our environmental assessment hearings and oral traditional evidence from indigenous communities.

We are in the process of consolidating our transcript data from 1959 to present day (i.e. the National Energy Board approves energy projects as an administrative court) together into a single data file and would like assistance in making this an accessible tool for staff, communities, and researchers to analyze this information.

This project will assist in the development of this tool or parts by:

a) transforming the data into Linked Open Data (if this is not already completed by our internal team prior to the commencement of this project)
b) creating a visual tool to create “community profiles” that highlight concerns and issues by indigenous communities, landowner or geographic location
c) creating a method to preserve and share information back to communities
d) identifying trends that can help the regulator be more responsive in addressing community issues

This project was inspired by web applications such as this one that turns data into a tool that educates individuals about residential schools.

Final presentation: Download here

EXPLORATORY DATA ANALYSIS AND VISUALIZATION FOR SURREY’S ELECTRIC VEHICLE STRATEGY AND HEAVY-DUTY VEHICLE APPROACH

Sponsor: City of Surrey

Description: The City of Surrey is undertaking the development of an Electric Vehicle Strategy as a core element of the City’s approach to reducing GHG emissions. As a highly auto-dependent community, electric vehicles will be critical to the City’s success in mitigating climate change. This project will involve the exploratory analysis of multiple datasets to help City staff better understand key vehicle stock, demographic, and sociotechnical characteristics necessary to ground the EV Strategy in our current reality, and to develop impactful, strategic policies and programs. Using the team’s findings, the project lead is also seeking a set of data visualizations that deftly communicate some of the most important finding and the key stories they tell. In both cases, the project lead will spend time with the student team to develop an understanding of what matters in energy system transformation and the passenger vehicle sector, and work with the team to design visualizations that make an impact.

This work builds on a DSSG project in 2018 that involved a lighter and less-focused analysis of multiple datasets, as part of a project more focused on building a workable projection tool. Some of the findings in the exploratory analysis provided such valuable insight, that City staff is pursuing a more comprehensive data analysis than has ever been done to inform a city’s electric vehicle strategy before.

In addition to the passenger EV focus, the City is working alongside organizations like BC Hydro and Renewable Cities to better understand the medium/heavy-duty vehicle sector and how to transition it to zero-emission technologies in the future. A smaller aspect of this project will involve exploratory analysis on multiple datasets to help the City better understand the existing and time-series changes in these vehicles in Surrey.

Final presentation: Download here

INCREASING ACCESSIBILITY OF BIODIVERSITY DATA IN METRO VANCOUVER

Sponsor: MetroVancouver

Description: British Columbia, like the rest of the world, is experiencing unprecedented declines in biodiversity. Simultaneously, our knowledge about where species are and which habitat is most important to them is patchy and difficult for local governments to access and consider when making decisions. To address this challenge, urban planners across the Metro Vancouver region need tools that help practitioners consider and plan for biodiversity.

While data on species occurrence (plant and animal; common and rare) in Metro Vancouver is distributed across many platforms, in different formats, and therefore challenging for local government planners and other practitioners to use in decision-making. A single source of this consolidated knowledge does not exist.

This project will create a consolidated spatial database and web mapping viewer that brings together these disparate data sources into one location for planners and others (e.g., researchers, public) to view and manipulate. This tool would greatly assist in incorporating biodiversity in land use planning.

Final presentation: Download here

USE OF MACHINE LEARNING TECHNIQUES TO CLASSIFY LABORATORY TEST RESULTS (PHASE 2)

Sponsor: BC Centre for Disease Control

Description: While laboratory tests have been standardized to LOINC coding schemes, the results of those tests are often unstructured text, or where result codes exist, many different codes can be used to represent the same result.  New result codes are often created when small differences in the associated text are needed (e.g. test negative for WNV vs. test negative for WNV-convalescent sera required).  Our data warehouse team works with lab personnel to review and classify all result codes as either positive, negative or indeterminate.  This is a time-intensive process, but necessary to determine if someone meets the case definition for a reportable communicable disease.

This project will build on the work of the 2018 DSSG project to apply machine learning techniques to laboratory data in order to classify the results automatically.  In phase 1, exploratory work was done on various techniques to prepare the data and apply machine learning algorithms, MetaMap was used to tag concepts to assist the algorithms, and a pipeline was coded to classify data.

In this phase, we will tweak the configuration of MetaMap to optimize the input to the machine learning algorithms, attempt to improve the accuracy of the existing pipeline that classifies test outcomes, and investigate the use of a terminology dictionary to improve microorganism classification results.

Final presentation: Download here

 

2019 Fellows

 

Raghav Aggarwal

Raghav is an undergrad in Computer Science entering his third year this fall. He transferred from the Indian Institute of Technology Kanpur where he researched opinion mining and sentiment analysis last summer. With DSSG, he aims to broaden his data manipulation and visualization skills while contributing to society. He also wishes to expand his network and understand the perspectives of people in business and arts. He contemplates that data science would be able to solve problems in his home country. In his leisure time, he likes to read, play the guitar and run.

Alex Chow

Alex is a fourth-year honours Sociology student who specializes in quantitative research methods and higher education research. He has a keen interest in using data to identify social inequalities in existing institutions. Through the DSSG program, Alex hopes to learn from and work with the other amazing DSSG fellows to conduct research that is directly applicable to benefiting socially marginalized communities, while honing his technical skills in data and statistical analysis.

Huaiwen Dong

Huaiwen is a recent BSc graduate majoring in Statistics. She developed solid theoretical and technical skills in statistics and data science during her undergraduate years, but the most valuable university experience for her is to explore her interests in social science disciplines. Her internship experience as a macroeconomic researcher and research assistant experience in Sauder School of Business have inspired her in the crucial role of quantitative fields in solving social, political and economic issues as the digital era has brought us channels to generate huge data and methods to acquire, store and explain them. She hopes explore more possibilities in both fields in the DSSG program.

 

Iris Gao

Iris is a fourth-year student studying Mathematical Sciences at UBC. She has gained experience in data analysis through her co-op experience in Vancouver Coastal Health and her research project with UBC Earth and Ocean Sciences. She believes that data science will play a major role in social development and changing the dynamics in many industries. Her goal as a DSSG fellow is to further develop her skills in machine learning as well as to make an impact on the local health care community. Besides work, Iris enjoys dancing, playing music and cooking.

Laura Greenstreet

Laura will soon be finishing her BSc with honours in Computer Science and Mathematics. She is interested in interdisciplinary problems, especially sustainable resource usage, and its social, environmental and economic impacts. Having previously worked for a non-profit in the agriculture sector, Laura is excited to continue to work on sustainability issues. She is excited to practice and learn new data analysis techniques in the DSSG program and is looking forward to gaining hands-on experience integrating and analyzing multiple datasets. After graduation, Laura plans to pursue a graduate degree in Computer Science. Inspired by her previous non-profit work, Laura plans to return to the non-profit or governmental sectors after completing her graduate studies.

Yingqiu Kuang

Yingqiu Kuang is a PhD Candidate at UBC in political science. Her dissertation research explores emerging economies and global technology governance in international political economy, with a focus on multinational corporations and technology standardization in China and South Korea. She is also a Mitacs Research Fellow at the Asia Pacific Foundation of Canada, where she practiced various diverse technical skills to analyze Canada-China clean-tech innovations, bilateral trade and investment, and Canada-China Track II dialogue on energy. Before she came to UBC, she finished her undergraduate study in world history and international relations at Peking University and Waseda University. As a DSSG fellow, she hopes to further learn and apply data analysis and computer programming techniques for social good projects and her own research.

 

Eugenie Lai

Eugenie is a fourth-year undergraduate student in Business and Computer Science Combined Major. She has enjoyed doing data analysis in the past and is looking forward to putting her business knowledge into practice. She also hopes to pursue graduate studies in data mining, data engineering, or business intelligence. In her spare time, she enjoys baking, hiking, and movies.

Jackie Lam

Jackie recently completed his Bachelor's of International Economics with a minor in Mathematics. Through his studies, he has experience in data analytics in the area of finance and economics. Working as a research assistant, he further augmented his knowledge and gained experience in Natural Language Processing. Jackie hopes to diversify and improve his skillset through the DSSG program and pave his way into a career as a data scientist.

Taeyoon (Harry) Lee

Harry is a MSc student in Statistics. While his research topics are theoretical and foundational in nature, he has developed a great passion for data-driven applications for social good through consulting, data science competitions, and modelling workshops. This same passion led him to an internship at the UN, where he, as a sole statistician, proposed a method and designed a survey for an impact evaluation study of financial training and products on garment factory workers in Laos. Moreover, recently he completed a computer science course, called AI for Social Impact, in which students worked with the City of Vancouver and identified parking inefficiencies using a diverse set of AI and statistical tools. He is ready for another adventure with DSSG.

 

Lesley Miller

Lesley has just completed her Bachelor of Science degree in the Faculty of Land and Food Systems. During her undergraduate, she worked in two genomics labs focused on biodiversity research as well as contributed to a team tasked with understanding food system resilience in Vancouver. These experiences fostered a strong desire to use her computational skills on projects that promote sustainability, protect the environment and improve the lives of everyday people. As a DSSG Fellow, she is excited to expand her technical skill set while also working on a project that will benefit the public good. When she is not programming or analyzing data, she enjoys cycling and running in Vancouver’s outdoor beauty.

Alexi Rodriguez Arelis

Alexi is a PhD candidate at UBC in the Statistics program at the Faculty of Science. He has a background in Industrial Engineering as well as Applied Statistics. His current research is focused on computer experiments that emulate scientific and engineering systems with Gaussian stochastic processes. He has developed modeling reformulations that seek to improve prediction accuracy, while relying on dimensional analysis and statistical principles from the first half of the 20th century. Alexi hopes to use his time as a DSSG fellow to broaden his current statistical consulting experience in collaborative projects that benefit society.

Gabriel Smith

Gabriel is a PhD student in UBC's Department of Psychology, whose research focuses on mind wandering and the neural mechanisms of thought production. He is also pursuing the quantitative methods specialization in psychology, and hopes to unite his two areas of interest by developing advanced statistical methods with which to analyze both behavioural and neuroimaging data (e.g., machine learning techniques that can be applied to fMRI data). Originally from Dartmouth, Nova Scotia, he loves dogs, board games, Thai food and snow.

 

Partners:

BC Centre for Disease Control

City of Surrey

MetroVancouver

National Energy Board of Canada

Applied Statistics and Data Science Group (ASDa)

 

The 2019 DSSG Program is supported by:

 

Archive:

2018 Data Science for Social Good Program

2017 Data Science for Social Good Program

 

For questions about this program, email dsi.admin@science.ubc.ca.