Data-Driven Privacy, Fall 2015

Background and Course Description

In 2006, AOL released user search logs stripped of email addresses, account holder names and other explicitly identifying information. The New York Times promptly examined the queries in the data and identified user 4417749 as a 62 year old resident of Lilburn, GA [1]. This is one of many ‘reidentification’ attacks. More recently, researchers have shown how pseudonymous eBay users can be identified [2] and how Netflix users can be identified from their movie ratings [3].

When hearing about privacy breaches like these, it’s natural to ask what technology is available to users to protect their privacy. It’s important to consider not only the technical functionality of privacy-enhancing technologies (PETs) but also usability and the user experience -- all impact adoption. Hillary Clinton’s use of a personal email account for government business [4] demonstrates the strong influence usability can have on privacy/security choices.

Muddying the waters somewhat is controversy around whether PETs meet a user need. Some argue that despite the frequency of outcries over breaches, users do not care much about privacy. The evidence presented includes studies in which users are willing to give up sensitive information (e.g. passwords, birthdates) for candy as well as examples of over-sharing online [5]. This is often referred to as the “privacy paradox” because user privacy behavior appears at odds with user attitudes.

In this course, we will study how to use data in a principled way to understand privacy risk, attitudes and concerns. We will focus on several of the attacks, PETs and privacy measurement techniques behind this research and press, in particular:

  • Reidentification and Data Aggregation: Inference risks, attacks and protection strategies

  • Privacy-Enhancing Technologies & the User Experience: Protocols for privacy-enhanced data storage and communication, and the user experience issues that can impede adoption

  • Privacy Measurement: Usage of self-reported and behavioral data to understand the user perspective on privacy

This course will provide students with an introduction to PETs and active areas of privacy research.

Class format: Class time will be spent discussing papers that are to be read by students before class. The class will be mostly discussion driven, with few lectures.

Prerequisites: The course is geared toward graduate students, but advanced undergraduates are encouraged to join with the instructor's permission. There are no hard requirements, but experience with statistical analysis software (e.g. R) will give students more class projects to choose from. Also, undergraduate level coursework in statistics/probability and number theory may be helpful when reading the papers. The instructor will recommend statistics and number theory resources as needed, however.

Grading: Students are evaluated according to class participation and a final class project. The class project can be a team effort for teams of 2-3 students each.

Grades will be based on the following:

  • 20% Class participation

  • 20% Paper summaries (presented in class and submitted in written form)

  • 60% Class project (of the student's choosing) due at the end of the semester

CSC 591-008, Data-Driven Privacy: Schedule