In 1999, the then-CEO of Sun Microsystems, Scott McNealy, famously said, "You have zero privacy anyway. Get over it." To a degree, McNealy was quite prescient. Let's remember, he said this years before people started sharing their most personal information in droves with Facebook, Instagram, Gmail, and myriad other social media platforms.
Yet with all that, there are still plenty of privacy needs. From financial and medical records to US census data and more, a massive amount of data needs to be protected. Protection may be the easiest part, but how does an organization share data in a secure manner to ensure the privacy of the data subjects?
In Differential Privacy (MIT Press), author Simson Garfinkel has written an interesting guide that will introduce many to a robust privacy framework that can be used to share statistical information while protecting the privacy of individual data subjects.
Differential Privacy (DP) is a mathematical approach for statistical disclosure control, but it is not a single mathematical algorithm or process. Instead, DP is a framework based on a formal definition of privacy loss. This formal definition is the breakthrough Adam Smith, one of the inventors of DP, introduced.
The power of DP is that it enables those holding data to share specific aggregate patterns of the data while limiting information shared about specific individuals. The need for DP is based on the reality that it is tough to fully and truly anonymize data.
Computer scientist and privacy advocate Latanya Sweeney made that eminently clear in 1997 when she performed a data re-identification experiment where she could identify the medical records of then-Massachusetts governor William Weld.
Her landmark paper Only You, Your Doctor, and Many Others May Know matched patient names to publicly available health data sold by Washington State. Sweeney showed how poor data-sharing practices at the time, combined with a false sense of adequate data anonymization, coalesced into a privacy disaster.
While Garfinkel may be one of the biggest DP cheerleaders, he's quite pragmatic about the challenges of using it. He writes that while DP is almost 20 years old, scientific inventions in general, and Information Security tools specifically, take significant time to go mainstream.
Although public-key cryptography was invented by Ron Rivest, Adi Shamir, and Leonard Adleman (the RSA guys) in 1977, it was not until the early 2010s, over 30 years later, that the idea of encrypting all data in transit using public-key cryptography became widespread.
Garfinkel admits that DP should be widely deployed by the mid-2030s. Before that, DP deployments will require specialty software and trained consultants, which will take several years.
The slow adoption for the DP framework is similar to that of FAIR (Factor Analysis of Information Risk) from the FAIR Institute. FAIR is a very powerful quantitative risk analysis framework to assess and manage information risks. But its depth and breadth do not lead to quick rollouts.
While it will be a while before DP becomes mainstream, it has had many significant uses. Some of the biggest DP projects have been from Apple, Google, and the US Census Bureau for the 2020 census.
DP's challenge is partly due to the abstract nature of privacy. We have agreed-upon definitions and protocols for what an inch is, what a UDP packet should look like, and more. However, privacy is abstract and means different things to different people. Combine that with the challenging mathematics that underlies DP, and you have a powerful yet intimidating framework.
The power of DP has led Apple, Google, the Census Bureau, and many other organizations to take privacy seriously when implementing it. However, as Garfinkel acknowledges, DP adoption is a very slow and deliberate process.
DP is a powerful privacy tool and an idea whose time has come. It's not a perfect solution for every organization. However, for those dealing with massive data sets, it may be the balm they have long searched for to assist in their data privacy sharing initiatives.
At under 200 pages, the book is a perfect introduction to DP, and the need for privacy frameworks such as DP. For those that want a hard copy of this important reference, it’s available via the regular commercial channels. The book also comes with an open-access license and is freely available here.