Differential Privacy
Differential Privacy is something that I need to learn about in order to safely create something that I want to create.
Differential Privacy Notes
Differential privacy (DP) is a mathematically rigorous framework for releasing statistical information about datasets while protecting the privacy of individual data subjects. It enables a data holder to share aggregate patterns of the group while limiting information that is leaked about specific individuals. This is done by injecting carefully calibrated noise into statistical computations such that the utility of the statistic is preserved while provably limiting what can be inferred about any individual in the dataset.
- Differential privacy algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses.
- An algorithm is differentially private if an observer seeing its output cannot tell whether a particular individual's information was used in the computation.
- ε -differential privacy is a mathematical definition for the privacy loss associated with any data release drawn from a statistical database.
- A statistical database means a set of data that are collected under the pledge of confidentiality for the purpose of producing statistics that, by their production, do not compromise the privacy of those individuals who provided the data.
- The definition of ε -differential privacy requires that a change to one entry in a database only creates a small change in the probability distribution of the outputs of measurements, as seen by the attacker. The key insight of differential privacy is that as the query is made on the data of fewer and fewer people, more noise needs to be added to the query result to produce the same amount of privacy.
Public Purpose Considerations
- Data Utility and Accuracy
- The main concern with differential privacy is the trade-off between data utility and individual privacy.
If the privacy loss parameter is set to favor utility, the privacy benefits are lowered (less “noise” is injected into the system); if the privacy loss parameter is set to favor heavy privacy, the accuracy and utility of the dataset are lowered (more “noise” is injected into the system).
- The main concern with differential privacy is the trade-off between data utility and individual privacy.
- Data Privacy and Security
- Differential privacy provides a quantified measure of privacy loss and an upper bound and allows curators to choose the explicit trade-off between privacy and accuracy.
Attacks in Practice
- Subtle algorithmic or analytical mistakes
- Timing side-channel attacks
- Leakage through floating point arithmetic
- Timing channel through floating-point arithmetic