Sie sind auf Seite 1von 23

Machine

Learning and
Differential Privacy
Travis Dick
Dec. 6, 2017
Learning and Privacy
• To do machine learning, we need data.

• What if the data contains sensitive information?

• Even if the (person running the) learning algorithm can be


trusted, perhaps the output of the algorithm reveals
sensitive information.
Example: Search query completions

What if we use your friends’ search logs to suggest completions?


Good for accuracy, but…

why are _
why are my feet so itchy?
Privacy leaks can be subtle!
Imagine you live in a
small town…

Local hospital wants to use ML to study effects


of some condition 𝑋 on school performance.
They collect data from people with cond. 𝑋 and train
a model using the perceptron algorithm.
𝑤 = 0.1 1.3 4.4 1 0
Has green hair?
• Feature 𝑗 ≡ “has green hair” is non-zero in 𝑤
• Only one person in town has green hair.
• We now know the green haired person has condition 𝑋!
How can you be confident that this wont happen?
Differential Privacy
• Differential privacy is an approach to these problems.
• It is a constraint on an algorithm that…
• Lets us learn many population-level properties of a dataset.
• But not about specific individuals in the dataset.
Intuition: DP requires that the algorithm is randomized and that no
individual influences the output distribution very much.

Output
value 𝑣
Output Distribution
Dataset Algorithm
Output distribution with my data being 𝑥+ . Output distribution with my data being 𝑥+- .

0 0.5 1 0 0.6 1
You only see output 0.55 - Can you tell if my data was 𝑥+ or 𝑥+- ?
Differential Privacy
Def: Two datasets 𝑆 and 𝑆 - are neighboring if they differ on at most
one entry (1 entry ≡ 1 person).
𝑆 𝑆-
𝑥0 𝑥0-
⋮ ⋮
𝑥+ 𝑥+-
⋮ ⋮
𝑥2 𝑥2-

Def: A randomized alg. 𝐴 is 𝜖-differentially private if for any


neighboring datasets 𝑆, 𝑆 - and any set 𝐶 of outcomes, we have:
Pr 𝐴 𝑆 ∈ 𝐶 ≤ 𝑒 < ⋅ Pr (𝐴 𝑆 - ∈ 𝐶)

AB C D ∈E
Equivalently: ∈ 𝑒 G< , 𝑒 < ≈ 1 − 𝜖, 1 + 𝜖
AB C D F ∈E
Differential Privacy: Guarantee
Differential Privacy promises plausible deniability.
E.g., Based on the output of a private algorithm, no one learns if data
was 𝑥+ or 𝑥+- .
Pr 𝑥+ output Pr output 𝑥+ Pr 𝑥+
- = - ⋅
Pr 𝑥+ output Pr output 𝑥+ Pr 𝑥+-

≈1±𝜖
(Posterior ≈ Prior)

More generally, all future events occur with approximately the same
probability, whether the algorithm is run with your real data or not.
Designing Differentially Private Algorithms
How can we design useful differentially private algorithms?

1. Laplace Mechanism.
• Privately computing averages.

2. Basic Composition Theorem.


• Private version of gradient descent.

3. Exponential Mechanism.
• Dataset sanitization.
The Laplace Mechanism
The Laplace Mechanism
A very useful building block for designing private algorithms.
Goal: Answer a “query” 𝑓: 𝐷 → 𝑅 that maps datasets to numbers
while preserving 𝜖-differential privacy.
Example: 𝑓 𝑆 = mean age of people in 𝑆.

Idea: Compute 𝑓(𝑆) and add noise to hide any individual’s influence.
how big can this be?
Def: The sensitivity of 𝑓 is ΔV = max |𝑓 𝑆 − 𝑓 𝑆- | .
D,D F Z[\]^_`B\Z]

Def: The Laplace Mechanism outputs 𝑓 𝑆 + Z, where 𝑍 is drawn


ef ef <
from the Lap distribution (density 𝑝 𝑧 = exp (− 𝑧 )).
< i< ef

𝑓(𝑆)
The Laplace Mechanism: Analysis
Def: The sensitivity of 𝑓 is ΔV = max |𝑓 𝑆 − 𝑓 𝑆 - | .
D,D F Z[\]^_`B\Z]

Def: The Laplace Mechanism outputs 𝑓 𝑆 + Z, where 𝑍 is drawn


ef ef <
from the Lap distribution (density 𝑝 𝑧 = exp (− 𝑧 )).
< i< ef
Privacy: The Laplace mechanism preserves 𝜖-differential privacy.
Proof: Let 𝑆 and 𝑆′ be neighbors and 𝑝, 𝑝′ be the output densities for 𝑆, 𝑆 - .
Sufficient to show that for all 𝑣 ∈ 𝑅 we have 𝑝 𝑣 ≤ 𝑒 < 𝑝′(𝑣).
ΔV 𝜖 We know
𝑝 𝑣 = exp − 𝑣−𝑓 𝑆
2𝜖 ΔV 𝑣−𝑓 𝑆 ≥ 𝑣 − 𝑓 𝑆′ − ΔV )
≤ ΔV
ΔV 𝜖
≤ exp − 𝑣 − 𝑓 𝑆 - − ΔV = 𝑒 < 𝑝′(𝑣)
2𝜖 ΔV
𝑓(𝑆) 𝑓(𝑆′)
ef 0
Utility: With probability at least 1 − 𝛿, we have |𝑍| ≤ log .
< r
ef
Proof: An easy integration shows Pr 𝑍 ≥ ⋅ 𝑡 ≤ 𝑒 Gu . Choose 𝑡 so that 𝑒 Gu = 𝛿.
<
Privately Computing Means
How can we privately compute the mean of a set 𝑆 = {𝑥0 , … , 𝑥2 } of
numbers in the interval [0,1]?
0
Let 𝑓 𝑆 = ∑ 𝑥 be the mean of 𝑆.
2 + +

Question: What is the sensitivity of 𝑓? ΔV = 1/𝑛


0
So we can output 𝑓 𝑆 + 𝑍 where 𝑍 ∼ Lap while preserving 𝜖-
2<
differential privacy.
0
With high probability, error is 𝑍 ≈ .
2<
Now imagine 𝑆 is an iid sample from a distribution 𝑃 and our real
goal is to estimate 𝐸ƒ~… 𝑥 using the mean of the sample 𝑆.
0
Then 𝑓 𝑆 − 𝐸ƒ~… 𝑥 ≈ .
2

So error due to privacy is negligible compared to sampling error!


The Basic Composition Theorem
&
Private Gradient Descent
Laplace Mech. For 𝑑 Dimensional Queries
What if our query 𝑓 gives us 𝑑 numbers? I.e., 𝑓: 𝐷 → 𝑅‡ .
Example: 𝑓 𝑆 = ⟨mean age in 𝑆, mean height in 𝑆⟩.

Def: The sensitivity of 𝑓 is ΔV = max 𝑓 𝑆 − 𝑓 𝑆- 0.


D,D F Z[\]^_`B\Z]

Def: The Laplace Mechanism outputs 𝑓 𝑆 + Z, where 𝑍 ∈ 𝑅‡ has


ef
components drawn from the Lap( ) distribution.
<

Privacy: The Laplace mechanism preserves 𝜖-differential privacy.


ef ‡
Utility: w.p. ≥ 1 − 𝛿, we have 𝑍 ˆ ≤ log
< r
Useful Property: Composition
What if we want to run multiple differentially private algorithms?

Theorem: “We can and the 𝜖’s add up”


• Let 𝐴0 , … , 𝐴Œ be 𝜖-differentially private algorithms.
• 𝐴• can be chosen based on the output of algorithms 𝐴0 , … , 𝐴•G0 .
• Then applying 𝐴0 , … , 𝐴Œ to 𝑆 and publishing all outcomes is 𝑘𝜖-
differentially private.
Proof idea: Easy case is when 𝐴• does not depend on earlier outputs.
Let 𝑆, 𝑆 - be neighboring datasets.
Let 𝑣0 , … , 𝑣Œ be any outputs for algorithms 𝐴0 , … , 𝐴Œ .

Pr For all i, 𝐴+ 𝑆 = 𝑣+ = • Pr 𝐴+ 𝑆 = 𝑣+ ≤ • 𝑒 < Pr 𝐴+ 𝑆 - = 𝑣+


+ +
Œ<
=𝑒 Pr For all i, 𝐴+ 𝑆 - = 𝑣+

This lets us design private algs. by breaking problems into sub-problems!


Private Gradient Descent
0
Given 𝑆 = { 𝑥0 , 𝑦0 , … , 𝑥2 , 𝑦2 }, minimize 𝐿 𝑆, 𝑤 = ∑+ ℓ(𝑥+ , 𝑦+ , 𝑤).
2
Suppose 𝛻š ℓ 𝑥, 𝑦, 𝑤 0 ≤ 1 for all 𝑥 ∈ 𝑅‡ , 𝑦 ∈ 0,1 , 𝑤 ∈ 𝑅‡ .
Gradient Descent:
• 𝑤0 = 0 ∈ 𝑅‡ 𝐿(𝑆, 𝑤)
• For 𝑡 = 1, … , 𝑇
• 𝑤u•0 = 𝑤u − 𝛼𝛻˜ 𝐿 𝑆, 𝑤u
• Output 𝑤™•0 . 𝑤

Let 𝑓˜ 𝑆 = 𝑤 − 𝛼𝛻𝐿(𝑆, 𝑤) be the one-step update.


Question: What is ΔV› ? Let 𝑆 and 𝑆 - differ only on the 𝑗u• point.
œ
𝑓˜ 𝑆 − 𝑓˜ 𝑆- 0 = ∑+ 𝛻ℓ 𝑥+ , 𝑦+ , 𝑤 − 𝛻ℓ 𝑥+- , 𝑦+- , 𝑤
2 0
œ
= 𝛻ℓ 𝑥• , 𝑦• , 𝑤 − 𝛻ℓ 𝑥•- , 𝑦•- , 𝑤
2 0


2
Private Gradient Descent
0
Given 𝑆 = { 𝑥0 , 𝑦0 , … , 𝑥2 , 𝑦2 }, minimize 𝐿 𝑆, 𝑤 = ∑+ ℓ(𝑥+ , 𝑦+ , 𝑤).
2
Suppose 𝛻š ℓ 𝑥, 𝑦, 𝑤 0 ≤ 1 for all 𝑥 ∈ 𝑅‡ , 𝑦 ∈ 0,1 , 𝑤 ∈ 𝑅‡ .
Gradient Descent:
• 𝑤0 = 0 ∈ 𝑅‡ 𝐿(𝑆, 𝑤)
• For 𝑡 = 1, … , 𝑇
• 𝑤u•0 = 𝑤u − 𝛼𝛻˜ 𝐿 𝑆, 𝑤u +𝑍u•0
• Output 𝑤™•0 . 𝑤

Let 𝑓˜ 𝑆 = 𝑤 − 𝛼𝛻𝐿(𝑆, 𝑤) be the one-step update.



Question: What is ΔV› ? ΔV› =
2

So we can set 𝑤u•0 = 𝑤u − 𝛼𝛻𝐿 𝑆, 𝑤 + 𝑍u•0 , with 𝑍u•0 ∼ Lap( F )
2<
Each round is 𝜖′-DP. How do we set 𝜖′ to get 𝜖-DP overall? 𝜖 - = 𝜖/𝑇
iϪ
Noise magnitude: 𝑍u•0 ˆ ≈ → 0 as dataset grows.
2<
The Exponential Mechanism
&
Database Sanitization
The Exponential Mechanism
Goal: Choose the “best” item from a finite set 𝑌 of items.
Data-dependent utility 𝑢 𝑆, 𝑦 = “utility of 𝑦 for dataset 𝑆”
Find 𝑦 ∈ 𝑌 that maximizes 𝑢(𝑆, 𝑦).
Laplace Mechanism doesn’t work. E.g., 𝑌 = {cat, dog, parrot,…}.

How do I add noise to “cat”?

Def: The sensitivity of 𝑢 is Δ = max |𝑢 𝑆, 𝑦 − 𝑢 𝑆 - , 𝑦 |.


F D,D ,¡

Def: The Exponential Mechanism outputs 𝑦 with probability


<
proportional to exp( 𝑢 𝑆, 𝑦 ).
ie¢
probability
utility
The Exponential Mechanism Summary
Goal: Choose the “best” item from a finite set 𝑌 of items.
Data-dependent utility 𝑢 𝑆, 𝑦 = “utility of 𝑦 for dataset 𝑆”
Find 𝑦 ∈ 𝑌 that maximizes 𝑢(𝑆, 𝑦).
Def: The sensitivity of 𝑢 is Δ = max |𝑢 𝑆, 𝑦 − 𝑢 𝑆 -, 𝑦 |
F D,D ,¡

Def: The Exponential Mechanism outputs 𝑦 with probability


<
proportional to exp( 𝑢 𝑆, 𝑦 ).
ie¢

Privacy: The exponential mechanism preserves 𝜖-DP.


Idea: Moving from 𝑆 to 𝑆′ changes the un-normalized probabilities by at most 𝑒 ±</i .
Also have to bound the change in the normalizing constant, which uses the other 𝑒 ±</i .
Utility: Let 𝑦£ be the (random) output of the Exp. Mech.
∗ ie¢ ¥
w.p. ≥ 1 − 𝛿, we have 𝑢 𝑆, 𝑦£ ≥ 𝑢 𝑆, 𝑦 − log .
< r
Idea: Exponentially more likely to output good 𝑦’s. Can only fail if there are many bad 𝑦’s.
Database Sanitization

Given a dataset 𝑆, can we produce a synthetic dataset 𝑆¦ while


preserving 𝜖-differential privacy, so that 𝑆¦ behaves basically
the same as 𝑆 (for our purposes)?
Database Sanitization
Let 𝑆 ⊂ 0,1 ‡ be a dataset of 𝑑-dimensional binary vectors.
Let 𝐻 = ℎ: 0,1 ‡ → 0,1 be a concept class of VC-dimension 𝐷.
0
We write ℎ 𝑆 = ∑ƒ∈D ℎ(𝑥) to be the fraction of 𝑆 with ℎ = 1.
D

‡-
¬
Theorem: For any 𝜖, 𝛼, 𝛿 > 0, if 𝑆 ≥ 𝑂( ® ), then we can output a
œ <
synthetic dataset 𝑆¦ while preserving 𝜖-DP such that w.p. ≥ 1 − 𝛿,
every ℎ ∈ 𝐻 satisfies ℎ 𝑆 − ℎ 𝑆¦ ≤ 𝛼.
Proof idea:
- œ
There exists a dataset 𝑆¯ of size 𝑂¬( °) such that for all ℎ ∈ 𝐻, ℎ 𝑆 − ℎ 𝑆¯ ≤ .
œ i
²³
±¬( ° )
There are only 2 ´ databases of

this size.
Use the Exp. Mech. to pick one maximizing 𝑢 𝑆, 𝑆 - = − max ℎ 𝑆 − ℎ 𝑆 - .

œ ‡-
Best dataset has 𝑢 𝑆, 𝑆 - ≤ . Output of Exp. Mech. is at most 𝑂¬( ° ) worse w.h.p.
i |D|œ <
Condition on |𝑆| makes total error ≤ 𝛼.
Summary

• Differential privacy lets us learn about a population as a


whole, but not specific individuals.

• Two building blocks: Laplace & Exponential Mechanisms


• Basic Composition Theorem: “the 𝜖s add up”.

• Applications to:
• Privately computing averages.
• Private gradient descent.
• Database sanitization.