Beruflich Dokumente
Kultur Dokumente
Thesis Defense
Thesis committee:
Bruce Maggs (co-chair)
Amit Manjhi Todd Mowry (co-chair)
Chris Olston (co-chair)
School of Computer Science
Carnegie Mellon March 4, 2008 Mahadev Satyanarayanan
Mike Franklin (UC Berkeley)
1
Typical Architecture of Dynamic
Web Applications
Execute Access
code database
Users Request
Internet
Response
Web App Database
Server Server
Home server
CDN nodes
Users
Internet
4
2. Shared infrastructure charge on a usage basis
CDN Application Services
CDN nodes
Users
Internet
5
A distributed architecture still has
database as a bottleneck
users:
home server
database
6
Methods to Scale the Database Component
8
Our Approach
9
Database Scalability Service Architecture
users:
Request Response
11
Outline
12
Guaranteeing Security in a DBSS Setting
Goal: limit DBSS from observing an application’s data
DBSS caches query results —
kept consistent by invalidation
Content Delivery Network
13
A Simple Example comments (id, rating, story)
No Invalidations
Q:id=11,15
Nothing is
11 1 Intel
Q: id=11,15
Empty encrypted
Q 15 1
2 Intel
U DBSS node Home server database
Invalidate Q: Result
11 1 Intel Results
Empty
Q: Result are
2 Intel
15 1 encrypted
Q
U
More encryption can lead to more invalidations
14
Security-Scalability Space for Query
Result Caching
No
encryption
No
Encrypt
Scalability
everything
Full
(Maximum security,
read-only scalability)
Security
(Not to scale. Just for illustration)
Security
Scalability
Security-scalability tradeoff
16
Outline
17
Key Insight: Arbitrary Queries and
Updates Not Possible
function get_toy_id ($toy_name) {
$template:=“SELECT toy_id FROM toys
WHERE toy_name=?”;
$query:=attach_to_template ($template, $toy_name);
$result:=execute ($query);
…
}
Important
contribution
Given templates:
An algorithm for statically identifying data
18 that does not help in invalidation
Examples of Data Not Useful for Invalidation
Example 1:
SELECT toy_id FROM toys WHERE toy_name=?
SELECT toy_name FROM toys WHERE toy_id=?
Example 2:
SELECT toy_id FROM toys WHERE toy_name=?
19
Security without Hurting Scalability
As a result,
Tradeoff has to be managed only over remaining data
20
Security-Scalability Space for Query
Result Caching
No Encrypt data not useful for invalidation
encryption [Manjhi+ SIGMOD 06]
No SCSA
Encrypt
Want solutions in this space
Scalability
everything
Full
(Maximum security,
read-only scalability)
Security
(Not to scale. Just for illustration)
21
Outline
22
Invalidation Clues: Motivation
23
How do invalidation clues work?
[Manjhi+ ICDE 07]
Invalidations
(query clue, update clue)
update
Result query
clue Update
Query Query clue Result
query Database
QueryEmpty
clue
Result
Home server
DBSS
Query
Update
Home servers attach query clues to query results and update clues
to updates. DBSS uses query and update clues for invalidation.
24
Security-Scalability Space for Query
Result Caching
No Encrypt
(Code-analysis
data not useful
security,
for invalidation
encryption [Manjhi+
maximum SIGMOD 06]
scalability)
Database
No SCSA
Encrypt
Want solutions in this space
Scalability
everything
Security
(Not to scale. Just for illustration)
25
Minimizing Invalidations in the
Clues Framework
What is the “most precise” invalidation that can be done?
-- may need more data than what passes through the DBSS
SELECT id FROM comments WHERE story=? AND rating>?
UPDATE comments SET rating=? WHERE id=?
Invalidation logic on an update with id ‘5’:
Is comment id ‘5’ present in the result?
Yes: invalidation decision is based on rating values
No: Based on rating values, need to know story
26
Database Inspection Strategy and Beyond
OR
Update Clue: send story of the comment On-the-fly
5 ms 100 ms
Users CDN and DBSS Home server
Machines on Emulab
28
Scalability Benefits of Clues
concurrent users supported)
No DBSS Clues Clues Hybrid
(excl. DB clues) (incl. DB clues)
900
Scalability (number of
600
300
0
Auction Bboard Bookstore
Benchmark Applications
1. Factor of 2-5 improvement over using no DBSS
29 2. Using more clues is not necessarily a win
Related Work: View Invalidation
Our work:
• compares view-invalidation strategies
• study database update clues formally
30
Related Work: Privacy
31
Managing Security Scalability Tradeoff: Contributions
32
Outline
33
Contributors to User Latency
Request, high latency
Database
Response, high latency Web server App server
Traditional architecture
high latency
DBSS architecture
A single HTTP request Multiple database requests
34
Sample Web Application Code
function find_comments ($user_id) {
$template:=“SELECT from_id, body FROM comments
WHERE to_id=?”
$query:=attach_to_template ($template, $user_id)
$result:=execute ($query)
foreach ($row in $result)
print (get_body ($row), get_name (get_id ($row)))
}
36
The MERGING Transformation
www.ebay.com
John
Names of users who
have posted comments
Content Delivery Network
about John
1 Query
1. Find user_ids who
have made comments N Database
Queries Scalability
2. For each user_id, find Service
name of the user High latency
37
The MERGING Transformation
Find names of users who have commented about John
Home page
Content Delivery Network
1. Greet user
Transformations
Overall latency decreases by 38%,
41
the DBSS-DB latency decreases by 65%
Impact of Latency on Scalability
Improved scalability
Scalability
Threshold
Latency curve
43
concurrent users supported) Effect of the Transformations on Scalability
Effect of the Transformations on Scalability
concurrent users supported)
Scalability (number of
Stored procedures
Difficult to optimize and cache
45
Related Work: NONBLOCKING transformation
46
Reducing User Latency in a DBSS Setting:
Contributions
47
Thesis Contributions
48
Thanks!
Questions?
49
Backup Slides
50
Number of requests a website receives
is also unpredictable
Source: 1. CNN news release Sept 12, 2001; 2. Keynote’s news release Sept 11, 2001 1.
http://archives.cnn.com/2001/TECH/internet/09/12/attacks.internet/ 2.
http://www.keynote.com/news_events/releases_2001/091101.html
51
An appealing solution is to use a CDN
Traffic at CNN.com
Page views/day
(in millions)
Page size
(in kB)
Execute Access
code DB
Response
Web App DB
Server Server
Home server
53
Dynamic content sites are becoming increasingly popular
Trusting the Site of Code Execution
54
A Simple Example toys (toy_id, toy_name)
No Invalidations
Q1:toy_id=15
Nothing is
11 Barbie
Q1: toy_id=15
Empty encrypted
Q1 15 GI Joe
U1 DBSS Home server Database
56
Security-Scalability tradeoff
900
concurrent users supported)
Nothing
encrypted
Scalability (Number of
600
300 Everything
encrypted
0
0 5 10 15 20 25 30
Security (Number of query templates with encrypted results)
Data Sensitivity
Completely Moderately Extremely
insensitive sensitive sensitive
Bestsellers Inventory records, Credit Card
list customer records Information
Care but worried about Secure at
Don’t care scalability impact all costs
5 ms 100 ms
Users CDN and DBSS Home server
BOOKSTORE application
60
Scalability Conscious Security Approach
(SCSA) for Managing the Tradeoff
900 Nothing
concurrent users supported)
SCSA
encrypted
Scalability (Number of
600
300 Everything
encrypted
0
0 5 10 15 20 25 30
Security (Number of query templates with encrypted results)
00
Benchmark Applications
62
Security Results
4 6 17 7 7 7
and result
18 12 14
63
Security Results in Detail
64
Scalability Conscious Security Approach:
Contributions
Identify security-scalability tradeoff
Evaluation
Blanket encryption hurts scalability
Most data encrypted for free is moderately sensitive
65
Invalidation Clues: Motivation
Augmented example template:
SELECT toy_id, price FROM toys WHERE toy_name=“GI Joe”
template parameter
DELETE FROM toys WHERE toy_id=5
Previous solution:
1. Coarse grained—either encrypt query result or not
2. Not possible to get the best scalability
3. No general framework for studying the tradeoff
4. Did not consider specific attack models from DBSS
66
Invalidation Clues [ICDE 2007]
67
Illustrative Example of Clues
QT SELECT item_id, category, end_date
FROM items WHERE seller = ?
UT UPDATE items SET end_date = ?
20080304
?
WHERE item_id = 7
71
Affects interactivity in a DBSS setting
MERGING Transformation
Names of users who have posted comments about John
comments (from_id,to_id,…), users (id,name)
72
Example for NONBLOCKING Transformation
74
Scalability Effects of Increasing
Home Server Bandwidth
concurrent users supported)
Scalability (number of
78
Coverage of the MERGING Transformation
79
Coverage of the NONBLOCKING Transformation
80
Impact of the MERGING Transformation on
Latency