Beruflich Dokumente
Kultur Dokumente
Ingo Weber, Hiroshi Wada, Alan Fekete, Anna Liu and Len Bass
Based on presentation at USENIX HotDep 12
NICTA in Brief
Australias National Centre of Excellence in Information and Communication Technology Five Research Labs: ATP: Australian Technology Park, Sydney NRL: UNSW, Sydney CRL: ANU, Canberra VRL: Uni. Melbourne QRL: Uni. Queensland and QUT 700 staff including 270 PhD students Budget: ~$90M/yr from Fed/State Gov and industry ~600 research papers/year, ~150 patents total
our spin-out
Issues we face
High cost of writing unit tests
Preparing a test bed, reset after each test, and error recovery Why there is no DBUnit for cloud!
Our Goal
Provide an undo button to cloud users
Allow for rolling back to a previous state e.g., undelete deleted resources and reconstruct the relations among resources
Less invasive
Minimum changes in existing code or scripts
Status quo
Cloud resources
API calls
Administrator/script
Goal
Cloud resources
API calls
Administrator/script
Checkpoint
Commit Rollback
Provide the ability to go back to a checkpoint No change in scripts except checkpoint and commit
NICTA Copyright 2012 From imagination to impact 7
Our approach
undo one by one in reverse order does not always work
No undo action is available
e.g., no undeleteing a deleted resouce
Not optimal
e.g., Bolts operations (creating many temporary resouces)
API Wrapper
Apply changes
Administrator/script
Checkpoint
Commit Rollback
Execute API calls if they are undoable Defer the execution of non-undoable calls until commit
NICTA Copyright 2012 From imagination to impact 9
API Wrapper
Execute rollback
Apply changes
Administrator/script
Changes made by (semi-)undoable API calls are compensated by an AI planner AI planner finds ways to handle errors potentially occur during undo as well
Checkpoint
Commit Rollback
AI Planner
Generate
Compensation script
Code generator
Compensation plan
10
AI Planning 101
Given the initial state of the world, goal state, and a set of available actions, find a sequence of actions that leads from the initial to the goal
http://en.wikipedia.org/wiki/Tower_of_Hanoi
We use FF [*] with an extension to handle actions with alternative outcomes Finds maximal contingency plans
e.g., if detachnig a volume fails, stop the attached instance if possible. If a planner cannot solve, ask human intervention
[*] H OFFMANN , J., and N EBEL , B. The FF planning system: Fast plan generation through heuristic search. Journal of imagination Artificial Research, 14 (2001), NICTA Copyright 2012 From toIntelligence impact
12
13
Evaluation
Scalability of the planner based on an internally released prototype
AWS cmd line tool replacement
15
20 length is the maximum we need in our problem Executing a plan with 10 steps takes ~145 sec
1.5
0.5
0 0
NICTA Copyright 2012
10
20
30
40
50
60 70 Plan length
16
Basis: most difficult problem from previous slide Planners cost is small unless having 1000s of resouces
400
350 300 250 200 150 100 50
0
35
NICTA Copyright 2012
350
From imagination to impact
Future work
Extending checkpoints to capture internal resource state Parallelizing plans Finding forward plans with constraints
18
Questions?
19