Cloud Computing - A Practical View

Mandeep Dhami

http://geekandpoke.typepad.com/geekandpoke/2009/03/let-the-clouds-make-your-life-easier.html

Overview
• The Context
– A specific project scenario

Why Cloud Computing?
– Economic drivers – Flexibility and agility – New capabilities

Why not Cloud Computing?
– Regulatory constraints – Operational concerns – Technical issues

And the Practical “Middle Way”!
– Services evaluated – Proposed engagement

The Context
• Cloud computing can mean different things to different people • In this talk we evaluate the trade-offs in context of the following hypothetical scenario:
– You work on a medicare/medicaid eligibility system – Field workers use a web based tool to input case details and to check status – Web server is implemented using java/websphere on a Windows Server – Backend eligibility sub-system is implemented using COBOL on a IBM mainframe – You are tasked with evaluating a cloud based solution for the web tool
http://www.nature.com/ki/journal/v62/n5/fig_tab/4493262f1.html

Many Layers of the Cloud

Some Initial Design Constraints
• Type of cloud service required - IaaS or Private Cloud
– Since it is a custom software application, SaaS is not an option – Since the platform is also very custom (for libraries and versions) and has some non-standard libraries (say websphere v6.5, DB2 v9.1, JCA for CICS, etc …), PaaS is not an option either. – IaaS might be feasible as we own the software stack in that model – Private cloud can always be used, as we will own the cloud in that model!

Type of connectivity required – “VPN to VM”
– We will need secure encrypted connection to backend system for the web application to get/update case status. Conceptually this is like a VPN from the VM to the backend. – Any IaaS solution that does not provide secure connection from the server VM to internal LAN can not be used

Why Cloud Computing?
To cloud or not to cloud, that is the question …

http://geekandpoke.typepad.com/geekandpoke/2009/11/simply-explained-project-risk-update.html

Economic Drivers
• Pay as you go
– No upfront cost to acquire server/network hardware – Only pay for dev and test systems during dev and test phases – No upfront cost to “try” new features like Web Firewalls

Lower support costs
– The team does not have manage hardware, network or storage for production system – No need to hire expensive consultants for non-core (infrastructure related) activities

Deterministic Project Costing
– More transparency regarding infrastructure costs – Less risk from last minute capital cost request related to production usage – Not encumbered by internal transfer accounting!

Lower hardware costs
– Typical server utilization is low, pay only for what you use – Typical network utilization is low (routers, firewall, etc), pay only for what you use

Flexibility and Agility
• Rapid Scaling
– Start small, scale as required based on production performance measurements – Respond faster to customer demand for capacity – Respond faster to features that require more compute/storage resources

Dynamic Provisioning
– Spin up more test-beds as required. Keep test execution moving even as developers are debugging on an existing setup – Spin up systems to do load testing as required. Pay only for the time used to do the tests

Dynamic Infrastructure
– Enable infrastructure changes with mouse clicks – Increase server pool for batch processing as required – meet any batch window (at some cost) – Developers can prototype “at production scale” and capacity

More Choice
– Change infrastructure vendors for better SLA or price without impacting/altering the application – Do “Beta test” for a few case workers on a small system, roll out new code incrementally – Roll back to a previous image, as a fallback option

New Capabilities
• Next Gen architectures
– Enable disaster recovery by using a service provider with multiple physical locations – “Try” new features like memcached, CDNs, etc. without new investment in hardware or infrastructure expertise

Accelerate innovation
– Shift from supporting the infrastructure to innovating on application – Use cost transparency to innovate processes and reduce waste

Advanced infrastructure capabilities
– Change management to server configuration is centrally managed and encapsulated – Self healing, hot backups etc. available – API’s available to infrastructure for flow-thru’ automation

Green computing
– Increase server utilization, reduce power usage – Use more efficient cooling, reduce power usage – Reduce number of servers and reduce waste

Why Not Cloud Computing?
There be dragons …

First, you sometimes hear some FUD …
“We will have no liability to you for any unauthorized access or use, corruption, deletion, destruction or loss of Your Content or Applications”
Customer Agreement, Amazon Web Services

“Salesforce.com shall not be responsible or liable for the deletion, correction, destruction, damage, loss or failure to store any customer data”
Master Subscription Agreement, Salesforce.com

… but this is not really very different from software EULA
(So we believe that you can safely ignore this issue, except during contract negotiation) during

But there are Real Regulatory Constraints
• Privacy
– Since this project handles medical data, HIPPA rules apply – If your cloud infrastructure can not be HIPPA compliant, you can not use it

Forensics and audit
– If your cloud APIs can not be audited for forensic investigation, you can not use it for sensitive data – If audit data is not cryptographically secure, it lacks adequate controls

Governance mandate
– Just because the application is on cloud, the governance mandates do not go away! – Can you produce reports on usage or controls that are comparable to a system with physical security?

PKI infrastructure
– How are private keys stored and managed by the cloud based VMs? – Can you meet FIPS requirements that you currently meet with hardware/physical security constraints?

And Real Operational Concerns
• The Blame game
– When there is a problem today, it is already painful to get from defect to defect ownership … When a problems occur in cloud, how do you get from the “conf-call from hell” discussing defect to productive “root cause analysis” and taking defect ownership?

Priority management
– When you have a customer situation, your “tech team” works on it as #1 priority till it is resolved … How do you set priority for the cloud vendor’s tech team to fix your specific problem among their priorities?

SLA “assurance”
– Can you measure service levels in terms of the metrics used in the SLA in the contract? – Do you get reports on “real SLA” or on a synthetic benchmark? – Do you get “continuous reporting” of metrics that you can use for trend analysis and planning?

Vendor lock-in
– How real is the promise of choice? – To resolve the technical or operational issues, are you tying into a proprietary API that limits any real choice?

And Very Real Technical Issues
• Visibility
– Clear system boundary with adequate instrumentation – Tools to view infrastructure usage by your application

Security
– Encrypted VPN from “Server VM to the Backend network” – SSO integration for admin/API usage – “Safe sharing” of shared resources (like network, swap, crash dump, etc).

Diagnostics
– On demand capture of data, traffic and performance statistics – Flow thru’ integration with automation/tools – Automated data capture (black box) before the VM image is lost.

Network Services
– No good model for application level network services (like firewall, load balancer, etc) – We can use x86 VMs as virtual appliances, but they lack the hardware acceleration of typical network devices

The Practical “Middle Way”
In Buddhism, the “Middle Way” is the Nirvana-bound path of moderation - away from the extremes of sensual indulgence and self-mortification and toward the practice of wisdom, morality and mental cultivation.
From http://en.wikipedia.org/wiki/Middle_way http://en.wikipedia.org/wiki/Middle_way

From http://dilbert.com/strips/comic/2009-11-18 http://dilbert.com/strips/comic/2009-11-

… No I really did not mean that!

Cloud Service’s Evaluation for This Specific Project
NOTE: This is a sample evaluation. Your results will differ based on the assumptions that you make on the project and on the services them selves

Service Provider Amazon

Product

Regulatory Constraints

Operational Concerns*

Technical Issues

EC2 Solid performer, lots of 3rd party support

Rackspace

Mosso Solid performer, good enterprise support

Savvis

Virtualization in the Cloud Closest to a private cloud (VMware), very good enterprise support

Appnexus

Appnexus Cloud Not clear how it will handle issues specific to government or HIPPA compliance * Assuming appropriate relationship and contract/penalties

Engagement Proposed for This Specific Project
• First qualify the service provider’s offering for regulatory issues
– – – – HIPPA PCI (if you accept credit cards for fees) FIPS (for PKI) Etc

Then qualify your relationship with the service provider so that you can handle operational issues around “blame game”, priority management etc. Then qualify the network, the virtual servers, and the storage for security, visibility, manageability, diagnostics, etc. In particular, qualify the secure VPN to your virtual servers (like Amazon’s VDC) Finally move development and test of next major upgrade to cloud service provider. Do a beta roll out first, and then scale incrementally as you build confidence. With dev & test success behind you, use it as a model to transition the production servers (for the web application) to the cloud. Always, incremental build-up based on success of the previous step!