Beruflich Dokumente
Kultur Dokumente
Version 1.6
T E C H N I C A L W HI T E P A P E R
Table of Contents List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1. What is a VMware vCloud?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1 Document Purpose and Assumptions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 Cloud Computing and vCloud Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 vCloud Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 vCloud Infrastructure .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.1 vCloud Management Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.2 Compute Resources .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.3 Storage Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.4 Networking Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.5 Component Placement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.6 vCloud Consumer Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.7 vCloud Logical Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2. vCloud Director Constructs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3. vCloud Consumer Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Cloud Consumer Resources .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Establish Provider Virtual Datacenters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.1 Public Cloud Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.2 Private Cloud Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.3 Provider Virtual Datacenter Special Use Cases . . . . . . . . . . . . . . . . . . . . . 18 3.2.4 Compute Resources Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.5 Storage Resources Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.6 Networking Resources Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Multi-Site/Multi-Geo Clouds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.1 Scenario #1Common User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.2 Scenario #2Common Set of Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . .22 3.3.3 Suggested Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3.4 Other Multi-Site Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.5 Merging Chargeback Reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.6 Synchronizing Catalogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
T ECHNICAL W HI T E P A P E R / 2
4. Providing Cloud Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1 Establish Organizations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1.1 Administrative Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1.2 Standard Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Establish Networking Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2.1 External Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.2 Network Pools .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.3 Cisco Nexus 1000V Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Establish Networking OptionsPublic vCloud Example . . . . . . . . . . . . . . . . . . . . 26 4.3.1 External Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.2 Network Pools .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.3 Organization Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.4 Cisco Nexus 1000V Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4 Establish Networking OptionsPrivate vCloud Example .. . . . . . . . . . . . . . . . . . . 30 4.4.1 External Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4.2 Network Pools .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4.3 Organization Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.4.4 Cisco Nexus 1000V Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Establish Organization Virtual Datacenters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5.1 Public vCloud Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.5.2 Private vCloud Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.6 Create vApp Templates and Media Catalogs .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.6.1 Auto-Joining Active Directory Domains .. . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.6.2 Establish Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.6.3 Accessing your vCloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.6.4 Deploy vApps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.6.5 Employ Chargeback or Showback .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5. Extending vCloud Capabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1 Core vCloud Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 vCloud Request Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.3 vCloud API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.4 vCenter Orchestrator .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.4.1 Cloud Administration Orchestration Examples .. . . . . . . . . . . . . . . . . . . . . 37 5.4.2 Organization Administration Orchestration Examples. . . . . . . . . . . . . . . 37 5.4.3 Cloud Consumer Operation Orchestration Examples .. . . . . . . . . . . . . . . 37
T ECHNICAL W HI T E P A P E R / 3
5.5 vCloud Connector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.5.1 vCloud Connector Placement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.5.2 vCloud Connector Example Usage Scenarios .. . . . . . . . . . . . . . . . . . . . . . 39 5.5.3 vCloud Connector Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6. Managing the vCloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.1 Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.1.1 Management Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.1.2 Cloud Consumer Resources and Workloads. . . . . . . . . . . . . . . . . . . . . . . . 40 6.2 Logging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.2.1 Logging Architectural Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.2.2 Logging as a Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.3 End-to-End Security Considerations with vCloud. . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3.1 vCloud Environment Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3.2 User Access Security .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3.3 Securing Workloads at the NetworkLevel Workload Security . . . . . . 43 6.4 Workload Availability Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.4.1 Uptime SLAs at 99.99%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.4.2 Load Balancing of vCloud Director Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.4.3 I/O Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.4.4 Disaster Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.4.5 Backup and Restore of vApps .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 7. Sizing the vCloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 7.1 Initial Sizing of Cloud Consumer Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 7.2 Capacity Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 8. Implementing Your vCloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 9. Appendix: vCloud Director Cell Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 10. Appendix: vCloud Availability Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 11. Appendix: Security Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 11.1 Network Access Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 11.2 Compliance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61 11.3 Use Cases: Why Logs Should be Available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 11.3.1 Example Compliance Use Cases for Logs . . . . . . . . . . . . . . . . . . . . . . . . . . 64 11.3.2 VMware vCloud Log Sources for Compliance .. . . . . . . . . . . . . . . . . . . . . . 65 11.4 vCloud Director Diagnostic and Audit Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 11.5 Load Balancer Considerations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 12. Appendix: Signed Certicates with vCloud Director. . . . . . . . . . . . . . . . . . . . . . . . . . . 70
T ECHNICAL W HI T E P A P E R / 4
13. Appendix: Capacity Planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 13.1 Cloud Administrator (Service Provider) Perspective. . . . . . . . . . . . . . . . . . . . . . . . 87 13.2 Network Capacity Planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 14. Appendix: Capacity Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 14.1 vCloud-Specic Capacity Forecasting (Demand Management) . . . . . . . . . . . . . 93 14.2 Capacity Monitoring and Establishing Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 14.3 Capacity Management Manual ProcessesProvider Virtual Datacenter. . . . . . 94 14.4 End-Customer (Organization) Administrator Perspective. . . . . . . . . . . . . . . . . . . 95 14.5 Organization Virtual DatacenterSpecic Capacity Forecasting . . . . . . . . . . . . 97 14.6 Capacity Management Manual ProcessesOrganization Virtual Datacenter. . . 100
T ECHNICAL W HI T E P A P E R / 5
List of Figures Figure 1. vCloud Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Figure 2. Core vCloud Logical Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Figure 3. vCloud Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Figure 4. vCloud Logical Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Figure 5. vCloud Director Construct to vSphere Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 15 Figure 6. vCloud Consumer Resource Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Figure 7. Two Sites with Local vCloud Director Instances Managing Local vCenters . . 21 Figure 8. Remote Console Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Figure 9. Two Sites with Isolated vCloud Director Instances. . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 10. Example Diagram of Provider Networking for a Public vCloud . . . . . . . . . . . 27 Figure 11. Congure External IPs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Figure 12. vCloud Director Logical Networking w/ Cisco Nexus 1000V . . . . . . . . . . . . . 29 Figure 13. Example Diagram of Provider Networking for a Private vCloud. . . . . . . . . . . 30 Figure 14. vCloud Connector Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Figure 15. Architectural Example Drawing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 16. Congure Firewall Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Figure 17. Reference Architecture Kit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Figure 18. Log Collection in the Cloud Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Figure 19. Architecture of vCloud Components and Log Collection. . . . . . . . . . . . . . . . . 65 Figure 20. Infrastructure Layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
T ECHNICAL W HI T E P A P E R / 6
List of Tables Table 1. Reference Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Table 2. vCloud Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Table 3. vCloud Director Constructs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Table 4. Component Requirements for a Management Cluster . . . . . . . . . . . . . . . . . . . . . 19 Table 5. Network Pool Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Table 6. vCloud vApp Requirements Checklist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Table 7. Denition of Resource Pool and Virtual Machine Split. . . . . . . . . . . . . . . . . . . . . . 49 Table 8. Memory, CPU, Storage, and Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Table 9. Example Consolidation Ratios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Table 10. MBeans Used To Monitor vCloud Cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Table 11. vCloud Availability Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Table 12. Network Access Security Use Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Table 13. Audit Concerns Within The Cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Table 14. vCloud Component Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Table 15. Other Component Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Table 16. Load Balancer Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Table 17. Certicate Steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Table 18. vSphere Host Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Table 19. Determing Redundancy Overhead. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Table 20. Network Capacity Planning Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Table 21. Capacity Monitoring Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Table 22. Organization Virtual Datacenter Units of Consumption. . . . . . . . . . . . . . . . . . . 95 Table 23. Recommended Organization Virtual Datacenter Capacity Thresholds. . . . . . 95 Table 24. Sample Organization Virtual Datacenter Resource Allocation . . . . . . . . . . . . . 96 Table 25. Organization Virtual Datacenter Trending Information. . . . . . . . . . . . . . . . . . . . 97 Table 26. Organization Virtual Datacenter Capacity Trending Variables . . . . . . . . . . . . . 98 Table 27. Sample Organization Virtual Datacenter Trending Information. . . . . . . . . . . . . 99
T ECHNICAL W HI T E P A P E R / 7
Requirements for a Cloud Service Denition for Public Cloud Service Denition for Private Cloud
vCloud Implementations
Service Provider Public vCloud Implementation Example Private vCloud Implementation Example
vCloud Director
vCloud Director Installation and Conguration Guide vCloud Director Administrators Guide vCloud Director Security Hardening Guide
vCloud API
vSphere
vShield Administration Guide vCenter Chargeback Users Guide Using vCenter Chargeback with VMware Cloud Director Technical Note
vCenter Orchestrator Developers Guide VMware vCenter Orchestrator Administration Guide vCenter Server 4.1 Plug-In API Reference for vCenter Orchestrator
vCloud Request Manager Installation and Conguration Guide vCloud Request Manager Users Guide
T ECHNICAL W HI T E P A P E R / 8
For further information, refer to the set of documentation for the appropriate product. For additional guidance and best practices, refer to the Knowledge Base on vmware.com.
vCloud Request Manager vCenter Chargeback vCloud Director vShield Edge vCloud Connector vCenter Orchestrator VMware vSphere
Figure 1. vCloud Overview
vCloud API
T ECHNICAL W HI T E P A P E R / 9
v C LO U D C O m P O N E N t
D E S cr I P t I O N
Cloud Coordinator and UI. Abstracts vSphere resources. Includes: vCloud Director Server(s) (also known as cell) vCloud Director Database vCloud API, used to manage cloud objects
API used to programmatically interact with a vCloud Underlying foundation of virtualized resources. The vSphere family of products includes: vCenter Server and vCenter Server Database ESXi hosts, clustered by vCenter Server Management Assistant
VMware vShield
Provides network security services Includes: vShield Manager (VSM) virtual appliance vShield Edge* virtual appliances, automatically deployed by vCloud Director
*The fully licensed version of vShield Edge includes optional features such as VPN and load balancing that are not integrated with vCloud Director.
Optional component that provides resource metering and reporting to facilitate resource showback/chargeback Includes: vCenter Chargeback Server Chargeback Data Collector vCloud Data Collector VSM Data Collector
Optional component that facilitates orchestration at the vCloud API and vSphere levels. Optional component that provides provisioning request and approval workows, software license tracking, and policy-based cloud partitioning. Optional component to facilitate transfer of a powered-off vApp in OVF format from a local vCloud or vSphere to a remote vCloud
Other VMware or third-party products or solutions are not addressed in this iteration of a vCloud.
T ECHNICAL W HI T E P A P E R / 1 0
From an architectural view, the following diagram shows how the core vCloud components interrelate.
vCloud API vCloud Director End Users Web Console vCloud Director Database
VMware vSphere
vCenter Service
vCenter Chargeback
vCenter Chargeback Server Data Collectors
VM VM VM VM VM VM VM VM VM
vSphere Client
vShield
VM VM VM VM VM VM VM VM VM VM VM VM
VM
wa re
VM
wa re
VM
wa re
VM
wa re
VM
wa re
VM
wa re
ESX/ESXi Hosts
vCloud Agent
vCloud Agent
vCloud Agent
vCloud Agent
vCloud Agent
vCloud Agent
Datastores
Management Cluster
Compute
Storage
Networking
Virtual Infrastrucure
Figure 3. vCloud Infrastructure
In building a vCloud, we assume that all management components, such as vCenter Server and vCenter Chargeback Server, will run in virtual machines. As a best practice of separating resources allocated for management functions from pure user-requested workloads, the underlying vSphere clusters will be split into two logical groups: A single management cluster running all core components and services needed to run the cloud. Remaining available vCenter clusters are aggregated into a pool called cloud consumer resources. These clusters will be under the control of VMware vCloud Director. Multiple clusters can be managed by the same vCenter Server or different vCenter Servers, but vCloud Director will be managing the clusters through the vCenter Servers.
T ECHNICAL W HI T E P A P E R / 1 1
Reasons for organizing and separating vSphere resources include: Ensuring that management components are separate from the resources they are managing. Minimizing overhead for cloud consumer resources. Resources allocated for cloud use have little overhead reserved. For example, cloud resource groups would not host vCenter virtual machines. Dedicating resources for the cloud. Resources can be consistently and transparently managed and carved up, and scaled horizontally.
Cloud Consumer Resources
Management Cluster
Compute
Storage
Networking
Virtual Infrastrucure
The underlying vSphere Infrastructure will follow vSphere best practices. Design considerations specic to a vCloud will be addressed accordingly in this document, organized by the vCloud management cluster and cloud consumer resources. 1.4.1 vCloud Management Cluster
Cloud Consumer Resources
Management Cluster
Compute
Storage
Networking
Virtual Infrastrucure
The management cluster will follow vSphere best practices to facilitate load balancing, redundancy, and high availability. 1.4.2 Compute Resources Compute resources for the management cluster will follow vSphere best practices where possible, including but not limited to VMware DRS, HA, and FT. To facilitate VMware HA, a cluster of three VMware ESXi hosts will be used. While additional hosts can be added, three hosts supporting just vCloud management components should be sufficient for typical vCloud environments. Detailed sizing guidance of the management cluster is provided later in this document. Use a VMware HA percentage-based admission control policy in an N+1 fashion instead of dedicating a single host for host failures or dening the amount of host failures a cluster can tolerate. This will allow the management workloads to run evenly across the hosts in the cluster without the need to dedicate a host strictly for host failure situations. Additional hosts can be added to the management cluster for N+2 or more redundancy but this is not required by the current vCloud Service Denitions. Use VMware HA (including VM Monitoring) and/or FT, where possible, to protect the management virtual machines. vCenter Site Recovery Manager (SRM) can be used to protect components of the management cluster. At this time, vCenter Site Recovery Manager will not be used to protect vCloud Director cells because a secondary (DR) site is out of scope of the vCloud, and changes to IP addresses and schemas in recovered vCloud Director cells can result in problems.
T ECHNICAL W HI T E P A P E R / 1 2
Unlike a traditional vSphere environment where vCenter Server is used by administrators to provision virtual machines, vCenter Server plays an integral role in end-user self-service provisioning by handling all virtual machine deployment requests by vCloud Director. Therefore, VMware recommends that vCenter Servers are made available with a solution such as vCenter Heartbeat. Since FT is not supported for a multi-vCPU virtual machine, this is another reason for using vCenter Heartbeat for high resiliency. 1.4.3 Storage Resources Shared storage in the management cluster will be congured to include, but not limited to, the following: Storage paths will be redundant at the host (connector), switch, and storage array levels. All hosts in a cluster will have access to the same datastores. 1.4.4 Networking Resources Host networking in the management cluster will be congured to include (but not limited to) the following: Logical separation of network traffic for security and load considerations by type (management, virtual machine, vMotion/FT, IP storage. Network component and path redundancy. At least 10GigE or GigE network speeds, if available. Use of vNetwork distributed switches where possible for network management simplification. The architecture calls for the use of vNetwork distributed switches in the user workload resource group, so it is a best practice to use the vNetwork Distributed Switch across all of your clusters, including the management cluster. Increasing the MTU size of the physical switches as well as the vNetwork distributed switches to at least 1524 (default is 1500) to accommodate the additional MAC header information used by vCloud Director Network Isolation links. vCloud Director Network Isolation is called for by the Service Denition and the architecture found later in this document. This needs to be done on the transport network for vCloud Director Network Isolation. Failure to increase the MTU size could affect performance due to packet fragmentation affecting network throughput of virtual machines hosted on the vCloud infrastructure. 1.4.5 Component Placement Management components running as virtual machines in the management cluster include the following: vCenter Server(s) and vCenter Database vCloud Director Cell(s) and vCloud Director Database vCenter Chargeback Server(s) vShield Manager (one per vCenter Server) Note: vShield Edge appliances are deployed automatically by vCloud Director through vShield Manager as needed and will reside in the vCloud consumer resource clusters, not in the management cluster. They will be placed in a system resource pool by vCloud Director and vCenter. For additional information on the vShield Edge appliance and its functions, refer to the vShield Manager Administrator guides. Optional management functions, deployed as virtual machines include: vCenter Update Manager vCenter Capacity IQ VMware Management Assistant vCenter Orchestrator (part of vCenter Server) vCloud Request Manager and associated database The optional management virtual machines are not required by the Service Denition but they are highly recommended to increase the operational efficiency of the solution. Database components, if running on the same platform, can be placed on the same database server. For example, the databases used by vCloud Director, vCenter Chargeback, and vCloud Request Manager can be consolidated on the same database server.
T ECHNICAL W HI T E P A P E R / 1 3
For more information on the resources needed by the virtual machines in the management cluster refer to the Sizing section later in this document. 1.4.6 vCloud Consumer Resources
Cloud Consumer Resources
Management Cluster
Compute
Storage
Networking
Virtual Infrastrucure
The cloud consumer resources represent vCenter clusters to host cloud workloads. These resources will be carved up by vCloud Director. Well cover vCloud Director cloud constructs and denitions in the next section rst before drilling down on the compute, storage, and networking resources. 1.4.7 vCloud Logical Infrastructure In summary, the vCloud logical architecture with vSphere resource separation is depicted as follows.
Management Cluster
VM VM VM VM VM VM VM VM VM VM VM VM VM
VM
wa
VM
re
wa
VM
re
wa
VM
re
wa
VM
re
wa
VM
re
wa
re
vCloud infrastructure virtual machine vCenter Servers & vCenter Database vCloud Director Cells & vCloud Director Database vCenter Chargeback Servers vShield Manager (1 per vCenter Server) Optional Management Functions, deployed as virtual machines vCenter Update Manager vCenter Capacity IQ VMware Management Assistant vCenter Orchestrator (part of vCenter Server) vCloud Request Manager No user workloads
VM
VM VM VM VM VM VM
VM VM VM VM VM VM
VM VM VM VM VM VM
VM VM VM VM VM
VM
wa
VM
re
wa
VM
re
wa
VM
re
wa
re
VM VM VM VM VM VM VM
VM VM VM VM VM VM
VM VM VM VM VM VM
VM VM VM VM VM
VM
wa
VM
re
wa
VM
re
wa
VM
re
wa
re
Space allocated to user workloads vCloud infrastructure virtual machines (small footprint) vShield Edge virtual appliances
The management cluster may also include virtual machines or have access to servers that provide infrastructure services such as directory (LDAP/AD), timekeeping (NTP), networking (DNS, DHCP), and security (certificate). Detailed considerations for sizing are addressed in the Sizing section.
T ECHNICAL W HI T E P A P E R / 1 4
The management cluster resides in a single physical site. vCloud consumer resources also reside within the same physical site, ensuring a consistent level of service. Otherwise, latency issues might arise if workloads need to be moved from one site to another, over a slower or less reliable network. For definition purposes, this cloud is dened under the context of a single physical site, and does not span multiple sites. Considerations for connecting clouds representing different sites are addressed later in this document. Secondary DR sites are discussed later in the Disaster Recovery section of this document.
Catalogs
Provisioning Policies
Catalogs
Provisioning Policies
User Clouds
User Clouds
Organization vDCs
Organization vDCs
vSphere
vApp Network Organization Network External Networks Organization Network
v C LO U D D I r E ctO r C O N S tr U ct
D E S cr I P t I O N
Logical grouping of vSphere compute resources (attached vSphere cluster and one or more datastores) for the purposes of providing cloud resources to consumers. A unit of administration that represents a logical collection of users, groups, and computing resources, and also serves as a security boundary from which only users of a particular organization can deploy workloads and have visibility into such workloads in the cloud. In the simplest term, an organization = an association of related end consumers.
Organization
T ECHNICAL W HI T E P A P E R / 1 5
v C LO U D D I r E ctO r C O N S tr U ct
D E S cr I P t I O N
Subset allocation of a provider virtual datacenters resources assigned to an organization, backed by a vCenter resource pool automatically created by vCloud Director. An organization virtual datacenter allocates resources using one of three models: Pay as you go Reservation Allocation
A collection of available services for consumption. Catalogs contain vApp templates (precongured containers of one or more virtual machines) and/or media (ISO images of operating systems). A network that connects to the outside using an existing vSphere network port group. A network visible within an organization. It can be an external organization network with connectivity to an external network, and use a direct or routed connection, or it can be an internal network visible only to vApps within the organization. A network visible within a vApp. It can be connected to other vApp networks within an organization and use a direct or routed connection, or it can be an internal network visible only to virtual machines within the vApp. A set of pre-allocated networks that vCloud Director can draw upon as needed to create private networks and NAT-routed networks.
vApp Network
Management Cluster
Compute
Storage
Networking
Virtual Infrastrucure
The cloud consumer resources are dedicated vCenter clusters that host cloud workloads. These resources are carved up by vCloud Director in the form of one or more provider virtual datacenters, which is a vCenter cluster, and one or more attached datastores. Networking for the resource group will encompass vSphere networks visible to the hosts in that cluster. Provider virtual datacenters are further carved up into organization virtual datacenters, which are backed by vCenter resource pools.
T ECHNICAL W HI T E P A P E R / 1 6
Host Cluster
Datastore
=
Resource Pool
Figure 6. vCloud Consumer Resource Mapping
T ECHNICAL W HI T E P A P E R / 1 7
Refer to the Service Denition for guidance on the size of vSphere clusters and datastores to attach when creating a provider virtual datacenter. Consider: Expected number of virtual machines Size of virtual machines (CPU, RAM, disk) 3.2.1 Public Cloud Considerations Considerations for a public vCloud include creating multiple provider virtual datacenters based on tiers of service that will be provided. Since provider virtual datacenters only contain CPU, memory, and storage resources and those are common across all of the requirements in the Service Denition for Public Cloud, you should create one large provider virtual datacenter attached to a vSphere cluster that has sufficient capacity to run 1,500 virtual machines. You should also leave overhead to grow the cluster with more resources up to the maximum of 32 hosts, should organizations need to grow in the future. If you determine that your hosts do not have sufficient capacity to run the maximum number of virtual machines called out by the Service Denition for Public Cloud, then you will need additional provider virtual datacenters. 3.2.2 Private Cloud Considerations Given that a provider virtual datacenter represents a vSphere cluster, it is commonly accepted that a single provider virtual datacenter be established. Since provider virtual datacenters only contain CPU, memory, and storage resources and those are common across all of the requirements in the Service Denition for Private Cloud, you should create one large provider virtual datacenter attached to a cluster that has sufficient capacity to run 400 virtual machines. Refer to the Service Denition for Private Cloud for details on the service tier(s) called for. Should it be determined that existing host capacity cant meet the requirement, or theres a desire to segment capacity along the lines of equipment type (for example, CPU types in different provider virtual datacenters), then establish a provider virtual datacenter for Pay-As-You-Go use cases and a separate provider virtual datacenter for the resource-reserved use cases. 3.2.3 Provider Virtual Datacenter Special Use Cases There are instances where a provider virtual datacenter must be viewed as special purpose in one way or another. Special use-case provider virtual datacenters are a great example of what makes cloud computing so exible and powerful. The primary driver behind this need is to satisfy the license restrictions imposed by a specic software vendor that stipulates that all the processors that could run specic software must be licensed for it, regardless of whether or not they actually are running that software. In order to meet the EULA requirements of such a software vendor, you can establish a purpose-specic provider virtual datacenter, populated with enough sockets of processing power to meet the need but limited to no more than what is needed in order to keep licensing costs down. An example of this, in practice, is establishing an Oracle-only provider virtual datacenter. Since a provider virtual datacenter is backed by a cluster, you can name a cluster for a special purpose and publish it to any and all clouds that might need the service. You then maintain enough paid licenses to cover all the sockets in that cluster, and you are covered under the EULA, because the guests can only run on one of the sockets in that cluster/resource pool. There is some level of user education needed to verify that all Oracle instances are deployed to the special purpose virtual datacenter; that is, vCloud Director does not provide a way to prevent someone from incorrectly deploying virtual machines. So, the enforcement has to be manual, typically through organizational processes. In the following example, you could name the virtual datacenter with a descriptive name to deploy all Oracle instances, or insert instructions in the vApp name so that users deploy to the correct virtual datacenter:
Oracle Database Use Only! PvDC
T ECHNICAL W HI T E P A P E R / 1 8
Management Cluster
Compute
Storage
Networking
Virtual Infrastrucure
All hosts in will be congured per vSphere best practices, similar to the management cluster. VMware HA will also be used to protect against host and virtual machine failures. Provider vDCs can be of different compute capacity sizes (number of hosts, number of cores, performance of hosts) to support differentiation of compute resources by capacity or performance for service level tiering purposes. Organization vDCs in turn should be created based on what services are planned. For a detailed look at how to size the vCloud, refer to the Sizing section later in this document. The following table lists out the requirements for each of the components that will run in the vCloud Director management cluster. For the number of virtual machines and organizations listed in the Service Denitions you will not need to worry about scaling too far beyond the provided numbers.
It E m v C PU MEmOrY StO raG E N E tw O r K I N G
vCenter Server Oracle Database vCloud Director x 2 (stats for each) vCenter Chargeback vShield Manager TOTAL
2 4 2 2 1 11
8 GB 16 GB 4 GB 8 GB 4 GB 40 GB
* Numbers rounded up or down will not impact overall sizing Table 4 . Component Requirements for a Management Cluster
For the table above, the Oracle Database will be shared between the vCenter Server, the vCloud Director cells, and the vCenter Chargeback Server. Different users and instances should be used for each database instance in-line with VMware best practices. In addition to the storage requirements above, a NFS volume is required to be mounted and shared by each vCloud Director cell to facilitate uploading of vApps from cloud consumers. The size for this volume will vary depending on how many concurrent uploads are in progress. Once an upload completes the vApp is moved to permanent storage on the datastores backing the catalogs for each organization and the data no longer resides on the NFS volume. The recommended starting size for the NFS transfer volume is 250 GB. You should monitor this volume and increase the size should you experience more concurrent or larger uploads in your environment.
T ECHNICAL W HI T E P A P E R / 1 9
Management Cluster
Compute
Storage
Networking
Virtual Infrastrucure
Shared storage in the consumer resources will be congured per vSphere best practices, similar to the management cluster. Storage types supported by vSphere will be used. The use of RDMs in the vCloud infrastructure is currently not supported and should be avoided. Creation of datastores will need to take into consideration Service Denition requirements and workload use cases, which will affect the number and size of datastores to be created. vCloud Director will assign datastores for use through provider virtual datacenters, and only existing vSphere datastores can be assigned. Datastores attached to provider virtual datacenters will be used for vCloud workloads, known as vApps. vSphere best practices apply for datastore sizing in terms of number and size. Vary datastore size or shared storage characteristic if providing differentiated or tiered levels of service. Sizing considerations include: Datastore storage expectations: Size a datastore sufficiently to allow for placement of multiple vApps, avoiding creating small datastores that can house only one or two vApps. A few large datastores are preferred over many small datastores, especially since consumers are not allowed to choose which datastores to place their workload on when selecting a virtual datacenter with more than one datastore; vCloud Director will choose the datastore with the most free space available. What is the average vApp size x number of vApps x spare capacity? For example: Average virtual machine size * # virtual machines * (1+ % headroom) What is the average virtual machine disk size? How many virtual machines are in a vApp? How many virtual machines are to be expected? How much spare capacity do you want to allocate for room for growth (express in a percentage)? Will expected workloads be transient or static? Datastore performance characteristics: Will expected workloads be disk intensive? What are the performance characteristics of the associated cluster? Refer to the requirements in the Service Denition and size your datastores accordingly. Additionally, an NFS share must be set up and made visible to all hosts for use by vCloud Director for transferring files in a vCloud Director multi-cell environment. NFS is the required protocol for the transfer volume. Refer to the vCloud Director Installation and Conguration Guide for more information on where to mount this volume. See the Workload Availability section for additional storage and storage I/O factors to take into account.
T ECHNICAL W HI T E P A P E R / 2 0
Management Cluster
Compute
Storage
Networking
Virtual Infrastrucure
Host networking for hosts within a provider vDC will be congured per vSphere best practices in the same manner as the vCloud management cluster. In addition, the number of vNetwork Distributed Switch ports per host should be increased from the default value of 128 to the maximum of 4096. Increasing the ports will allow for vCloud Director to dynamically create port groups as necessary for the private organization networks created later in this document. Refer to the vSphere Administrator Guide for more information on increasing this value. Networking at the provider and organization virtual datacenter level is detailed in the next section on providing cloud resources.
Site 1
vCD vCD vCD
Site 2
vCloud Director
vCD vCD vCD
vCenter Server
VM VM VM VM VM VM
vCenter Server
VM VM VM VM VM VM
VM
wa
VM
re
wa
re
Figure 7. Two Sites with Local vCloud Director Instances Managing Local vCenters
T ECHNICAL W HI T E P A P E R / 2 1
The local vCenter servers will control resources local to each site. This is a very logical setup of the infrastructure until you look at some of the user ows. If we are a user that is coming into site #1 requesting remote console access to a virtual machine in site #1 we are not guaranteed to have all traffic stay in site #1. This is because we cannot control which vCloud Director cell is the proxy for which vCenter Server. We could come into a vCloud Director cell in site #1 which would then have to talk to the proxy for vCenter server #1 in site #2. That vCloud Director cell would then talk back to the vCenter server in site #1 that would then finish setting up the remote console connection to the local ESXi host with the virtual machine in question in site #1. Traffic at that time would then flow through the vCloud Director cell that initiated the request in site #1. This is all illustrated below.
Site 1
vCD vCD vCD
Site 2
vCD vCD vCD
vCenter Server
VM VM VM VM VM VM
vCenter Server
VM VM VM VM VM VM
VM
wa
VM
re
wa
re
One of the problems with this setup is how do you control which vCloud Director cell a user gets terminated on, based on virtual machine and site specic data? Its next to impossible to continue to gure this out and provide that logic to a load balancer. Another problem with the scenario is we need to have a central Oracle database for all of the vCloud Director cells from both sites. This creates even more traffic on the link between the 2 sites since the message bus in vCloud Director uses the Oracle database for communication. Overall this solution is less than optimal and only suggested for cross-campus multi-site congurations where site-to-site communication will not overwhelm the network and where network availability is highly reliable. 3.3.2 Scenario #2Common Set of Services A more pragmatic approach to multi-site setups is to have a single vCloud Director setup in each of the sites that is isolated from other sites. This solves the network cross-talk issue but it introduces even more problems of its own. For example, how do you provide a common set of services across the different sites? How do you keep organization names and rights as well as catalogs, networks, storage, and other information common across the different sites? Currently there is no mechanism to do this in the currently shipping vCloud Director product. Using other VMware technologies included in the vSphere suite of products you can synchronize cloud deployments using automation scripts and provide common sets of services across locations. In an enterprise, a private vCloud maps to a single site. Multiple vClouds can be connected using vCloud Connector for offline vApp migrations. A public vCloud can be connected to form a hybrid cloud. 3.3.3 Suggested Deployment Multi-site deployments are not officially supported by VMware at this time. However, if you are still going to create a multi-site deployment, then the recommended way to deploy multi-site solutions is to set up an isolated vCloud Director instance in each location. This isolated vCloud Director instance would include local vCloud
T ECHNICAL W HI T E P A P E R / 2 2
Director cells, vCenter Servers, an Oracle database instance, a vCenter Chargeback instance, and local vSphere resources as illustrated in the picture below.
Site 1
vCD vCD vCD vCD
Site 2
vCD vCD
vCenter Server
VM VM VM VM VM VM
vCenter Server
VM VM VM VM VM VM
VM
wa
VM
re
wa
re
In order to keep the sites synchronized with organization and resource information, VMware encourages you to create a set of onboarding scripts and workows. These workows would be used anytime you need to create a new organization or a new resource for an organization and would drive that creation across all cloud sites. The VMware cloud services organization can assist you in the creation of these customer specic workows based on templates the cloud practice already has. VMware cloud services has created these template workows using the vCenter Orchestrator product that is included with vSphere. By using the workows for administrative resource creation you can keep multiple clouds synchronized with organization resources. 3.3.4 Other Multi-Site Considerations When creating multi-site congurations, there are resources out of the control of the vCloud setup that require some thought. How do you set up networking between the sites? How do you handle IP addressing? It is for these physical resource decisions, varying between customers, that we have not provided specic guidance on in the reference architecture. Setting up these physical resources is also not included in the sample scripts previously mentioned. 3.3.5 Merging Chargeback Reports In our reference multi-site setup we included multiple vCenter Chargeback instances. In order to provide one common bill or usage report to your cloud consumer you must aggregate all of the chargeback reports into one report. You can leverage the vCenter Chargeback API as well as vCenter Orchestrator to pull chargeback reports from each vCenter Chargeback server and consolidate them into one master report. 3.3.6 Synchronizing Catalogs Synchronizing catalogs between sites is the most time consuming task. When setting up multiple cloud sites you should designate one site as the master site for template creation and have all other sites be replication peers. It is advisable to leverage native storage array replication to replicate the storage for the templates in each catalog. Array replication can provide several benets for long distance data movement including data de-duplication and compression. Once the data is synchronized you can leverage the catalog synchronization workows provided by VMwarevCloud API to import the replicated templates into the appropriate catalogs in VMware vCloud Director. Synchronizing templates added at remote sites is out of scope for this version of the reference architecture. This feature can be added selectively by engaging VMware Professional Services.
T ECHNICAL W HI T E P A P E R / 2 3
T ECHNICAL W HI T E P A P E R / 2 4
4.2.1 External Networks An external network provides connectivity outside an organization through an existing, preconfigured vSphere network port group. The vSphere port groups can be created using standard vSwitch port groups, vNetwork Distributed Switch port groups, or the Cisco Nexus 1000V. In a public vCloud, these precongured port groups will provide access through the Internet to customer networks, typically using VPN or MPLS terminations. When creating an external network, make sure to have sufficient vSphere port groups created and made available for virtual machine access in the vCloud. 4.2.2 Network Pools vCloud Director creates a private network as needed from a pool of networks to facilitate VM-to-VM communication and NAT-routed networks. vCloud Director supports one of three methods to back network pools: vSphere port group. vCloud Director uses one of many existing, precongured vSphere networks. The networks themselves can have VLAN tagging for additional security. VLAN. vCloud Director automatically uses VLAN tagging from a range provided to segment networks to create internal networks (organization and vApp networks) as needed. This assumes that vCloud Director and all the managed hosts have access to the VLANs on the physical network. vCloud Director Network Isolation. vCloud Director automatically creates internal networks using MAC-in-MAC encapsulation. The following table compares the three options for a network pool.
C O N S I D E rat I O N v S P H E r E P O rt Gr O U P B ac K E D V L A N B ac K E D v C LO U D N E tw O r K I S O L at I O N B ac K E D
How it works
Isolated port groups must be created and exist on all hosts in cluster Only option compatible with Cisco Nexus 1000V
Uses range of available, unused VLANs dedicated for vCloud Best network performance vCloud Director creates portgroups as needed
Overlay network (with network ID) created for each isolated network Optionally requires one VLAN per vCloud Director Network Isolation backed network pool More secure than VLAN backed option vCloud Director creates portgroups as needed
Advantages
Disadvantages
Requires manual creation and management of portgroups Possible to use a portgroup that is in fact not isolated
T ECHNICAL W HI T E P A P E R / 2 5
Considerations when using a vSphere port group-backed network pool include: Standard or distributed virtual switches may be used. vCloud Director does not automatically create port groups. You must manually create these ahead of time for vCloud Director to use. Considerations when using a VLAN-backed network pool include: Organization and vApp networks created by vCloud Director out of VLAN backed network pools are private to an organization or vApp, respectively. Hosts in the cluster backing the provider vDC used by the organization vDC must be connected to VLAN trunk ports. vNetwork distributed switches are required for all hosts and the cluster backing the provider vDC used by the organization vDC that draws from the network pool. vCloud Director creates port groups automatically as needed. Considerations when using a vCloud Network Isolation-backed network pool include: vNetwork distributed switches are required for all hosts and the cluster backing the provider vDC used by the organization vDC that draws from the network pool. Increase the MTU size of the physical switches as well as the vNetwork distributed switches to at least 1524 to accommodate the additional MAC header information used by vCloud Director Network Isolation links. Failure to increase the MTU size could affect performance due to packet fragmentation affecting network throughput of virtual machines hosted on the vCloud infrastructure. Specify a VLAN ID for the MAC-in-MAC transport network (this is optional but recommended for security). Leaving this blank will default to VLAN 0. vCloud Director creates port groups automatically on vNetwork distributed switches as needed. Private networks backed by vCloud Director Network Isolation use fewer VLAN IDs. Organization and vApp networks created by vCloud Director out of vCloud Director Network Isolation backed network pools are private with respect to an organization or vApp, respectively. 4.2.3 Cisco Nexus 1000V Considerations In vCloud Director 1.0, the Cisco Nexus 1000V is only supported with the vSphere port group-backed option for network pools, which also happens to be the least flexible. This is also true for the VMware vNetwork Standard Switch, while the vNetwork Distributed Switch will support all network types. Cisco Nexus 1000V requires a vNetwork Distributed Switch and therefore vSphere Enterprise Plus licensing. The Cisco Nexus 1000V is typically deployed in a vSphere environment to provide increased network visibility, common management and advanced layer 2 security and quality of service functionality for virtual networking. As vCloud Director ideally uses dynamic, automatically provisioned, isolated networks; internally this means these requirements for management and security do not directly apply to network pools. However they are relevant to the external networks where traffic enters and exits the vCloud. The next sections walk go into detail on networking options for external networks and network pools, and discuss public vs. private vCloud perspectives.
T ECHNICAL W HI T E P A P E R / 2 6
of static IP addresses that will be consumed by vShield Edge appliances (that facilitate a routed connection) each time you connect an organization network to this external network. For sizing purposes, you should create a large enough IP address pool so that each of your organizations can have access to an external network. Per the Service Definition, the estimated number of organizations for 1,500 virtual machines is 25 organizations, so make sure you have at least 25 IP addresses in your static IP pool. More IP addresses should be set aside if you plan to allow inbound access into organizations. 4.3.2 Network Pools In addition to access to external networks, each organization in a public vCloud will have organization-specic private networks. vCloud Director instantiates Isolated L2 networks through the use of network pools. Create a single large network pool for all organizations to share, and limit the use of this network pool when you create each individual organization. The network pool created will use vCloud Network Isolation for separating the traffic. This will use an existing vNetwork Distributed Switch previously created for connecting hosts. Use a VLAN to further segregate all of the vCloud Director Network Isolation traffic in a transport network from the rest of the infrastructure. Because the network pools will be used by both the external organization network and private vApp networks, you will need at least 11 networks in the network pool per organization. Ten of the networks in the pool will be for the private vApp networks according to the Service Denition for a Public Cloud. One of the networks will be used for the protected external organization network. Given the estimate of 25 organizations, you need at least 275 networks in the pool. There is a limitation of a maximum of 4096 networks in a network pool due to the port limitation on the vNetwork Distributed Switch. Ephemeral ports in a vNetwork Distributed Switch are also limited to 1016 per Switch and per vCenter server, further limiting the number of networks that can be instantiated from a network pool. When connecting the network pool to a vNetwork Distributed Switch, make sure you have enough free ports left on the switch (at least 275).
vCloud Datacenter
Organization ACME Corp.
Network Pool
Org Net: ACME-Private Private Internal Org Net: ACME-Internet Private Routed Provider-Internet
4.3.3 Organization Networks Create two different organization networks for each organization, one external organization network and one private internal organization network. You can do this as one step in the vCloud Director UI wizard by selecting the default (recommended) option when creating a new organization network. When naming an organization network, it is a best practice to start with the organization name and a hyphen, for example, ACME-Internet.
T ECHNICAL W HI T E P A P E R / 2 7
Per the Service Denition for Public Cloud, the external network will be connected as a routed connection that will leverage vShield Edge for firewalling and NAT to keep traffic separated from other organizations on the same external provider network. Both the external organization network and the internal organization networks will leverage the same vCloud Director Network Isolation network pool previously established. For both the internal network and the external network, you will need to provide a range of IP addresses and associated network information. Since both of the networks will be private networks, you can use RFC 1918 addresses for both static IP address pools. The Service Denition for Public Cloud denes a limit of external connections with a maximum of 8 IP addresses, so you should provide a range of 8 IP addresses only when creating the static IP address pool for the external network. For the private network, you can make the static IP address pool as large as desired. Typically, a full RFC 1918 class C is used for the private network IP pool. The last step is to add external public IP addresses to the vShield Edge conguration on the external organization network. By selecting Congure Services on the external organization network, you can add 8 public IP addresses that can be used by that particular organization. These IP addresses should come from the same subnet as the network that you assigned to the systems external network static IP pool.
4.3.4 Cisco Nexus 1000V Considerations It is important to note that vCloud Director is designed for secure multi-tenancy, and Layer 2 networks are not shared between customers for organization and vApp networks. External networks can be dedicated to customers (for example, MPLS VPNs) and vShield Edge is available to securely share networks such as a common internet VLAN. These capabilities should be considered when determining whether the Cisco Nexus 1000V is required in a vCloud Director deployment. Where it has been determined that the Cisco Nexus 1000V is a requirement, either operationally or technically, the recommended approach is to use the Nexus 1000v for external networks and a VMware vNetwork Distributed Switch for network pools. This approach has the advantage of providing advanced functionality where required, without limiting the exibility of vCloud Director networking.
T ECHNICAL W HI T E P A P E R / 2 8
Using the Cisco Nexus 1000V for vCloud Director network pools is not recommended because this requires increased administrative overhead; the Cisco Nexus 1000V works only with vSphere port group-backed network pools and introduces scalability limits. In this model, organization and vApp networks cannot be created dynamically because port profiles need to be defined on the Cisco Nexus 1000V before being added to vCloud Director. To maintain isolation within these internal networks, each port group will need to be congured with a VLAN ID, which given the limit of 512 active VLANs across all virtual ethernet module (VEMs) managed by a Cisco Nexus 1000V, also potentially limits the total number of networking pools. This is particularly true if one virtual supervisor module (VSM) is managing VEMs in multiple vCloud resource groups, and the 802.1q standard itself is limited to 4096 VLANs. vCloud Director Network Isolation backed-network pools is an approach to address these scalability limits by providing isolation of internal vCloud Director networks using MAC-in-MAC encapsulation over a transport network. Instead of individual VLANs, vCloud Director Network Isolation network IDs are dynamically assigned to these encapsulated networks by vCloud Director. VLAN backed network pools is another option where a range of VLAN IDs is allocated to vCloud Director, which will then assign these VLANs to organization and vApp networks, as required. So while it shares the same scalability limits as port group backed, it reduces manual administrator setup. VLAN and vCloud Director Network Isolation backed network pools are only supported with the vNetwork Distributed Switch. The following diagram illustrates the recommended deployment model with the Cisco Nexus 1000V, used for external networks only.
VLAN 20
pNIC
VMware vDS
OrgNet1 OrgNet2 OrgNet3 VCD-NI Pool A VCD-NI Pool B
VLAN 210 VLAN 220 VLAN 230 VLAN 298 VLAN 299 pNIC pNIC
T ECHNICAL W HI T E P A P E R / 2 9
Enterprise vCloud
Organization Software Design
Network Pool
Org Net: Internal Network Private Internal (optional) Org Net: External Access Private Direct Corporate Backbone
An important differentiation in a private vCloud vs. a public vCloud is the external network and organization external network. At least one external network is required to enable organization external networks to access resources outside of the vCloud Director resourcesthe Internet for public cloud deployments and an internal (local) network for private cloud deployments. It is a network that already exists within the address space used by the enterprise. To establish this network, follow the wizard, lling in the network mask, default gateway and other specications of the LAN segment as required. When building this, specify enough address space for use as static assignments, as this is where vCloud Director draws Public IP Pool addresses from. A good starting range is 30 addresses that do not conict with existing addresses in use, or ranges already committed for DHCP. Note: Static IP Pool address space is not used for DHCP, but the function is similar to that. This pool will be used to provision NAT-type connectivity between the organizations and the cloud services below it. 4.4.2 Network Pools You will need a network in the network pool for every private organization network and external organization network in the vCloud environment. The Service Denition for a Private Cloud calls for one external organization network and the ability for the organization to create private vApp networks. Since there is no minimum called out in the Service Definition for the number of vApp networks a good number of networks to start out with is 10 per organization. Make your network pool as large as the number of organizations times 10.
T ECHNICAL W HI T E P A P E R / 3 0
4.4.3 Organization Networks At least one organization external network is required to connect vApps created within the organization to other vApps and/or the networking layers beyond the Private vCloud. To accomplish this, create an external network in the Cloud Resources section (under Manage & Monitor of the System Administration section of the vCloud Director UI). In the wizard, be sure to select a direct connection. This external network maps to an existing vSphere network for virtual machine use as dened in the External Networks section (above). Other networking options are available, like a routed organization external network, and could be used, but add complexity to the design that is normally not needed. For the purpose of this design there are no additional network requirements. For more information on adding additional network options, refer to the vCloud Director Administrators Guide. 4.4.4 Cisco Nexus 1000V Considerations The Cisco Nexus 1000V is applicable as a switching fabric in the Enterprise. The caveats expressed in the public vCloud section for Cisco Nexus 1000V also apply to the private vCloud. The primary difference is that the networking rails used as external networks will likely be simpler that those shown for a public vCloud. For example, theres likely to be only a single backbone LAN to connect to thats also the path to the Internet, as compared to a public vCloud where there will be multiple, distinct network paths. In summary, the Cisco Nexus 1000V is applicable to private vCloud implementations as a switching backbone for external networks only, to represent the one or more core LANs that lead to the rest of the business and/or Internet.
T ECHNICAL W HI T E P A P E R / 3 1
4.5.1 Public vCloud Considerations The organization virtual datacenter allocation model maps directly to a corresponding vCenter Chargeback billing model: Pay as you go. Pricing can be set per virtual machine, and a corresponding speed of a vCPU equivalent can be specied. Billing is unpredictable as it is tied directly to actual usage. Allocation. Consumers are allocated a baseline set of resources but have the ability to burst by tapping into additional resources as needed, but are typically charged at higher rates for exceeding baseline usage. This model will result in more variable billing but allows for the possibility of more closely aligning variable workloads to their cost. Reservation. Consumers are allocated and billed for a xed container of resources, regardless of usage. This model allows for predictable billing and level of service, but consumers may pay for a premium if they do not consume all their allocated resources. These allocation models also map directly to the service tiers found in the Service Denition for a Public Cloud. The Basic vDC model will use the Pay-as-you-go allocation model since instances are only charged for the resources they consume and there is no commitment required from the consumer. The Committed vDC model will use the Allocation Pool model since the consumer is required to commit to a certain level of usage but is also allowed to exceed that usage. The Dedicated vDC model will use the Reservation Pool model since this service tier requires dedicated and guaranteed resources for the consumer. An option to enable thin provision allows provisioning virtual machines using thin disks to conserve disk usage. vSphere best practices apply in the use of thin-provisioned virtual disks. The Service Denition for Public Cloud provides detailed and descriptive guidance on how much a provider should charge for each service tier. Chargeback functionality is provided by VMware vCenter Chargeback, which is integrated with VMware vCloud Director. You can reference the VMware vCenter Chargeback Users Guide for information on how to customize the individual reports generated. For further information, refer to the vCloud Chargeback Models Implementation Guide, which details how to set up vCloud Director and vCenter Chargeback to accommodate instance-based pricing (pay as you go), reservation-based pricing, and allocation-based pricing. 4.5.2 Private vCloud Considerations The organization virtual datacenter allocation model used depends on the type of workloads to be expected. Pay as you go. A transient environment where workloads are repeatedly deployed and undeployed, such as a demonstration or training environment, would be suited for this model. Allocation. Elastic workloads that have a steady state but during certain periods of time surge due to special processing needs would be suited for this model. Reservation. Since a xed set of resources are guaranteed, infrastructure-type workloads that demand a predictable level of service would run well using this model. When an organization virtual datacenter is created in vCloud Director, vCenter Server automatically creates child resource pools with the appropriate resource reservations and limits, under the resource pool representing the provider virtual datacenter. As part of creating an organization virtual datacenter, a storage limit must be set unless you are using the Pay as you go allocation model, which defaults to unlimited. For the purpose of this architecture there will be no limit on storage consumed by the vApps since we are providing static values for the individual virtual machine storage and we are also limiting the number of virtual machines in an organization. An option to enable thin provision allows provisioning virtual machines using thin disks to conserve disk usage. vSphere best practices apply in the use of thin-provisioned virtual disks. This feature can save substantial amounts of storage and have very little performance impact on workloads in the vCloud infrastructure. It is recommended to enable this feature when creating each organization. For more information about this feature, refer to the vCloud Director Administrators Guide or the VMware knowledge base.
T ECHNICAL W HI T E P A P E R / 3 2
vCloud Request Manager can be used to facilitate policy-based creation of organization virtual datacenters in a private cloud to automate and standardize their creation through the creation and use of a organization virtual datacenter template (or blueprint) to deploy new organization virtual datacenters.
T ECHNICAL W HI T E P A P E R / 3 3
4.6.3 Accessing your vCloud Each organization should have a public URL congured to access the organizations cloud portal using vCloud Director. These URLs will have the format of https://<vCD-cell-hostname>/cloud/org/<org-Name>. Each time a user of an organization logs in they should point their browser to the organization-specic URL. 4.6.4 Deploy vApps vApps can now be deployed from a catalog of vApp templates. 4.6.5 Employ Chargeback or Showback In a public vCloud, chargeback is essential in accurately metering consumer usage and recouping costs to ensure protability. In a private vCloud, IT does not necessarily have the same cost pressures as a public vCloud service provider. IT may also not have chargeback procedures or policies in place, as chargeback typically is a nancial policy. An alternative to chargeback is showback, which merely attempts to raise awareness of consumption usage and cost without involving formal accounting procedures to bill the usage back to the consumers department. To align consumer behavior with the actual cost of the resources being consumed, utilize the vCenter Chargeback reports to provide resource and nancial transparency. Without showback or chargeback, consumers do not have the awareness of the actual cost of the resources they have consumed and thus have little incentive to change their consumption patterns. Cloud computing resources can be easily spun up, and with the exception of deployment policies dictating resource leases, there are no disincentives or penalties to curb excessive use. Showback or chargeback will expose heavy or demanding users.
vCloud API
Additional components to extend the vCloud capabilities are discussed further below.
vCloud API
T ECHNICAL W HI T E P A P E R / 3 4
vCloud Director provides a cloud portal whereby end consumers can self-provision their own workloads. For environments where a formal provisioning process, including a request /approval mechanism, is needed, vCloud Request Manager can be used as a front-end to vCloud Director. vCloud Request Manager 1.0 is largely intended for private vClouds due to the fact that vCloud Request Manager requires cloud system privileges to vCloud Director and therefore would not be ideal in a multi-tenant public cloud. vCloud Request Managers primary capabilities include: Provisioning with approvals Software license tracking Policy-based cloud partitioning (new organization creation) In relation to the reference architecture, this new component sits in the functional stack above vCloud Director as a new user entry point/UI, but not obscuring it entirely, as to allow for vCloud Director to be an entry point as well. vCloud Request Manager is a front-end to the default vCloud Director portal to be exposed to those users who need the policy-based approval/provisioning/tracking functionality over the more freeform vCloud Director UI. vCloud Request Manager adds a vApp request/approval cycle to vCloud Director. This adds a exible mechanism by which vApps in catalogs can be requested, approved, and provisioned, based on business needs. The included workows can be modied as needed to match these requirements. vCloud Request Manager adds simple software asset tracking and pairs these assets with vApps in the catalog. The quantity of available licenses of a particular package that is being tracked is decremented each time a vApp with that package is provisioned, and incremented when the vApp is destroyed. This license-to-vApp relationship is managed manually inside vCloud Request Managers workow engine. Lastly, vCloud Request Manager provides easy creation tools for new private clouds with approval cycles included. This enables the quick creation of new cloud instances (organizations) based on blueprints of cloud policies and parameters. One obvious application for this technology is adding lifecycle management to vApps provisioned in the private clouds, a solution formerly covered by vCenter Lifecycle Manager. vCloud Request Manager also provides a simplied interface for requesting resources from multiple cloud installations. vCloud Request Manager can be explored as an option if you require a single interface for all of your cloud installations. vCloud Request Manager does not provide the full richness of vApp management in a cloud environment. Several third party solutions exist as well that can provide one interface to multiple cloud installations. Refer to the vCloud Request Manager Installation and Conguration Guide for specic installation requirements. vCloud Request Manager runs in its own virtual machine separate from vCloud Director. Note that several objects in vCloud Request Manager map to corresponding vCloud Director objects but do not use consistent terms. They include: organization in vCloud Request Manager refers to a vCloud Director system instance location in vCloud Request Manager refers to an organization in vCloud Director cloud in vCloud Request Manager refers to an organization vDC in vCloud Director
T ECHNICAL W HI T E P A P E R / 3 5
vCloud API
The vCloud API allows for interacting with a vCloud and can be used to facilitate communication with vCloud Director using a UI other than the portal included with vCloud Director. For example, vCloud Request Manager communicates with vCloud Director using the vCloud API. The vCloud API is the cornerstone to federation and ecosystem support in a vCloud environment. All of the current federation tools talk to the vCloud environment through the vCloud API. The ISV ecosystem also uses the vCloud API to allow their software to talk to vCloud environments. It is very important that a vCloud environment exposes the vCloud API to the cloud consumer. Currently, VMware vCloud Director is the only software package that exposes the vCloud API. In some environments, vCloud Director is deployed behind a portal or in another location not readily accessible to the cloud consumer. In this case there needs to be an API proxy or relay in to have the vCloud API exposed to the end consumer. Due to the value of the vCloud API, some environments may wish to meter API usage and charge extra for it to customers. Protecting the vCloud API through audit trails as well as API inspection is also a good idea. Lastly, there are several cases where cloud providers may wish to extend the vCloud API with new features. In order to aid in several of the vCloud API use cases discussed, the cloud provider may wish to implement an API proxy. The vCloud API is a REST-based service that contains XML payloads. For this reason any suitable XML gateway can be used to proxy the vCloud API. There are several third party solutions on the market that excel in XML gateway services today. VMware has begun to partner with some of these vendors to develop joint guidance on how to deploy their solutions in a vCloud Director environment. For the latest information on these efforts and collateral, please contact your local VMware vCloud specialist.
vCloud API
vCenter Orchestrator (vCO) is a system for assembling operational workows. The primary benet of vCloud Orchestrator is to coordinate multiple systems to achieve a composite operation that would have taken several individual operations on different systems. It is not meant to replace the APIs that it in turn calls; rather, it is meant to bring them together when necessary. In general, if an operation uses only one underlying system, direct access to that system should be considered for efficiency and complexity reduction. In the vCloud use case, orchestration can be used to automate highly repetitive tasks to avoid manual work and potential errors. vCloud Orchestrator has a plug-in framework. There are plug-ins for VMware products such as vCenter Server, vCloud Director, vCenter Chargeback. Thus, vCloud Orchestrator can orchestrate workows at the VIM API, VIX API, vCloud API, and Chargeback API levels.
T ECHNICAL W HI T E P A P E R / 3 6
There are three main categories of use cases that vCloud Orchestrator can help satisfy: Cloud administration operations Organization administration operations Organization consumer operations 5.4.1 Cloud Administration Orchestration Examples Here are some example use cases that highlight the value of vCloud Orchestrator to the cloud owner. These use cases are primarily focused on infrastructure management and the control of the resource allocation process. First, consider the case of a provider who wants to bring a new customer into their vCloud. The major steps would be to create a new organization, users (possibly imported from Active Directory), networks, virtual datacenters, and catalogs. The provider may also want to set up a recurring chargeback report so the tenant can be billed, and possibly send an email notication to the tenant advising them that their new cloud environment is ready. Another example would be the case of a tenant request for additional external network capacity. In this case, the provider may want to automate the creation of the network, which would include name generation, identication and allocation of available VLAN and IP address range, configuration of the network switch and cloud perimeter rewall, creation of the external network in vCenter, and nally allocation of the external network to the tenants organization. 5.4.2 Organization Administration Orchestration Examples There are operational tasks within the tenants organization that can benet from automation as well. These are typically tasks that address the vApp and virtual machine lifecycle management tasks including creation, conguration, routine maintenance, and decommissioning. Consider the case of virtual machine creation in an environment that uses Active Directory to identify services such as authentication and printing. After deployment, the virtual machine must join the Active Directory domain. In many cases it is preferable to use an organization unit (OU) other than the default Computers Container. vCloud Orchestrator could be used to create the virtual machines computer account in the proper OU prior to virtual machine deployment, insuring that the computer account name is unique and residing in the proper OU. Similarly, when the virtual machine is decommissioned, the entry in the OU can be removed as part of the same workow. Another example is the case where an organization administrator would like to manage recurring updates to a software package or conguration element across several virtual machines in a single operation. In this case, a workow could be created to accept a list of systems and a source for the software or conguration as parameters, and carry out the activity on each system. 5.4.3 Cloud Consumer Operation Orchestration Examples These operations generally fall into the category of tasks that the organization administrator wants to offload as a self-service operation. Performing the operation as a vCloud Orchestrator workow provides an easy way to expose the operation to a customer via the built-in web portal or via a customized portal that leverages the web-services API. Many of the operations in this category can be satised directly via the vCloud Director UI; however, there are candidates that affect multiple systems or t better into a customer portal, which may be better implemented as an orchestration workflow. None of this is exposed to the cloud consumer, which makes it somewhat difficult. This has to be initiated by the cloud provider using the vCenter Orchestrator Client, unless the provider creates a portal to front-end vCenter Orchestrator. Some examples of these types of use cases would include resetting of system or user account passwords on virtual machines using the VIX plug-in, putting a load balanced service into maintenance mode by stopping the service, removing it from the load balancing pool and disabling monitors, loading certicates into virtual machines, and deploying instances of custom applications from the organizations catalog. vCloud Orchestrator can be used to create custom workows at the vCloud API and VIM levels. vCloud Request Manager is an alternative to vCloud Orchestrator that has built-in workow functionality that integrates with vCloud Director through the vCloud API.
T ECHNICAL W HI T E P A P E R / 3 7
For additional information on topics on vCO installation and conguration and workow solution development, refer to the vCenter Orchestrator v4.1 documentation set at http://www.vmware.com/support/pubs/ orchestrator_pubs.html.
vCloud API
As more clouds are stood up, several clouds from different sites within a private enterprise can form a larger cloud, or a private and public cloud can form a hybrid cloud. Cloud consumers need a way to migrate workloads in a federated cloud. vCloud Connector (vCC) solves this problem by allowing you to perform migrations from all of your public and private clouds and obtain a consistent view of them from a single interface. vCloud Connector needs to be installed by cloud administrators but can be used by administrators and end users alike to view and manage workloads. Once vCloud Connector has been deployed to a vSphere host and registered with a vCenter Server, end users can access the vCloud Connector under Solutions and Applications from the vSphere Client where the OVF was deployed. 5.5.1 vCloud Connector Placement There are two considerations on where to place your vCloud Connector appliance. The virtual appliance must be deployed to a vCenter Server that the target users can access via the vCenter Client. The only user access is via the vSphere Client so users of vCloud Connector must have the right to login to this vCenter Server. Workload copy operations use the vCloud Connector appliance as a middleman, so network latency and bandwidth between clouds needs to be considered. In some cases it may be preferred to run multiple instances of vCloud Connector across multiple vCenter Servers to avoid network latency or consuming excessive bandwidth.
Remote vCloud
vCloud Connector UI
vSphere Client
VM VM
vCloud Director Y
wa
re
vCenter Server A
ESXi Host
vCenter Server B
T ECHNICAL W HI T E P A P E R / 3 8
5.5.2 vCloud Connector Example Usage Scenarios vCloud Connector can support a number of workload migration use cases. The following examples assume migration of a vApp, comprised of one or more virtual machines: Copying a vApp from vSphere to a vCloud Copying a vApp from a private vCloud to a public vCloud Copying a vApp from a vCenter to another vCenter. Even in environments not running vCloud Director, vCloud Director can still be used to copy and move vApps. As long as both vCenter Servers are added as clouds in vCloud Director, you can freely move workloads between them. 5.5.3 vCloud Connector Limitations The use of vCloud Connector to copy and migrate vApps is subject to the following limitations: Currently there is no way to have predened clouds appear in vCloud Connector. Each user must manually add all clouds to vCloud Connector that they intend intend to access. There are no clouds dened by default so that the user can add only the clouds they care to see. Traffic to and from the vCloud Connector appliance is not WAN optimized so it is not ideal to migrate workloads over WAN links even if sufficient bandwidth exists. To avoid this scenario it is preferred to have vCloud Connector appliances installed in locations to avoid having to traverse WAN links as much as possible. There is currently no way to limit which clouds can be added to a vCloud Connector instance so you must instruct your users to only use the proper vCloud Connector instance for their needs. All workloads being transferred are copied to a staging area on the vCloud Connector appliance before being copied to the destination cloud. This area is 20GB by default, which means the largest virtual machine that can be copied must be smaller in disk space than that. The ability to easily resize this staging area will be available in an upcoming update. vCloud Connector is designed to give you a consistent view of your workloads across multiple clouds and migrate those workloads. Therefore, you will still need to use the vCenter Client and/or login to the vCloud Director to manage your workloads. vApps/virtual machines must be powered off for migration, that is, the workloads must be offline. Hot migrations are currently not available. The vApp networking conguration will also need to be modied before powering on the virtual machines.
T ECHNICAL W HI T E P A P E R / 3 9
6.2 Logging
This section describes logging architecture concerns and logging as a service. Logging requirements and considerations for a public vCloud can be extensive. The requirements and considerations for a private vCloud typically may not be as rigorous. For a private vCloud, refer to enterprise-specic requirements and guidelines. Logs should be available in a vCloud for numerous reasons, including the following: Regulatory Compliance. Collect logs to make them available for analysis, security review, compliance requirements, as described in the Appendix: Security Considerations. Individual logs can then be used to satisfy specic compliance controls; for example, a user access log can be used to verify access to a resource is only by authorized users.
T ECHNICAL W HI T E P A P E R / 4 0
Customer Requirements. End customers (tenants) can retrieve logs that pertain to their environment in order to satisfying their own requirements, many of which, such as compliance, will likely be similar to provider requirements. Operational Integrity. Operational alerts should be dened where specic logs trigger notications for further remediation. This will typically be a backup alert, secondary to monitoring. Troubleshooting. Closely related to operational integrity, troubleshooting can be done with logs. For example, the use of vShield Edge logs can show whether or not a specic external connection request is being passed through the firewall or NATed by the firewall. 6.2.1 Logging Architectural Considerations Redundancy Many components rely on syslog for logging events. Syslog is a UDP-based protocol that lacks delivery guarantees. To help ensure delivery: Verify that infrastructure components have physically and logically redundant network interfaces. Send logs to two syslog targets. Where only a single syslog target is possible, it is recommended to log to a local syslog daemon congured to retransmit the logs to two remote syslog targets. For example, vCloud Director 1.0 only supports a single syslog target for its activity logs. Where possible, place log receivers on DRS so vCenter will restart them in the case of failure. Scalability vCloud infrastructure components generate a relatively low level of logs for provider infrastructure. Customer components, especially vShield Edge rewalls, can generate a very high volume of logs. Collecting logs on IOPS performance is critical. Collecting logs on CPU performance is negligible, but will matter for analysis. It is highly recommended that logs be collected to dedicated log partitions on collection servers. Reporting Logs need to be available to tenants. They should be able to download in raw format all vCloud Director and VSE logs pertaining to their organizations/networks. Logs with customer identiers should be agged or indexed for retrieval. Customer activity in vCloud Director will generate logs that are agged with their organization ID. vShield Edge devices do not have unique identifiers in vCloud Director 1.0. Therefore, it is VMware recommends you keep NAT-routed organization external networks and fenced vApps connected only to single-tenant provider networks. When the vShield Edge device is deployed by vCloud Director and its external IP address is allocated, the tenant can then be identied by the IP address.
T ECHNICAL W HI T E P A P E R / 4 1
v C O
vCD
Log Collection Node
vSM
vSE vCenter
CBM
Log Collection Node
v C O
LaaS
VMware,
VMware,
VMware,
vCenter
Log Collection Node Log Collection Node Log Collection Node
ESXi
ESXi
ESXi
ESXi
ESXi
ESXi
DB Collection/Agents - HA
Inc. VMware, VMware, Inc.
Web - Report/DL
VMware,
VMware,
VMware,
Inc.
VMware,
VMware,
Inc.
Inc.
Inc.
VMware,
VMware,
6.2.2 Logging as a Service Logging as a service can be done in two directions: With customer collection and forwarding to provider servers for analysis and reporting. With customer collection, reporting and analysis in the customer environment; and provider logs forwarded to the customer environment. Customer Collection forwarding to Provider Pros: Logs can be sent directly to collector even on customer private IP space. Resources can be allocated at the customer level for collection, allowing more granular scaling of collection. Cons: More difficult to scale analysis; challenges correlating customer activity to storage consumption. Collection node(s) still required even though utilization will be low. Most of the resource consumption is on the storage and analysis side, so the resources billed via the IaaS model will be minimal. Customer Collection, with Provider Logs Forwarded Into Customer Environment Pros: Distributed analysis relies on general cloud resources and can scale. Customer can employ their own analysis tools to organize and report on the data, or use a provider-supported package or appliance. Cons: Provider needs duplicate copy of infrastructure logs for provider purposes. Transmission of logs to the customer environment requires connectivity; either Internet or a provider service network, and inbound traffic through a rewall into the customer environment, adding risks.
VMware,
Inc.
VMware,
VMware,
Inc.
Inc.
Inc.
Inc.
Inc.
T ECHNICAL W HI T E P A P E R / 4 2
VMware,
Customer vApps
Org vDC
Org vDC
Inc. Inc.
LaaS
Inc. Inc.
Customer vApps
T ECHNICAL W HI T E P A P E R / 4 3
Private vCloud network routing and rewall requirements depend on security policies, organizational requirements, and workloads of the enterprise. For further details, see the Appendix on Security.
T ECHNICAL W HI T E P A P E R / 4 4
To address a 99.99% uptime SLA, VMware can only control the resiliency of its vCloud platform components and provide recommendations to mitigate single points of failure (SPOF) in the underlying infrastructure. A provider can eliminate SPOF by ensuring redundancy, as listed below: Redundant power sourced from multiple feeds, with multiple whips to racks, as well as sufficient backup battery and generator capacity. Redundant network components. Redundant storage components. Storage design needs to be able to handle the I/O load as well. Customer workloads may not be accessible under high disk latency, le locks, and so forth. Storage design should also be tied to business continuity and disaster recovery efforts, possibly including array level backups. Redundant server components (multiple independent power supplies, network interface cards (NICs), and, if appropriate, host based adaptors (HBAs). Sufficient compute resources for a minimum of N+1 redundancy within a vSphere HA cluster including sufficient capacity for timely recovery. Redundant databases and management Appropriate change, incident, problem and capacity management processes must also be well dened and enforced to make sure that poor operational processes do not result in unnecessary downtime. In addition to a redundant infrastructure, employees or contractors responsible for operating and maintaining the environment and the supporting infrastructure must be adequately trained and skilled. A vCloud is capable of supporting an uptime SLA of 99.99% by following the guidelines in the table in Appendix on vCloud Availability. The availability recommendations in the table allow a vCloud to achieve a 99.99% uptime SLA but only with no SPOFs in the underlying infrastructure, required skills available, and suitable processes dened and followed. 6.4.2 Load Balancing of vCloud Director Cells vCloud Director (vCloud Director) cells are stateless front-end processors for the vCloud. Each cell has a variety of purposes and self-manages various functions among cells, while connecting to a central database. The cell manages connectivity to the cloud and provides both API and UI end-points, or clients. Multiple cells (a load balanced group) should be used to address availability and scale. This is typically achieved by load balancing or content switching this front-end layer. Load balancers present a consistent address for services regardless of the underlying node responding. They can spread session load across cells, and monitor cell health and add/remove cells from the active service pool. The group should not be considered a true cluster since there is no failover from one cell to another. In general, any load balancer that supports SSL session persistence and has network connectivity to the public facing Internet or internal service network can perform load balancing of vCloud Director cells. General concerns around performance, security, manageability, and so forth should be taken into account when deciding to share or dedicate load balancing resources. For the purposes of this reference architecture, the load balancer is assumed to be a dedicated virtual machine or hardware device. For additional information on load balancing, see the Appendix on vCloud Availability, and the section on load balancers.
T ECHNICAL W HI T E P A P E R / 4 5
6.4.3 I/O Considerations vCloud Director offers controls for new organizations to guard against the misuse of resources by other organizations. These include: Quotas for running and stored virtual machines. Determines how many virtual machines each user in the organization can store and power on in the organizations virtual datacenters. The quotas you specify act as the default for all new users added to the organization. Limits for resource intensive operations. Prevents resource intensive operations from affecting all the users in an organization and also provide a defense against denial-of-service attacks. Simultaneous VMware Remote Console (VMRC) connections. Limits the number of simultaneous connections for performance or security reasons. NOTE: VMware currently does not recommend the use of SIOC/NIOC in vSphere beneath the cloud abstraction layer. vSphere contains options for storage I/O control (SIOC) and network I/O control (NIOC) but these functions are not integrated into vCloud Director and their use in vCloud could cause unpredictable results. 6.4.4 Disaster Recovery Disaster Recovery (DR) focuses on the recovery of systems and infrastructure after an incident that interrupts normal operations. A disaster can be dened as partial or complete unavailability of resources and services, including software, the virtualization layer, the cloud layer, and the workloads running in the resource groups. There are different approaches and technologies supported, but there are at least two areas that require disaster recovery: the management cluster and consumer resources. Consumer resources are described later in this document. There are different approaches and technologies supported. The two areas include the management cluster and the consumer resources, as dened later. Management Cluster Disaster Recovery Good practices at the infrastructure level will lead to easier disaster recovery of the management cluster. This includes technologies such as HA and DRS for reactive and proactive protection at the primary site. vCenter Heartbeat can also be used to protect vCenter Server, specically, at the primary site. For multi-site protection, vCenter Site Recovery Manager protection of virtual machines is VMwares solution that works normally for this use case, since the management VMs are not part of a cloud instance of any type (but, rather, running the cloud instances). Cloud Consumer Resources Disaster Recovery This section focuses on disaster recovery of the cloud infrastructure to handle failure to an alternate site. vCenter Site Recovery Manager is not supported, although there are manual steps that can be applied as long as vApp metadata is saved, conguration information is matched between the primary site and the recovery site, and the documented steps are validated. While Site Recovery Manager is vCenter Server-aware, vCloud Director is not Site Recovery Manager-aware. Without the collaboration between vCloud Director and Site Recovery Manager, the mechanisms working beneath the covers to synchronize VMs cannot work to keep vCloud Director in sync as well, and as a result the recovery of vCloud Director can be problematic. While it is possible to architect a solution where one sites total environment, 100% of the operational parameters of that site including IP addressing, start-up order of dependent systems and the like, are duplicated to another site to recover to, it would be very difficult to implement and maintain, so is therefore out of scope for this document. VMware is actively working on a streamlined solution to this situation, to be addressed by future product enhancements. 6.4.5 Backup and Restore of vApps This section focuses on handling of backup/restore procedures with the vApps that are deployed into the cloud. Traditional backup tools do not capture the required metadata associated with a vApp, including owner, network, and organization. This results in recovery/restoration issues. Without this data, recovery must include manual steps and conguration attributes to be manually re-entered.
T ECHNICAL W HI T E P A P E R / 4 6
Within a vCloud environment, a vApp can be a single virtual machine or group of virtual machines, treated as one object. Backup of vApps on isolated networks must be supported. Identifying inventories of individual organizations becomes challenging based on current methods that enumerate the backup items using vSphere, which uses UUIDs to differentiate objects whereas vCloud Director uses object IDs. For backing up and restoring vApps, VMware recommends the use of vStorage API Data Protection (VADP) based backup technologies. This technology has no agents on Guest OSs, is centralized for improved manageability, and has a reduced dependency on backup windows. Guest-based backup solutions may not work in a vCloud because not all virtual machines are accessible by network. Also, virtual machines may have identical IP addresses that can cause problems. Therefore, backups of vCloud vApps must require a virtual machine-level approach. When deploying virtual machines (as part of a vApp), use the full name and computer name elds to specify realistic names that will help describe the virtual machines. If this is not done, the generic information in these elds can make it hard to specify individual virtual machines. vApps and virtual machines that are provisioned by vCloud Director have a large GUID template_name,which means many virtual machines could appear to be very similar and make it hard for a user or administrator to ask for a specic virtual machine to be restored. VMware Solutions VMware Data Recovery is a vStorage API Data Protection based solution from VMware. Other vStorage API Data Protection based backup technologies are available from third-party backup vendors. Currently due to the UUID versus object ID issue discussed above, VMware Data Recovery cannot be used with VMware vCloud Director. Backup of vCloud workloads has a few requirements to address. VMware recommends that clients validate the level of support provided by the vendor to make sure client requirements are supported. Here is a list of Cloud vApp requirements to ask your vendor about:
vA P P R E Q U I r E m E N t D E ta I L
vStorage API Data Protection provides change-block tracking capability to reduce backup windows. Integration to enable backup of isolated VMs and vApps. Integration with vStorage API Data Protection to provide LAN-free and server-free backups to support better consolidation rations for vCloud and the underlying vSphere infrastructure. Use of the virtual machine UUID versus virtual machine name will support multi-tenancy and avoid potential name space conicts. Interface support for cloud provider administrator teams. In the future, consumer (organization administrator and users) access will potentially be provided by some vendors. Include vCloud metadata for the vApps. This includes temporary and permanent metadata per VM/vApp. This is required to make sure that recovery of the VM/vApp will have all data required to support resource requirements and SLAs. Provide vApp granularity for backups. Support backup of multitiered vApps (for example, a Microsoft Exchange vApp that has multiple virtual machines included. Backup selection of the Exchange vApp would pick up all the underlying virtual machines that are part of the main vApp).This is something that is not available today, but is being developed by vendors.
vApp requirements
T ECHNICAL W HI T E P A P E R / 4 7
Challenges The following is a list of backup and restore challenges: vApp naming posing conict issues between tenants vApp meta data required for recovery Multi-object vApp backup (protection groups for multi-tiered vApps) Manual recovery steps in the cloud Support for backup of vApps on isolated networks or with no network connectivity Enumeration of vApps by organization for use by the organization administrator Enumeration of vApps by organization and provider for use by the organization provider User initiated backup/recovery Support of provider (provider administrator) and consumer (organization administrator and user)
T ECHNICAL W HI T E P A P E R / 4 8
T Y P E O F R E S O U rc E P O O L
TOta L P E rc E N taG E
TOta L V M S
Pay-As-You-Go Small Reservation Pool Medium Reservation Pool Large Reservation Pool TOTAL
* Note: Some total virtual machines are rounded up or down due to percentages. Table 7. Denition of Resource Pool and Virtual Machine Split
The Service Denition for a Public Cloud also calls out the distribution for virtual machines in the environment with 45% small, 35% medium, 15% large, and 5% extra large. Below is a table that shows the total amount of memory, CPU, storage, and networking based on the assumptions and the total virtual machine count from the Service Denition for a Public Cloud.
It E m P E rc E N t v C PU S MEmOrY StO raG E N E tw O r K I N G
The above numbers may shock you. Before you determine your nal sizing numbers, you should refer to VMware best practices for common consolidation ratios on the above resources. An example table below shows what nal numbers could look like using typical consolidation ratios seen in eld deployments.
R E S O U rc E BEFOrE R at I O AFtEr
The above calculations could be served by 16 of the following hosts. Socket count: 4 Core count: 6 Hyper threading: Yes Memory: 128 GB Networking: Dual 10 GigE
T ECHNICAL W HI T E P A P E R / 4 9
The above calculations do not take into account the storage consumed by consumers or providers templates. The above calculations also do not take into account the resources consumed by the vShield Edge appliances that are deployed for each organization. There will be a vShield Edge for each private organization network and external organization network. Given the current Service Definition target of 25 organizations, a maximum of 275 vShield Edge appliances will be created. The specications for each vShield Edge appliance are listed below. CPU: 1 vCPU Memory: 64 MB Storage: 16 MB Network: 1 GigE (this is already calculated in the throughput of the workloads and should not be added again)
T ECHNICAL W HI T E P A P E R / 5 0
What should the vCloud offer? 2a Service Denition for a Public Cloud
What to consider in building a vCloud Public Cloud (Service Provider) 3 Architecting a vCloud
4b Private Cloud
Architect-level guide identifying what components go into a vCloud and design considerations. Uses Service Denition as input into what to design.
Audience: vCloud Architect familiar with vSphere best practices (ideally VCP-level) and exposure to vCloud product components
T ECHNICAL W HI T E P A P E R / 5 1
com.vmware.vcloud.diagnostics.UserSessions Local (cell) user session statistics 1 n/a Description Total number of sessions created on this cell Total number of successful logins to this cell Total number of failed login requests to this cell
com.vmware.vcloud.GlobalUserSessionStatistics List of active user sessions by organization. 1 n/a Description Database ID of the organization Number of active sessions number of open sessions
com.vmware.vcloud.diagnostics.DataAccess Local (cell) user session statistics 1 Conversation Description object type of the last database object accessed time taken to access the last database object accessed object type of the worst (slowest) database object access time taken by the worst (slowest) database object access
Mbean Description
com.vmware.vcloud.datasource.globalDataSource Statistics and conguration information about the database connection pool. This information is currently specic to the database JDBC driver being used (Oracle). 1
Cardinality
T ECHNICAL W HI T E P A P E R / 5 2
LO ca L U S E r S E S S I O N S
Instance ID Attribute abandonedConnectionTimeout availableConnectionsCount borrowedConnectionsCount connectionHarvestMaxCount connectionHarvestTriggerCount connectionPoolName connectionWaitTImeout databaseName dataSourceName fastConnectionFailoverEnabled inactiveConnectionTimeout initialPoolSize loginTImeout maxConnectionReuseCount maxIdleTime maxPoolSize maxStatements minPoolSize networkProtocol ONSConfiguration portNumber SQLForValidateConnection timeoutCheckInterval timeToLiveConnectionTimeout URL user validateConnectionOnBorrow
V I M O P E rat I O N S
Description
minimum number of connections that will exist in the pool network protocol used by JDBC driver
com.vmware.vcloud.diagnostics.VlsiOperations Local (cell) user session statistics 1 per VIM end-point (VC or host agent) VIM end-point URL Description the total network round-trip time taken to make the MethodName call on object of type ObjectType in the VIM end-point.
T ECHNICAL W HI T E P A P E R / 5 3
LO ca L U S E r S E S S I O N S Pr E S E N tat I O N A PI M E t H O D S
com.vmware.vcloud.diagnostics.VlsiOperations Local (cell) user session statistics 1 per presentation layer method method name Description currently active invocations total number of failed executions total number of invocations over time total time taken to execute
com.vmware.vcloud.diagnostics.Jetty Web server request statistics 2, 1 for REST API and 1 for UI UI Requests for UI, REST API Requests for REST API Description number of web requests currently being handled
com.vmware.vcloud.diagnostics.VlsiOperations Local (cell) user session statistics 1 per operation stage/granularity: RoundTrip, BasicLogin, Logout, Authentication, SecurityFilter, ConversationFilter, JAXRSServlet. RoundTrip is the most interesting, as it represents the overall REST API performance. One of: RoundTrip, BasicLogin, Logout, Authentication, SecurityFilter, ConversationFilter, JAXRSServlet Description currently active invocations total number of failed executions total number of invocations over time total time taken to execute
com.vmware.vcloud.diagnostics.TaskExecutionJobs Statistics about long running tasks 1 per task Name of task Description currently active invocations
T ECHNICAL W HI T E P A P E R / 5 4
LO ca L U S E r S E S S I O N S
total number of failed executions total number of invocations over time total time taken to execute
Mbean Description Cardinality Instance ID Attribute currentInvocations totalFailed totalInvocations executionTime returnedItems
VC Ta S K M a N aG E r
com.vmware.vcloud.diagnostics.QueryService Presentation layer query service statistics 1 per query query name Description currently active invocations total number of failed executions total number of invocations over time total time taken to execute number of items returned by successful query executions
Mbean Description Cardinality Instance ID Attribute successfulTasksCount failedTasksCount waitForTaskInvocationsCount completedWaitForTasksCount historicalTasksCount vcRetrievedTaskCompletionsCount taskCompletionMessagesPublishedCount taskCompletionMessagesReceivedCount success_elapsedTaskWaitTime failed_elapsedTaskWaitTIme
Description total successful tasks total failed tasks total invocations of VIM wait for task total completed task waits total historical task updates received total task completions received total task completion messages published on message bus total task completion messages received on message bus time elapsed for successful tasks time elapsed for failed tasks
com.vmware.vcloud.diagnostics.VimInventoryUpdates Inventory processing statistics 3, one for ObjectUpdate, PropertyCollector and UpdateSets respectively ObjectUpdate Description
T ECHNICAL W HI T E P A P E R / 5 5
LO ca L U S E r S E S S I O N S
total number of object updates received total number of object updates failed to be processed time taken for updates
com.vmware.vcloud.diagnostics.VimInventoryEvents VIM inventory event manager statistics. Tracks the frequency of common vCenter events. 1 per folder per VC URL, 1 MBean per event name event name Description
total number of VIM inventory events that were failed to be handled total time time to handle VIM inventory events
Mbean Description Cardinality Instance ID Attribute totalInvocations executionTime totalItemsInQueue objectsInQueue objectBusyRequeueCount loadValidationObjectTime duplicatesDiscarded
VC Obj E ct Va L I Dat I O N R E act I O N S
com.vmware.vcloud.diagnostics.VcValidation VC object validation statistics 1 global plus 1 per validator null = global, validator name = per validator Description total number of validation executions total time spent in validator total items currently queued for validation (global) total items currently queued for validation (per validator) total number of objects requeued for validation due to object being busy time taken to load validation object total number of discarded duplicate validations
com.vmware.vcloud.diagnostics.Reactions validation reaction statistics 1 global plus 1 per reaction null = global, reaction name = per reaction Description total number of reaction executions total number of reactions requeued due to objects being busy total number of executions of this reaction total time spent in reaction
T ECHNICAL W HI T E P A P E R / 5 6
LO ca L U S E r S E S S I O N S
failedReactions objectRequeueCount
VC c O N N E ct I O N S
total number of failed reactions number of times this reaction was requeued due to objects being busy
Mbean Description Cardinality Instance ID Attribute Connected Count Disconnected Count Start Count UI Vim Reconnect Count
Act I v E M Q
com.vmware.vcloud.diagnostics.VimConnection Local (cell) user session statistics 1 per VC VC-VcInstanceId where VcInstanceId is an integer identifying the virtual center Description Total successful connections Total disconnections Total number of times the VC listener was started Total number of times the VC was reconnected through the UI
Mbean Description Cardinality Instance ID Attribute lastHealthCheckDate messageRoundTripDurationMs unreachableCells isHealthy reachableCells
T ra N S F E r S E rv E r
com.vmware.vcloud.diagnostics.ActiveMQ Active MQ (message bus) statistics 1 global and 1 per peer VCSD cell (each cell other than the current one) Global = global statisticsto_cellName_cellPrimaryIp_ cellUUID=per cell Description last time health check was performed Time taken for an echo message to be sent and returned Total number of unreachable cells Health of connections to pear Total number of reachable cells
Description number of items successfully transferred number of items that were failed to be transferred number of successful upload operations
T ECHNICAL W HI T E P A P E R / 5 7
Availability All VMware ESXi hosts will be congured in highly available clusters with a minimum of n+1 redundancy. This will provide protection not only for the customers virtual machines, but also the virtual machines hosting the platform portal/management applications and all of the vShield Edge appliances.
Failure Impact In the event of a failure of a host, VMware HA will detect the failure within 13 seconds and commence powering on the failed virtual machines on other hosts within the cluster. (Analogous to pressing the power button on a physical server, but not including time to boot the OS or launch applications) VMware HA Admission Control ensures sufficient resources are available to restart the virtual machines. The admission control policy Percentage of cluster resources is recommended as it is exible while guaranteeing resource availability. The following whitepaper contains best practices around increasing availability / resiliency: http://www.vmware.com/ les/pdf/techpaper/VMW-Server-WPBestPractices.pdf It is also recommended that vCenter is congured to proactively migrate virtual machines off a host in the even the hosts health become unstable. Rules can be dened in vCenter when monitoring host system health.
VMware DRS will automatically migrate virtual machines between hosts to make sure the cluster is balanced to reduce the risk of a noisy neighbor virtual machine monopolizing CPU and memory resources within a host at the expense of other virtual machines running on the same host. VMware Storage I/O Control will automatically throttle hosts and virtual machines when it detects that the datastore is congested/bottlenecked. This ensures that a noisy neighbor virtual machine does not monopolize storage I/O resources. Storage I/O Control ensures each virtual machine will receive the resources it is entitled to by leveraging the shares mechanism.
No impact. Virtual machines are automatically migrated between hosts with no downtime by VMware DRS.
No impact. Virtual machines and ESXi hosts are throttled by Storage I/O Control automatically based on their entitlement relative to the amount of shares or the maximum amount of IOPS congured. For more information on Storage I/O Control, check out the whitepaper: http://www.vmware.com/ files/pdf/techpaper/VMW-vSphere41SIOC.pdf
T ECHNICAL W HI T E P A P E R / 5 8
M a I N ta I N I N G R U N N I N G W O r K LOa D
ESXi hosts will be congured with a minimum of 2 physical paths to each required network (port group) to make sure a single link failure does not impact platform or virtual machine connectivity this should include management and vMotion networks. The Load Based Teaming mechanism will be used to avoid oversubscribed network links. ESXi hosts will be congured with a minimum of 2 physical paths to each LUN or NFS share to make sure a single storage path failure does not result in an impact to service. Path Selection Plug-in will be selected based on storage vendors best practices.
No impact. Failover will occur with no interruption to service. Conguration of failover and failback as well as corresponding physical settings such as Portfast are a requirement.
M a I N ta I N I N G W O r K LOa D Acc E S S I b I L I t Y
Availability vCenter Server will run as a virtual machine and make use of vCenter Server Heartbeat.
Failure Impact vCenter Server Heartbeat provides a clustered solution for vCenter Server with fully automated failover between nodes, thereby providing near zero downtime. vCenter Heartbeat or Oracle RAC provides a clustered solution for a vCenter database with fully automated failover between nodes, thereby providing zero downtime. Oracle RAC supports the resiliency of the vCloud Director and Chargeback databases as it maintains vCloud Director state information and the critical Chargeback data required for customer billing respectively. While not required for maintaining workload accessibility, clustering the Chargeback database makes sure providers can accurately produce customer billing information. In the event that one of the data collectors for a group should go offline, the others will pick up the load such that transactions are captured by vCenter Chargeback.
VMware vCenter Database resiliency is provided with vCenter Heartbeat if MS SQL is used or Oracle RAC if Oracle is used. VMware vCloud component database resiliency is provided with Oracle RAC.
vCenter Chargeback Data Collectors vCenter Server, vCloud Director, vShield Manager shall be distributed in the event of failure
vC LO U D I N F ra S tr U ct U r E Pr Ot E ct I O N
Component
Availability
Failure Impact
T ECHNICAL W HI T E P A P E R / 5 9
M a I N ta I N I N G R U N N I N G W O r K LOa D
vShield Manager
vShield Manager will receive the additional protection of VMware FT, resulting in seamless failover between hosts in the event of a host failure. VM Monitoring is enabled on a cluster level within HA and uses the VMware Tools heartbeat to verify vShield Manager is alive. When a virtual machine fails and thus the VMware Tools heartbeat is not updated VM Monitoring will verify if any storage or networking I/O has occurred over the last 120 seconds before the virtual machine will be restarted. It is also recommended to create a scheduled backup of vShield Manager to an external FTP or SFTP server.
Infrastructure availability yes, service availability no. vShield Edge devices will continue to run without the management control, but no addition edge appliances or modications to existing can occur until the service comes back online.
vCenter Chargeback
vCenter Chargeback virtual machines will be deployed as a two node, load balanced cluster. Multiple Chargeback data collectors can be deployed remotely to avoid a single point of failure.
There is no impact on Infrastructure availability or customer virtual machines. While not required for maintaining workload accessibility, clustering the vCenter Chargeback servers makes sure providers can accurately produce customer billing information and usage reports. Session state of users connected via the portal to failed instance will be lost. They will be able to reconnect immediately. No impact to customer virtual machines.
vCloud Director
The vCloud Director virtual machines will be deployed as a load balanced, highly available clustered pair in an N+1 redundancy set up, with the option to scale out when the environment requires this. vShield Edge can be deployed through API and vCloud Director. To provide network reliability VM Monitoring will be enabled. In case of a vShield Edge Guest OS failure VM Monitoring will restart the vShield Edge device. vShield Edge appliances do not have VMware Tools and thus are not monitored as part of VMware HA guest OS monitoring.
vShield Edge
Partial temporary loss of service. vShield Edge is possible connection into organization. No impact to customer virtual machines or VM Remote Console (VMRC) access. All external network routed connectivity will be lost if a vShield Edge appliance
T ECHNICAL W HI T E P A P E R / 6 0
vShield VPN can connect multiple cloud deployments. For example, an organizations virtual datacenter at a service provider on the West Coast could be connected with the organizations virtual datacenter at a service provider on the East Coast. NOTE: Each internal subnet must have a unique address space to connect successfully. Because vShield also provides address translation, it is possible to deploy multiple organization virtual datacenters at different providers using the same RFC1918 address space. Unique subnets are required to connect them.
It is possible to have a permanent VPN from a router-based VPN, such as a Cisco 2821 Integrated Services Router, to a cloud environment with the vShield Edge. This can also be accomplished via a Linux-based gateway since vShield VPN is compatible with Openswan, a Linux IPSEC implementation. Client software is generally not supported, although robust clients with static IPs that support pre-shared key authentication can connect.
To configure vShield VPN, the endpoint connecting to the vShield Edge device must: Support the following: IKE for ISAKMP/Oakley Pre-Shared Key Authentication Mode 3DES or AES128 encryption SHA1 authentication Diffie-Helman Group 2/5 (1024 bit/1536 bit, respectively) PFS (Perfect Forward Secrecy) ESP Tunnel Mode Disable ISAKMP aggressive mode
11.2 Compliance
Audit concepts applied to a cloud environment, such as segmentation and monitoring, reveal new challenges. Elasticity may break old segmentation controls and the ability to isolate sensitive data within a rapidly growing environment. Role-based access controls and virtual rewalls must also demonstrate compatibility with audit requirements for segmentation, including detailed audit trails and logs. Can a provider guarantee that an off-line image with sensitive data in memory is accessible only by authorized users, and can a log tell who accessed it and when? Multiple admin-level roles are necessary for cloud resource management. The complexity of cloud environments, coupled with new and different technology, requires careful audits to document and detail compliance. The following are a set of common audit concerns within the cloud.
T ECHNICAL W HI T E P A P E R / 6 1
CONcErN
D E ta I L
Hypervisor
An additional layer of technology is present in every cloud and therefore presents a different attack surface. It introduces a layer between the traditional processing environment and the physical layer, which brings vulnerabilities of its own as well as new paths of attack related to its communication with the layers above and below. May expose sensitive data when not congured and monitored properly; physical and logical isolation has always been an audit concern. The ease and speed of change to a virtualized environment within cloud computing, often called elasticity, makes the setup of review of segmentation controls even more critical to compliance. A cloud can make much more efficient use of hardware, but this brings auditors to assess whether sensitive data is at risk just from the proximity of other virtual systems managed to a lesser level of security. Some compliance standards for example, require one primary function per server (or virtual server), as illustrated below.
App App App Management Controls Distributed Network Hypervisor Cluster ESXi ESXi ESXi ESXi FW, IDS, AV App Regulated Data FW, IDS AV App App App App
Physical
Storage
Routers
Switches
In a cloud environment, remote network access becomes the only path for customers to manage an environment. Before, physical access was audited for equipment installation and modication, now software controls customer access capabilities. Authorization software has to be more sophisticated to handle every user, group, and role request for a cloud customer. The ability of systems to quickly change and move within the cloud gives auditors a need to track this. Cloud environments make extensive use of short-lived instances. Virtual machines may have a lifecycle far shorter than physical systems because they are so easy to provision and then repurpose. The systems often share data across large arrays. Permanence of data is also affected by environments that push as much storage as possible through high-speed memory to avoid the latency of spinning disk. Customers need a view of their audit trails that is unique to their own use of the cloud environment and that can be used for investigations. Providers must enhance the sophistication of existing log tools in order to keep up with the new technology and new management practices within a cloud environment.
T ECHNICAL W HI T E P A P E R / 6 2
T ECHNICAL W HI T E P A P E R / 6 3
11.3.1 Example Compliance Use Cases for Logs The following use cases are a sample of events that benet from careful logging and monitoring in the cloud environment. Other examples may include unauthorized services or protocols, remote login success and certicate changes. Shared accounts. An investigation is initiated to review network outages and nds multiple instances of an Administrator account had logged into critical servers before failure. Shared accounts make it very difficult to trace fault to one individual; it is impossible to determine from the logs on that system which person was logged into the user account that made the error. Therefore, usage must be tied to an individual user ID and unique password with correct time to aid in investigations. Systems also should be congured to detect any and all use of generic IDs such as an administrator or root account and trace them to unique identities. User account changes. A malicious user nds an unpatched aw in an environment that allows elevation of privileges. That user then uses system-level privileges to create a new bogus user object from which to launch further attacks. A user object is a Microsoft Widows Domain or Local user account, for example. User object logs can be used to gure out when a name was changed or an account added. This assists in detection of actions without authorization or users trying to hide attacks. Unauthorized software. Malware or a new virtual machine instance in the cloud, can be found in system object logs. A system must track system objects that are added, removed or modified. This can be very helpful during installation to monitor system changes caused by software. 11.3.2 VMware vCloud Log Sources for Compliance Customers should be able to retrieve logs from all areas that are relevant and unique to their organization. Retrieval should be possible via a programmatic fashion such as an API to allow for automated queries. Log collection nodes must be added to a cloud environment, as illustrated below.
App App App Management Controls Distributed Network Hypervisor Cluster ESXi
Log Collection
App App
App App
Nonregulated Data
FW, IDS, AV
ESXi
ESXi
ESXi
Physical
Storage
Routers
Switches
T ECHNICAL W HI T E P A P E R / 6 4
Logs, generated by VMware components, must be maintained by the provider but also must be available to tenants. Tenants should be able to download in raw format all vCloud Director and VSE logs pertaining to their organizations/networks. Logs with customer identiers should be agged or indexed for retrieval. The following diagram illustrates architecture of vCloud components and log collection.
Log Reporting Archive Log Analysis Log Analysis
vSE VMware vCloud Director vCD Server vCenter Orchestrator vCD Server vCD DB vCC vCC DB VM VM vCenter eVDC Customer Env Log Collector VM VM vCenter Server vCenter DB
vCenter
vShield Manager
vShield Manager
VMware vSphere
VMware vSphere
VMware vSphere
The following table illustrates to which logs a vCloud tenant must have access.
V M war E C O m P O N E N t Pr Ov I D E r LO G S T E N a N t LO G S
VMware vCloud Director (vCD) vCenter Server (VC) vSphere Server (ESXi) Chargeback Manager (CBM) vCenter Orchestrator (vCO) vShield Manager (VSM) vShield Edge (VSE)
Table 14. vCloud Component Logs
Other components also generate logs in the cloud environment that must be maintained by the provider but direct tenant access is not required.
T ECHNICAL W HI T E P A P E R / 6 5
Ot H E r C O m P O N E N t
Pr Ov I D E r LO G S
T E N a N t LO G S
vCloud Director DB (Oracle) VMware Virtual Center Database VMware vCenter Chargeback Database MS SQL Server Linux (vCD) Windows System Logs (CBM, vCO, VC Server)
Table 15. Other Component Logs
Logs in the vCloud Datacenter environment can further be categorized into four logical business layers: 1. Cloud Application: Represents the external interface that the enterprise administrators of the cloud are able to interact with. These administrators are authenticated and authorized at this layer, and have no (direct or indirect) access to the underlying infrastructure. They interact only with the Business Orchestration Layer. 2. Business Orchestration: Represents both the conguration entities of the cloud, as well as, the governance policies to control the cloud deployment. vCenter Chargeback: Service Catalog: Presents the different service levels available and their conguration elements Service Design: Represents the service level and specic conguration elements along with any policies dened. Conguration Management Database (CMDB): Represents the system of record, which may be federated with enterprise CMDB. Service Provision: Represents the nal conguration specication. 3. Service Orchestration: Represents the provisioning logic for the cloud infrastructure. This layer consists of an orchestration director system and automation elements for network, storage, security, and server/ compute vCenter Server (VC), VMware vCloud Director (vCloud Director), vCenter Orchestrator (vCO). 4. Infrastructure Layer: Represents the physical and virtual compute, network, storage, hypervisor, security and management components vSphere Server (ESXi), vShield Manager (VSM), vShield Edge (VSE).
T ECHNICAL W HI T E P A P E R / 6 6
API
Custom/Enterprise Portal
Administrators
Application Management
Users
Enterprise
Service Provision
Business Admin
Business Management
IDS
Service Orchestration
Storage
AppVM
Security Automation
Service Management
Server Automation
Firewall
Network
Security
Management
Hypervisor
Storage
Infrastructure Admin
Storage
Compute Physical
AppVM
Infrastructure Management
Infrastructure Layer
The abstraction of these four layers and their security controls helps illustrate audit and compliance requirements for proper authentication and segregation. Cloud provider administrator accounts, for example, should be maintained in a central repository integrated with two-factor authentication. Different tiers of cloud deployments (VPDCs) would be made available to enterprise users. The following diagram illustrates architecture of vCloud components and log collection.
T ECHNICAL W HI T E P A P E R / 6 7
Audit events, as dened earlier, are not the only event types. Diagnostic logs, described below, contain information about system operation events and are stored as les in the local le system of each cells OS. Diagnostic logs can be useful for problem resolution but are not intended to preserve an trail of system interactions for audit. Each VMware vCloud Director cell creates several diagnostic log les described in the Viewing the vCloud Director Logs section of the VMware vCloud Director Administrators Guide. Audit logs, on the other hand, do record signicant actions, including login and logout. A syslog server can be set up during installation as detailed in the vCloud Director Installation Guide. Exporting the logs to a syslog server is required for compliance due to multiple reasons: 1. Database logs are not retained after 90 days, while logs transmitted via syslog can be retained as long as desired. 2. It allows audit logs from all cells to be viewed together in a central location at the same time. 3. It protects the audit logs from loss on the local system due to failure, a lack of disk space, compromise, and so on. 4. It supports forensics operations in the face of problems like those listed above. 5. It is the method by which many log management and Security Information and Event Management (SIEM) systems will integrate with vCloud Director. This enables: a. Correlation of events and activities across vCloud Director, vShield, vSphere, and even the physical hardware layers of the stack b. Integration of cloud security operations with the rest of the cloud providers or enterprises security operations, cutting across physical, virtual, and cloud infrastructures 6. Logging to a remote system, rather than the system the cell is deployed on, provides data integrity, that is, inhibits tampering. A compromise of the cell does not necessarily enable access to or alteration of the audit log.
Security
A front-end rewall is typically deployed before the load balancer. In some environments additional rewalls may be located between vCloud Director cells and the resource tiers, including vCenter. Load Balancers may also provide NAT/SNAT (Source network address translation) and is typically congured to provide this for the clustered cells. It is also recommended to secure access between cells and the other management and resource group components. Refer to the vCloud Director Installation and Configuration Guide for ports that must be opened.
Single vCloud Director site and scope Sizing recommendations for number of cells
This architecture covers load balancing of a single vCloud Director site or instance. It does not cover client application load balancing or global load balancing. In general, VMware recommends that the number of vCloud Director cell instances = n + 1, where n is the number of vCenter Server instances providing compute resources for cloud consumption. Based on the Service Denition, two vCloud Director cell instances should be sufficient and allow for upgradability (upgrading one vCloud Director cell, then the other) and high availability.
T ECHNICAL W HI T E P A P E R / 6 8
C O N S I D E rat I O N
D E ta I L
Multiple vCloud Director cells require NTP (Network Time Protocol) which is a best practice for all elements of the vCloud infrastructure. Consult www.vmware.com/les/pdf/Timekeeping-In-VirtualMachines.pdf for more information on how to set up NTP.
At least two load balancers in a HA conguration should be used to reduce single points of failure. There are multiple strategies for this depending on vendor or software used. Each load-balanced vCloud Director cell requires setting a proxy console IP address which should be provided by the load balancer in most cases. The cloud service URL should be mapped to the address provided via the load balancer. This is configured in the vCloud Director administrator GUI as well as in the load balancer conguration. This is the address that should be used to check the health status of the vCloud Director cell. Some vCloud Director cell roles may consume high resources, for example, image transfer. All cells can perform the same set of tasks but it would be possible to set policies that effect which ones are used. See the advanced conguration settings. Sessions are generally provided in secure methods and are terminated at the cells. Because of this, session persistence should be enabled using SSL. Least connections or round-robin is generally acceptable. Each load balancer service should be congured to check the health of the individual vCloud Director cells. Since each cell responds via HTTPS, this can be congured quickly via the IP and API end point URL. Load balancers may support other types of health checks. Generally services are checked every few-30 seconds based on load. A good starting point is 5 seconds. Example GUI URL - https://my.cloud.com/cloud/ Example API URL - https://my.cloud.com/api/versions In the second example, the versions supported by this end point should be returned as XML.
Load balancer session persistence Load balancing algorithm vcloud Director cell status health checks
The service IP should be specied appropriately before cells are added to the service group. Typically port 443 is the only port exposed standard HTTPS. Load balancers can also provide layer 7 content switching or direction, which may allow a vCloud Director conguration send certain types of client traffic to dedicated cells. While cells can perform any function, they can be utilized in a dedicated fashion if they only receive those types of requests. When a cell joins an existing cluster, it may try and load balance sessions. This may impact connection mapping thru the load balancer as it will be unaware of the balancing happening within the cell cluster. VMware vShield Edges load balancing functionality does not support SSL session persistence today and may not be not be suitable at this point for this application.
Connection mapping
Session persistence
T ECHNICAL W HI T E P A P E R / 6 9
a b c d e
Generate and import the CA signed wildcard certificate for vCloud Director Generate the wildcard untrusted certificate with the necessary details Generate the Certificate Signing Request Send the CSR to the Certicate Authority The CA will send you back the cert with a root cert and possibly an intermediate certicate
T ECHNICAL W HI T E P A P E R / 7 0
St E P
REFErENcED DOcUmENt
f g h i j
import the root certicate into your keystore Import the intermediate certicate (if there is one, it depends on your CA) When you run the vCloud conguration script it looks for the correct aliases, you need to create them by cloning the wildcard alias. Now import the wildcard certificate for each one If you need to delete an entry from the keystore
For further information, refer to the keytool man page keytool(1). Generating and Importing CA-Signed Wildcard Certicate for vCloud Director Each vCloud Director host requires two TLSv1/SSL certificates, one for each of its IP addresses. You can use wildcard certicates to simplify the addition of new Cells. The following is the procedure to create the keystore for vCloud Director to use. You create the keystore once and then copy it to each new Cell added. NOTE: The keytool(1) certicate management utility gets installed with vCloud Director, the full path is /opt/ vmware/cloud-director/jre/bin/keytool Example: $ mkdir -p /opt/keystore $ chown vcloud:vcloud /opt/keystore $ cd /opt/keystore Generating the Wildcard Untrusted Certicate with the Necessary Details When you run the following command below, you are going to be asked specic questions in order to generate a wildcard certicate. Example: $ /opt/vmware/cloud-director/jre/bin/keytool -keystore certicates.vmware -storetype JCEKS -storepass <certicate passwd> -genkey -keyalg RSA -alias wildcard
What is your rst and last name? [Unknown]: johndoe.example.com What is the name of your organizational unit? [Unknown]: Cloud Engineering What is the name of your organization? [Unknown]: Example, Inc. What is the name of your City or Locality? [Unknown]: Cambridge What is the name of your State or Province? [Unknown]: Massachusetts
T ECHNICAL W HI T E P A P E R / 7 1
What is the two-letter country code for this unit? [Unknown]: US Is CN=johndoe.example.com, OU=Cloud Engineering, O=Example, Inc., L=Cambridge, ST=Massachusetts, C=US correct? [no]: yes Enter key password for <wildcard> (RETURN if same as keystore password): Generating the Certicate Signing Request Once the certicate-signing request has been approved, you should be able to obtain the server certicate in ASCII format. This contains the public key of the certicate that corresponds with the private key that was created when you generated the certicate request. The ASCII representation of the approved certicate will look something like: BEGIN CERTIFICATE MIICtDCCAl6gAwIBAgIBHTANBgkqhkiG9w0BAQQFADB5MQswCQYDVQQGEwJVUzEO MAwGA1UECBMFVGV4YXMxDzANBgNVBAcTBkF1c3RpbjEZMBcGA1UEChMQU3VuIE1p Y3Jvc3lzdGVtczEQMA4GA1UECxMHaVBsYW5ldDEcMBoGA1UEAxMTQ2VydGlmaWNh dGUgTWFuYWdlcjAeFw0wMTEyMTMyMjQ4MzRaFw0wMjEyMTMyMjQ4MzRaMH0xCzAJ BgNVBAYTAlVTMQ4wDAYDVQQIEwVUZXhhczEPMA0GA1UEBxMGQXVzdGluMRkwFwYD VQQKExBTdW4gTWljcm9zeXN0ZW1zMRAwDgYDVQQLEwdpUGxhbmV0MSAwHgYDVQQD ExdzdW5maXJlLmNlbnRyYWwuc3VuLmNvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAw gYkCgYEA45ji7uN6LqdCVehxPnuKzzqq2PfFaTaWZhqYro903bSdf9Qp+sGabDfJ qrspwrgjE2Owwia4H3InHpvzkcf2O2uB89bwm/RyHhU5AGt3wVFmsgN16XIL+smk CBBSJo31RTuIZw11ZYkkqMZzVY84sBpGJ0mtD1xnWhsb0MYN5bMCAwEAAaOBiDCB hTARBglghkgBhvhCAQEEBAMCBsAwDgYDVR0PAQH/BAQDAgTwMB0GA1UdDgQWBBTM q/dM6tawKUfRnqKupfhU3HlDAfBgNVHSMEGDAWgBRCOcKaQjn6l7Ft1OqsPcji gwlFuTAgBgNVHREEGTAXgRVuZWlsLmEud2lsc29uQHN1bi5jb20wDQYJKoZIhvcN AQEEBQADQQApqNPdeDARy6xWu7/SfxAH12S/wPD43OYJqbt/R2y5/Zpde/arIyhk fucakqo0Bk9DlI/A4IR+b9Q56k6Ce8tO END CERTIFICATE Example: $ /opt/vmware/cloud-director/jre/bin/keytool -keystore certicates.vmware -storetype JCEKS -storepass <certicate passwd> -certreq -alias wildcard -le wildcard.csr Send the CSR to the Certicate Authority The CA will send you back the cert with a root cert and possibly an intermediate certicate. Import the Root Certicate into your Keystore Example: $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -storepass <certicate passwd> -keystore certicates.vmware -import -alias root -le EntrustRootCerticate.cer
T ECHNICAL W HI T E P A P E R / 7 2
This step or subsequent certificate import steps may result in a trust this certificate? prompt from keytool(1), so do not be surprised when this happens. Also be aware that the certicates.vmware) needs to be the same keystore that the CSR was produced from. NOTE: The keystore needs to be in JCEKS format, as vCloud Director 1.0 does not support other formats. Import the Intermediate Certicate Example: $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -storepass <certicate passwd> -keystore certicates.vmware -import alias intermediate -le EntrustCrossCerticate.cer Create Correct Aliases for vCloud Directors Conguration Script $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -storepass <certicate passwd> -keystore certicates.vmware -keyclone alias wildcard -dest http $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -storepass <certicate passwd> -keystore certicates.vmware -keyclone alias wildcard -dest consoleproxy Import the Wildcard Certicate Example: $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -storepass <certicate passwd> -keystore certicates.vmware -import -alias http -le wildcard.cer $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -storepass <certicate passwd> -keystore certicates.vmware -import -alias consoleproxy -le wildcard.cer Copy the Certicates It may be obvious to the reader, but careful consideration should be made to ensure that you do not copy the keystores into the $VCLOUD_HOME location, as this will only confuse you when you run the vCloud Director congure tool which creates the certicates and proxycerticates les in $VCLOUD_HOME/etc. A more important note is to ensure that the permissions are set such that the vcloud user has read access, and that the les are not owned by root with the group and other bits unset). Deleting an Entry from the Keystore If you need to delete an entry from the keystore, then you can use the following procedure to do so. Example: $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -storepass <certicate passwd> -keystore certicates.vmware -delete alias consoleproxy Viewing and Verifying the Keystore Entries The keystore used by vCloud Director is a storage mechanism for the cryptographic tokens. These tokens are also known as entries, and we will see shortly how to view these keystore entry certicate details using keytool(1). Each entry in a keystore is identified by a different alias or entry name. Entries also store their last modied date/time. The KeyStore is also password protected. The password is required to load the keystore and a password will be requested when saving a keystore for the rst time. There are various different types of KeyStores available, such as: JKS Java KeyStore. Suns keystore format. JCEKS Java Cryptography Extension KeyStore. PKCS #12 Public-Key Cryptography Standards BKS Bouncy Castle KeyStore. UBER Bouncy Castle UBER KeyStore.
T ECHNICAL W HI T E P A P E R / 7 3
However, currently vCloud Director v1.0 only supports JCEKS, which is a more secure version of JKS. As previously mentioned, to view the contents of a KeyStore that has already been created, you will need to know the password, which is a random encrypted string. You can only verify a KeyStore that you have created or know the password for. The following example shows you how to view the contents of the KeyStore entries. There are two ways of passing the password to keytool(1). They are as follows: $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -list -v -keystore certicates.vmware Enter keystore password: <enter keystore password> and $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -list storepass <keystore passwd> -v -keystore certicates.vmware If you do not provide the correct KeyStore password you will see the following error message: $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -list -v -keystore certicates.vmware Enter keystore password: <incorrect keystore password> keytool error: java.io.IOException: Keystore was tampered with, or password was incorrect java.io.IOException: Keystore was tampered with, or password was incorrect $ It is also worth mentioning that the vCloud Director cells do not share a common keystore. As it states in the installation guide, each cell has its own keystore that contains two keys: one for the HTTP service and one for the console proxy. If you use a wildcard certicate, you still need two entries in the keystore with the appropriate aliases and since there is no cell-specic information in the keystore, you could use it for the congure step on each cell. It is also important to note that the keystore le should be protected by restrictive operating system permissions. The password for the le should be stored securely or should be prompted for on server startup. Things to Check If you encounter any issues whilst setting up and implementing certicates in vCloud Director, such as the infamous Cryptographic error then here are a few things to check. 1. Make sure that certicates-vmware is a JCEKS keystore, and not the default JKS type. 2. Make sure that any password for the keystore does not use special characters that the shell will interpret, when the keystore is created, or that the arguments to keytool(1) were quoted to prevent the shell from interpreting it (namely the dollar sign followed by anything or asterisk, and so on). There was a case previously where a customer used something along the lines of -storepass ca$$ and wondered why the congure script complained that that the password was wrong. It worked from one particular shell because bash would expand $$ to be the PID of the current shell. Obviously the vCloud Director configure script does not expand the value and kept using ca$$ instead of ca<someint> 3. Can you successfully list the certicates in the keystore or print it using /opt/vmware/cloud-director/jre/ bin/keytool? Given that the same Java platform code will be used under the covers to read in the keystore, decrypt values, and so on, this is a good check to make sure that you have not done something wrong. at com.sun.crypto.provider.JceKeyStore.engineLoad(DashoA13*..) at java.security.KeyStore.load(KeyStore.java:1185) at sun.security.tools.KeyTool.doCommands(KeyTool.java:620) at sun.security.tools.KeyTool.run(KeyTool.java:172) at sun.security.tools.KeyTool.main(KeyTool.java:166)
T ECHNICAL W HI T E P A P E R / 7 4
Detailed Example and Output of the vCloud Director HTTP Certicate $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -list storepass <keystore passwd> -v -keystore certicates.vmware
Alias name: http Creation date: Jan 18, 2011 Entry type: PrivateKeyEntry Certicate chain length: 3 Certificate[1]: Owner: CN=*.vcloud.vmware.com.us, OU=VMware Architects, O=VMware Inc, L=Hillview Avenue, C=US Issuer: CN=Entrust Certification Authority - L1C, OU=(c) 2009 Entrust, Inc., OU=www.entrust.net/rpa is incorporated by reference, O=Entrust, Inc., C=US Serial number: 3f865g44 Valid from: Tue Jan 18 13:11:31 EST 2011 until: Sat Jan 19 16:28:25 EST 2013 Certicate ngerprints: MD5: 80:04:E8:70:0E:F1:E8:8B:68:A7:7B:16:C7:69:60:FF SHA1: B4:C8:82:88:63:2C:E6:08:6C:23:7D:5C:53:4A:C0:54:16:0A:08:88 Signature algorithm name: SHA1withRSA Version: 3
Extensions:
T ECHNICAL W HI T E P A P E R / 7 5
#2: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentier [ 0000: 18 F6 DA D8 68 E0 A5 F7 E3 CA 81 C8 56 3C 42 47 ....h.......V<BG 0010: 8F E0 DB F7 .... ] ]
#3: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false AuthorityInfoAccess [ [ accessMethod: 1.3.6.1.5.5.7.48.1 accessLocation: URIName: http://ocsp.entrust.net] ]
#5: ObjectId: 2.5.29.32 Criticality=false CerticatePolicies [ [CertificatePolicyId: [1.2.840.113533.7.75.2] [PolicyQualierInfo: [ qualifierID: 1.3.6.1.5.5.7.2.1 qualifier: 0000: 16 1A 68 74 74 70 3A 2F 2F 77 77 77 2E 65 6E 74 ..http://www.ent 0010: 72 75 73 74 2E 6E 65 74 2F 72 70 61 rust.net/rpa ]] ] ]
T ECHNICAL W HI T E P A P E R / 7 6
#8: ObjectId: 2.5.29.35 Criticality=false AuthorityKeyIdentier [ KeyIdentier [ 0000: 1E F1 AB 89 06 F8 49 0F 01 33 77 EE 14 7A EE 19 ......I..3w..z.. 0010: 7C 93 28 4D ..(M ] ]
Certificate[2]: Owner: CN=Entrust Certification Authority - L1C, OU=(c) 2009 Entrust, Inc., OU=www.entrust.net/rpa is incorporated by reference, O=Entrust, Inc., C=US Issuer: CN=Entrust.net Certification Authority (2048), OU=(c) 1999 Entrust.net Limited, OU=www.entrust.net/ CPS_2048 incorp. by ref. (limits liab.), O=Entrust.net Serial number: 7245h1sx Valid from: Fri Dec 11 07:43:54 EST 2009 until: Wed Dec 11 08:13:54 EST 2019 Certicate ngerprints: MD5: 2F:B3:00:F2:FA:12:7B:BD:82:95:70:05:96:17:DB:BE SHA1: 61:43:AF:68:F7:B3:3A:47:94:04:74:98:8B:05:F7:B1:62:96:98:42 Signature algorithm name: SHA1withRSA Version: 3
T ECHNICAL W HI T E P A P E R / 7 7
Extensions:
#3: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentier [ 0000: 1E F1 AB 89 06 F8 49 0F 01 33 77 EE 14 7A EE 19 ......I..3w..z.. 0010: 7C 93 28 4D ..(M ] ]
#4: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false AuthorityInfoAccess [ [ accessMethod: 1.3.6.1.5.5.7.48.1 accessLocation: URIName: http://ocsp.entrust.net] ]
T ECHNICAL W HI T E P A P E R / 7 8
#6: ObjectId: 2.5.29.32 Criticality=false CerticatePolicies [ [CertificatePolicyId: [2.5.29.32.0] [PolicyQualierInfo: [ qualifierID: 1.3.6.1.5.5.7.2.1 qualifier: 0000: 16 1A 68 74 74 70 3A 2F 2F 77 77 77 2E 65 6E 74 ..http://www.ent 0010: 72 75 73 74 2E 6E 65 74 2F 72 70 61 rust.net/rpa
]] ] ]
#7: ObjectId: 2.5.29.35 Criticality=false AuthorityKeyIdentier [ KeyIdentier [ 0000: 55 E4 81 D1 11 80 BE D8 89 B9 08 A3 31 F9 A1 24 U...........1..$ 0010: 09 16 B9 70 ...p ] ]
Certificate[3]: Owner: CN=Entrust.net Certification Authority (2048), OU=(c) 1999 Entrust.net Limited, OU=www.entrust.net/ CPS_2048 incorp. by ref. (limits liab.), O=Entrust.net Issuer: CN=Entrust.net Certification Authority (2048), OU=(c) 1999 Entrust.net Limited, OU=www.entrust.net/ CPS_2048 incorp. by ref. (limits liab.), O=Entrust.net Serial number: 8392k3du Valid from: Sat Dec 25 04:50:51 EST 1999 until: Wed Jul 25 00:15:12 EST 2029 Certicate ngerprints: MD5: EE:29:31:BC:32:7E:9A:E6:E8:B5:F7:51:B4:34:71:90 SHA1: 50:30:06:09:1D:97:D4:F5:AE:39:F7:CB:E7:92:7D:7D:65:2D:34:31 Signature algorithm name: SHA1withRSA Version: 3
T ECHNICAL W HI T E P A P E R / 7 9
Extensions:
#3: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentier [ 0000: 55 E4 81 D1 11 80 BE D8 89 B9 08 A3 31 F9 A1 24 U...........1..$ 0010: 09 16 B9 70 ...p ] ]
******************************************* *******************************************
Detailed Example and Output of the vCloud Director CONSOLEPROXY Certicate $ /opt/vmware/cloud-director/jre/bin/keytool -storetype JCEKS -list storepass <keystore passwd> -v -keystore consoleproxy.vmware
T ECHNICAL W HI T E P A P E R / 8 0
Alias name: consoleproxy Creation date: Jan 18, 2011 Entry type: PrivateKeyEntry Certicate chain length: 3 Certificate[1]: Owner: CN=*.vcloud.vmware.com.us, OU=VMware Architects, O=VMware Inc, L=Hillview Avenue, C=US Issuer: CN=Entrust Certification Authority - L1C, OU=(c) 2009 Entrust, Inc., OU=www.entrust.net/rpa is incorporated by reference, O=Entrust, Inc., C=US Serial number: 3f865g44 Valid from: Tue Jan 18 13:11:31 EST 2011 until: Sat Jan 19 16:28:25 EST 2013 Certicate ngerprints: MD5: 80:04:E8:70:0E:F1:E8:8B:68:A7:7B:16:C7:69:60:FF SHA1: B4:C8:82:88:63:2C:E6:08:6C:23:7D:5C:53:4A:C0:54:16:0A:08:88 Signature algorithm name: SHA1withRSA Version: 3
Extensions:
#2: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentier [ 0000: 18 F6 DA D8 68 E0 A5 F7 E3 CA 81 C8 56 3C 42 47 ....h.......V<BG 0010: 8F E0 DB F7 .... ] ]
T ECHNICAL W HI T E P A P E R / 8 1
#3: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false AuthorityInfoAccess [ [ accessMethod: 1.3.6.1.5.5.7.48.1 accessLocation: URIName: http://ocsp.entrust.net] ]
#5: ObjectId: 2.5.29.32 Criticality=false CerticatePolicies [ [CertificatePolicyId: [1.2.840.113533.7.75.2] [PolicyQualierInfo: [ qualifierID: 1.3.6.1.5.5.7.2.1 qualifier: 0000: 16 1A 68 74 74 70 3A 2F 2F 77 77 77 2E 65 6E 74 ..http://www.ent 0010: 72 75 73 74 2E 6E 65 74 2F 72 70 61 rust.net/rpa
]] ] ]
T ECHNICAL W HI T E P A P E R / 8 2
#8: ObjectId: 2.5.29.35 Criticality=false AuthorityKeyIdentier [ KeyIdentier [ 0000: 1E F1 AB 89 06 F8 49 0F 01 33 77 EE 14 7A EE 19 ......I..3w..z.. 0010: 7C 93 28 4D ..(M ] ]
Certificate[2]: Owner: CN=Entrust Certification Authority - L1C, OU=(c) 2009 Entrust, Inc., OU=www.entrust.net/rpa is incorporated by reference, O=Entrust, Inc., C=US Issuer: CN=Entrust.net Certification Authority (2048), OU=(c) 1999 Entrust.net Limited, OU=www.entrust.net/ CPS_2048 incorp. by ref. (limits liab.), O=Entrust.net Serial number: 7245h1sx Valid from: Fri Dec 11 07:43:54 EST 2009 until: Wed Dec 11 08:13:54 EST 2019 Certicate ngerprints: MD5: 2F:B3:00:F2:FA:12:7B:BD:82:95:70:05:96:17:DB:BE SHA1: 61:43:AF:68:F7:B3:3A:47:94:04:74:98:8B:05:F7:B1:62:96:98:42 Signature algorithm name: SHA1withRSA Version: 3
Extensions:
T ECHNICAL W HI T E P A P E R / 8 3
#3: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentier [ 0000: 1E F1 AB 89 06 F8 49 0F 01 33 77 EE 14 7A EE 19 ......I..3w..z.. 0010: 7C 93 28 4D ..(M ] ]
#4: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false AuthorityInfoAccess [ [ accessMethod: 1.3.6.1.5.5.7.48.1 accessLocation: URIName: http://ocsp.entrust.net] ]
#6: ObjectId: 2.5.29.32 Criticality=false CerticatePolicies [ [CertificatePolicyId: [2.5.29.32.0] [PolicyQualierInfo: [ qualifierID: 1.3.6.1.5.5.7.2.1 qualifier: 0000: 16 1A 68 74 74 70 3A 2F 2F 77 77 77 2E 65 6E 74 ..http://www.ent 0010: 72 75 73 74 2E 6E 65 74 2F 72 70 61 rust.net/rpa
]] ] ]
T ECHNICAL W HI T E P A P E R / 8 4
#7: ObjectId: 2.5.29.35 Criticality=false AuthorityKeyIdentier [ KeyIdentier [ 0000: 55 E4 81 D1 11 80 BE D8 89 B9 08 A3 31 F9 A1 24 U...........1..$ 0010: 09 16 B9 70 ...p ] ]
Certificate[3]: Owner: CN=Entrust.net Certification Authority (2048), OU=(c) 1999 Entrust.net Limited, OU=www.entrust.net/ CPS_2048 incorp. by ref. (limits liab.), O=Entrust.net Issuer: CN=Entrust.net Certification Authority (2048), OU=(c) 1999 Entrust.net Limited, OU=www.entrust.net/ CPS_2048 incorp. by ref. (limits liab.), O=Entrust.net Serial number: 8392k3du Valid from: Sat Dec 25 04:50:51 EST 1999 until: Wed Jul 25 00:15:12 EST 2029 Certicate ngerprints: MD5: EE:29:31:BC:32:7E:9A:E6:E8:B5:F7:51:B4:34:71:90 SHA1: 50:30:06:09:1D:97:D4:F5:AE:39:F7:CB:E7:92:7D:7D:65:2D:34:31 Signature algorithm name: SHA1withRSA Version: 3
Extensions:
T ECHNICAL W HI T E P A P E R / 8 5
#3: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentier [ 0000: 55 E4 81 D1 11 80 BE D8 89 B9 08 A3 31 F9 A1 24 U...........1..$ 0010: 09 16 B9 70 ...p ] ]
T ECHNICAL W HI T E P A P E R / 8 6
T ECHNICAL W HI T E P A P E R / 8 7
It E m
Var I ab L E
Va LU E
UNItS
2 4 2.4 64
Calculating the total memory available is very straightforward; it is simply the total amount of RAM for the vSphere host. Total CPU resources are calculated using the following formula:
Nnodes
This represents the number of nodes in a cluster This represents the minimum number of redundant nodes This represents a targeted ratio of redundancy as indicated by a real number greater than one. This ratio (such as 1.10) indicates that there is a ten percent overhead committed to availability. For example, a 10 node provider vDC with a 1.10 redundancy ratio would require 11 nodes to deliver the appropriate capacity. Note that this level of redundancy may vary depending on the class of service offering being delivered on that provider vDC. Redundancy variables can be determined with the equation below.
Nredundant
Rredundancy,HA
T ECHNICAL W HI T E P A P E R / 8 8
( (
= Rredundancy
For example, the level of redundancy is calculated below for a cluster size of ten nodes containing two redundant nodes.
)( )
=
8+2 8
= 1.25 = Rredundancy
Once the ratio of redundancy is calculated, the number of units of consumption per provider vDC can be determined with the following equation: CPU resources per Cluster
Rredundancy, HA
PCPU, host
19.2GHz
NCPU, cluster =
8x19.2 1.25
122.88GHz
The number of memory units of consumption is calculated below. For our example where:
Nmem, host
64GB
8 x 64 = 409.6GB 1.25
So weve now established that our example provider virtual datacenter has 122.88GHz of available CPU and 409.6GB of available memory, taking a vSphere cluster redundancy of N+2 into account. Next well look at some guidance for capacity management as it applies to each of the consumption models.
T ECHNICAL W HI T E P A P E R / 8 9
Pay-As-You-Go Model When an organization virtual datacenter is created in the Pay-As-You-Go model, a resource pool is instantiated with expandable reservations. As such, the customer organization virtual datacenters contained on that provider virtual datacenter can grow to consume all of the available provider virtual datacenter resources. While this could be true in any vSphere environment, the added challenge in a vCloud is the use of reservations at the vApp level. When an organization virtual datacenter is created out of a provider virtual datacenter using the PayAs-You-Go consumption model, a %guarantee is configured for CPU and memory. This is applied to each vApp or virtual machine within a vApp. For example, if the service provider congures the organization virtual datacenter with a 50% guarantee for CPU and 75% guarantee for memory, then the customer creates a virtual machine consuming 1 vCPU of 1GHz and 1GB of memory, a reservation for that virtual machine will be set at 50% of 1GHz, or 0.5 GHZ and 75% of 1GB, or 0.75GB of memory. Since there is no way of knowing how a customer will dene their virtual machine templates in their private customer catalogs, coupled with the fact that organization virtual datacenters can expand on demand, VMware recommends the following: Calculate the total available CPU and memory resources (less an amount reserved for global catalog templates), adjusted by the cluster redundancy ratio, at the provider virtual datacenter level Establish a CPU and Memory %RESERVED threshold at the provider virtual datacenter level Establish the %RESERVED for the provider virtual datacenter at a number in the 60% range initially As the total amount of reserved CPU or reserved memory approaches the %RESERVED threshold, do not deploy new organization virtual datacenters in that provider virtual datacenter without adding additional resources. If the corresponding vSphere cluster has reached its maximum point of expansion, a new provider virtual datacenter should be deployed and any new organization virtual datacenter s should be assigned to the new provider virtual datacenter. In this way there is 40% of expansion capacity for the existing organization virtual datacenters in the case where the provider virtual datacenter has reached its maximum point of expansion. CPU and memory over-commitment can be applied, and if so the %RESERVED value should be set lower than if no over-commitment is applied due to the unpredictability of the virtual machine sizes being deployed (and hence reservations being established) Monitor the %RESERVED on a regular basis and adjust the value according to historical usage as well as project demand Allocation Model When an organization virtual datacenter is created in the Allocation Model, a non-expandable resource pool is instantiated with a %guaranteed value for CPU and memory that was specified. Using a %guaranteed value of 75%, this means if an organization virtual datacenter is created specifying 100GHz of CPU and 100GB of memory, a resource pool is created for that organization virtual datacenter with a reservation of 75GHz and limit of 100GHz for CPU and a reservation of 75GB with a limit of 100GB for memory. The additional 25%, in this example, is not guaranteed and can be accessed only if its available across the provider virtual datacenter. In other words, the 25% can be over-committed by the provider at the provider virtual datacenter level and therefore may not be available depending on how ALL of the organization virtual datacenters in that provider virtual datacenter are using it. At the virtual machine level, when a virtual machine is deployed, it is instantiated with no CPU reservation but with a memory reservation equal to the virtual machines memory allocation multiplied by the %guaranteed. Despite the fact that no CPU reservation is set at the virtual machine level, the total amount of CPU allocated across all virtual machines in that organization virtual datacenter is still subject to the overall CPU reservation of the organization virtual datacenter established by the %guarantee value.
T ECHNICAL W HI T E P A P E R / 9 0
Based on this use of reservations in the Allocation Model, VMware recommends the following: Calculate the total available CPU and memory resources (less an amount reserved for global catalog templates), adjusted by the cluster redundancy ratio, at the provider virtual datacenter level Determine how much resource, at the provider virtual datacenter level, you want to make available for expanding organization virtual datacenters that are deployed to that provider virtual datacenter Establish a CPU and Memory %RESERVED (guaranteed, not allocated) threshold at the provider virtual datacenter level based on the %guaranteed less the amount reserved for growth. The remaining unreserved resources are available to all organization virtual datacenters for bursting. As the total amount of reserved CPU or reserved memory approaches the %RESERVED threshold, do not deploy new organization virtual datacenters in that provider virtual datacenter without adding additional resources. If the corresponding vSphere cluster has reached its maximum point of expansion, a new provider virtual datacenter should be deployed and any new organization virtual datacenters should be assigned to the new provider virtual datacenter. This gives some predetermined amount of capacity available for expanding the existing organization virtual datacenters in the case where the provider virtual datacenter has reached its maximum point of expansion. CPU and memory over-commitment can be applied, but it should be based only on the amount of unreserved resources at the provider virtual datacenter level, allowing for over-committing the resources available for organization virtual datacenter bursting. Monitor the %RESERVED on a regular basis and adjust the value according to historical usage as well as project demand Reservation Model When an organization virtual datacenter is created in the Reservation Model, a non-expandable resource pool is instantiated with the reservation and limit values equivalent to the amount of resources allocated. This means if an organization virtual datacenter is created allocating 100GHz of CPU and 100GB of memory, a reservation pool is created for that organization virtual datacenter with a reservation and limit of 100GHz for CPU and a reservation and limit of 100GB for memory. At the virtual machine level, when a virtual machine is deployed, it is instantiated with no reservation or limit for either CPU or memory. Based on this use of reservations in the Reservation Model, VMware recommends the following: Calculate the total available CPU and memory resources (less an amount reserved for global catalog templates), adjusted by the cluster redundancy ratio, at the provider virtual datacenter level Determine how much resource, at the provider virtual datacenter level, you want to make available for expanding organization virtual datacenters that are deployed to that provider virtual datacenter Establish a CPU and Memory %RESERVED threshold at the provider virtual datacenter level equivalent to the capacity of the underlying vSphere cluster, taking into account HA redundancy As the total amount of reserved CPU or reserved memory approaches the %RESERVED threshold, do not deploy new organization virtual datacenters in that provider virtual datacenter without adding additional resources. If the corresponding vSphere cluster has reached its maximum point of expansion, a new provider virtual datacenter should be deployed and any new organization virtual datacenters should be assigned to the new provider virtual datacenter. In this way there is some predetermined amount of capacity available for expanding the existing organization virtual datacenters in the case where the provider virtual datacenter has reached its maximum point of expansion. No over-commitment can be applied to the provider virtual datacenter in the Reservation Model, due to the reservation being at the resource pool level Monitor the %RESERVED on a regular basis and adjust the value according to historical usage as well as project demand
T ECHNICAL W HI T E P A P E R / 9 1
Storage VMware vCloud Director uses a largest available capacity algorithm for deploying virtual machines to datastores. Storage capacity must be managed on both an individual datastore basis as well as in the aggregate for a provider virtual datacenter. In addition to considering VMware storage allocation best practices, manage capacity at the datastore level using the largest virtual machine storage conguration, in terms of units of consumption, offered in the service catalog when determining the amount of spare capacity to reserve. For example, if using 1 TB datastores (100 storage units of consumption based on a 10 GB unit of consumption) and the largest virtual machine storage configuration is 6 storage units of consumption (60 GB), then applying the VMware best practice of approximately 80% datastore utilization would imply managing to 82 storage units of consumption. This would result in 82% datastore utilization and reserve capacity equivalent to 3 of the largest virtual machines offered in the service catalog in terms of storage.
IP Addresses
Available IP addresses to be assigned in support of a dedicated external network for an organization, such as for Internet access or hardwarebased rewall rules Need to track IP addresses assigned to specific organizations to determine whats available for a shared external Organization network VLANs available for VLAN-backed pool assignment, if required Additional vCloud Director Network Isolation networks that can be assigned from the vNetwork Distributed Switch (only 1016 ephemeral ports per vNetwork Distributed Switch), if used
T ECHNICAL W HI T E P A P E R / 9 2
%RESERVED CPU
%RESERVED Memory CPU utilization Memory utilization Datastore utilization Transfer store utilization Network IP addresses available
provider vDC, organization vDC provider vDC, organization vDC provider vDC, organization vDC provider vDC vCloud vCloud
T ECHNICAL W HI T E P A P E R / 9 3
Attr I b U t E
M O N I tO r E D P E r
Network IP addresses consumed Network VLANs available Network ephemeral ports consumed
Table 21. Capacity Monitoring Metrics
Once thresholds have been exceeded, the group responsible for capacity management of the vCloud should be notied to add additional capacity. You should account for the time required to add the physical components necessary to increase the capacity of a provider virtual datacenter. A vCloud-aware capacity management tool should be deployed. Whichever tool is chosen, the capacity model can be used to forecast new provider virtual datacenter capacity utilization as well as ongoing capacity management of existing provider virtual datacenters. It should also account for expansion triggers based on provisioning timeframes. Once a provider virtual datacenter has had its total amount of available resources calculated, no adjustments to that provider virtual datacenter such as adding or removing hosts should be made without updating the calculated value. This model may be altered if long-term CPU and memory reservations are not at the levels that they were designed for. An increase in the resources allocated to an organization virtual datacenter can affect the remaining capacity of a full provider virtual datacenter. Monitoring full provider virtual datacenters should be done on a weekly basis. The resource consumption of virtual machines within an organization virtual datacenter should be reviewed for trends that indicate the resources purchased for that organization virtual datacenter are insufficient. vCenter CapacitIQ, while not vCloud Director aware, can be used to provide insight into provider virtual datacenter utilization and trends.
PUC
1 GHz 1 GB 10 GB
MUC DUC
Taking such an approach enables more efficient capacity management since the vApp component virtual machine resource allocations are predened in the Service Catalog resulting in vCloud Infrastructure resource consumption being more accurately predicted. Each organization will be provided with a nite quantity of resources (in the cases of the allocation and reservation consumption models) from one or more provider virtual datacenters in the form of organization virtual datacenters. This means that as the organization consumes the organization virtual datacenter resources, a tripping point will need to be dened to make sure steps are taken to expand the organization virtual datacenter. First, the resource consumption limits for an organizations organization virtual datacenters need to be dened, with these limits dening when action needs to be taken to remove the potential capacity issue.
Attr I b U t E Var I ab L E LImIt D E S cr I P t I O N
CCPULimit
80%
The limit for allocating CPU resources within the organization virtual datacenter before expansion is required. This value will vary depending on the consumption model being used. From an organization virtual datacenter perspective, reservation values should be considered equal to the amount of CPU allocated as reservation values are not available to the organization administrator. The limit for allocating memory resources within the organization virtual datacenter before expansion is required. This value will vary depending on the consumption model being used. From an organization virtual datacenter perspective, reservation values should be considered equal to the amount of memory allocated as reservation values are not available to the organization administrator.
CmemLimit
80%
T ECHNICAL W HI T E P A P E R / 9 5
The amount of CPU and Memory resources will vary dependant on the size of the organization virtual datacenter contracted for. The table below provides an example of the resources needed to calculate the organization virtual datacenters capacity.
It E m Var I ab L E Va LU E UNItS
Total organization virtual datacenter vCPU Units of Consumption organization virtual datacenter Memory Allocation in Units of Consumption
SorgvDC MorgvDC
50
GHz
64
GB
The number of capacity units available within this organization virtual datacenter is found by using the following equations. Determining organization virtual datacenter memory units of consumption
MUC, orgVDC =
( ( ( (
) )(
=
Based on the information from the above tables, the total memory unit of consumption for the organization virtual datacenter is calculated as shown below.
MUC, orgVDC =
0.8 x 64 1
51.2GB
This results in 51.2 memory units of consumption for the sample organization virtual datacenter. Determining organization virtual datacenter CPU units of consumption
PUC, orgVDC =
) )(
=
Based on the information from the above tables, the CPU units of consumption per organization virtual datacenter are calculated as shown below.
PUC, orgVDC =
50 x 0.8 1
40GHz
This results in 40 CPU units of consumption for this sample organization virtual datacenter.
T ECHNICAL W HI T E P A P E R / 9 6
This is the organization virtual datacenter in which the virtual machine resides This is the date the virtual machine is built. This is the number of CPU units of consumption allocated to the virtual machine. This is the number of memory units of consumption allocated to the virtual machine. This is the amount of storage (GB) allocated to the virtual machine.
Identier
Storage
GB
T ECHNICAL W HI T E P A P E R / 9 7
Determine Trending Variables With the information recorded as described above, it is possible to determine the rate of organization virtual datacenter consumption.
Var I ab L E Nam E D E S cr I P t I O N UNItS
T NcpuUC
Time
This is the time between points of observation. This is the total number of CPU units of consumption required for the forecasted virtual machines. This is the total number of memory units of consumption required for the forecasted virtual machines. This is the total amount of storage required for the forecasted virtual machines. The amount of time to procure additional organization virtual datacenter resources.
Weeks
NmemUC
NVGB Tpurchase
GB
Weeks
NcpuUC
NcpuUC T NmemUC T
NmemUC
NVGB
NVGB T
T ECHNICAL W HI T E P A P E R / 9 8
Determining the Trend It is important to understand that the rate of increase dictates how far in advance additional organization virtual datacenter resources need to be purchased. The following table presents a sample virtual machine forecast for a quarter along with sample time to purchase value.
Attr I b U t E Va LU E
NcpuUC NVGB
NmemUC Tpurchase
NcpuUC,cluster
NmemUC,cluster
Table 27. Sample Organization Virtual Datacenter Trending Information
In this example, NcpuUC,free and NmemUC,free represents the number of free resources within an organization virtual datacenter at which point additional organization virtual datacenter resources should be ordered. In order to determine the trigger point for ordering equation 6 should be used if no pipeline data exists. Determining Trigger Point For Ordering Capacity Using Trends
For storage, in this example, the trigger point is calculated at 720 GB:
Tpurchase
360 x 2 = 720GB
Determine the Automatic Point of Expansion Based on the example above, additional organization virtual datacenter resources would need to be ordered when the available units of CPU or Memory fall to 24GHz or 24GB respectively, or when storage capacity falls to 720 GB. The additional capacity needs to be on order when described or the capacity will not be available in time to meet demand. Currently there are no tools available to assist in organization virtual datacenter capacity management. However, it it is possible to develop scripts to gather pertinent information with languages such as PowerCLI.
T ECHNICAL W HI T E P A P E R / 9 9
VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 www.vmware.com
Copyright 2010 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. Item No: VMW_11Q1_WP_ArchitectingvCloud_p100_R2