High-Throughput and Predictable VM Scheduling for High-Density Workloads

  • In the increasingly competitive public-cloud marketplace, improving the efficiency of data centers is a major concern. One way to improve efficiency is to consolidate as many VMs onto as few physical cores as possible, provided that performance expectations are not violated. However, as a prerequisite for increased VM densities, the hypervisor’s VM scheduler must allocate processor time efficiently and in a timely fashion. As we show in this thesis, contemporary VM schedulers leave substantial room for improvements in both regards when facing challenging high-VM-density workloads that frequently trigger the VM scheduler. As root causes, we identify (i) high runtime overheads and (ii) unpredictable scheduling heuristics. To better support high VM densities, we propose Tableau, a VM scheduler that guarantees a minimum processor share and a maximum bound on scheduling delay for every VM in the system. Tableau combines a low-overhead, core-local, table-driven dispatcher with a fast on-demand table-generation procedure (triggered on VM creation/teardown) that employs scheduling techniques typically used in hard real-time systems. Further, we show that, owing to its focus on efficiency and scalability, Tableau provides comparable or better throughput than existing Xen schedulers in dedicated-core scenarios as are commonly employed in public clouds today. Tableau also extends this design by providing the ability to use idle cycles in the system to perform low-priority background work, without affecting the performance of primary VMs, a common requirement in public clouds. Finally, VM churn and workload variations in multi-tenant public clouds result in changing interference patterns at runtime, resulting in performance variation. In particular, variation in last-level cache (LLC) interference has been shown to have a significant impact on virtualized application performance in cloud environments. Tableau employs a novel technique for dealing with dynamically changing interference, which involves periodically regenerating tables with the same guarantees on utilization and scheduling latency for all VMs in the system, but having different LLC interference characteristics. We present two strategies to mitigate LLC interference: a randomized approach, and one that uses performance counters to detect VMs running cache-intensive workloads and selectively mitigate interference.
Author:Manohar Vanga
URN (permanent link):urn:nbn:de:hbz:386-kluedo-65518
Advisor:Björn Brandenburg
Document Type:Doctoral Thesis
Language of publication:English
Publication Date:2021/09/01
Year of Publication:2021
Publishing Institute:Technische Universität Kaiserslautern
Granting Institute:Technische Universität Kaiserslautern
Acceptance Date of the Thesis:2020/11/26
Date of the Publication (Server):2021/09/01
Number of page:XVII, 199
Faculties / Organisational entities:Fachbereich Informatik
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)