There are several use cases where it's beneficial to be able to automatically rearrange VM instances in a cloud into a different placement. These usually fall into one of two categories, which could be called defragmentation, and rebalancing:
- Defragmentation - condense VMs onto fewer physical VM host
machines (conceptually similar to VMware
Example use cases:
- Reduce power consumption
- Increase CPU / RAM / I/O utilization (e.g. by over-committing and/or memory page sharing)
- Evacuate physical servers
- Rebalancing - spread the VMs evenly across as many physical VM host
machines as possible (conceptually similar to
Example use cases:
- Optimise workloads for performance, by reducing CPU / I/O hotspots
- Maximise headroom on each physical machine
- Reduce thermal hotspots in order to reduce power consumption
Custom rearrangements may be required according to other IT- or business-driven policies, e.g. only rearrange VM instances relating to a specific workload, in order to increase locality of reference, reduce latency, respect availability zones, or facilitate other out-of-band workflows.
It is clear from the above that VM placement policies are likely to
vary greatly across clouds, and sometimes even within a single cloud.
nova) has fairly sophisticated scheduling capabilities
which can be configured to implement some of the above policies on an
incremental basis, i.e. every time a VM instance is started or
migrated, the destination VM host can be chosen according to filters
and weighted cost functions. However, this approach is somewhat
limited, because the placement policies are implemented cloud-wide,
and only considered one migration at a time.
Since OpenStack is rapidly evolving to the point where a VM's network and storage dependencies can be live migrated along with the workload in a near seamless fashion, it is advantageous to develop mechanisms for implementing finer-grained placement policies, where not only is VM rearrangement performed automatically, but the policies themselves can be varied dynamically over time as workload requirements change.
Developing algorithms to determine optimal placement is distinctly non-trivial. For example, the defragmentation scenario above is a complex variant of the bin packing problem, which is NP-hard. The following constraints add significant complexity to the problem:
A useful real world solution should take into account not only the RAM footprint of the VMs, but also CPU, disk, and network.
It also needs to ensure that SLAs are maintained whilst any rearrangement takes place.
If the cloud is sufficiently near capacity, it may not be possible to rearrange the VMs from their current placement to a more optimal placement without first shutting down some VMs, which could be prohibited by the SLAs.
Even if the arrangement is achievable purely via a sequence of live migrations, the algorithm must also be sensitive to the performance impact to running workloads when performing multiple live migrations, since live migrations require intensive bursts of network I/O in order to synchronize the VM's memory contents between the source and target hosts, followed by a momentary freezing of the VM as it flips from the source to the target. This trade-off between optimal resource utilization and service availability means that a sub-optimal final placement may be preferable to an optimal one.
In the case where the hypervisor is capable of sharing memory pages between VMs, the algorithm should try to place together VMs which are likely to share memory pages (e.g. VMs running the same OS platform, OS version, software libraries, or applications. A research paper published in 2011 demonstrated that VM packing which optimises placement in this fashion can be approximated in polytime, achieving 32% to 50% reduction in servers and a 25% to 57% reduction in memory footprint compared to sharing-oblivious algorithms.
As noted by the above paper, this area of computer science is still evolving. There is one constant however: any rearrangement solution must not only provide a final VM placement optimised according to the chosen constraints, but also a sequence of migrations to it from the current placement. There will often be multiple migration sequences reaching the optimised placement from the current one, and their efficiency can vary widely. In other words, there are two questions which need answering:
- Given a starting placement A, which is the best final placement B to head for?
- What's the best way to get from A to B?
The above considerations strongly suggest that the first question is much harder to answer than the second. I propose that by adopting a divide and conquer approach, solving the second may simplify the first. Decoupling the two should also provide a mechanism for comparatively evaluating the effectiveness of potential answers to the first. Another bonus of this decoupling is that it should be possible for the path-finding algorithm to also discover opportunities for parallelizing live migrations when walking the path, so that the target placement B can be reached more quickly.
Therefore, for this Hack Week project, I intend to design and implement an algorithm which reliably calculates a reasonably optimal sequence of VM migrations from a given initial placement to a given final placement. My goals are as follows:
- If there is any migration path, the algorithm must find at least one.
- It should find a path which is reasonably optimal with respect to the migration cost function. In prior work, I already tried Dijkstra's shortest path algorithm and demonstrated that an exhaustive search for the shortest path has intolerably high complexity.
- The migration cost function should be pluggable. Initially it will simply consider the cost as proportional to the RAM footprint of the VM being migrated.
- For now, use a smoke-and-mirrors in-memory model of the cloud's state.
Later, the code can be ported to consume the
- In light of the above, the implementation should be in Python.
- Determination of which states are sane should be pluggable. For example, the algorithm should not hardcode any assumptions about whether physical servers can over-commit CPU or RAM.
This project is part of:
Hack Week 10
Video For Linux 2 Camera monitor, frame capture, day/night exposure control and timelapse generator in python by dmair
Use the V4L2 API in a PySide qt6.3 application ...
[comment]: # (Please use the project descriptio...
Join the Gitter channel! [https://gitter.im/uy...