There are several use cases where it's beneficial to be able to automatically rearrange VM instances in a cloud into a different placement. These usually fall into one of two categories, which could be called defragmentation, and rebalancing:

  • Defragmentation - condense VMs onto fewer physical VM host machines (conceptually similar to VMware DPM). Example use cases:
    • Reduce power consumption
    • Increase CPU / RAM / I/O utilization (e.g. by over-committing and/or memory page sharing)
    • Evacuate physical servers
      • for repurposing
      • for maintenance (BIOS upgrades, re-cabling etc.)
      • to protect SLAs, e.g. when SMART monitors indicate potential imminent hardware failure, or when HVAC failures are likely to cause servers to shutdown due to over-heating
  • Rebalancing - spread the VMs evenly across as many physical VM host machines as possible (conceptually similar to VMware DRS). Example use cases:
    • Optimise workloads for performance, by reducing CPU / I/O hotspots
    • Maximise headroom on each physical machine
    • Reduce thermal hotspots in order to reduce power consumption

Custom rearrangements may be required according to other IT- or business-driven policies, e.g. only rearrange VM instances relating to a specific workload, in order to increase locality of reference, reduce latency, respect availability zones, or facilitate other out-of-band workflows.

It is clear from the above that VM placement policies are likely to vary greatly across clouds, and sometimes even within a single cloud. OpenStack Compute (nova) has fairly sophisticated scheduling capabilities which can be configured to implement some of the above policies on an incremental basis, i.e. every time a VM instance is started or migrated, the destination VM host can be chosen according to filters and weighted cost functions. However, this approach is somewhat limited, because the placement policies are implemented cloud-wide, and only considered one migration at a time.

Since OpenStack is rapidly evolving to the point where a VM's network and storage dependencies can be live migrated along with the workload in a near seamless fashion, it is advantageous to develop mechanisms for implementing finer-grained placement policies, where not only is VM rearrangement performed automatically, but the policies themselves can be varied dynamically over time as workload requirements change.

Developing algorithms to determine optimal placement is distinctly non-trivial. For example, the defragmentation scenario above is a complex variant of the bin packing problem, which is NP-hard. The following constraints add significant complexity to the problem:

  • A useful real world solution should take into account not only the RAM footprint of the VMs, but also CPU, disk, and network.

  • It also needs to ensure that SLAs are maintained whilst any rearrangement takes place.

  • If the cloud is sufficiently near capacity, it may not be possible to rearrange the VMs from their current placement to a more optimal placement without first shutting down some VMs, which could be prohibited by the SLAs.

  • Even if the arrangement is achievable purely via a sequence of live migrations, the algorithm must also be sensitive to the performance impact to running workloads when performing multiple live migrations, since live migrations require intensive bursts of network I/O in order to synchronize the VM's memory contents between the source and target hosts, followed by a momentary freezing of the VM as it flips from the source to the target. This trade-off between optimal resource utilization and service availability means that a sub-optimal final placement may be preferable to an optimal one.

  • In the case where the hypervisor is capable of sharing memory pages between VMs, the algorithm should try to place together VMs which are likely to share memory pages (e.g. VMs running the same OS platform, OS version, software libraries, or applications. A research paper published in 2011 demonstrated that VM packing which optimises placement in this fashion can be approximated in polytime, achieving 32% to 50% reduction in servers and a 25% to 57% reduction in memory footprint compared to sharing-oblivious algorithms.

As noted by the above paper, this area of computer science is still evolving. There is one constant however: any rearrangement solution must not only provide a final VM placement optimised according to the chosen constraints, but also a sequence of migrations to it from the current placement. There will often be multiple migration sequences reaching the optimised placement from the current one, and their efficiency can vary widely. In other words, there are two questions which need answering:

  1. Given a starting placement A, which is the best final placement B to head for?
  2. What's the best way to get from A to B?

The above considerations strongly suggest that the first question is much harder to answer than the second. I propose that by adopting a divide and conquer approach, solving the second may simplify the first. Decoupling the two should also provide a mechanism for comparatively evaluating the effectiveness of potential answers to the first. Another bonus of this decoupling is that it should be possible for the path-finding algorithm to also discover opportunities for parallelizing live migrations when walking the path, so that the target placement B can be reached more quickly.

Therefore, for this Hack Week project, I intend to design and implement an algorithm which reliably calculates a reasonably optimal sequence of VM migrations from a given initial placement to a given final placement. My goals are as follows:

  • If there is any migration path, the algorithm must find at least one.
  • It should find a path which is reasonably optimal with respect to the migration cost function. In prior work, I already tried Dijkstra's shortest path algorithm and demonstrated that an exhaustive search for the shortest path has intolerably high complexity.
  • The migration cost function should be pluggable. Initially it will simply consider the cost as proportional to the RAM footprint of the VM being migrated.
  • For now, use a smoke-and-mirrors in-memory model of the cloud's state. Later, the code can be ported to consume the nova API.
  • In light of the above, the implementation should be in Python.
  • Determination of which states are sane should be pluggable. For example, the algorithm should not hardcode any assumptions about whether physical servers can over-commit CPU or RAM.

Looking for hackers with the skills:

cloud virtualization orchestration openstack performance python

This project is part of:

Hack Week 10


  • about 9 years ago: aspiers liked this project.
  • about 9 years ago: aspiers started this project.
  • about 9 years ago: aspiers added keyword "orchestration" to this project.
  • about 9 years ago: aspiers added keyword "openstack" to this project.
  • about 9 years ago: aspiers added keyword "performance" to this project.
  • about 9 years ago: aspiers added keyword "python" to this project.
  • about 9 years ago: aspiers added keyword "cloud" to this project.
  • about 9 years ago: aspiers added keyword "virtualization" to this project.
  • about 9 years ago: aspiers originated this project.

  • Comments

    • aspiers
      about 9 years ago by aspiers | Reply

      I completed this project, and intend to publish the code and also blog about it. I uploaded a YouTube video demoing what the code can do.

    Similar Projects

    Multi-cloud Lightweight metadata CLI by seanmarlow

    Project Description

    The goal of this proj...

    OMEMO Hexchat plugin by dknorr

    [comment]: # (Please use the project descriptio...

    Rewrite libkdumpfile Python binding using CFFI by ptesarik

    Project Description

    The [libkdumpfile](ht...

    Script that loads dummy data into HANA database for testing purposes. by rangelino

    [comment]: # (Please use the project descriptio...

    Video For Linux 2 Camera monitor, frame capture, day/night exposure control and timelapse generator in python by dmair

    Use the V4L2 API in a PySide qt6.3 application ...

    web-based testing tracker for aquarist-labs's s3gw by jluis

    [comment]: # (Please use the project descriptio...