The idea here is to study and understand how ephemeral storage for containers works and investigate if local storage could be avoided at all and Ceph used instead. Could new storage driver be developed to support Ceph storage: https://github.com/containers/storage

The goal of this project is to understand requirements for container's ephemeral storage (performance, latency and etc) and maybe PoC on storage driver for Ceph.

Looking for hackers with the skills:

ceph k8s

This project is part of:

Hack Week 19

Activity

  • almost 5 years ago: SLindoMansilla liked this project.
  • almost 5 years ago: denisok added keyword "ceph" to this project.
  • almost 5 years ago: denisok added keyword "k8s" to this project.
  • almost 5 years ago: tbechtold liked this project.
  • almost 5 years ago: denisok started this project.
  • almost 5 years ago: STorresi liked this project.
  • almost 5 years ago: denisok liked this project.
  • almost 5 years ago: denisok originated this project.

  • Comments

    • denisok
      almost 5 years ago by denisok | Reply

      so after reading some info about different storage drivers and getting know some details of containers/storage, I would first try to just mount CephFS and point vfs storage driver to that mount and see what impact on perf would be and how would it work.

      Also I plan to search for some benchmarks that I could use to test solution.

    • denisok
      almost 5 years ago by denisok | Reply

      I have created virtual 3 nodes Ceph cluster and tried CephFS as a mount point for the graphroot for the storage driver. It was working fine but quite slower than local (ssd) storage:

      dkondratenko@f66:~/Workspace> time podman pull opensuse/toolbox
      Trying to pull docker.io/opensuse/toolbox...
        denied: requested access to the resource is denied
      Trying to pull quay.io/opensuse/toolbox...
        unauthorized: access to the requested resource is not authorized
      Trying to pull registry.opensuse.org/opensuse/toolbox...
      Getting image source signatures
      Copying blob 6b18d304886e done  
      Copying blob 32329fe4ff6e done  
      Copying config d5ebabf36c done  
      Writing manifest to image destination
      Storing signatures
      d5ebabf36cd400a32148eda6b97ec2989c307a4d4ebfdb03822ec6e35b0d7f36
      
      real    3m25.789s
      user    0m35.065s
      sys     0m9.852s
      
      real    0m28.302s
      user    0m30.095s
      sys     0m6.377s
      
      
      dkondratenko@f66:~/Workspace> time podman pull opensuse/busybox
      Trying to pull docker.io/opensuse/busybox...
        denied: requested access to the resource is denied
      Trying to pull quay.io/opensuse/busybox...
        unauthorized: access to the requested resource is not authorized
      Trying to pull registry.opensuse.org/opensuse/busybox...
      Getting image source signatures
      Copying blob ce98310bda16 done  
      Copying config 52423b05e2 done  
      Writing manifest to image destination
      Storing signatures
      52423b05e29476a7619641e48c4a5c960eb18b7edb5264c433f7af0bfeebcad0
      
      real    0m25.236s
      user    0m1.147s
      sys     0m0.593s
      
      real    0m3.696s
      user    0m0.907s
      sys     0m0.327s
      
      
      time podman run -it opensuse/toolbox /bin/sh -c 'zypper in -y bonnie; bonnie'
      
      Bonnie: Warning: You have 15927MiB RAM, but you test with only 100MiB datasize!    
      Bonnie:          This might yield unrealistically good results,
      Bonnie:          for reading and seeking and writing.
      Bonnie 1.6: File './Bonnie.201', size: 104857600, volumes: 1
      Writing       25MiB with putc()...         done:  64357 kiB/s  74.1 %CPU
      Rewriting    100MiB...                     done:2696297 kiB/s  41.2 %CPU
      Writing      100MiB intelligently...       done: 558372 kiB/s  31.7 %CPU
      Reading       12MiB with getc()...         done:  86603 kiB/s 100.0 %CPU
      Reading      100MiB intelligently...       done:5248052 kiB/s  99.5 %CPU
      Seeker 2 1 3 4 7 6 8 5 9 10 12 11 15 16 14 13 start 'em ................
      Estimated seek time: raw 0.010ms, eff 0.009ms
      
                  ----Sequential Output (nosync)----- ----Sequential Input--- --Rnd Seek-
                  -Per Char-- --Block---- -Rewrite--- -Per Char-- --Block---- --04k (16)-
      Machine     MiB  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU   /sec %CPU
      2b337e6 1*  100  64357 74.1 558372 31.72696297 41.2  86603  1005248052 99.5  96801  617
      
      real    0m18.456s
      user    0m0.127s
      sys     0m0.056s
      
      
      Bonnie: Warning: You have 15927MiB RAM, but you test with only 100MiB datasize!
      Bonnie:          This might yield unrealistically good results,
      Bonnie:          for reading and seeking and writing.
      Bonnie 1.6: File './Bonnie.202', size: 104857600, volumes: 1
      Writing       25MiB with putc()...         done:  37109 kiB/s  46.4 %CPU
      Rewriting    100MiB...                     done:1687791 kiB/s  26.0 %CPU
      Writing      100MiB intelligently...       done: 197676 kiB/s  11.3 %CPU
      Reading       12MiB with getc()...         done:  87179 kiB/s 100.0 %CPU
      Reading      100MiB intelligently...       done:6430948 kiB/s  99.4 %CPU
      Seeker 1 3 2 4 6 5 8 7 9 10 12 11 14 13 15 16 start 'em ................
      Estimated seek time: raw 1.088ms, eff 1.087ms
      
                  ----Sequential Output (nosync)----- ----Sequential Input--- --Rnd Seek-
                  -Per Char-- --Block---- -Rewrite--- -Per Char-- --Block---- --04k (16)-
      Machine     MiB  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU  kiB/s %CPU   /sec %CPU
      b653a06 1*  100  37109 46.4 197676 11.31687791 26.0  87179  1006430948 99.4    919  5.3
      
      real    0m51.470s
      user    0m0.113s
      sys     0m0.094s
      

    • denisok
      almost 5 years ago by denisok | Reply

      So there are two main investigation point here

      • tar to CephFS is quite slow
      • operations in container image are two slow

      As for unpacking to CephFS, first of all it is slow because Ceph run in VMs and no way could provide a performance.

      Second, in case of CephFS unpacking of same image layer could be done only once, so the same image layer wouldn't need unpacking and could be just mounted. Container run from CephFS doesn't have that dramatic performance impact:

      dkondratenko@f66:~/Workspace> time podman run -it opensuse/toolbox /bin/sh -c 'echo Hello world!'
      Hello world!
      
      real    0m0.448s
      user    0m0.097s
      sys     0m0.088s
      
      real    0m1.470s
      user    0m0.117s
      sys     0m0.079s
      

      Next point is a container layer that shouldn't be persistent but eats quite a space and performance, as in bonnie example, to intall it and for tests. That layer could be mounted in local FS and deleted after each run.

      Also CephFS performance could be improved by caching, images are persistent, so technics similar to ceph-immmutable-object-cache could be used.

      I have tried to pull some images to local storage and instead of some layers just mount CephFS layers, but such container failed to start. I track it down to simple container that tries to start with pre-mounted fuse-overlayfs mount that has both CephFS layers and local ones, but such container doesn't see right mount point. I wasn't able to track it further.

      Next steps:

      • try some hw ceph cluster and see how big perf differences
      • find out if CephFS could be mounted together with local layers as overlayfs to the namspace
      • implement storage driver that could provide hybrid approach, CephFS for the common layers and local layer container layer
      • find out if implementation of on-prem registry that would be a proxy and unpack layers to CephFS makes sense

    Similar Projects

    This project is one of its kind!