The idea here is to study and understand how ephemeral storage for containers works and investigate if local storage could be avoided at all and Ceph used instead. Could new storage driver be developed to support Ceph storage: https://github.com/containers/storage
The goal of this project is to understand requirements for container's ephemeral storage (performance, latency and etc) and maybe PoC on storage driver for Ceph.
This project is part of:
Hack Week 19
Activity
Comments
-
almost 5 years ago by denisok | Reply
so after reading some info about different storage drivers and getting know some details of containers/storage, I would first try to just mount CephFS and point vfs storage driver to that mount and see what impact on perf would be and how would it work.
Also I plan to search for some benchmarks that I could use to test solution.
-
almost 5 years ago by denisok | Reply
I have created virtual 3 nodes Ceph cluster and tried CephFS as a mount point for the graphroot for the storage driver. It was working fine but quite slower than local (ssd) storage:
dkondratenko@f66:~/Workspace> time podman pull opensuse/toolbox Trying to pull docker.io/opensuse/toolbox... denied: requested access to the resource is denied Trying to pull quay.io/opensuse/toolbox... unauthorized: access to the requested resource is not authorized Trying to pull registry.opensuse.org/opensuse/toolbox... Getting image source signatures Copying blob 6b18d304886e done Copying blob 32329fe4ff6e done Copying config d5ebabf36c done Writing manifest to image destination Storing signatures d5ebabf36cd400a32148eda6b97ec2989c307a4d4ebfdb03822ec6e35b0d7f36 real 3m25.789s user 0m35.065s sys 0m9.852s real 0m28.302s user 0m30.095s sys 0m6.377s dkondratenko@f66:~/Workspace> time podman pull opensuse/busybox Trying to pull docker.io/opensuse/busybox... denied: requested access to the resource is denied Trying to pull quay.io/opensuse/busybox... unauthorized: access to the requested resource is not authorized Trying to pull registry.opensuse.org/opensuse/busybox... Getting image source signatures Copying blob ce98310bda16 done Copying config 52423b05e2 done Writing manifest to image destination Storing signatures 52423b05e29476a7619641e48c4a5c960eb18b7edb5264c433f7af0bfeebcad0 real 0m25.236s user 0m1.147s sys 0m0.593s real 0m3.696s user 0m0.907s sys 0m0.327s time podman run -it opensuse/toolbox /bin/sh -c 'zypper in -y bonnie; bonnie' Bonnie: Warning: You have 15927MiB RAM, but you test with only 100MiB datasize! Bonnie: This might yield unrealistically good results, Bonnie: for reading and seeking and writing. Bonnie 1.6: File './Bonnie.201', size: 104857600, volumes: 1 Writing 25MiB with putc()... done: 64357 kiB/s 74.1 %CPU Rewriting 100MiB... done:2696297 kiB/s 41.2 %CPU Writing 100MiB intelligently... done: 558372 kiB/s 31.7 %CPU Reading 12MiB with getc()... done: 86603 kiB/s 100.0 %CPU Reading 100MiB intelligently... done:5248052 kiB/s 99.5 %CPU Seeker 2 1 3 4 7 6 8 5 9 10 12 11 15 16 14 13 start 'em ................ Estimated seek time: raw 0.010ms, eff 0.009ms ----Sequential Output (nosync)----- ----Sequential Input--- --Rnd Seek- -Per Char-- --Block---- -Rewrite--- -Per Char-- --Block---- --04k (16)- Machine MiB kiB/s %CPU kiB/s %CPU kiB/s %CPU kiB/s %CPU kiB/s %CPU /sec %CPU 2b337e6 1* 100 64357 74.1 558372 31.72696297 41.2 86603 1005248052 99.5 96801 617 real 0m18.456s user 0m0.127s sys 0m0.056s Bonnie: Warning: You have 15927MiB RAM, but you test with only 100MiB datasize! Bonnie: This might yield unrealistically good results, Bonnie: for reading and seeking and writing. Bonnie 1.6: File './Bonnie.202', size: 104857600, volumes: 1 Writing 25MiB with putc()... done: 37109 kiB/s 46.4 %CPU Rewriting 100MiB... done:1687791 kiB/s 26.0 %CPU Writing 100MiB intelligently... done: 197676 kiB/s 11.3 %CPU Reading 12MiB with getc()... done: 87179 kiB/s 100.0 %CPU Reading 100MiB intelligently... done:6430948 kiB/s 99.4 %CPU Seeker 1 3 2 4 6 5 8 7 9 10 12 11 14 13 15 16 start 'em ................ Estimated seek time: raw 1.088ms, eff 1.087ms ----Sequential Output (nosync)----- ----Sequential Input--- --Rnd Seek- -Per Char-- --Block---- -Rewrite--- -Per Char-- --Block---- --04k (16)- Machine MiB kiB/s %CPU kiB/s %CPU kiB/s %CPU kiB/s %CPU kiB/s %CPU /sec %CPU b653a06 1* 100 37109 46.4 197676 11.31687791 26.0 87179 1006430948 99.4 919 5.3 real 0m51.470s user 0m0.113s sys 0m0.094s
-
almost 5 years ago by denisok | Reply
So there are two main investigation point here
- tar to CephFS is quite slow
- operations in container image are two slow
As for unpacking to CephFS, first of all it is slow because Ceph run in VMs and no way could provide a performance.
Second, in case of CephFS unpacking of same image layer could be done only once, so the same image layer wouldn't need unpacking and could be just mounted. Container run from CephFS doesn't have that dramatic performance impact:
dkondratenko@f66:~/Workspace> time podman run -it opensuse/toolbox /bin/sh -c 'echo Hello world!' Hello world! real 0m0.448s user 0m0.097s sys 0m0.088s real 0m1.470s user 0m0.117s sys 0m0.079s
Next point is a container layer that shouldn't be persistent but eats quite a space and performance, as in bonnie example, to intall it and for tests. That layer could be mounted in local FS and deleted after each run.
Also CephFS performance could be improved by caching, images are persistent, so technics similar to ceph-immmutable-object-cache could be used.
I have tried to pull some images to local storage and instead of some layers just mount CephFS layers, but such container failed to start. I track it down to simple container that tries to start with pre-mounted fuse-overlayfs mount that has both CephFS layers and local ones, but such container doesn't see right mount point. I wasn't able to track it further.
Next steps:
- try some hw ceph cluster and see how big perf differences
- find out if CephFS could be mounted together with local layers as overlayfs to the namspace
- implement storage driver that could provide hybrid approach, CephFS for the common layers and local layer container layer
- find out if implementation of on-prem registry that would be a proxy and unpack layers to CephFS makes sense
Similar Projects
This project is one of its kind!