Nvidia has a way to support GPU on Kubernetes via docker and crio, but so far they don't support SLES and CaaSP, this is the goal of this project.
No Hackers yet
Looking for hackers with the skills:
Nothing? Add some keywords!
This project is part of:
Hack Week 19
Activity
Comments
-
almost 5 years ago by drdavis | Reply
https://github.com/NVIDIA/gpu-operator
-
almost 5 years ago by huizhizhao | Reply
Thank you Darren, I didn't take a look into this carefully, but tried with the steps and I failed to enable GPU on my environment. Will go deep on this.
-
-
almost 5 years ago by huizhizhao | Reply
CaaSP: GPU support on CaaSP: 1. What I think about enable GPU on CaaSP: 1). Make Nvidia official support GPU on CaaSP. 2). Have an installation guide for customers about how to enable GPU on CaaSP.
- There's an email "[caasp-internal] Sharing a GPU", Roger mentioned that "We are planning on integrating Nvidia’s vGPU driver, after initial release. But it must be licensed from them. " But what I heard it's already been 2 years since we start with this. Also, this about enable vGPU on SLES, not CaaSP.
Steps to enable Nvidia GPU on CaaSP: 1. Hardware prerequisites: Ensure your GPU is CUDA-capable: If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable. 2. Install Nvidia driver on the GPU nodes. (https://www.nvidia.com/Download/index.aspx) 3. Install CUDA on the GPU nodes. (https://developer.nvidia.com/cuda-downloads?targetos=Linux&targetarch=x8664&targetdistro=SLES&targetversion=15&targettype=rpmlocal) 4. Install nvidia-container-runtime-hook on the GPU nodes: 4.1 Source code can be found at: https://github.com/Hui-Zhi/libnvidia-container/tree/sles15-sp1-support 4.2 Run "make sle15.1" // 4.1 and 4.2 is not necessary need to be done on GPU nodes, and it needs docker. 4.3 Install dist/sle15.1/x86_64/libnvidia-container*.rpm on GPU nodes. 5. Label GPU nodes, example: nvidia-smi -q | grep 'Product Name' kubectl label nodes node1 nvidia.com/brand=Quadro // Needs to be the product of yours. kubectl label nodes node1 hardware-type=NVIDIAGPU
If you follow the steps above, you have enabled GPU on your CaaSP environment, but please verify it. Example: verify by vector-add: nvidia-test.yaml apiVersion: v1 kind: Pod metadata: name: cuda-vector-add namespace: default spec: restartPolicy: OnFailure containers: - name: cuda-vector-add image: "docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1" env: - name: NVIDIAVISIBLEDEVICES value: all - name: NVIDIADRIVERCAPABILITIES value: "compute,utility" - name: NVIDIAREQUIRECUDA
value: "cuda>=5.0" securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] seLinuxOptions: type: nvidiacontainert resources:-
almost 5 years ago by huizhizhao | Reply
Please ignore this, a better format version can be found below.
-
almost 5 years ago by huizhizhao | Reply
Please ignore the comment above, it has bad format, below is the version with better format.
Steps to enable Nvidia GPU on CaaSP:
- Hardware prerequisites: Ensure your GPU is CUDA-capable: If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable.
- Install Nvidia driver on the GPU nodes. (https://www.nvidia.com/Download/index.aspx)
- Install CUDA on the GPU nodes. (https://developer.nvidia.com/cuda-downloads?targetos=Linux&targetarch=x8664&targetdistro=SLES&targetversion=15&targettype=rpmlocal)
- Install nvidia-container-runtime-hook on the GPU nodes: 4.1 Source code can be found at: https://github.com/Hui-Zhi/libnvidia-container/tree/sles15-sp1-support 4.2 Run "make sle15.1" // 4.1 and 4.2 is not necessary need to be done on GPU nodes, and it needs docker. 4.3 Install dist/sle15.1/x86_64/libnvidia-container*.rpm on GPU nodes.
- Label GPU nodes, example: nvidia-smi -q | grep 'Product Name' kubectl label nodes node1 nvidia.com/brand=Quadro // Needs to be the product of yours. kubectl label nodes node1 hardware-type=NVIDIAGPU
If you follow the steps above, you have enabled GPU on your CaaSP environment, but please verify it. Example: verify by vector-add:
nvidia-test.yaml
yaml apiVersion: v1 kind: Pod metadata: name: cuda-vector-add namespace: default spec: restartPolicy: OnFailure containers: - name: cuda-vector-add image: "docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1" env: - name: NVIDIA_VISIBLE_DEVICES value: all - name: NVIDIA_DRIVER_CAPABILITIES value: "compute,utility" - name: NVIDIA_REQUIRE_CUDA value: "cuda>=5.0" securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] seLinuxOptions: type: nvidia_container_t resources: limits: nvidia.com/gpu: 1
Then run
kubectl create -f nvidia-test.yaml
Or run
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/examples/workloads/deployment.yml
Our Comment tool doesn't deal with yaml format well, this is why the yaml format still looks bad.
-
almost 5 years ago by huizhizhao | Reply
Before the verification, please install nvidia device plugin:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml
Similar Projects
This project is one of its kind!