SUSE Hack Week: Nvidia GPU support for CaaSP

# A First Level Header
## A Second Level Header

Use one asterisk to *emphasize*

Use two asterisks for **strong emphasis**

- Use hyphens
- for unordereed
- lists

This is an [link to example.com](http://example.com/)

This is an image ![an openSUSE geeko icon](https://en.opensuse.org/images/d/d0/Icon-distribution.png)

This is a user link @hans

This is a project link hw#some-cool-title

More Complex Markdown Help

Formatting Help

# A First Level Header
## A Second Level Header

Use one asterisk to *emphasize*

Use two asterisks for **strong emphasis**

- Use hyphens
- for unordereed
- lists

This is an [link to example.com](http://example.com/)

This is an image ![an openSUSE geeko icon](https://en.opensuse.org/images/d/d0/Icon-distribution.png)

This is a user link @hans

This is a project link hw#some-cool-title

More Complex Markdown Help

Formatting Help

Edit
Preview

CaaSP:
  GPU support on CaaSP:
   1. What I think about enable GPU on CaaSP:
     1). Make Nvidia official support GPU on CaaSP.
     2). Have an installation guide for customers about how to enable GPU on CaaSP.

4. There's an email "[caasp-internal] Sharing a GPU", Roger mentioned that "We are planning on integrating Nvidia’s vGPU driver, after initial release. But it must be licensed from them. " But what I heard it's already been 2 years since we start with this. Also, this about enable vGPU on SLES, not CaaSP.

Steps to enable Nvidia GPU on CaaSP:
  1. Hardware prerequisites: Ensure your GPU is CUDA-capable: If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable.
  2. Install Nvidia driver on the GPU nodes. (https://www.nvidia.com/Download/index.aspx)
  3. Install CUDA on the GPU nodes. (https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=SLES&target_version=15&target_type=rpmlocal)
  4. Install nvidia-container-runtime-hook on the GPU nodes:
     4.1 Source code can be found at: https://github.com/Hui-Zhi/libnvidia-container/tree/sles15-sp1-support
     4.2 Run "make sle15.1"  // 4.1 and 4.2 is not necessary need to be done on GPU nodes, and it needs docker.
     4.3 Install dist/sle15.1/x86_64/libnvidia-container*.rpm on GPU nodes.
  5. Label GPU nodes, example:
       nvidia-smi -q | grep 'Product Name'
       kubectl label nodes node1 nvidia.com/brand=Quadro // Needs to be the product of yours.
       kubectl label nodes node1 hardware-type=NVIDIAGPU

If you follow the steps above, you have enabled GPU on your CaaSP environment, but please verify it. Example:
 verify by vector-add:
nvidia-test.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
  namespace: default
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      image: "docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1"
      env:
        - name: NVIDIA_VISIBLE_DEVICES
          value: all
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: "compute,utility"
        - name: NVIDIA_REQUIRE_CUDA  
          value: "cuda>=5.0"
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop: ["ALL"]
        seLinuxOptions:
          type: nvidia_container_t
      resources:

CaaSP: GPU support on CaaSP: 1. What I think about enable GPU on CaaSP: 1). Make Nvidia official support GPU on CaaSP. 2). Have an installation guide for customers about how to enable GPU on CaaSP.

There's an email "[caasp-internal] Sharing a GPU", Roger mentioned that "We are planning on integrating Nvidia’s vGPU driver, after initial release. But it must be licensed from them. " But what I heard it's already been 2 years since we start with this. Also, this about enable vGPU on SLES, not CaaSP.

Steps to enable Nvidia GPU on CaaSP: 1. Hardware prerequisites: Ensure your GPU is CUDA-capable: If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable. 2. Install Nvidia driver on the GPU nodes. (https://www.nvidia.com/Download/index.aspx) 3. Install CUDA on the GPU nodes. (https://developer.nvidia.com/cuda-downloads?targetos=Linux&targetarch=x8664&targetdistro=SLES&targetversion=15&targettype=rpmlocal) 4. Install nvidia-container-runtime-hook on the GPU nodes: 4.1 Source code can be found at: https://github.com/Hui-Zhi/libnvidia-container/tree/sles15-sp1-support 4.2 Run "make sle15.1" // 4.1 and 4.2 is not necessary need to be done on GPU nodes, and it needs docker. 4.3 Install dist/sle15.1/x86_64/libnvidia-container*.rpm on GPU nodes. 5. Label GPU nodes, example: nvidia-smi -q | grep 'Product Name' kubectl label nodes node1 nvidia.com/brand=Quadro // Needs to be the product of yours. kubectl label nodes node1 hardware-type=NVIDIAGPU

If you follow the steps above, you have enabled GPU on your CaaSP environment, but please verify it. Example: verify by vector-add: nvidia-test.yaml apiVersion: v1 kind: Pod metadata: name: cuda-vector-add namespace: default spec: restartPolicy: OnFailure containers: - name: cuda-vector-add image: "docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1" env: - name: NVIDIAVISIBLEDEVICES value: all - name: NVIDIADRIVERCAPABILITIES value: "compute,utility" - name: NVIDIAREQUIRECUDA
value: "cuda>=5.0" securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] seLinuxOptions: type: nvidiacontainert resources:

# A First Level Header
## A Second Level Header

Use one asterisk to *emphasize*

Use two asterisks for **strong emphasis**

- Use hyphens
- for unordereed
- lists

This is an [link to example.com](http://example.com/)

This is an image ![an openSUSE geeko icon](https://en.opensuse.org/images/d/d0/Icon-distribution.png)

This is a user link @hans

This is a project link hw#some-cool-title

More Complex Markdown Help

Formatting Help

# A First Level Header
## A Second Level Header

Use one asterisk to *emphasize*

Use two asterisks for **strong emphasis**

- Use hyphens
- for unordereed
- lists

This is an [link to example.com](http://example.com/)

This is an image ![an openSUSE geeko icon](https://en.opensuse.org/images/d/d0/Icon-distribution.png)

This is a user link @hans

This is a project link hw#some-cool-title

More Complex Markdown Help

Formatting Help

Edit
Preview

### Please ignore the comment above, it has bad format, below is the version with better format.

### Steps to enable Nvidia GPU on CaaSP:
  1. Hardware prerequisites: Ensure your GPU is CUDA-capable: If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable.
  2. Install Nvidia driver on the GPU nodes. (https://www.nvidia.com/Download/index.aspx)
  3. Install CUDA on the GPU nodes. (https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=SLES&target_version=15&target_type=rpmlocal)
  4. Install nvidia-container-runtime-hook on the GPU nodes:
     4.1 Source code can be found at: https://github.com/Hui-Zhi/libnvidia-container/tree/sles15-sp1-support
     4.2 Run "make sle15.1"  // 4.1 and 4.2 is not necessary need to be done on GPU nodes, and it needs docker.
     4.3 Install dist/sle15.1/x86_64/libnvidia-container*.rpm on GPU nodes.
  5. Label GPU nodes, example:
       nvidia-smi -q | grep 'Product Name'
       kubectl label nodes node1 nvidia.com/brand=Quadro // Needs to be the product of yours.
       kubectl label nodes node1 hardware-type=NVIDIAGPU

If you follow the steps above, you have enabled GPU on your CaaSP environment, but please verify it. Example:
 verify by vector-add:
```nvidia-test.yaml```
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
  namespace: default
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      image: "docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1"
      env:
        - name: NVIDIA_VISIBLE_DEVICES
          value: all
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: "compute,utility"
        - name: NVIDIA_REQUIRE_CUDA  
          value: "cuda>=5.0"
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop: ["ALL"]
        seLinuxOptions:
          type: nvidia_container_t
      resources:
        limits:
          nvidia.com/gpu: 1
```

Then run `kubectl create -f nvidia-test.yaml`

Or  run `kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/examples/workloads/deployment.yml`

#### Our Comment tool doesn't deal with yaml format well, this is why the yaml format still looks bad.

Please ignore the comment above, it has bad format, below is the version with better format.

Steps to enable Nvidia GPU on CaaSP:

Hardware prerequisites: Ensure your GPU is CUDA-capable: If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable.
Install Nvidia driver on the GPU nodes. (https://www.nvidia.com/Download/index.aspx)
Install CUDA on the GPU nodes. (https://developer.nvidia.com/cuda-downloads?targetos=Linux&targetarch=x8664&targetdistro=SLES&targetversion=15&targettype=rpmlocal)
Install nvidia-container-runtime-hook on the GPU nodes: 4.1 Source code can be found at: https://github.com/Hui-Zhi/libnvidia-container/tree/sles15-sp1-support 4.2 Run "make sle15.1" // 4.1 and 4.2 is not necessary need to be done on GPU nodes, and it needs docker. 4.3 Install dist/sle15.1/x86_64/libnvidia-container*.rpm on GPU nodes.
Label GPU nodes, example: nvidia-smi -q | grep 'Product Name' kubectl label nodes node1 nvidia.com/brand=Quadro // Needs to be the product of yours. kubectl label nodes node1 hardware-type=NVIDIAGPU

If you follow the steps above, you have enabled GPU on your CaaSP environment, but please verify it. Example: verify by vector-add: nvidia-test.yaml yaml apiVersion: v1 kind: Pod metadata: name: cuda-vector-add namespace: default spec: restartPolicy: OnFailure containers: - name: cuda-vector-add image: "docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1" env: - name: NVIDIA_VISIBLE_DEVICES value: all - name: NVIDIA_DRIVER_CAPABILITIES value: "compute,utility" - name: NVIDIA_REQUIRE_CUDA value: "cuda>=5.0" securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] seLinuxOptions: type: nvidia_container_t resources: limits: nvidia.com/gpu: 1

Then run kubectl create -f nvidia-test.yaml

Or run kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/examples/workloads/deployment.yml

Our Comment tool doesn't deal with yaml format well, this is why the yaml format still looks bad.

# A First Level Header
## A Second Level Header

Use one asterisk to *emphasize*

Use two asterisks for **strong emphasis**

- Use hyphens
- for unordereed
- lists

This is an [link to example.com](http://example.com/)

This is an image ![an openSUSE geeko icon](https://en.opensuse.org/images/d/d0/Icon-distribution.png)

This is a user link @hans

This is a project link hw#some-cool-title

More Complex Markdown Help

Formatting Help

No Hackers yet

Looking for hackers with the skills:

This project is part of:

Activity

Comments

over 5 years ago by drdavis | Reply

over 5 years ago by huizhizhao | Reply

over 5 years ago by huizhizhao | Reply

over 5 years ago by huizhizhao | Reply

Please ignore this, a better format version can be found below.

over 5 years ago by huizhizhao | Reply

Please ignore the comment above, it has bad format, below is the version with better format.

Steps to enable Nvidia GPU on CaaSP:

Our Comment tool doesn't deal with yaml format well, this is why the yaml format still looks bad.

over 5 years ago by huizhizhao | Reply

Similar Projects