SUSE Hack Week: Leverage containers to distribute NFS DRBD storage among pacemaker cluster nodes

Current Situation [bsc#1201271]

SUSE HA NFS Storage Guide [1] provides a resilient NFS implementation to the clients even if the NS server node fails over within the cluster. However, it is Active-Passive of the two node cluster.

Motivation

Could be possible to extend more NFS server instances on both nodes? In the end, any NFS server instance can run either node in parallel within the pacemaker cluster.

Challenges and Possibilities:

NFS server configuration file and state isolation for its own exportfs eg. /var/lib/nfs. Could be helpful from the container technology?
How to bundle pacemaker RA service inside container to run nfs-server?
How to manage IP address inside container while failover between nodes?

[1] https://documentation.suse.com/en-us/sle-ha/15-SP5/single-html/SLE-HA-nfs-storage/

Join this project Leave this project

Looking for hackers with the skills:

nfs cluster drbd ha

This project is part of:

Hack Week 23

Activity

about 2 years ago: roseswe liked this project.

about 2 years ago: zzhou started this project.

about 2 years ago: sthackarajan liked this project.

about 2 years ago: zzhou removed keyword pacemakercluster from this project.

about 2 years ago: zzhou added keyword "ha" to this project.

about 2 years ago: zzhou added keyword "nfs" to this project.

about 2 years ago: zzhou added keyword "cluster" to this project.

about 2 years ago: zzhou added keyword "drbd" to this project.

about 2 years ago: zzhou added keyword "pacemakercluster" to this project.

about 2 years ago: zzhou originated this project.

Comments

about 2 years ago by zzhou | Reply

``` * Exercise-1: Lunch multiple NFS docker instances directly by systemd inside containers
1. tumbleweed status target : 20231101
Dockerfile:

FROM opensuse/tumbleweed RUN zypper -n install systemd nfs-kernel-server vim iproute2 iputils pacemaker-remote gawk which RUN systemctl enable nfs-server RUN echo "/srv/nfs/share *(rw)" > /etc/exports CMD ["/usr/lib/systemd/systemd", "--system"]
1. docker build -t nfsserver .
runnfsserverdocker () { i=$1 # eg. i=1; N=nfsserver; h=$N-$i; \ docker run -v /srv/nfs${i}/state:/var/lib/nfs \ -v /srv/nfs${i}/share:/srv/nfs/share \ -it --privileged --name=$h -h=$h $N & } runnfsserverdocker 1 runnfsserverdocker 2
1. verify two nfsserver docker instances ipnfsserver1=docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' nfsserver-1; echo $ip1 ipnfsserver2=docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' nfsserver-2; echo $ip2 showmount -e $ipnfsserver1 showmount -e $ipnfsserver2
tws-1:~ # showmount -e $ipnfsserver1 Export list for 172.17.0.3: /srv/nfs/share * tws-1:~ # showmount -e $ipnfsserver2 Export list for 172.17.0.4: /srv/nfs/share *
- Exercise-2: Lunch NFS server docker directly by pacemaker-remote inside containers
1. Add pacemaker docker bundle into CIB.xml TODO: FEAT: crmsh does not support container yet
- Summary:
Unfortunately in the end, this reveals a significant issue as the "show stopper" because pcmk-init for pacemaker-remote inside the container conflicts with systemd, both of them require PID=1.

Open questions: - what's the purpose of pcmk-init in the bundle container? - If possible let the pacemaker bundle container still run systemd? - Is there any solid/stable approach to run nfsserver containers without systemd?

```

about 2 years ago by zzhou | Reply

``` Back on this topic and update some major progress since hackweek 23.

In summary, the show-stopper issue in the past has been addressed. As a result, nfsserver can now operate within the Pacemaker bundle containers and can be distributed across the cluster nodes. The successful mounting of nfsserver exports on various cluster nodes using NFSv3 and v4 protocols has been confirmed.

The ongoing challenge lies with the showmount operation, specifically in the context of the NFS protocol GETADDR operation, which is currently not functioning correctly. Unfortunately, a resolution for this issue has not been identified yet.

The sample configuration is provided below. Some refinement may still be necessary, and adjustments might be required to enhance certain subtle NFS functionalities. [1] for Dockerfile, [2] for crm configure show

[1] Dockerfile

FROM opensuse/tumbleweed RUN zypper -n install systemd nfs-kernel-server vim iproute2 iputils pacemaker-remote gawk which

RUN echo -e "[Unit]\nRequires=pacemakerremote.service\nAfter=pacemakerremote.service\nRequires=nfs-server.service\nAfter=nfs-server.service" > /usr/lib/systemd/system/runpcmkremoteandnfs_server.target RUN echo -e "[Service]\nExecStartPre=/usr/bin/mkdir -p /var/lib/nfs/sm /var/lib/nfs/sm.bak" > /usr/lib/systemd/system/nfs-server.service.d/10-prepare-dirs.conf

RUN sed -e 's/STATDPORT=.*/STATDPORT="662"/' -i /etc/sysconfig/nfs RUN sed -e 's/LOCKDTCPPORT=.*/LOCKDTCPPORT="32768"/' -i /etc/sysconfig/nfs RUN sed -e 's/LOCKDUDPPORT=.*/LOCKDUDPPORT="32768"/' -i /etc/sysconfig/nfs

CMD ["/usr/lib/systemd/systemd", "--system"]

[2] crm configure show

primitive drbd1 ocf:linbit:drbd \ params drbdresource=nfsserver1 \ op monitor interval=15 role=Promoted timeout=20 \ op monitor interval=30 role=Unpromoted timeout=20 \ op start timeout=240 interval=0s \ op promote timeout=90 interval=0s \ op demote timeout=90 interval=0s \ op stop timeout=100 interval=0s primitive exportfs1 exportfs \ params directory="/srv/nfs/share" options="rw,mountpoint" clientspec="*" fsid=0 \ op monitor interval=30s timeout=40s \ op start timeout=60s interval=0s \ op stop timeout=120s interval=0s primitive fs1 Filesystem \ params device="/dev/drbd1" directory="/srv/nfs1" fstype=ext4 \ op monitor interval=30s timeout=40s \ op start timeout=60s interval=0s \ op stop timeout=60s interval=0s bundle nfsserver1 \ docker image=nfsserver options="--privileged --stop-signal SIGRTMIN+3" run-command="/usr/lib/systemd/systemd --system --unit=runpcmkremoteandnfsserver.target" \ network ip-range-start=192.168.1.200 port-mapping id=nfs1portsunrpc port=111 port-mapping id=nfs1portdata port=2049 port-mapping id=nfs1portrpcmount port=20048 port-mapping id=nfs1portstatd port=662 port-mapping id=nfs1portlockd-tcpudp port=32768 \ storage storage-mapping id=nfs1-state source-dir="/srv/nfs1/state" target-dir="/var/lib/nfs" options=rw storage-mapping id=nfs1-share source-dir="/srv/nfs1/share" target-dir="/srv/nfs/share" options=rw \ meta target-role=Started \ primitive exportfs1 clone drbd-nfs1 drbd1 \ meta promotable=true promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 notify=true interleave=true colocation col-nfs1-on-drbd inf: nfsserver1 fs1 drbd-nfs1:Promoted order o-drbd-before-nfs1 Mandatory: drbd-nfs1:promote fs1:start nfsserver1

```

Similar Projects

Description

Rust has similar performance to C. Also, have a better async IO module and high integration with io_uring. This project aims to develop a user-space NFS server based on Rust.

Goals

Get an understanding of how cargo works
Get an understanding of how XDR was generated with xdrgen
Create the RUST-based NFS server that supports basic operations like mount/readdir/read/write

Result (2025 Hackweek)

In progress PR: https://github.com/Vicente-Cheng/arcticwolf/pull/1

Resources

https://github.com/Vicente-Cheng/arcticwolf

Description

SUSE MLS (Multi-Linux Support) - A subscription where SUSE provides technical support and updates for Red Hat Enterprise Linux (RHEL) and CentOS servers

The most significant operational difference between SUSE MLS 7 and the standard SUSE Linux Enterprise Server High Availability Extension (SLES HAE) lies in the administrative toolchain. While both distributions rely on the same underlying Pacemaker resource manager and Corosync messaging layer, MLS 7 preserves the native Red Hat Enterprise Linux 7 user space. Consequently, MLS 7 administrators must utilize the Pacemaker Configuration System (pcs), a monolithic and imperative tool. The pcs utility abstracts the entire stack, controlling Corosync networking, cluster bootstrapping, and resource management through single-line commands that automatically generate the necessary configuration files. In contrast, SLES HAE employs the Cluster Resource Management Shell (crmsh). The crm utility operates as a declarative shell that focuses primarily on the Cluster Information Base (CIB). Unlike the command-driven nature of pcs, crmsh allows administrators to enter a configuration context to define the desired state of the cluster using syntax that maps closely to the underlying XML structure. This makes SLES HAE more flexible for complex edits but requires a different syntax knowledge base compared to the rigid, command-execution workflow of MLS 7.

Scope is here MLS 7.9

Goals

Get more familiar with MLS7.9 HA toolchain and Graphical User Interface and Daemons
Create a two node MLS cluster with SBD
Check different use cases
Create a "SUSE Best Practices" presentation slide set suitable for Consulting Customers

Resources

You need MLS7.9 (Qcow2) installed + subscription
KVM server with 2 KVMs, 2 SBD
RHEL7 and HA skills

Work on kqlite (Lightweight remote SQLite with high availability and auto failover). by epenchev

Description

Continue the work on kqlite (Lightweight remote SQLite with high availability and auto failover).
It's a solution for applications that require High Availability but don't need all the features of a complete RDBMS and can fit SQLite in their use case.
Also kqlite can be considered to be used as a lightweight storage backend for K8s (https://docs.k3s.io/datastore) and the Edge, and allowing to have only 2 Nodes for HA.

Goals

Push kqlite to a beta version.
kqlite as library for Go programs.

Resources

https://github.com/kqlite/kqlite

ha

Uyuni read-only replica by cbosdonnat

Description

For now, there is no possible HA setup for Uyuni. The idea is to explore setting up a read-only shadow instance of an Uyuni and make it as useful as possible.

Possible things to look at:

live sync of the database, probably using the WAL. Some of the tables may have to be skipped or some features disabled on the RO instance (taskomatic, PXT sessions…)
Can we use a load balancer that routes read-only queries to either instance and the other to the RW one? For example, packages or PXE data can be served by both, the API GET requests too. The rest would be RW.

Goals

Prepare a document explaining how to do it.
PR with the needed code changes to support it

Current Situation [bsc#1201271]

Motivation

Challenges and Possibilities:

Looking for hackers with the skills:

This project is part of:

Activity

Comments

about 2 years ago by zzhou | Reply

about 2 years ago by zzhou | Reply

Similar Projects

nfs

Arcticwolf - A rust based user space NFS server by vcheng

Description

Goals

Result (2025 Hackweek)

Resources

cluster

Hacking a SUSE MLS 7.9 Cluster by roseswe

Description

Goals

Resources

Work on kqlite (Lightweight remote SQLite with high availability and auto failover). by epenchev

Description

Goals

Resources

ha

Uyuni read-only replica by cbosdonnat

Description

Goals