Description
When setting up their management cluster, Rancher users start by installing... Rancher. This can quickly result in complex sets of config files, not to mention that upgrading Rancher itself is a manual process.
What if... Rancher itself could be configured through Fleet, harnessing the power of GitOps and/or HelmOps?
Goals
- Find a way for Rancher, at install time, to adopt an existing Fleet release instead of installing a new one, provided that the Rancher and Fleet releases are compatible with one another
- Check what happens in Rancher upgrade cases, and improve what can/should be
Stretch goals
- Establish best practices for configuring Rancher through Fleet (e.g.
fleet.yamlor equivalent files, repository structure, etc)
Resources
- Rancher Docs
- Rancher repo
- Fleet Docs
- Fleet repo
- Curiosity!
Outcome
Preliminary research
What already worked
Fleet could already install Rancher, just like any Helm chart, through a GitRepo or HelmOp resource. However, Rancher would then overwrite a pre-existing Fleet deployment with its own, only taking its own configuration (e.g. pinned Fleet version) into account.
Output of this project
Git repository structure enabling Rancher to be installed in a gitOps fashion
Rancher can be set up by doing the following:
- Installing Fleet
- Creating a GitRepo as follows (or by saving the GitRepo itself into a file and applying it through
kubectl apply -n fleet-local -f $file):
cat << EOF | kubectl apply -n fleet-local -f -
kind: GitRepo
apiVersion: fleet.cattle.io/v1alpha1
metadata:
name: install-rancher
spec:
repo: https://github.com/weyfonk/test-fleet
branch: test-install-rancher
bundles:
- base: cert-manager
- base: rancher-cfg
- base: rancher
targets:
- clusterName: local
EOF
This will create 3 bundles:
cert-manager, which will install cert-manager v1.19.0rancher-cfg, containing a config map with Rancher values; this is stored in a separate directory and created as a different bundle to enable dynamic resolution of a Traefik load balancer service IP as Rancher'shostnamevalue.rancher, depending on the above two, installing Rancher 2.13.0.
Note: at the time of writing, this requires Fleet v0.15.0-alpha.1 or above, which has improved Helm lookup
support. This will eventually also be possible with Fleet v0.14, as the lookup fix has been backported there, but that
branch needs a new release including the fix.
Rancher patch enabling Fleet adoption
See feature branch, containing a few commits enabling Rancher's Fleet charts controller to:
- Check if a new
FleetBeforeRancherflag is enabled - If so, read the existing Fleet release, extract its version and values, and merge them with values corresponding to watched Rancher settings when such settings are updated (as opposed to installing the Fleet version pinned with Rancher, with Rancher-populated values exclusively)
These commits are of course not part of any Rancher release at this point. Testing them can be done as follows:
- Check out the feature branch locally
- Run
make quick - Tag the created
rancher/rancherimage as if it belonged to Rancher 2.13.0, e.g. (beware of the commit SHA as it may change as new commits are pushed to that branch):docker tag rancher/rancher:v2.14-33eddc4a0-head rancher/rancher:v2.13.0 - If using k3d, import the image into your cluster, e.g.:
k3d image import -c upstream -m direct rancher/rancher:v2.13.0 - Run the steps described in the previous section to install Rancher through Fleet
Bug fix in Fleet's chart URL resolution
Testing the above surfaced a glitch in Fleet's chart URL resolution; more info here. This glitch did not have any
effect on users nor customers, as it was only present on Fleet's main branch following recent refactoring.
Kudos and thanks to Alejandro Ruiz for his swift help!
Way forward
Rancher configuration monitoring through Fleet
- With Rancher being installed through gitOps, secrets and config maps referenced by the Rancher deployment are subject
to changes, which should lead to updates of the Rancher deployment accordingly. This is a known shortcoming of Fleet's
current usage of
valuesFrom, and should be addressed through this issue.
Readiness
Bundle diffs have been added to prevent additional resources, such as CRDs, installed by the Cert Manager bundle,
from appearing as not owned by the bundle, which resulted in its status appearing as Modified. However, the same
remains to be done for Rancher.
Manual Fleet updates in Rancher
Updating Fleet manually through the Rancher UI still causes install loops, with troubleshooting work pending to diagnose and fix them.
RBAC
With Rancher installed through Fleet, a Rancher bundle will be visible through Rancher's Continuous Delivery UI. Without additional RBAC, users would be able to edit or even delete that bundle, with potentially disastrous consequences. This, together with the previous point on manually updating Fleet through the Rancher UI, raises the following question: with elements of Rancher configuration owned by Fleet, which workflow, and which restrictions or changes to the current UX, should be expected? In particular:
- Should parts of the UI be greyed out with a warning message stating that their elements are owned by Fleet, prompting authorised users to apply changes through git commits rather than the Rancher UI?
- Or the other way around: should authorised users be able to make changes in the UI, which would them reflect them in git? (this would be contrary to gitOps, not to mention risks of conflicts)
Cluster registration
In the interest of time, this Hack Week's efforts have been focused on a minimal single-cluster setup. However, users and customers will be more interested in running this in a multi-cluster fashion. This would require more tests for scenarios such as cluster registration. In particular, what would happen:
- When registering a cluster through Rancher, with Fleet having been installed before Rancher? Cluster registration should be triggered as usual by Rancher, but successful deployment of workloads by Fleet should be tested
- When updating a multi-cluster workload deployed by Fleet prior to the Rancher installation? Would any conflicts or unexpected glitches appear?
Looking for hackers with the skills:
Nothing? Add some keywords!
This project is part of:
Hack Week 25
Activity
Comments
Be the first to comment!
Similar Projects
This project is one of its kind!