rootless img init-directories fail on Talos with enforce:baseline and audit + warn: restrictive #705
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I've been trying all day making this work and I just don't understand what I am doing wrong. Perhaps its the combination of things.
I run a Talos cluster, 5 nodes bare metal with Cilium and kube-proxy replacement. Storage provider is Longhorn 1.7.0.
Talos default pod security admission is in place:
So, as expected if I don't change anything the Helm deployment will complain:
I have set in my values.yaml:
And:
And finally, this part of which I am not even sure its needed. I found it online somehwere and figured I try it:
I tried the above one piece at a time. I tried different variations, such as setting
allowPrivilegeEscalation: true
and setting therunAsNonRoot: false
andrunAsUser: 0
.I also added:
I'm a bit lost, to be honest. Can someone help me out here fixing this persistent issue?
I figured I run the root image instead, set the namespace labels to privileged and be done with it. That worked, until I tried running gitea doctor after a restore (I am migrating from Docker) and decided I need to fix this properly.
Appreciate the help!
Thanks.
I found the issue.
I have defined:
Before I deploy Gitea I create the namespace, set pod security labels and define the following two items:
No matter what I try when this is present on a new deployment (that creates new volumes) the init container fails.
I remove the extraVolumes and extraVolumeMounts and it deploys fine. I re-add these and then it works.
And now I can put namespace pod security back to defaults:
and define:
Without the
extraVolumes
andextraVolumeMounts
this deploys without a problem.Damn this had me chasing my tail all day ... :(
So I don't know if this is something that can be solved? Maybe its just not possible to deploy it all at once?
Gosh Im sorry I just don't understand why this issue keeps coming back. I've deployed lots of stuff and it all works. Some take some troubleshooting of logs and events but I always work it out. Straight yaml manifests is often easiest but Helm charts shouldnt be a problem for me. It seems that for some reason it works, I think I worked it out, I uninstall and delete the namespace to to deploy from scratch and then it doesn't work anymore. Somehow I seem to do something that solves it. I keep notes and work instructions and I save my changes to the same helm values file. I test changes to the chart values by running
helm upgrade --install --namespace gitea -f my-values.yaml gitea gitea-charts/gitea
. I just can't work out why this won't deploy from scratch. I keep getting the dreaded permission denied on init-directories:If someone could please help me out here I'd appreciate it!
Thanks in advance!
Hard to say what the issue is but given you're running Talos and it comes with it's own security admissions, I'd guess it is related to that.
We haven't had any reported issues with the rootless image on "common" distributions.
I don't have time to dive into this in more detail. The only thing I can add that Micro OS and bottlerocket, two other container optimized OS, are working fine with the rootless image.
Appreciate the reply and I respect that, we all do what we can with the time we have.
Are you oke with leaving it open for a while, see if someone chimes in?
Sure, maybe you can update the title to include "Talos" in some ways as it is likely that it has an influence here and might help others WRT to search.
Maybe there is a way to temporary turn off the security admissions and then reapply the step by step to narrow it down?
I don't think it's a Talos issue. I am also having this issue trying to deploy Gitea in an AKS cluster.
We use the Azure File Provider. Whenever I start the container with rootless image, I get:
chmod: /data/ssh: Operation not permitted
I have tried all the possible combinations of securityContext and only running as root fixed it. But then Gitea won't start,as rootless image will not allow running as root.
Only workaround I found is to use the rootful image and set the ownership of of data directory to user 1000 in a preinit-script. Besides having root, which I want to avoid, also then the Gitea-container can not start, since OpenSSH-daemon fails because of too open permissions for the SSH keys (777). A Gitea config which deactivates SSH (I don't need it) also won't help:
Thanks for your input!
Azure and it's Gatekeeper are also known to be quite restrictive. I haven't deployed Gitea on AKS myself yet, though.
Interesting. I can confirm that in my installation,
/data/ssh
is also 777 but not causing an issue (this should not imply that everything is good as-is).I guess a deep dive is needed into the rootless image and the init-containers to inspect all of the above in more detail. Any research/help is appreciated.
I'll edit the title to clarify.
However, Talos doesn't really do anything special in this respect, it simply sets security contexts, albeit a little more restrictive than a default kubernetes install on a regular distribution. Talos does not supply their own security admissions, at least not that I am aware of.
When I set:
It will negate / ignore / disable any all all security admissions. I have tried that and still got the error. However, I have been trying so many things including custom init containers, and its quite possible I might have had another issue as well.
I can use machineconfig to edit Talos defaults, too. But since it does nothing more than set a default, which I can override on namespace level. The above namespace labels would simply translate to machine config:
So apart from being more restrictive by default, there is nothing different in Talos from another Kubernetes deployment.
Later today I will try another deployment and set everything to privileged again.
@pat-s I could solve the issue for AKS. I was so fixated on changing permissions via helm chart, I completely forgot that I can change it for the volume itself. Oh man.
So I created a storage class, which mounts the volume for user and group 1000. Using it for the Gitea data volume works.
Ow thats valuable feedback. I need to check if that would work on Longhorn as well.
Edit: that's not going to work on block storage (vs file storage on Azure).
Meh ... 🫤
@sleepingleopard Thanks for sharing!
@Outspoken I am also running Longhorn (on k3s with Micro OS) using the rootless image and haven't faced any issues yet.
Are you on Azure now using their CSI drivers or using Longhorn (on Talos?)?
One option might be to create
/data
with appropriate permissions already during image creation. This would simplify a lot as creating dirs and chmod'ing is a known pain in certain environments. @techknowlogick What do you think?@pat-s I am self hosting, bare metal with Talos, Longhorn, Cilium. I just didn't read his message very well and thought I'd try that until I realized he was using Azure file provider and not block storage.
I can test in a separate namespace to troubleshoot and help resolve the issue, but it will have to wait until october due to holiday leave.
I'm running into a similar issue but with a baremetal kubeadm bootstrapped cluster. I'm loading the git directories through a NFS mount
And the following security context
If there is no pod running, all is fine. Though when I update the config and re-apply, the pod errors on
init_directory_structure.sh
script withchmod: /data/.ssh: Operation not permitted
.For now I've worked around this with the initPreScript