rootless img init-directories fail on Talos with enforce:baseline and audit + warn: restrictive #705

Open
opened 2024-08-28 14:59:16 +00:00 by Outspoken · 14 comments
Outspoken commented 2024-08-28 14:59:16 +00:00 (Migrated from gitea.com)

I've been trying all day making this work and I just don't understand what I am doing wrong. Perhaps its the combination of things.

I run a Talos cluster, 5 nodes bare metal with Cilium and kube-proxy replacement. Storage provider is Longhorn 1.7.0.

Talos default pod security admission is in place:

apiVersion: pod-security.admission.config.k8s.io/v1alpha1
kind: PodSecurityConfiguration
defaults:
  enforce: "baseline"
  enforce-version: "latest"
  audit: "restricted"
  audit-version: "latest"
  warn: "restricted"
  warn-version: "latest"

So, as expected if I don't change anything the Helm deployment will complain:

W0828 13:51:26.773881   54860 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "init-directories", "init-app-ini", "configure-gitea", "gitea" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "init-directories", "init-app-ini", "configure-gitea", "gitea" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "init-directories", "init-app-ini", "configure-gitea", "gitea" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "init-directories", "init-app-ini", "configure-gitea", "gitea" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

I have set in my values.yaml:

podSecurityContext:
  fsGroup: 1000
  seccompProfile:
    type: RuntimeDefault

containerSecurityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL
  privileged: false
  readOnlyRootFilesystem: true
  runAsGroup: 1000
  runAsNonRoot: true
  runAsUser: 1000
  seccompProfile:
    type: RuntimeDefault

And:

initContainers:
  - name: init-directories
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000
      seccompProfile:
        type: RuntimeDefault
  - name: init-app-ini
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000
      seccompProfile:
        type: RuntimeDefault
  - name: configure-gitea
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000
      seccompProfile:
        type: RuntimeDefault

And finally, this part of which I am not even sure its needed. I found it online somehwere and figured I try it:

gitea:
  podSecurityContext:
    fsGroup: 1000
  containerSecurityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault

I tried the above one piece at a time. I tried different variations, such as setting allowPrivilegeEscalation: true and setting the runAsNonRoot: false and runAsUser: 0.

I also added:

initContainers:
  - name: init-permissions
    image: busybox
    command: ["sh", "-c", "mkdir -p /data/gitea/conf && chown -R 1000:1000 /data/gitea/conf"]
    volumeMounts:
      - name: gitea-data
        mountPath: /data/gitea/conf
    securityContext:
      runAsUser: 0  # Run as root to change permissions

I'm a bit lost, to be honest. Can someone help me out here fixing this persistent issue?

I figured I run the root image instead, set the namespace labels to privileged and be done with it. That worked, until I tried running gitea doctor after a restore (I am migrating from Docker) and decided I need to fix this properly.

Appreciate the help!
Thanks.

I've been trying all day making this work and I just don't understand what I am doing wrong. Perhaps its the combination of things. I run a Talos cluster, 5 nodes bare metal with Cilium and kube-proxy replacement. Storage provider is Longhorn 1.7.0. Talos default pod security admission is in place: ```yaml apiVersion: pod-security.admission.config.k8s.io/v1alpha1 kind: PodSecurityConfiguration defaults: enforce: "baseline" enforce-version: "latest" audit: "restricted" audit-version: "latest" warn: "restricted" warn-version: "latest" ``` So, as expected if I don't change anything the Helm deployment will complain: ```sh W0828 13:51:26.773881 54860 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "init-directories", "init-app-ini", "configure-gitea", "gitea" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "init-directories", "init-app-ini", "configure-gitea", "gitea" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "init-directories", "init-app-ini", "configure-gitea", "gitea" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "init-directories", "init-app-ini", "configure-gitea", "gitea" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") ``` I have set in my values.yaml: ```yaml podSecurityContext: fsGroup: 1000 seccompProfile: type: RuntimeDefault containerSecurityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault ``` And: ```yaml initContainers: - name: init-directories securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault - name: init-app-ini securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault - name: configure-gitea securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault ``` And finally, this part of which I am not even sure its needed. I found it online somehwere and figured I try it: ```yaml gitea: podSecurityContext: fsGroup: 1000 containerSecurityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault ``` I tried the above one piece at a time. I tried different variations, such as setting `allowPrivilegeEscalation: true` and setting the `runAsNonRoot: false` and `runAsUser: 0`. I also added: ```yaml initContainers: - name: init-permissions image: busybox command: ["sh", "-c", "mkdir -p /data/gitea/conf && chown -R 1000:1000 /data/gitea/conf"] volumeMounts: - name: gitea-data mountPath: /data/gitea/conf securityContext: runAsUser: 0 # Run as root to change permissions ``` I'm a bit lost, to be honest. Can someone help me out here fixing this persistent issue? I figured I run the root image instead, set the namespace labels to privileged and be done with it. That worked, until I tried running gitea doctor after a restore (I am migrating from Docker) and decided I need to fix this properly. Appreciate the help! Thanks.
Outspoken commented 2024-08-28 18:12:01 +00:00 (Migrated from gitea.com)

I found the issue.

I have defined:

extraVolumes:
  - name: gitea-themes
    secret:
      secretName: gitea-themes
  - name: custom-templates
    configMap:
      name: gitea-custom-templates

extraVolumeMounts:
  - name: gitea-themes
    readOnly: true
    mountPath: "/data/gitea/public/assets/css"
  - name: custom-templates
    mountPath: /data/gitea/templates/custom/

Before I deploy Gitea I create the namespace, set pod security labels and define the following two items:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: gitea-custom-templates
  namespace: gitea
data:
  extra_links.tmpl: |
    <a class="item" href="{{AppSubUrl}}/wiki/wiki/wiki">Wiki</a>    
---
apiVersion: v1
kind: Secret
metadata:
  name: gitea-themes
  namespace: gitea
type: Opaque
data:
  theme-tangerine-dream.css: |-

No matter what I try when this is present on a new deployment (that creates new volumes) the init container fails.

I remove the extraVolumes and extraVolumeMounts and it deploys fine. I re-add these and then it works.

I found the issue. I have defined: ```yaml extraVolumes: - name: gitea-themes secret: secretName: gitea-themes - name: custom-templates configMap: name: gitea-custom-templates extraVolumeMounts: - name: gitea-themes readOnly: true mountPath: "/data/gitea/public/assets/css" - name: custom-templates mountPath: /data/gitea/templates/custom/ ``` Before I deploy Gitea I create the namespace, set pod security labels and define the following two items: ```yaml --- apiVersion: v1 kind: ConfigMap metadata: name: gitea-custom-templates namespace: gitea data: extra_links.tmpl: | <a class="item" href="{{AppSubUrl}}/wiki/wiki/wiki">Wiki</a> --- apiVersion: v1 kind: Secret metadata: name: gitea-themes namespace: gitea type: Opaque data: theme-tangerine-dream.css: |- ``` No matter what I try when this is present on a new deployment (that creates new volumes) the init container fails. I remove the extraVolumes and extraVolumeMounts and it deploys fine. I re-add these and then it works.
Outspoken commented 2024-08-28 18:23:56 +00:00 (Migrated from gitea.com)

And now I can put namespace pod security back to defaults:

defaults:
  enforce: "baseline"
  enforce-version: "latest"
  audit: "restricted"
  audit-version: "latest"
  warn: "restricted"
  warn-version: "latest"

and define:

podSecurityContext:
  fsGroup: 1000
  seccompProfile:
    type: RuntimeDefault

containerSecurityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL
  privileged: false
  readOnlyRootFilesystem: true
  runAsGroup: 1000
  runAsNonRoot: true
  runAsUser: 1000
  seccompProfile:
    type: RuntimeDefault

Without the extraVolumes and extraVolumeMounts this deploys without a problem.

Damn this had me chasing my tail all day ... :(

So I don't know if this is something that can be solved? Maybe its just not possible to deploy it all at once?

And now I can put namespace pod security back to defaults: ```yaml defaults: enforce: "baseline" enforce-version: "latest" audit: "restricted" audit-version: "latest" warn: "restricted" warn-version: "latest" ``` and define: ```yaml podSecurityContext: fsGroup: 1000 seccompProfile: type: RuntimeDefault containerSecurityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsGroup: 1000 runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault ``` Without the `extraVolumes` and `extraVolumeMounts` this deploys without a problem. Damn this had me chasing my tail all day ... :( So I don't know if this is something that can be solved? Maybe its just not possible to deploy it all at once?
Outspoken commented 2024-08-31 08:53:56 +00:00 (Migrated from gitea.com)

Gosh Im sorry I just don't understand why this issue keeps coming back. I've deployed lots of stuff and it all works. Some take some troubleshooting of logs and events but I always work it out. Straight yaml manifests is often easiest but Helm charts shouldnt be a problem for me. It seems that for some reason it works, I think I worked it out, I uninstall and delete the namespace to to deploy from scratch and then it doesn't work anymore. Somehow I seem to do something that solves it. I keep notes and work instructions and I save my changes to the same helm values file. I test changes to the chart values by running helm upgrade --install --namespace gitea -f my-values.yaml gitea gitea-charts/gitea. I just can't work out why this won't deploy from scratch. I keep getting the dreaded permission denied on init-directories:

kc logs pod/gitea-7fddd946d4-wbpwp -c init-directories
+ mkdir -p /data/git/.ssh
+ chmod -R 700 /data/git/.ssh
+ '[' '!' -d /data/gitea/conf ']'
+ mkdir -p /data/gitea/conf
mkdir: can't create directory '/data/gitea/conf': Permission denied

If someone could please help me out here I'd appreciate it!
Thanks in advance!

Gosh Im sorry I just don't understand why this issue keeps coming back. I've deployed lots of stuff and it all works. Some take some troubleshooting of logs and events but I always work it out. Straight yaml manifests is often easiest but Helm charts shouldnt be a problem for me. It seems that for some reason it works, I think I worked it out, I uninstall and delete the namespace to to deploy from scratch and then it doesn't work anymore. Somehow I seem to do something that solves it. I keep notes and work instructions and I save my changes to the same helm values file. I test changes to the chart values by running `helm upgrade --install --namespace gitea -f my-values.yaml gitea gitea-charts/gitea`. I just can't work out why this won't deploy from scratch. I keep getting the dreaded permission denied on init-directories: ```sh kc logs pod/gitea-7fddd946d4-wbpwp -c init-directories + mkdir -p /data/git/.ssh + chmod -R 700 /data/git/.ssh + '[' '!' -d /data/gitea/conf ']' + mkdir -p /data/gitea/conf mkdir: can't create directory '/data/gitea/conf': Permission denied ``` If someone could please help me out here I'd appreciate it! Thanks in advance!
pat-s commented 2024-09-02 08:17:11 +00:00 (Migrated from gitea.com)

Hard to say what the issue is but given you're running Talos and it comes with it's own security admissions, I'd guess it is related to that.
We haven't had any reported issues with the rootless image on "common" distributions.

I don't have time to dive into this in more detail. The only thing I can add that Micro OS and bottlerocket, two other container optimized OS, are working fine with the rootless image.

Hard to say what the issue is but given you're running Talos and it comes with it's own security admissions, I'd guess it is related to that. We haven't had any reported issues with the rootless image on "common" distributions. I don't have time to dive into this in more detail. The only thing I can add that [Micro OS](https://microos.opensuse.org/) and [bottlerocket](https://bottlerocket.dev/en/), two other container optimized OS, are working fine with the rootless image.
Outspoken commented 2024-09-02 08:23:25 +00:00 (Migrated from gitea.com)

Appreciate the reply and I respect that, we all do what we can with the time we have.

Are you oke with leaving it open for a while, see if someone chimes in?

Appreciate the reply and I respect that, we all do what we can with the time we have. Are you oke with leaving it open for a while, see if someone chimes in?
pat-s commented 2024-09-02 08:32:41 +00:00 (Migrated from gitea.com)

Sure, maybe you can update the title to include "Talos" in some ways as it is likely that it has an influence here and might help others WRT to search.

Maybe there is a way to temporary turn off the security admissions and then reapply the step by step to narrow it down?

Sure, maybe you can update the title to include "Talos" in some ways as it is likely that it has an influence here and might help others WRT to search. Maybe there is a way to temporary turn off the security admissions and then reapply the step by step to narrow it down?
sleepingleopard commented 2024-09-02 08:55:01 +00:00 (Migrated from gitea.com)

I don't think it's a Talos issue. I am also having this issue trying to deploy Gitea in an AKS cluster.
We use the Azure File Provider. Whenever I start the container with rootless image, I get:
chmod: /data/ssh: Operation not permitted

I have tried all the possible combinations of securityContext and only running as root fixed it. But then Gitea won't start,as rootless image will not allow running as root.

Only workaround I found is to use the rootful image and set the ownership of of data directory to user 1000 in a preinit-script. Besides having root, which I want to avoid, also then the Gitea-container can not start, since OpenSSH-daemon fails because of too open permissions for the SSH keys (777). A Gitea config which deactivates SSH (I don't need it) also won't help:

gitea:
  config:
    server:
      DISABLE_SSH: true
      START_SSH_SERVER: false
      SSH_CREATE_AUTHORIZED_KEYS_FILE: false
I don't think it's a Talos issue. I am also having this issue trying to deploy Gitea in an AKS cluster. We use the Azure File Provider. Whenever I start the container with rootless image, I get: `chmod: /data/ssh: Operation not permitted` I have tried all the possible combinations of securityContext and only running as root fixed it. But then Gitea won't start,as rootless image will not allow running as root. Only workaround I found is to use the rootful image and set the ownership of of data directory to user 1000 in a preinit-script. Besides having root, which I want to avoid, also then the Gitea-container can not start, since OpenSSH-daemon fails because of too open permissions for the SSH keys (777). A Gitea config which deactivates SSH (I don't need it) also won't help: ``` gitea: config: server: DISABLE_SSH: true START_SSH_SERVER: false SSH_CREATE_AUTHORIZED_KEYS_FILE: false ```
pat-s commented 2024-09-02 09:02:56 +00:00 (Migrated from gitea.com)

Thanks for your input!

Azure and it's Gatekeeper are also known to be quite restrictive. I haven't deployed Gitea on AKS myself yet, though.

since OpenSSH-daemon fails because of too open permissions for the SSH keys (777)

Interesting. I can confirm that in my installation, /data/ssh is also 777 but not causing an issue (this should not imply that everything is good as-is).

I guess a deep dive is needed into the rootless image and the init-containers to inspect all of the above in more detail. Any research/help is appreciated.

Thanks for your input! Azure and it's Gatekeeper are also known to be quite restrictive. I haven't deployed Gitea on AKS myself yet, though. > since OpenSSH-daemon fails because of too open permissions for the SSH keys (777) Interesting. I can confirm that in my installation, `/data/ssh` is also 777 but not causing an issue (this should not imply that everything is good as-is). I guess a deep dive is needed into the rootless image and the init-containers to inspect all of the above in more detail. Any research/help is appreciated.
Outspoken commented 2024-09-02 09:11:37 +00:00 (Migrated from gitea.com)

I'll edit the title to clarify.

However, Talos doesn't really do anything special in this respect, it simply sets security contexts, albeit a little more restrictive than a default kubernetes install on a regular distribution. Talos does not supply their own security admissions, at least not that I am aware of.

When I set:

---
apiVersion: v1
kind: Namespace
metadata:
  name: netdata
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged

It will negate / ignore / disable any all all security admissions. I have tried that and still got the error. However, I have been trying so many things including custom init containers, and its quite possible I might have had another issue as well.

I can use machineconfig to edit Talos defaults, too. But since it does nothing more than set a default, which I can override on namespace level. The above namespace labels would simply translate to machine config:

defaults:
  enforce: "privileged"
  enforce-version: "latest"
  audit: "privileged"
  audit-version: "latest"
  warn: "privileged"
  warn-version: "latest"

So apart from being more restrictive by default, there is nothing different in Talos from another Kubernetes deployment.

Later today I will try another deployment and set everything to privileged again.

I'll edit the title to clarify. However, Talos doesn't really do anything special in this respect, it simply sets security contexts, albeit a little more restrictive than a default kubernetes install on a regular distribution. Talos does not supply their own security admissions, at least not that I am aware of. When I set: ```yaml --- apiVersion: v1 kind: Namespace metadata: name: netdata labels: pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/audit: privileged pod-security.kubernetes.io/warn: privileged ``` It will negate / ignore / disable any all all security admissions. I have tried that and still got the error. However, I have been trying so many things including custom init containers, and its quite possible I might have had another issue as well. I can use machineconfig to edit Talos defaults, too. But since it does nothing more than set a default, which I can override on namespace level. The above namespace labels would simply translate to machine config: ```yaml defaults: enforce: "privileged" enforce-version: "latest" audit: "privileged" audit-version: "latest" warn: "privileged" warn-version: "latest" ``` So apart from being more restrictive by default, there is nothing different in Talos from another Kubernetes deployment. Later today I will try another deployment and set everything to privileged again.
sleepingleopard commented 2024-09-02 10:35:31 +00:00 (Migrated from gitea.com)

@pat-s I could solve the issue for AKS. I was so fixated on changing permissions via helm chart, I completely forgot that I can change it for the volume itself. Oh man.
So I created a storage class, which mounts the volume for user and group 1000. Using it for the Gitea data volume works.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: azurefile-retain-rootless
provisioner: file.csi.azure.com 
allowVolumeExpansion: true
reclaimPolicy: Retain
mountOptions:
 - dir_mode=0777
 - file_mode=0777
 - uid=1000
 - gid=1000
 - mfsymlinks
 - cache=strict
 - actimeo=30
 - nobrl
parameters:
  skuName: Premium_LRS
@pat-s I could solve the issue for AKS. I was so fixated on changing permissions via helm chart, I completely forgot that I can change it for the volume itself. Oh man. So I created a storage class, which mounts the volume for user and group 1000. Using it for the Gitea data volume works. ``` kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: azurefile-retain-rootless provisioner: file.csi.azure.com allowVolumeExpansion: true reclaimPolicy: Retain mountOptions: - dir_mode=0777 - file_mode=0777 - uid=1000 - gid=1000 - mfsymlinks - cache=strict - actimeo=30 - nobrl parameters: skuName: Premium_LRS ```
Outspoken commented 2024-09-02 10:37:17 +00:00 (Migrated from gitea.com)

Ow thats valuable feedback. I need to check if that would work on Longhorn as well.
Edit: that's not going to work on block storage (vs file storage on Azure).

Meh ... 🫤

Ow thats valuable feedback. I need to check if that would work on Longhorn as well. Edit: that's not going to work on block storage (vs file storage on Azure). Meh ... 🫤
pat-s commented 2024-09-11 13:11:47 +00:00 (Migrated from gitea.com)

@sleepingleopard Thanks for sharing!

@Outspoken I am also running Longhorn (on k3s with Micro OS) using the rootless image and haven't faced any issues yet.
Are you on Azure now using their CSI drivers or using Longhorn (on Talos?)?

One option might be to create /data with appropriate permissions already during image creation. This would simplify a lot as creating dirs and chmod'ing is a known pain in certain environments. @techknowlogick What do you think?

@sleepingleopard Thanks for sharing! @Outspoken I am also running Longhorn (on k3s with Micro OS) using the rootless image and haven't faced any issues yet. Are you on Azure now using their CSI drivers or using Longhorn (on Talos?)? One option might be to create `/data` with appropriate permissions already during image creation. This would simplify a lot as creating dirs and chmod'ing is a known pain in certain environments. @techknowlogick What do you think?
Outspoken commented 2024-09-11 14:17:16 +00:00 (Migrated from gitea.com)

@pat-s I am self hosting, bare metal with Talos, Longhorn, Cilium. I just didn't read his message very well and thought I'd try that until I realized he was using Azure file provider and not block storage.

I can test in a separate namespace to troubleshoot and help resolve the issue, but it will have to wait until october due to holiday leave.

@pat-s I am self hosting, bare metal with Talos, Longhorn, Cilium. I just didn't read his message very well and thought I'd try that until I realized he was using Azure file provider and not block storage. I can test in a separate namespace to troubleshoot and help resolve the issue, but it will have to wait until october due to holiday leave.
I3eastmaster commented 2024-10-06 15:05:32 +00:00 (Migrated from gitea.com)

I'm running into a similar issue but with a baremetal kubeadm bootstrapped cluster. I'm loading the git directories through a NFS mount

extraVolumes:
- name: data
  nfs:
    server: server
    path: /path/to/gitea

And the following security context

podSecurityContext:
  fsGroup: 5300

containerSecurityContext:
  runAsGroup: 5300
  runAsUser: 5300

If there is no pod running, all is fine. Though when I update the config and re-apply, the pod errors on init_directory_structure.sh script with chmod: /data/.ssh: Operation not permitted.

For now I've worked around this with the initPreScript

initPreScript: |
  if [[ -d /data/git/.ssh ]]; then
    exit 0
  fi
I'm running into a similar issue but with a baremetal kubeadm bootstrapped cluster. I'm loading the git directories through a NFS mount ``` extraVolumes: - name: data nfs: server: server path: /path/to/gitea ``` And the following security context ``` podSecurityContext: fsGroup: 5300 containerSecurityContext: runAsGroup: 5300 runAsUser: 5300 ``` If there is no pod running, all is fine. Though when I update the config and re-apply, the pod errors on `init_directory_structure.sh` script with `chmod: /data/.ssh: Operation not permitted`. For now I've worked around this with the initPreScript ``` initPreScript: | if [[ -d /data/git/.ssh ]]; then exit 0 fi ```
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: lunny/helm-chart#705
No description provided.