Postgres DB issues: too many clients already #431

Closed
opened 2023-04-06 10:46:19 +00:00 by serrnovik · 13 comments
serrnovik commented 2023-04-06 10:46:19 +00:00 (Migrated from gitea.com)

Hello,

With our helm installation:

I'm observing crash on nuget restore when several build agents of our CI (3) try to work at the same time. Frequently seeing crashes.

After examining pods logs noticed the following errors:

In postgresql:

2023-04-06 10:13:14.497 GMT [355528] FATAL: remaining connection slots are reserved for non-replication superuser connections
2023-04-06 10:13:14.500 GMT [355521] FATAL: remaining connection slots are reserved for non-replication superuser connections
2023-04-06 10:13:14.504 GMT [355531] FATAL: remaining connection slots are reserved for non-replication superuser connections
2023-04-06 10:13:14.511 GMT [355525] FATAL: remaining connection slots are reserved for non-replication superuser connections
2023-04-06 10:13:14.511 GMT [355555] FATAL: sorry, too many clients already
2023-04-06 10:13:14.511 GMT [355560] FATAL: sorry, too many clients already
2023-04-06 10:13:14.512 GMT [355557] FATAL: sorry, too many clients already
2023-04-06 10:13:14.512 GMT [355559] FATAL: sorry, too many clients already
2023-04-06 10:13:14.512 GMT [355558] FATAL: sorry, too many clients already
2023-04-06 10:13:14.513 GMT [355527] FATAL: remaining connection slots are reserved for non-replication superuser connections

in gitea pod:

2023/04/06 10:13:18 ...ices/context/user.go:26:1() [E] [642e9b3e-161] GetUserByName: pq: sorry, too many clients already
2023/04/06 10:13:18 ...kages/nuget/nuget.go:29:apiError() [E] [642e9b3e-89] pq: sorry, too many clients already
2023/04/06 10:13:18 ...ices/context/user.go:26:1() [E] [642e9b3e-132] GetUserByName: pq: remaining connection slots are reserved for non-replication superuser connections
2023/04/06 10:13:18 ...rvices/auth/basic.go:129:Verify() [E] [642e9b3e-134] UserSignIn: pq: remaining connection slots are reserved for non-replication superuser connections
2023/04/06 10:13:18 ...ices/context/user.go:26:1() [E] [642e9b3e-141] GetUserByName: pq: remaining connection slots are reserved for non-replication superuser connections
2023/04/06 10:13:18 ...rvices/auth/basic.go:129:Verify() [E] [642e9b3e-168] UserSignIn: pq: sorry, too many clients already
2023/04/06 10:13:18 ...ices/context/user.go:26:1() [E] [642e9b3e-170] GetUserByName: pq: sorry, too many clients already
2023/04/06 10:13:18 ...ices/context/user.go:26:1() [E] [642e9b3e-113] GetUserByName: pq: remaining connection slots are reserved for non-replication superuser connections

And on client side (build agent):


:13:29       | Running 'dotnet restore "/opt/buildagent/work/5385ab0c70618735/code/iriska/ide/vs/Iriska.sln" --configfile "/opt/buildagent/work/5385ab0c70618735/ide/vs/NuGet.Config" --force /p:Configuration=Debug'...
12:13:29       | Allowed exit codes: none
12:16:06          >>
12:16:06          >> Welcome to .NET 6.0!
12:16:06          >> ---------------------
12:16:06          >> SDK Version: 6.0.202
12:16:06          >>
12:16:06          >> ----------------
12:16:06          >> Installed an ASP.NET Core HTTPS development certificate.
12:16:06          >> To trust the certificate run 'dotnet dev-certs https --trust' (Windows and macOS only).
12:16:06          >> Learn about HTTPS: https://aka.ms/dotnet-https
12:16:06          >> ----------------
12:16:06          >> Write your first app: https://aka.ms/dotnet-hello-world
12:16:06          >> Find out what's new: https://aka.ms/dotnet-whats-new
12:16:06          >> Explore documentation: https://aka.ms/dotnet-docs
12:16:06          >> Report issues and find source on GitHub: https://github.com/dotnet/core
12:16:06          >> Use 'dotnet --help' to see available commands or visit: https://aka.ms/dotnet-cli
12:16:06          >> --------------------------------------------------------------------------------------
12:16:06          >>   Determining projects to restore...
12:16:06          >>   Retrying 'FindPackagesByIdAsync' for source 'https://gitea.[REDACTED].io/api/packages/[REDACTED]/nuget/package/[REDACTED]/index.json'.
12:16:06          >>   Response status code does not indicate success: 401 (Unauthorized).
12:16:06          >>   Retrying 'FindPackagesByIdAsync' for source 'https://gitea.[REDACTED].io/api/packages/[REDACTED]/nuget/package/[REDACTED]/index.json'.
12:16:06          >>   Response status code does not indicate success: 401 (Unauthorized).
12:16:06          >>   Retrying 'FindPackagesByIdAsync' for source 'https://gitea.[REDACTED].io/api/packages/[REDACTED]/nuget/package/[REDACTED]/index.json'.
12:16:06          >>   Response status code does not indicate success: 500 (Internal Server Error).
... many more 401 and 500

Customized in the following way (should not affect the issue), tried to reduce logging:

...
ingress:
  enabled: true
  # className: nginx
  className:
  annotations:
    {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  hosts:
    - host: gitea.[REDACTED].io
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: [REDACTED]-full-cert
      hosts:
        - gitea.[REDACTED].io
persistence:
  enabled: true
  existingClaim:
  size: 500Gi
  accessModes:
    - ReadWriteOnce
  labels: {}
  annotations: {}
  storageClass:
  subPath:

  config:
    APP_NAME: "Gitea"
    service:
      DISABLE_REGISTRATION: true
    log:
      LEVEL: Warn
      ENABLE_ACCESS_LOG: false
      DISABLE_ROUTER_LOG: true
  

All the other values are identical to the values in the repository.

I tried to push/get from owner as user and as organization to exclude possible rights issues, result is the same.

Errors are frequent but random. Mostly happens on project with large number of dependencies.

Attached logs of gitea and postgre

Thank you in advance

Hello, With our helm installation: I'm observing crash on nuget restore when several build agents of our CI (3) try to work at the same time. Frequently seeing crashes. After examining pods logs noticed the following errors: In postgresql: ```log 2023-04-06 10:13:14.497 GMT [355528] FATAL: remaining connection slots are reserved for non-replication superuser connections 2023-04-06 10:13:14.500 GMT [355521] FATAL: remaining connection slots are reserved for non-replication superuser connections 2023-04-06 10:13:14.504 GMT [355531] FATAL: remaining connection slots are reserved for non-replication superuser connections 2023-04-06 10:13:14.511 GMT [355525] FATAL: remaining connection slots are reserved for non-replication superuser connections 2023-04-06 10:13:14.511 GMT [355555] FATAL: sorry, too many clients already 2023-04-06 10:13:14.511 GMT [355560] FATAL: sorry, too many clients already 2023-04-06 10:13:14.512 GMT [355557] FATAL: sorry, too many clients already 2023-04-06 10:13:14.512 GMT [355559] FATAL: sorry, too many clients already 2023-04-06 10:13:14.512 GMT [355558] FATAL: sorry, too many clients already 2023-04-06 10:13:14.513 GMT [355527] FATAL: remaining connection slots are reserved for non-replication superuser connections ``` in gitea pod: ```log 2023/04/06 10:13:18 ...ices/context/user.go:26:1() [E] [642e9b3e-161] GetUserByName: pq: sorry, too many clients already 2023/04/06 10:13:18 ...kages/nuget/nuget.go:29:apiError() [E] [642e9b3e-89] pq: sorry, too many clients already 2023/04/06 10:13:18 ...ices/context/user.go:26:1() [E] [642e9b3e-132] GetUserByName: pq: remaining connection slots are reserved for non-replication superuser connections 2023/04/06 10:13:18 ...rvices/auth/basic.go:129:Verify() [E] [642e9b3e-134] UserSignIn: pq: remaining connection slots are reserved for non-replication superuser connections 2023/04/06 10:13:18 ...ices/context/user.go:26:1() [E] [642e9b3e-141] GetUserByName: pq: remaining connection slots are reserved for non-replication superuser connections 2023/04/06 10:13:18 ...rvices/auth/basic.go:129:Verify() [E] [642e9b3e-168] UserSignIn: pq: sorry, too many clients already 2023/04/06 10:13:18 ...ices/context/user.go:26:1() [E] [642e9b3e-170] GetUserByName: pq: sorry, too many clients already 2023/04/06 10:13:18 ...ices/context/user.go:26:1() [E] [642e9b3e-113] GetUserByName: pq: remaining connection slots are reserved for non-replication superuser connections ``` And on client side (build agent): ```log :13:29 | Running 'dotnet restore "/opt/buildagent/work/5385ab0c70618735/code/iriska/ide/vs/Iriska.sln" --configfile "/opt/buildagent/work/5385ab0c70618735/ide/vs/NuGet.Config" --force /p:Configuration=Debug'... 12:13:29 | Allowed exit codes: none 12:16:06 >> 12:16:06 >> Welcome to .NET 6.0! 12:16:06 >> --------------------- 12:16:06 >> SDK Version: 6.0.202 12:16:06 >> 12:16:06 >> ---------------- 12:16:06 >> Installed an ASP.NET Core HTTPS development certificate. 12:16:06 >> To trust the certificate run 'dotnet dev-certs https --trust' (Windows and macOS only). 12:16:06 >> Learn about HTTPS: https://aka.ms/dotnet-https 12:16:06 >> ---------------- 12:16:06 >> Write your first app: https://aka.ms/dotnet-hello-world 12:16:06 >> Find out what's new: https://aka.ms/dotnet-whats-new 12:16:06 >> Explore documentation: https://aka.ms/dotnet-docs 12:16:06 >> Report issues and find source on GitHub: https://github.com/dotnet/core 12:16:06 >> Use 'dotnet --help' to see available commands or visit: https://aka.ms/dotnet-cli 12:16:06 >> -------------------------------------------------------------------------------------- 12:16:06 >> Determining projects to restore... 12:16:06 >> Retrying 'FindPackagesByIdAsync' for source 'https://gitea.[REDACTED].io/api/packages/[REDACTED]/nuget/package/[REDACTED]/index.json'. 12:16:06 >> Response status code does not indicate success: 401 (Unauthorized). 12:16:06 >> Retrying 'FindPackagesByIdAsync' for source 'https://gitea.[REDACTED].io/api/packages/[REDACTED]/nuget/package/[REDACTED]/index.json'. 12:16:06 >> Response status code does not indicate success: 401 (Unauthorized). 12:16:06 >> Retrying 'FindPackagesByIdAsync' for source 'https://gitea.[REDACTED].io/api/packages/[REDACTED]/nuget/package/[REDACTED]/index.json'. 12:16:06 >> Response status code does not indicate success: 500 (Internal Server Error). ... many more 401 and 500 ``` Customized in the following way (should not affect the issue), tried to reduce logging: ```yaml ... ingress: enabled: true # className: nginx className: annotations: {} # kubernetes.io/ingress.class: nginx # kubernetes.io/tls-acme: "true" hosts: - host: gitea.[REDACTED].io paths: - path: / pathType: Prefix tls: - secretName: [REDACTED]-full-cert hosts: - gitea.[REDACTED].io persistence: enabled: true existingClaim: size: 500Gi accessModes: - ReadWriteOnce labels: {} annotations: {} storageClass: subPath: config: APP_NAME: "Gitea" service: DISABLE_REGISTRATION: true log: LEVEL: Warn ENABLE_ACCESS_LOG: false DISABLE_ROUTER_LOG: true ``` All the other values are identical to the values in the repository. I tried to push/get from owner as user and as organization to exclude possible rights issues, result is the same. Errors are frequent but random. Mostly happens on project with large number of dependencies. Attached logs of gitea and postgre Thank you in advance
pat-s commented 2023-04-06 10:57:30 +00:00 (Migrated from gitea.com)

Looks like you're PG is throwing errors due to too many connections at the same time. The PG chart included is not HA capable by default.

I guess you need a HA-capable PG. We highly recommend to use an external, (managed) DB on a Cloud provider of your choice. You can also try https://bitnami.com/stack/postgresql-ha.

Looks like you're PG is throwing errors due to too many connections at the same time. The PG chart included is not HA capable by default. I guess you need a HA-capable PG. We highly recommend to use an external, (managed) DB on a Cloud provider of your choice. You can also try https://bitnami.com/stack/postgresql-ha.
serrnovik commented 2023-04-06 11:31:28 +00:00 (Migrated from gitea.com)

Thank you for quick response,

I'll try to test with separate PG. However our workload in not huge either. Previously we used basic baget with single docker/maria db and it worked just fine with the same setup/projects.

May it suggest that there is an issue with gitea (or it's configuration) connection pooling? It seems that for each package request it creates separate connection.

Thank you for quick response, I'll try to test with separate PG. However our workload in not huge either. Previously we used basic baget with single docker/maria db and it worked just fine with the same setup/projects. May it suggest that there is an issue with gitea (or it's configuration) connection pooling? It seems that for each package request it creates separate connection.
pat-s commented 2023-04-06 11:38:11 +00:00 (Migrated from gitea.com)

It's not about the size of the workload in these situations but only about the parallel access.

I am not a DB expert and this is just a guess of mine reading the logs - I could be wrong.
I don't think though it's an issue of Gitea, the pooling is done and handled on the DB side (but I might also be wrong here). You are of course welcome to clarify this in Gitea core.

We haven't had such a case lately/ever AFAIR, so there must be something special in your setup which causes this (which I can't spot from here).

It's not about the size of the workload in these situations but only about the parallel access. I am not a DB expert and this is just a guess of mine reading the logs - I could be wrong. I don't think though it's an issue of Gitea, the pooling is done and handled on the DB side (but I might also be wrong here). You are of course welcome to clarify this in Gitea core. We haven't had such a case lately/ever AFAIR, so there must be something special in your setup which causes this (which I can't spot from here).
exones commented 2023-04-06 13:22:02 +00:00 (Migrated from gitea.com)

Hello @pat-s !

I'm working with @serrnovik

AFAIK, actually DB Pooling is on the DB's client side (in this case gitea is a client of PG): it's like a cache of connections that you reuse. E.g. see this answer https://stackoverflow.com/a/4041136 or just the wiki article on connection pooling. In our case if the connections were pooled gitea wouldn't create hundreds of connections but rather just would reuse those already open in the pool.

Connection pooling usually can be set up in the connection string to the database (and it's managed by the client's DB driver, not the DB server).

It's all our hypotheses but it really looks like Gitea is opening a lot of connections and very quickly hits the limit.

Without closer looking at the code I can't say more for now.

Regards

Hello @pat-s ! I'm working with @serrnovik AFAIK, actually DB Pooling is on the DB's client side (in this case gitea is a client of PG): it's like a cache of connections that you reuse. E.g. see this answer https://stackoverflow.com/a/4041136 or just the wiki article on connection pooling. In our case if the connections were pooled gitea wouldn't create hundreds of connections but rather just would reuse those already open in the pool. Connection pooling usually can be set up in the connection string to the database (and it's managed by the client's DB driver, not the DB server). It's all our hypotheses but it really looks like Gitea is opening a lot of connections and very quickly hits the limit. Without closer looking at the code I can't say more for now. Regards
serrnovik commented 2023-04-06 20:58:46 +00:00 (Migrated from gitea.com)

Just small update:

I've deployed suggested bitnami PG HA chart. Spend quite some time, but seems had no help to original issue.

However, doing further readings like:
https://github.com/go-gitea/gitea/issues/8273 and https://github.com/go-gitea/gitea/issues/8540
I've ended solving the issue with:

      MAX_OPEN_CONNS: 20
      MAX_IDLE_CONNS: 20
      CONN_MAX_LIFE_TIME: 5m

Number 20 I've put by try and test method. 50 for example was not helping.

I still have hundreds of errors like:

2023/04/06 20:53:31 ...les/cache/context.go:62:GetContextData() [W] [642f314a-5] cannot get cache context when getting data: &{context.Background.WithCancel.WithCancel.WithValue(type pprof.labelContextKey, val {"graceful-lifecycle":"with-hammer"}) 0xc005242640 false}

and

023/04/06 20:42:17 [642f2a33-25] router: slow      GET /api/packages/Everix/nuget/package/[REDACTED]/index.json for 10.42.4.5:40046, elapsed 3295.6ms @ nuget/nuget.go:335(nuget.EnumeratePackageVersionsV3)
Just small update: I've deployed suggested bitnami PG HA chart. Spend quite some time, but seems had no help to original issue. However, doing further readings like: https://github.com/go-gitea/gitea/issues/8273 and https://github.com/go-gitea/gitea/issues/8540 I've ended solving the issue with: ``` MAX_OPEN_CONNS: 20 MAX_IDLE_CONNS: 20 CONN_MAX_LIFE_TIME: 5m ``` Number 20 I've put by try and test method. 50 for example was not helping. I still have hundreds of errors like: ``` 2023/04/06 20:53:31 ...les/cache/context.go:62:GetContextData() [W] [642f314a-5] cannot get cache context when getting data: &{context.Background.WithCancel.WithCancel.WithValue(type pprof.labelContextKey, val {"graceful-lifecycle":"with-hammer"}) 0xc005242640 false} ``` and ``` 023/04/06 20:42:17 [642f2a33-25] router: slow GET /api/packages/Everix/nuget/package/[REDACTED]/index.json for 10.42.4.5:40046, elapsed 3295.6ms @ nuget/nuget.go:335(nuget.EnumeratePackageVersionsV3) ```
pat-s commented 2023-04-07 10:57:38 +00:00 (Migrated from gitea.com)

@exones OK, that might be - as said, I am not a DB shenanigan :) So apologies for being wrong here.

It's all our hypotheses but it really looks like Gitea is opening a lot of connections and very quickly hits the limit.

Could be - but again, I cannot make a comment about it. This sounds like an issue for Gitea core then.
The helm chart is just providing a skeleton for the installation - when it comes to configuring application-side settings, everything can be done in values.yml.

@serrnovik Already found out how to do so it seems! I still wonder though what is different in your setup that you're encountering these issues which we haven't had reported here yet.

For example, I am administrating a medium-sized instance with a managed PG DB and haven't had any issues yet or the need to limit the DB connections from within Gitea.

Maybe there's indeed room for improvement for the defaults or the general approach of connection pooling in Gitea then - contributions welcome!

@exones OK, that might be - as said, I am not a DB shenanigan :) So apologies for being wrong here. > It's all our hypotheses but it really looks like Gitea is opening a lot of connections and very quickly hits the limit. Could be - but again, I cannot make a comment about it. This sounds like an issue for Gitea core then. The helm chart is just providing a skeleton for the installation - when it comes to configuring application-side settings, everything can be done in `values.yml`. @serrnovik Already found out how to do so it seems! I still wonder though what is different in your setup that you're encountering these issues which we haven't had reported here yet. For example, I am administrating a medium-sized instance with a managed PG DB and haven't had any issues yet or the need to limit the DB connections from within Gitea. Maybe there's indeed room for improvement for the defaults or the general approach of connection pooling in Gitea then - contributions welcome!
pat-s commented 2023-04-23 20:24:30 +00:00 (Migrated from gitea.com)

@serrnovik Are there any new developments regarding your issue?

WRT to the your last comment and the "cache" issue: this part should be unrelated to the DB part as "cache" is handled by memcached or redis and the DB should not be playing a role here.

The second one involves package storage (on disk) and should also be unrelated to any DB action.

@serrnovik Are there any new developments regarding your issue? WRT to the your last comment and the "cache" issue: this part should be unrelated to the DB part as "cache" is handled by memcached or redis and the DB should not be playing a role here. The second one involves package storage (on disk) and should also be unrelated to any DB action.
serrnovik commented 2023-04-23 21:04:55 +00:00 (Migrated from gitea.com)

@pat-s we made it to somewhat working state with blindly finding which MAX_OPEN_CONNS / MAX_IDLE_CONNS / CONN_MAX_LIFE_TIME values are good enough.

We still have router: slow and context.go:62:GetContextData() errors that makes some requests slow or even timeout, but in still serving package (at least after automatic retry(s) by nuget client).

So not good not terrible. We can live with it for now, but still looking for solution.

@pat-s we made it to somewhat working state with blindly finding which `MAX_OPEN_CONNS` / `MAX_IDLE_CONNS` / `CONN_MAX_LIFE_TIME` values are good enough. We still have `router: slow` and `context.go:62:GetContextData()` errors that makes some requests slow or even timeout, but in still serving package (at least after automatic retry(s) by nuget client). So `not good not terrible`. We can live with it for now, but still looking for solution.
pat-s commented 2023-04-24 06:31:59 +00:00 (Migrated from gitea.com)

Thanks. Would be interesting to get down to the root cause of this. Could you share more details of your installation? E.g. PG deployment, file system, cache setup, specific workloads that could potentially trigger these kind of warnings/messages?

Are you still using the PG-HA chart? Did you ever try with a managed DB in the cloud?

Thanks. Would be interesting to get down to the root cause of this. Could you share more details of your installation? E.g. PG deployment, file system, cache setup, specific workloads that could potentially trigger these kind of warnings/messages? Are you still using the PG-HA chart? Did you ever try with a managed DB in the cloud?
serrnovik commented 2023-04-24 13:19:48 +00:00 (Migrated from gitea.com)

In fact, it is as standard as possible. When i started this thread the only non-standard config was ingress and app title. Now it has external HA PG (installed after your suggestion. Now when i spent itme deploying HA PG it is better to use it rather than default non-HA). HA has default bitnami params except for user/passwords.

For gitea chart: I took values.yaml from chart and customized like en PJ. Changes from default are following:

Gitea params (incliding mentioned connection customizations):

    database:
      DB_TYPE: postgres
      HOST: postgresql-ha-pgpool.postgresql-ha.svc.cluster.local
      NAME: gitea
      # https://github.com/go-gitea/gitea/issues/8540#issuecomment-544446872
      MAX_OPEN_CONNS: 20
      MAX_IDLE_CONNS: 20
      CONN_MAX_LIFE_TIME: 5m

HA secrets

  additionalConfigFromEnvs: 
    - name: ENV_TO_INI__DATABASE__PASSWD
      valueFrom:
        secretKeyRef:
          name: gitea-app-ini-postgress-auth
          key: ENV_TO_INI__DATABASE__PASSWD
    - name: ENV_TO_INI__DATABASE__USER
      valueFrom:
        secretKeyRef:
          name: gitea-app-ini-postgress-auth
          key: ENV_TO_INI__DATABASE__USER

Increased volume size:

persistence:
  enabled: true
  existingClaim:
  size: 500Gi

Enabled ingress:

ingress:
  enabled: true
  # className: nginx
  className:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/proxy-body-size: "8m" # Default is 1m. For bigger nuget packages push will fail. See: https://discourse.gitea.io/t/unable-to-push-to-repo-due-to-rpc-failed-http-413-error/2630/4
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
  hosts:
    - host: gitea.MYDOMAIN.io
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: MYCERTNAME-full-cert
      hosts:
        - gitea.MYDOMAIN.io

As you can see in this section, later i've also increased max package size nginx.ingress.kubernetes.io/proxy-body-size: "8m" , but this is unrelated to original problem. We have one package that is slightly higher default 1m.

In fact, it is as standard as possible. When i started this thread the only non-standard config was ingress and app title. Now it has external HA PG (installed after your suggestion. Now when i spent itme deploying HA PG it is better to use it rather than default non-HA). HA has default bitnami params except for user/passwords. For gitea chart: I took values.yaml from chart and customized like en PJ. Changes from default are following: Gitea params (incliding mentioned connection customizations): ```yaml database: DB_TYPE: postgres HOST: postgresql-ha-pgpool.postgresql-ha.svc.cluster.local NAME: gitea # https://github.com/go-gitea/gitea/issues/8540#issuecomment-544446872 MAX_OPEN_CONNS: 20 MAX_IDLE_CONNS: 20 CONN_MAX_LIFE_TIME: 5m ``` HA secrets ```yaml additionalConfigFromEnvs: - name: ENV_TO_INI__DATABASE__PASSWD valueFrom: secretKeyRef: name: gitea-app-ini-postgress-auth key: ENV_TO_INI__DATABASE__PASSWD - name: ENV_TO_INI__DATABASE__USER valueFrom: secretKeyRef: name: gitea-app-ini-postgress-auth key: ENV_TO_INI__DATABASE__USER ``` Increased volume size: ```yaml persistence: enabled: true existingClaim: size: 500Gi ``` Enabled ingress: ```yaml ingress: enabled: true # className: nginx className: annotations: kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/proxy-body-size: "8m" # Default is 1m. For bigger nuget packages push will fail. See: https://discourse.gitea.io/t/unable-to-push-to-repo-due-to-rpc-failed-http-413-error/2630/4 nginx.ingress.kubernetes.io/proxy-read-timeout: "600" nginx.ingress.kubernetes.io/proxy-send-timeout: "600" hosts: - host: gitea.MYDOMAIN.io paths: - path: / pathType: Prefix tls: - secretName: MYCERTNAME-full-cert hosts: - gitea.MYDOMAIN.io ``` As you can see in this section, later i've also increased max package size `nginx.ingress.kubernetes.io/proxy-body-size: "8m" `, but this is unrelated to original problem. We have one package that is slightly higher default 1m.
pat-s commented 2023-05-02 21:14:58 +00:00 (Migrated from gitea.com)

Thanks. So just trying to wrap my head around it why you might be getting the error in the first place:

  1. Without any value set Gitea opens too many connections to the DB at once which causes an error (probably because you're doing many things in parallel?)
  2. Limiting helps but a reasonable value is needed for caching purposes etc.
  3. The absolute number of connections is defined by the DB type and size

By default Gitea sets no limit (https://docs.gitea.io/en-us/administration/config-cheat-sheet/#database-database) and keeps connections open forever with PG (not so with MYSQL).

https://github.com/go-gitea/gitea/issues/8540#issuecomment-544189211 mentions the following setting to work well

MAX_OPEN_CONNS = 5
MAX_IDLE_CONNS = 5
CONN_MAX_LIFE_TIME = 5m

Did you try these value or only 20 as in the your last comment? Just wondering even though I don't think there will be much difference for you. The important part is to set a limit in the first place.

From a chart perspective I don't think we should set custom defaults here. But it might be worth discussing the overall Gitea defaults as described above (no limit and no expiration).

Thank you for all the insights so far!

Thanks. So just trying to wrap my head around it why you might be getting the error in the first place: 1. Without any value set Gitea opens too many connections to the DB at once which causes an error (probably because you're doing many things in parallel?) 2. Limiting helps but a reasonable value is needed for caching purposes etc. 3. The absolute number of connections is defined by the DB type and size By default Gitea sets no limit (https://docs.gitea.io/en-us/administration/config-cheat-sheet/#database-database) and keeps connections open forever with PG (not so with MYSQL). https://github.com/go-gitea/gitea/issues/8540#issuecomment-544189211 mentions the following setting to work well ``` MAX_OPEN_CONNS = 5 MAX_IDLE_CONNS = 5 CONN_MAX_LIFE_TIME = 5m ``` Did you try these value or only `20` as in the your last comment? Just wondering even though I don't think there will be much difference for you. The important part is to set a limit in the first place. From a chart perspective I don't think we should set custom defaults here. But it might be worth discussing the overall Gitea defaults as described above (no limit and no expiration). Thank you for all the insights so far!
serrnovik commented 2023-05-03 09:33:26 +00:00 (Migrated from gitea.com)

@pat-s yes, i first tried this option (5,5,5m) as it was suggested by this discussion. Then with binary search aproach I found value that is higher but not causing DB issues.

@pat-s yes, i first tried this option (5,5,5m) as it was suggested by [this](https://github.com/go-gitea/gitea/issues/8540#issuecomment-544189211) discussion. Then with binary search aproach I found value that is higher but not causing DB issues.
pat-s commented 2023-05-04 07:16:30 +00:00 (Migrated from gitea.com)

Thanks! I've started an internal discussion about defining a default limit in contrast to not set a limit.

Besides, it seems that a lot of (new) connections are being opened and in use instead of old ones being reused (which should prevent hitting the DB instance limit). It might also be worth checking this on your side, i.e. what is happening, i.e. why Gitea is opening so many connections. It seems like what you do is quite above average.

While the limits you've set might prevent the PG errors, they might also limit your tasks and performance to some degree.

I'll close here now as the issue is within Gitea core and not related to the helm chart. Thanks again for your responsiveness and all information provided!

Thanks! I've started an internal discussion about defining a default limit in contrast to not set a limit. Besides, it seems that a lot of (new) connections are being opened and in use instead of old ones being reused (which should prevent hitting the DB instance limit). It might also be worth checking this on your side, i.e. what is happening, i.e. why Gitea is opening so many connections. It seems like what you do is quite above average. While the limits you've set might prevent the PG errors, they might also limit your tasks and performance to some degree. I'll close here now as the issue is within Gitea core and not related to the helm chart. Thanks again for your responsiveness and all information provided!
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: lunny/helm-chart#431
No description provided.