Postgres DB issues: too many clients already
#431
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hello,
With our helm installation:
I'm observing crash on nuget restore when several build agents of our CI (3) try to work at the same time. Frequently seeing crashes.
After examining pods logs noticed the following errors:
In postgresql:
in gitea pod:
And on client side (build agent):
Customized in the following way (should not affect the issue), tried to reduce logging:
All the other values are identical to the values in the repository.
I tried to push/get from owner as user and as organization to exclude possible rights issues, result is the same.
Errors are frequent but random. Mostly happens on project with large number of dependencies.
Attached logs of gitea and postgre
Thank you in advance
Looks like you're PG is throwing errors due to too many connections at the same time. The PG chart included is not HA capable by default.
I guess you need a HA-capable PG. We highly recommend to use an external, (managed) DB on a Cloud provider of your choice. You can also try https://bitnami.com/stack/postgresql-ha.
Thank you for quick response,
I'll try to test with separate PG. However our workload in not huge either. Previously we used basic baget with single docker/maria db and it worked just fine with the same setup/projects.
May it suggest that there is an issue with gitea (or it's configuration) connection pooling? It seems that for each package request it creates separate connection.
It's not about the size of the workload in these situations but only about the parallel access.
I am not a DB expert and this is just a guess of mine reading the logs - I could be wrong.
I don't think though it's an issue of Gitea, the pooling is done and handled on the DB side (but I might also be wrong here). You are of course welcome to clarify this in Gitea core.
We haven't had such a case lately/ever AFAIR, so there must be something special in your setup which causes this (which I can't spot from here).
Hello @pat-s !
I'm working with @serrnovik
AFAIK, actually DB Pooling is on the DB's client side (in this case gitea is a client of PG): it's like a cache of connections that you reuse. E.g. see this answer https://stackoverflow.com/a/4041136 or just the wiki article on connection pooling. In our case if the connections were pooled gitea wouldn't create hundreds of connections but rather just would reuse those already open in the pool.
Connection pooling usually can be set up in the connection string to the database (and it's managed by the client's DB driver, not the DB server).
It's all our hypotheses but it really looks like Gitea is opening a lot of connections and very quickly hits the limit.
Without closer looking at the code I can't say more for now.
Regards
Just small update:
I've deployed suggested bitnami PG HA chart. Spend quite some time, but seems had no help to original issue.
However, doing further readings like:
https://github.com/go-gitea/gitea/issues/8273 and https://github.com/go-gitea/gitea/issues/8540
I've ended solving the issue with:
Number 20 I've put by try and test method. 50 for example was not helping.
I still have hundreds of errors like:
and
@exones OK, that might be - as said, I am not a DB shenanigan :) So apologies for being wrong here.
Could be - but again, I cannot make a comment about it. This sounds like an issue for Gitea core then.
The helm chart is just providing a skeleton for the installation - when it comes to configuring application-side settings, everything can be done in
values.yml
.@serrnovik Already found out how to do so it seems! I still wonder though what is different in your setup that you're encountering these issues which we haven't had reported here yet.
For example, I am administrating a medium-sized instance with a managed PG DB and haven't had any issues yet or the need to limit the DB connections from within Gitea.
Maybe there's indeed room for improvement for the defaults or the general approach of connection pooling in Gitea then - contributions welcome!
@serrnovik Are there any new developments regarding your issue?
WRT to the your last comment and the "cache" issue: this part should be unrelated to the DB part as "cache" is handled by memcached or redis and the DB should not be playing a role here.
The second one involves package storage (on disk) and should also be unrelated to any DB action.
@pat-s we made it to somewhat working state with blindly finding which
MAX_OPEN_CONNS
/MAX_IDLE_CONNS
/CONN_MAX_LIFE_TIME
values are good enough.We still have
router: slow
andcontext.go:62:GetContextData()
errors that makes some requests slow or even timeout, but in still serving package (at least after automatic retry(s) by nuget client).So
not good not terrible
. We can live with it for now, but still looking for solution.Thanks. Would be interesting to get down to the root cause of this. Could you share more details of your installation? E.g. PG deployment, file system, cache setup, specific workloads that could potentially trigger these kind of warnings/messages?
Are you still using the PG-HA chart? Did you ever try with a managed DB in the cloud?
In fact, it is as standard as possible. When i started this thread the only non-standard config was ingress and app title. Now it has external HA PG (installed after your suggestion. Now when i spent itme deploying HA PG it is better to use it rather than default non-HA). HA has default bitnami params except for user/passwords.
For gitea chart: I took values.yaml from chart and customized like en PJ. Changes from default are following:
Gitea params (incliding mentioned connection customizations):
HA secrets
Increased volume size:
Enabled ingress:
As you can see in this section, later i've also increased max package size
nginx.ingress.kubernetes.io/proxy-body-size: "8m"
, but this is unrelated to original problem. We have one package that is slightly higher default 1m.Thanks. So just trying to wrap my head around it why you might be getting the error in the first place:
By default Gitea sets no limit (https://docs.gitea.io/en-us/administration/config-cheat-sheet/#database-database) and keeps connections open forever with PG (not so with MYSQL).
https://github.com/go-gitea/gitea/issues/8540#issuecomment-544189211 mentions the following setting to work well
Did you try these value or only
20
as in the your last comment? Just wondering even though I don't think there will be much difference for you. The important part is to set a limit in the first place.From a chart perspective I don't think we should set custom defaults here. But it might be worth discussing the overall Gitea defaults as described above (no limit and no expiration).
Thank you for all the insights so far!
@pat-s yes, i first tried this option (5,5,5m) as it was suggested by this discussion. Then with binary search aproach I found value that is higher but not causing DB issues.
Thanks! I've started an internal discussion about defining a default limit in contrast to not set a limit.
Besides, it seems that a lot of (new) connections are being opened and in use instead of old ones being reused (which should prevent hitting the DB instance limit). It might also be worth checking this on your side, i.e. what is happening, i.e. why Gitea is opening so many connections. It seems like what you do is quite above average.
While the limits you've set might prevent the PG errors, they might also limit your tasks and performance to some degree.
I'll close here now as the issue is within Gitea core and not related to the helm chart. Thanks again for your responsiveness and all information provided!