Gitea-HA multiple replica support #205
Reference in New Issue
Block a user
No description provided.
Delete Branch "gitea-ha"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR provides support to run Gitea with multiple replicas out of the box.
You will need to:
This PR also adds postgresql-ha and redis-cluster as dependency to provide real HA support.
Fixes: #206
Also moved the gitea.database.builtIn and gitea.cache.builtIn section into the dependencies directly
What happens when memcached fails?
You can remove deprecated
ISSUE_INDEXER_QUEUE_TYPE
andISSUE_INDEXER_QUEUE_CONN_STR
. You anly neednot sure how to properly quote
queue.issue_indexer
section.I think you also need shared session configuration (redis or db)
Thank you for your Input :)
I've already played around with shared sessions, however I ran into some Issues and forgot about it :D
Will try again and adjust the rest.
Session now uses redis, had it configured with ingress previously.
Removed deprecated config for issue_indexer queue
Share sessions between instances, see other comment
Regarding memcached, when running HA I'd currently recommend redis instead of memcached.
Thats what i'm currently do.
I also have a elastic search cluster for indexing.
Is it better to use
db
?To be honest, I currently do not know which one will perform better. I plan to do some stress tests on the gitea cluster and document the results.
I choose db, because I already got postgresql-ha running, so it was the easiest to do :D
I'm thinking about adding elasticsearch as a dependency for indexing, but I'm worried, that we eventually will have too many dependencies.
hmm still having issues with the queue.issue_indexer :/
edit: ->solved :)
This will be a breaking change, since i moved the following values
gitea.database.builtIn.mariadb.enabled -> mariadb.enabled
gitea.database.builtIn.postgresql.enabled -> postgresql.enabled
gitea.database.builtIn.mysql.enabled -> mysql.enabled
gitea.cache.builtIn.enabled -> memcached.enabled
This PR is ready if anyone wants to test it :)
What is the easiest way to test this build with terraform? I cannot find an easy way to use a helm chart from a branch.
Edit: testing with
minikube
is probably easiest. Added https://gitea.com/gitea/helm-chart/pulls/228 for instructions.Please excuse for asking some possibly stupid questions and executing potentially questionable steps - relatively new to k8s replica deployments.
With this branch deployed, I see the following pods
I see replicas for redis but not for gitea - is
gitea-memcached
the second one here?Also I see no replicaton for the postgres pod - is this expected?
Next, for
gitea-redis-replicas-0
I saw the following during startupHow could I effectively test HA works? I deleted the
gitea-0
pod and lost the connection as port 3000 was gone (not sure if this a good test since I usedkubectl port-forward
).Sorry, I forgot to mention this i the initial message. Youll also need to disable memcached if you use redis. I should probably add a check here, if either one of them is enabled.
Have you set replicaCount > 1 for gitea?
I think there are still some configurations absent include
queue
. I list them below:"memory", "file", "redis", "mysql", "postgres", "couchbase", "memcache", "db"
. And the default value ismemory
, I recommand to usedb
orredis
. If useredis
,PROVIDER_CONFIG
is needed."memory", "redis", "memcache"
, default cache ismemory
. And for ha mode,redis
ormemcache
is suitable.bleve, db or elasticsearch
, the default isbleve
. It should be changed todb
orelasticsearch
.bleve
orelasticsearch
, the default isbleve
. It should beelasticsearch
for HA mode.attachments
,avatars
,lfs
,archives
and so on. The default is local file system. It's better to store them to aminio
instance.@lunny Sorry, it took me quite a while to recover and get back to work. However
Session:
This is already included, I set redis as default including provider config, if
replicaCount > 1 and (recently added) if no session config was provided by the user.
Cache:
Also already taken care of automatically, if no user config is provided for the cache:
Indexer:
Same as cache
I will check on the storage part :)
I will also need to check some parts of the config generation.
For indexers, there are issue indexer and code indexer. I think maybe you missed the last one.
@lunny regarding the storage. I think it is nice to have a minio instance for storage. However we already have a lot of dependencies in the chart, like postgresql, redis, mysql, mariadb, etc. I fear, that if we keep adding dependencies the chart will get too complicated to use. For example, the Gitlab Helm Chart also has lots and lots of dependencies and it is really horrific to use.
HA already works fine when using a RWX PVC, I think it might be better to not directly include minio.
Not blocking the Idea of minio as a dependency, but I want to discuss this :)
Same goes for the code indexer.
If I understood correctly, the code indexing feature is optional.
REPO_INDEXER_ENABLED -> false by default
I wouldn't want to add elasticsearch as a dependency, same reason as stated above :)
We should add a description which makes clear, that if HA is used, the REPO_INDEXER should be set to elasticsearch.
But if code indexer set to true, and it's a default value
bleve
. The second gitea instance maybe hang.For storage, if it's a RWX PVC, That's no problem.
I'm wondering if Gitea could handle
PodDisruptionBudgets
as to replace one pod at a time to keep Gitea up?Is this too risky from application perspective regarding data(base) consistency?
In case some testing/help is needed, LMK :)
@luhahn @justusbunsi
I'd be interesting in testing/pushing this forward. I could start by merging
master
first and resolving conflicts, then continue with some testing/feedback.I will hopefully continue next week, the last time i checked, i had some issues.
As per discord discussion, running multiple Gitea instances on the same database and storage requires some sort of internal leader election. Without the instances being aware of each others everyone would execute cron tasks. This can lead to unexpected inconsistencies.
Just realized that when running multiple replicas, all init containers would run x times. But this might be a problem. Think of upgrading Gitea. One of the init container executes a
gitea migrate
. We wouldn't want to have this action done twice or in parallel.#332
I maybe see those issues sometimes now.
symtoms:
Since now we have an option to disable migration when start, maybe we can let one instance to run migration but others not?
That could actually be a way into the right direction. Can you share the PR reference that implemented this?
I think it is important for using that functionality, to move away from the complex structure of the init containers. We cannot tell which replica shall act as leader and run the migration. All replicas are identical regarding configuration.
app.ini creation and other preparation must be done once for all replicas. They share those files.
Another problem is that we cannot start a new Gitea version along with the old one to perform a rollover. I currently see no way to achieve it without a downtime. The database must be migrated in order to run the newer Gitea version which would break older running versions. Migrate is required for major releases afaik, maybe for patch release too.
An option would be to first scale down all non-leader replicas, then replace the leader with the new version, let it handle the migration, scale non-leader replicas up with new Gitea version. To achieve that, we have to know which instance is the leader instance.
It's https://github.com/go-gitea/gitea/pull/22053 . I also think maybe it's the only way to upgrade.
Re:memcached
The bitnami chart says one can set
architecture: "high-availability"
and everything should be handled behind the scences.That could bring everything down to the following (aiming for minimal changes to the helm chart):
memcached
in HA modebleve
or useelastisearch
or similarIn addition we could try to use a deployment instead of a statefulset for the gitea pods to avoid creating replicated PVs for each replica (see discussion in #426).
I will go ahead and create a new branch with some required changes and test the above config in a toy cluster.
Tried it, doesn't work. Doesn't matter it
memcached
itself is running in HA, the operations in the back don't work without issues.I'll close here in favor of #437 which has almost everything of this PR + more.
Pull request closed