helm-chart/docs/ha-setup.md

178 lines
8.1 KiB
Markdown
Raw Permalink Normal View History

[Breaking] Add HA-support; switch to `Deployment` (#437) # Changes A big shoutout to @luhahn for all his work in #205 which served as the base for this PR. ## Documentation - [x] After thinking for some time about it, I still prefer the distinct option (as started in #350), i.e. having a standalone "HA" doc under `docs/ha-setup.md` to not have a very long README (which is already quite long). Most of the information below should go into it with more details and explanations behind all of the individual components. ## Chart deps ~~- Adds `meilisearch` as a chart dependency for a HA-ready issue indexer. Only works with >= Gitea 1.20~~ ~~- Adds `redis` as a chart dependency for a HA-ready session and queue store.~~ - Adds `redis-cluster` as a chart dependency for a HA-ready session and queue store (alternative to `redis`). Only works with >= Gitea 1.19.2. - Removes `memcached` instead of `redis-cluster` - Add `postgresql-ha` as default DB dep in favor of `postgres` ## Adds smart HA chart logic The goal is to set smart config values that result in a HA-ready Gitea deployment if `replicaCount` > 1. - If `replicaCount` > 1, - `gitea.config.session.PROVIDER` is automatically set to `redis-cluster` - `gitea.config.indexer.REPO_INDEXER_ENABLED` is automatically set to `false` unless the value is `elasticsearch` or `meilisearch` - `redis-cluster` is used for `[queue]` and `[cache]` and `[session]`mode or not Configuration of external instances of `meilisearch` and `minio` are documented in a new markdown doc. ## Deployment vs Statefulset Given all the discussions about this lately (#428), I think we could use both. In the end, we do not have the requirement for a sequential pod scale up/scale down as it would happen in statefulsets. On the other side, we do not have actual stateless pods as we are attaching a RWX to the deployment. Yet I think because we do not have a leader-election requirement, spawning the pods as a deployment makes "Rolling Updates" easier and also signals users that there is no "leader election" logic and each pod can just be "destroyed" at anytime without causing interruption. Hence I think we should be able to switch from a statefulset to a deployment, even in the single-replica case. This change also brought up a templating/linting issue: the definition of `.Values.gitea.config.server.SSH_LISTEN_PORT` in `ssh-svc.yaml` just "luckily" worked so far due to naming-related lint processing. Due to the change from "statefulset" to "deployment", the processing queue changed and caused a failure complaining about `config.server.SSH_LISTEN_PORT` not being defined yet. The only way I could see to fix this was to "properly" define the value in `values.yaml` instead of conditionally definining it in `helpers.tpl`. Maybe there's a better way? ## Chart PVC Creation I've adapted the automated PVC creation from another chart to be able to provide the `storageClassName` as I couldn't get dynamic provisioning for EFS going with the current implementation. In addition the naming and approach within the Gitea chart for PV creation is a bit unusual and aligning it might be beneficial. A semi-unrelated change which will result in a breaking change for existing users but this PR includes a lot of breaking changes already, so including another one might not make it much worse... - New `persistence.mount`: whether to mount an existing PVC (via `persistence.existingClaim` - New `persistence.create`: whether to create a new PVC ## Testing As this PR does a lot of things, we need proper testing. The helm chart can be installed from the Git branch via `helm-git` as follows: ``` helm repo add gitea-charts git+https://gitea.com/gitea/helm-chart@/?ref=deployment helm install gitea --version 0.0.0 ``` It is **highly recommended** to test the chart in a dedicated namespace. I've tested this myself with both `redis` and `redis-cluster` and it seemed to work fine. I just did some basic operations though and we should do more niche testing before merging. Examplary `values.yml` for testing (only needs a valid RWX storage class): <details> <summary>values.yaml</summary> ```yml image: tag: "dev" PullPolicy: "Always" rootless: true replicaCount: 2 persistence: enabled: true accessModes: - ReadWriteMany storageClass: FIXME redis-cluster: enabled: false global: redis: password: gitea gitea: config: indexer: ISSUE_INDEXER_ENABLED: true REPO_INDEXER_ENABLED: false ``` </details> ## Preferred setup The preferred HA setup with respect to performance and stability might currently be as follows: - Repos: RWX (e.g. EFS or Azurefiles NFS) - Issue indexer: Meilisearch (HA) - Session and cache: Redis Cluster (HA) - Attachments/Avatars: Minio (HA) This will result in a ~ 10-pod HA setup overall. All pods have very low resource requests. fix #98 Co-authored-by: pat-s <pat-s@noreply.gitea.io> Reviewed-on: https://gitea.com/gitea/helm-chart/pulls/437 Co-authored-by: pat-s <patrick.schratz@gmail.com> Co-committed-by: pat-s <patrick.schratz@gmail.com>
2023-07-17 19:09:42 +00:00
# High Availability
All components (in-memory DB, volume/asset storage, code indexer) used by Gitea must be deployed in a HA-ready fashion to achieve a full HA-ready Gitea deployment.
The following document explains how to achieve this for all individual components.
The resulting Gitea deployment will consist of ~ 10 pods (depending on the chosen components and their replicas).
One should evaluate upfront whether a HA-deployment is required as switching between HA/non-HA comes with some effort.
For production instances, HA is always recommended to increase uptime and have a frictionless update process.
A general comment about chart dependencies and external services:
Instead of relying on chart dependencies, it is often better to rely on an external, (managed) instances (in-memory database, asset storage provider, database, etc.).
Many cloud providers offer such services, at least for databases or in-memory databases.
They might cost a bit more than using a self-hosted k8s variant but are usually easier to maintain and scale, if needed.
Also they can be centrally managed and are not linked to the Gitea helm chart or namespace.
Please consider using external services before you start with your Gitea HA setup, it will make your life (and the life of the Gitea maintainers) easier.
This helm chart tries to help as much as possible to simplify and assert the provisioning of a HA-ready Gitea instance by implementing smart conditionals if `replicaCount` is set to a value > 1.
Nevertheless, we cannot guarantee for every possible combination of Gitea settings to work together perfectly in a HA setup.
As a general advice, we recommend to have a test environment aside on which to test possible changes/upgrades before applying these to a production installation.
## Requirements for HA
Storage-wise, the HA-Gitea setup requires a RWX file-system which can be shared among the deployment-based replica pods.
In addition, the following components are required for full HA-readiness:
- A HA-ready issue (and optionally code) indexer: `elasticsearch` or `meilisearch`
- A HA-ready external object/asset storage (`minio`) (optional, assets can also be stored on the RWX file-system)
- A HA-ready cache (`redis-cluster`)
- A HA-ready DB
`postgres.enabled`, which default to `true`, must be set to `false` for a HA setup.
The default `postgres` chart dependency is not HA-ready (there's a dedicated `postgres-ha` chart).
The following sections discuss each of the components in more detail.
Note that for each component discussed, the shown configurations only provides a (working) starting point, not necessarily the most optimal setup.
We try to optimize this document over time as we have gained more experience with HA setups from users.
## Indexers (Issues and code/repo)
The default code indexer `bleve` is not able to allow multiple connections and hence cannot be used in a HA setup.
Alternatives are `elasticsearch` and `meilisearch` (as of >= 1.19.2).
Unless you have an existing `elasticsearch` cluster, we recommend using `meilisearch` as it is faster and requires way less resources.
Unfortunately, `meilisearch` does only support the `ISSUE_INDEXER` and not the `REPO_INDEXER` yet ([tracking issue](https://github.com/go-gitea/gitea/pull/24149)).
This means that the `REPO_INDEXER` must still be disabled for a HA setup right now.
An alternative to the two options above for the `ISSUE_INDEXER` is `"db"`, however we recommend to just go with `meilisearch` in this case and to not bother the DB with indexing.
To configure `meilisearch` within Gitea, do the following:
```yml
gitea:
config:
indexer:
ISSUE_INDEXER_CONN_STR: <http://meilisearch.<namespace>.svc.cluster.local:7700>
ISSUE_INDEXER_ENABLED: true
ISSUE_INDEXER_TYPE: meilisearch
REPO_INDEXER_ENABLED: false
# REPO_INDEXER_TYPE: meilisearch # not yet working
```
Unfortunately `meilisearch` cannot be deployed in HA as of now.
Nevertheless it allows for multiple Gitea requests at the same time and is therefore required in a HA setup.
Exemplary configuration for the [meilisearch-kubernetes](https://github.com/meilisearch/meilisearch-kubernetes/tree/main/charts/meilisearch) chart:
```yaml
persistence:
enabled: true
accessMode: ReadWriteOnce
size: 5Gi
```
## Cache, session and queue
A `redis` instance is required for the in-memory cache.
Two options exist:
- `redis`
- `redis-cluster`
The chart provides `redis-cluster` as a dependency as this one can be used for both HA and non-HA setups.
You're also welcome to go with `redis` if you prefer or already have a running instance.
It should be noted that `redis-cluster` support is only available starting with Gitea 1.19.2.
You can also configure an external (managed) `redis` instance to be used.
To do so, you need to set the following configuration values yourself:
- `gitea.config.queue.TYPE`: redis`
- `gitea.config.queue.CONN_STR`: `<your redis connection string>`
- `gitea.config.session.PROVIDER`: `redis`
- `gitea.config.session.PROVIDER_CONFIG`: `<your redis connection string>`
- `gitea.config.cache.ENABLED`: `true`
- `gitea.config.cache.ADAPTER`: `redis`
- `gitea.config.cache.HOST`: `<your redis connection string>`
By default, the `redis-cluster` chart provisions three standalone master nodes of which each has a single replica.
To reduce the number of pods for a default Gitea deployment, we opted to omit the replicas (`replicas: 0`) by default.
Only the minimum required number of master pods for a functional `redis-cluster` deployment are provisioned.
For a "proper" `redis-cluster` setup however, we recommend to set `replicas: 1` and `nodes: 6`.
[Breaking] Add HA-support; switch to `Deployment` (#437) # Changes A big shoutout to @luhahn for all his work in #205 which served as the base for this PR. ## Documentation - [x] After thinking for some time about it, I still prefer the distinct option (as started in #350), i.e. having a standalone "HA" doc under `docs/ha-setup.md` to not have a very long README (which is already quite long). Most of the information below should go into it with more details and explanations behind all of the individual components. ## Chart deps ~~- Adds `meilisearch` as a chart dependency for a HA-ready issue indexer. Only works with >= Gitea 1.20~~ ~~- Adds `redis` as a chart dependency for a HA-ready session and queue store.~~ - Adds `redis-cluster` as a chart dependency for a HA-ready session and queue store (alternative to `redis`). Only works with >= Gitea 1.19.2. - Removes `memcached` instead of `redis-cluster` - Add `postgresql-ha` as default DB dep in favor of `postgres` ## Adds smart HA chart logic The goal is to set smart config values that result in a HA-ready Gitea deployment if `replicaCount` > 1. - If `replicaCount` > 1, - `gitea.config.session.PROVIDER` is automatically set to `redis-cluster` - `gitea.config.indexer.REPO_INDEXER_ENABLED` is automatically set to `false` unless the value is `elasticsearch` or `meilisearch` - `redis-cluster` is used for `[queue]` and `[cache]` and `[session]`mode or not Configuration of external instances of `meilisearch` and `minio` are documented in a new markdown doc. ## Deployment vs Statefulset Given all the discussions about this lately (#428), I think we could use both. In the end, we do not have the requirement for a sequential pod scale up/scale down as it would happen in statefulsets. On the other side, we do not have actual stateless pods as we are attaching a RWX to the deployment. Yet I think because we do not have a leader-election requirement, spawning the pods as a deployment makes "Rolling Updates" easier and also signals users that there is no "leader election" logic and each pod can just be "destroyed" at anytime without causing interruption. Hence I think we should be able to switch from a statefulset to a deployment, even in the single-replica case. This change also brought up a templating/linting issue: the definition of `.Values.gitea.config.server.SSH_LISTEN_PORT` in `ssh-svc.yaml` just "luckily" worked so far due to naming-related lint processing. Due to the change from "statefulset" to "deployment", the processing queue changed and caused a failure complaining about `config.server.SSH_LISTEN_PORT` not being defined yet. The only way I could see to fix this was to "properly" define the value in `values.yaml` instead of conditionally definining it in `helpers.tpl`. Maybe there's a better way? ## Chart PVC Creation I've adapted the automated PVC creation from another chart to be able to provide the `storageClassName` as I couldn't get dynamic provisioning for EFS going with the current implementation. In addition the naming and approach within the Gitea chart for PV creation is a bit unusual and aligning it might be beneficial. A semi-unrelated change which will result in a breaking change for existing users but this PR includes a lot of breaking changes already, so including another one might not make it much worse... - New `persistence.mount`: whether to mount an existing PVC (via `persistence.existingClaim` - New `persistence.create`: whether to create a new PVC ## Testing As this PR does a lot of things, we need proper testing. The helm chart can be installed from the Git branch via `helm-git` as follows: ``` helm repo add gitea-charts git+https://gitea.com/gitea/helm-chart@/?ref=deployment helm install gitea --version 0.0.0 ``` It is **highly recommended** to test the chart in a dedicated namespace. I've tested this myself with both `redis` and `redis-cluster` and it seemed to work fine. I just did some basic operations though and we should do more niche testing before merging. Examplary `values.yml` for testing (only needs a valid RWX storage class): <details> <summary>values.yaml</summary> ```yml image: tag: "dev" PullPolicy: "Always" rootless: true replicaCount: 2 persistence: enabled: true accessModes: - ReadWriteMany storageClass: FIXME redis-cluster: enabled: false global: redis: password: gitea gitea: config: indexer: ISSUE_INDEXER_ENABLED: true REPO_INDEXER_ENABLED: false ``` </details> ## Preferred setup The preferred HA setup with respect to performance and stability might currently be as follows: - Repos: RWX (e.g. EFS or Azurefiles NFS) - Issue indexer: Meilisearch (HA) - Session and cache: Redis Cluster (HA) - Attachments/Avatars: Minio (HA) This will result in a ~ 10-pod HA setup overall. All pods have very low resource requests. fix #98 Co-authored-by: pat-s <pat-s@noreply.gitea.io> Reviewed-on: https://gitea.com/gitea/helm-chart/pulls/437 Co-authored-by: pat-s <patrick.schratz@gmail.com> Co-committed-by: pat-s <patrick.schratz@gmail.com>
2023-07-17 19:09:42 +00:00
## Object and asset storage
Object/asset storage refers to the storage of attachments, avatars, LFS files, etc.
While most of these can be stored on the RWX file-system, it is recommended to use an external S3-compatible object storage for such, mainly for performance reasons.
By default the chart provisions a single RWO volume to store everything (repos, avatars, packages, etc.).
This volume cannot be mounted by multiple pods.
Hence, a RWX volume is required and (optionally) an external HA-ready object storage.
> **Note:** Double-check that the file permissions are set correctly on the RWX volume! That is everything should be owned by the `git` user which usually has `uid=1000` and `gid=1000`.
To use `minio` you need to deploy and configure an external `minio` instance yourself and explicitly define the `STORAGE_TYPE` values as shown below.
Note that `MINIO_BUCKET` here is just a name and does not refer to a S3 bucket.
It's the root access point for all objects belonging to the respective application, i.e., to Gitea in this case.
```yaml
gitea:
config:
attachment:
STORAGE_TYPE: minio
lfs:
STORAGE_TYPE: minio
picture:
AVATAR_STORAGE_TYPE: minio
"storage.packages":
STORAGE_TYPE: minio
storage:
MINIO_ENDPOINT: <minio-headless.<namespace>.svc.cluster.local:9000>
MINIO_LOCATION: <location>
MINIO_ACCESS_KEY_ID: <access key>
MINIO_SECRET_ACCESS_KEY: <secret key>
MINIO_BUCKET: <bucket name>
MINIO_USE_SSL: false
```
Exemplary configuration for the [bitnami minio](https://github.com/bitnami/charts/blob/main/bitnami/minio) chart:
```yaml
auth:
rootUser: minio
mode: distributed
replicaCount: 4
persistence:
enabled: true
size: 20Gi
accessModes:
- ReadWriteOnce
```
## Database
If you do not have an HA-ready DB, using a managed database service in the cloud might be the easiest and most robust solution.
Remember: disable the built-in `postgres` dependency and configure the database connection manually via `gitea.config.database`:
```yml
gitea:
database:
builtIn:
postgresql:
enabled: false
config:
database:
DB_TYPE: postgres
HOST: <host>
NAME: <name>
USER: <user>
```
## Known issues
- Currently Cron jobs are run on all replicas as no leader election is implemented.
See [https://github.com/go-gitea/gitea/issues/13791](https://github.com/go-gitea/gitea/issues/13791) for a discussion and possible solution.
- Running with multiple replicas slows down Gitea a bit, i.e. page loading time increases.