Create ScalingSchedule collector

This commit adds two new collectors to the adapter:
- ClusterScalingScheduleCollector; and
- ScalingScheduleCollector

Also, it introduces the required collectors plugins, initialization
logic in the server startup, documentation and deployment example
(including the helm chart). A new config flag is created,
`-scaling-schedule`, and allows to enable and to disable the collection
of such metrics. It's disabled by default.

This collectors are the required logic to utilise the CRDs introduced in
the #284 pull request. It makes use of the kubernetes go-client
implementations of a [Store][0] and [Reflector][1].

[0]: https://pkg.go.dev/k8s.io/client-go/tools/cache#Store
[1]: https://pkg.go.dev/k8s.io/client-go/tools/cache#Reflector

Signed-off-by: Jonathan Juares Beber <jonathanbeber@gmail.com>
This commit is contained in:
Jonathan Juares Beber 2021-05-21 09:00:39 +02:00
parent 7a68304389
commit a382dbfe7b
No known key found for this signature in database
GPG Key ID: 41D3F4ACE4465751
17 changed files with 1838 additions and 5 deletions

View File

@ -1,6 +1,8 @@
FROM registry.opensource.zalan.do/library/alpine-3.12:latest
LABEL maintainer="Team Teapot @ Zalando SE <team-teapot@zalando.de>"
RUN apk add --no-cache tzdata
# add binary
ADD build/linux/kube-metrics-adapter /

118
README.md
View File

@ -671,3 +671,121 @@ metric-config.<metricType>.<metricName>.<collectorType>/interval: "30s"
The default is `60s` but can be reduced to let the adapter collect metrics more
often.
## ScalingSchedule Collectors
The `ScalingSchedule` and `ClusterScalingSchedule` collectors allow
collecting time-based metrics from the respective CRD objects specified
in the HPA.
### Supported metrics
| Metric | Description | Type | K8s Versions |
| ---------- | -------------- | ------- | -- |
| ObjectName | The metric is calculated and stored for each `ScalingSchedule` and `ClusterScalingSchedule` referenced in the HPAs | `ScalingSchedule` and `ClusterScalingSchedule` | `>=1.16` |
### Example
This is an example of using the ScalingSchedule collectors to collect
metrics from a deployed kind of the CRD. First, the schedule object:
```yaml
apiVersion: zalando.org/v1
kind: ClusterScalingSchedule
metadata:
name: "scheduling-event"
spec:
schedules:
- type: OneTime
date: "2021-10-02T08:08:08+02:00"
durationMinutes: 30
value: 100
- type: Repeating
durationMinutes: 10
value: 120
period:
startTime: "15:45"
timezone: "Europe/Berlin"
days:
- Mon
- Wed
- Fri
```
This resource defines a scheduling event named `scheduling-event` with
two schedules of the kind `ClusterScalingSchedule`.
`ClusterScalingSchedule` objects aren't namespaced, what means it can be
referenced by any HPA in any namespace in the cluster. `ScalingSchedule`
have the exact same fields and behavior, but can be referenced just by
HPAs in the same namespace. The schedules can have the type `Repeating`
or `OneTime`.
This example configuration will generate the following result: at
`2021-10-02T08:08:08+02:00` for 30 minutes a metric with the value of
100 will be returned. Every Monday, Wednesday and Friday, starting at 15
hours and 45 minutes (Berlin time), a metric with the value of 120 will
be returned for 10 minutes. It's not the case of this example, but if multiple
schedules collide in time, the biggest value is returned.
Check the CRDs definitions
([ScalingSchedule](./docs/scaling_schedules_crd.yaml),
[ClusterScalingSchedule](./docs/cluster_scaling_schedules_crd.yaml)) for
a better understanding of the possible fields and their behavior.
An HPA can reference the deployed `ClusterScalingSchedule` object as
this example:
```yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: "myapp-hpa"
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 15
metrics:
- type: Object
object:
describedObject:
apiVersion: zalando.org/v1
kind: ClusterScalingSchedule
name: "scheduling-event"
metric:
name: "scheduling-event"
target:
type: AverageValue
averageValue: "10"
```
The name of the metric is equal to the name of the referenced object.
The `target.averageValue` in this example is set to 10. This value will
be used by the HPA controller to define the desired number of pods,
based on the metric obtained (check the [HPA algorithm
details](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details)
for more context). This HPA configuration explicitly says that each pod
of this application supports 10 units of the `ClusterScalingSchedule`
metric. Multiple applications can share the same
`ClusterScalingSchedule` or `ScalingSchedule` event and have a different
number of pods based on its `target.averageValue` configuration.
In our specific example at `2021-10-02T08:08:08+02:00` as the metric has
the value 100, this application will scale to 10 pods (100/10). Every
Monday, Wednesday and Friday, starting at 15 hours and 45 minutes
(Berlin time) the application will scale to 12 pods (120/10). Both
scaling up will last at least the configured duration times of the
schedules. After that, regular HPA scale down behavior applies.
Note that these number of pods are just considering these custom
metrics, the normal HPA behavior still applies, such as: in case of
multiple metrics the biggest number of pods is the utilized one, HPA max
and min replica configuration, autoscaling policies, etc.
These collectors are disabled by default, you have to start the server
with the `--scaling-schedule` flag to enable it. Remember to deploy the CRDs
`ScalingSchedule` and `ClusterScalingSchedule` and allow the service
account used by the server to read, watch and list them.

View File

@ -28,6 +28,7 @@ spec:
- --prometheus-server=http://prometheus.kube-system.svc.cluster.local
- --skipper-ingress-metrics
- --aws-external-metrics
- --scaling-schedule
env:
- name: AWS_REGION
value: eu-central-1

View File

@ -1,6 +1,6 @@
apiVersion: v2
name: kube-metrics-adapter
version: 0.1.10
version: 0.1.11
description: kube-metrics-adapter helm chart
home: https://github.com/zalando-incubator/kube-metrics-adapter
maintainers:

View File

@ -0,0 +1,119 @@
{{- if .Values.scalingSchedule.enabled }}
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.5.0
creationTimestamp: null
name: clusterscalingschedules.zalando.org
spec:
group: zalando.org
names:
kind: ClusterScalingSchedule
listKind: ClusterScalingScheduleList
plural: clusterscalingschedules
singular: clusterscalingschedule
scope: Cluster
versions:
- name: v1
schema:
openAPIV3Schema:
description: ClusterScalingSchedule describes a cluster scoped time based
metric to be used in autoscaling operations.
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: ScalingScheduleSpec is the spec part of the ScalingSchedule.
properties:
schedules:
description: Schedules is the list of schedules for this ScalingSchedule
resource. All the schedules defined here will result on the value
to the same metric. New metrics require a new ScalingSchedule resource.
items:
description: Schedule is the schedule details to be used inside
a ScalingSchedule.
properties:
date:
description: Defines the starting date of a OneTime schedule.
It has to be a RFC3339 formated date.
format: date-time
type: string
durationMinutes:
description: The duration in minutes that the configured value
will be returned for the defined schedule.
type: integer
period:
description: Defines the details of a Repeating schedule.
properties:
days:
description: The days that this schedule will be active.
items:
description: ScheduleDay represents the valid inputs for
days in a SchedulePeriod.
enum:
- Sun
- Mon
- Tue
- Wed
- Thu
- Fri
- Sat
type: string
type: array
startTime:
description: The startTime has the format HH:MM
pattern: (([0-1][0-9])|([2][0-3])):([0-5][0-9])
type: string
timezone:
description: The location name corresponding to a file in
the IANA Time Zone database, like Europe/Berlin.
type: string
required:
- days
- startTime
- timezone
type: object
type:
description: Defines if the schedule is a OneTime schedule or
Repeating one. If OneTime, date has to be defined. If Repeating,
Period has to be defined.
enum:
- OneTime
- Repeating
type: string
value:
description: The metric value that will be returned for the
defined schedule.
type: integer
required:
- durationMinutes
- type
- value
type: object
type: array
required:
- schedules
type: object
required:
- spec
type: object
served: true
storage: true
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
{{- end}}

View File

@ -182,6 +182,9 @@ spec:
{{- if .Values.zmon.tokenName }}
- --zmon-token-name={{ .Values.zmon.tokenName }}
{{- end}}
{{- if .Values.scalingSchedule.enabled }}
- --scaling-schedule
{{- end}}
resources:
limits:
cpu: {{ .Values.resources.limits.cpu }}

View File

@ -73,6 +73,17 @@ rules:
- get
- list
- watch
{{- if .Values.scalingSchedule.enabled }}
- apiGroups:
- zalando.org
resources:
- clusterscalingschedules
- scalingschedules
verbs:
- get
- list
- watch
{{- end}}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding

View File

@ -0,0 +1,119 @@
{{- if .Values.scalingSchedule.enabled }}
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.5.0
creationTimestamp: null
name: scalingschedules.zalando.org
spec:
group: zalando.org
names:
kind: ScalingSchedule
listKind: ScalingScheduleList
plural: scalingschedules
singular: scalingschedule
scope: Namespaced
versions:
- name: v1
schema:
openAPIV3Schema:
description: ScalingSchedule describes a namespaced time based metric to be
used in autoscaling operations.
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: ScalingScheduleSpec is the spec part of the ScalingSchedule.
properties:
schedules:
description: Schedules is the list of schedules for this ScalingSchedule
resource. All the schedules defined here will result on the value
to the same metric. New metrics require a new ScalingSchedule resource.
items:
description: Schedule is the schedule details to be used inside
a ScalingSchedule.
properties:
date:
description: Defines the starting date of a OneTime schedule.
It has to be a RFC3339 formated date.
format: date-time
type: string
durationMinutes:
description: The duration in minutes that the configured value
will be returned for the defined schedule.
type: integer
period:
description: Defines the details of a Repeating schedule.
properties:
days:
description: The days that this schedule will be active.
items:
description: ScheduleDay represents the valid inputs for
days in a SchedulePeriod.
enum:
- Sun
- Mon
- Tue
- Wed
- Thu
- Fri
- Sat
type: string
type: array
startTime:
description: The startTime has the format HH:MM
pattern: (([0-1][0-9])|([2][0-3])):([0-5][0-9])
type: string
timezone:
description: The location name corresponding to a file in
the IANA Time Zone database, like Europe/Berlin.
type: string
required:
- days
- startTime
- timezone
type: object
type:
description: Defines if the schedule is a OneTime schedule or
Repeating one. If OneTime, date has to be defined. If Repeating,
Period has to be defined.
enum:
- OneTime
- Repeating
type: string
value:
description: The metric value that will be returned for the
defined schedule.
type: integer
required:
- durationMinutes
- type
- value
type: object
type: array
required:
- schedules
type: object
required:
- spec
type: object
served: true
storage: true
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
{{- end}}

View File

@ -92,3 +92,6 @@ resources:
requests:
cpu: 100m
memory: 100Mi
scalingSchedule:
enabled: false

View File

@ -78,6 +78,15 @@ rules:
- get
- list
- watch
- apiGroups:
- zalando.org
resources:
- clusterscalingschedules
- scalingschedules
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding

216
go.sum

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -332,7 +332,7 @@ func TestSkipperCollector(t *testing.T) {
err := makeIngress(client, tc.namespace, tc.ingressName, tc.backend, tc.hostnames, tc.backendWeights)
require.NoError(t, err)
plugin := makePlugin(tc.metric)
hpa := makeHPA(tc.namespace, tc.ingressName, tc.backend)
hpa := makeIngressHPA(tc.namespace, tc.ingressName, tc.backend)
config := makeConfig(tc.ingressName, tc.namespace, tc.backend, tc.fakedAverage)
_, err = newDeployment(client, tc.namespace, tc.backend, tc.replicas, tc.readyReplicas)
require.NoError(t, err)
@ -387,7 +387,7 @@ func makeIngress(client kubernetes.Interface, namespace, ingressName, backend st
return err
}
func makeHPA(namespace, ingressName, backend string) *autoscalingv2.HorizontalPodAutoscaler {
func makeIngressHPA(namespace, ingressName, backend string) *autoscalingv2.HorizontalPodAutoscaler {
return &autoscalingv2.HorizontalPodAutoscaler{
ObjectMeta: metav1.ObjectMeta{Namespace: namespace},
Spec: autoscalingv2.HorizontalPodAutoscalerSpec{

View File

@ -63,7 +63,8 @@ func (s *MetricStore) insertCustomMetric(value custom_metrics.MetricValue) {
s.Lock()
defer s.Unlock()
// TODO: handle this mapping nicer
// TODO: handle this mapping nicer. This information should be
// registered as the metrics are.
var groupResource schema.GroupResource
switch value.DescribedObject.Kind {
case "Pod":
@ -71,7 +72,7 @@ func (s *MetricStore) insertCustomMetric(value custom_metrics.MetricValue) {
Resource: "pods",
}
case "Ingress":
// group can be either `extentions` or `networking.k8s.io`
// group can be either `extensions` or `networking.k8s.io`
group := "extensions"
gv, err := schema.ParseGroupVersion(value.DescribedObject.APIVersion)
if err == nil {
@ -81,6 +82,26 @@ func (s *MetricStore) insertCustomMetric(value custom_metrics.MetricValue) {
Resource: "ingresses",
Group: group,
}
case "ScalingSchedule":
group := "zalando.org"
gv, err := schema.ParseGroupVersion(value.DescribedObject.APIVersion)
if err == nil {
group = gv.Group
}
groupResource = schema.GroupResource{
Resource: "scalingschedules",
Group: group,
}
case "ClusterScalingSchedule":
group := "zalando.org"
gv, err := schema.ParseGroupVersion(value.DescribedObject.APIVersion)
if err == nil {
group = gv.Group
}
groupResource = schema.GroupResource{
Resource: "clusterscalingschedules",
Group: group,
}
}
metric := customMetricsStoredMetric{

View File

@ -146,6 +146,120 @@ func TestInternalMetricStorage(t *testing.T) {
},
},
},
{
test: "insert/list/get a ScalingSchedule metric",
insert: collector.CollectedMetric{
Type: autoscalingv2.MetricSourceType("Object"),
Custom: custom_metrics.MetricValue{
Metric: newMetricIdentifier("scalingschedulename"),
Value: *resource.NewQuantity(10, ""),
DescribedObject: custom_metrics.ObjectReference{
Name: "metricObject",
Namespace: "default",
Kind: "ScalingSchedule",
APIVersion: "zalando.org/v1",
},
},
},
expectedFound: true,
list: []provider.CustomMetricInfo{
{
GroupResource: schema.GroupResource{
Group: "zalando.org",
Resource: "scalingschedules",
},
Namespaced: true,
Metric: "scalingschedulename",
},
},
byName: struct {
name types.NamespacedName
info provider.CustomMetricInfo
}{
name: types.NamespacedName{Name: "metricObject", Namespace: "default"},
info: provider.CustomMetricInfo{
GroupResource: schema.GroupResource{
Group: "zalando.org",
Resource: "scalingschedules",
},
Namespaced: true,
Metric: "scalingschedulename",
},
},
byLabel: struct {
namespace string
selector labels.Selector
info provider.CustomMetricInfo
}{
namespace: "default",
selector: labels.Everything(),
info: provider.CustomMetricInfo{
GroupResource: schema.GroupResource{
Group: "zalando.org",
Resource: "scalingschedules",
},
Namespaced: true,
Metric: "scalingschedulename",
},
},
},
{
test: "insert/list/get a ClusterScalingSchedule metric",
insert: collector.CollectedMetric{
Type: autoscalingv2.MetricSourceType("Object"),
Custom: custom_metrics.MetricValue{
Metric: newMetricIdentifier("clusterscalingschedulename"),
Value: *resource.NewQuantity(10, ""),
DescribedObject: custom_metrics.ObjectReference{
Name: "metricObject",
Namespace: "default", // The HPA namespace
Kind: "ClusterScalingSchedule",
APIVersion: "zalando.org/v1",
},
},
},
expectedFound: true,
list: []provider.CustomMetricInfo{
{
GroupResource: schema.GroupResource{
Group: "zalando.org",
Resource: "clusterscalingschedules",
},
Namespaced: true,
Metric: "clusterscalingschedulename",
},
},
byName: struct {
name types.NamespacedName
info provider.CustomMetricInfo
}{
name: types.NamespacedName{Name: "metricObject", Namespace: "default"},
info: provider.CustomMetricInfo{
GroupResource: schema.GroupResource{
Group: "zalando.org",
Resource: "clusterscalingschedules",
},
Namespaced: true,
Metric: "clusterscalingschedulename",
},
},
byLabel: struct {
namespace string
selector labels.Selector
info provider.CustomMetricInfo
}{
namespace: "default",
selector: labels.Everything(),
info: provider.CustomMetricInfo{
GroupResource: schema.GroupResource{
Group: "zalando.org",
Resource: "clusterscalingschedules",
},
Namespaced: true,
Metric: "clusterscalingschedulename",
},
},
},
{
test: "insert/list/get a non-namespaced resource metric",
insert: collector.CollectedMetric{

View File

@ -18,6 +18,7 @@ package server
import (
"context"
"errors"
"fmt"
"net"
"net/http"
@ -31,15 +32,19 @@ import (
"github.com/spf13/cobra"
"github.com/zalando-incubator/cluster-lifecycle-manager/pkg/credentials-loader/platformiam"
generatedopenapi "github.com/zalando-incubator/kube-metrics-adapter/pkg/api/generated/openapi"
v1 "github.com/zalando-incubator/kube-metrics-adapter/pkg/apis/zalando.org/v1"
"github.com/zalando-incubator/kube-metrics-adapter/pkg/client/clientset/versioned"
"github.com/zalando-incubator/kube-metrics-adapter/pkg/collector"
"github.com/zalando-incubator/kube-metrics-adapter/pkg/provider"
"github.com/zalando-incubator/kube-metrics-adapter/pkg/zmon"
"golang.org/x/oauth2"
"k8s.io/apimachinery/pkg/fields"
openapinamer "k8s.io/apiserver/pkg/endpoints/openapi"
genericapiserver "k8s.io/apiserver/pkg/server"
"k8s.io/client-go/informers"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/cache"
"k8s.io/client-go/tools/clientcmd"
"k8s.io/klog"
)
@ -118,6 +123,8 @@ func NewCommandStartAdapterServer(stopCh <-chan struct{}) *cobra.Command {
"disregard failing to create collectors for incompatible HPAs")
flags.DurationVar(&o.MetricsTTL, "metrics-ttl", 15*time.Minute, "TTL for metrics that are stored in in-memory cache.")
flags.DurationVar(&o.GCInterval, "garbage-collector-interval", 10*time.Minute, "Interval to clean up metrics that are stored in in-memory cache.")
flags.BoolVar(&o.ScalingScheduleMetrics, "scaling-schedule", o.ScalingScheduleMetrics, ""+
"whether to enable time-based ScalingSchedule metrics")
return cmd
}
@ -245,6 +252,49 @@ func (o AdapterServerOptions) RunCustomMetricsAdapterServer(stopCh <-chan struct
collectorFactory.RegisterExternalCollector([]string{collector.AWSSQSQueueLengthMetric}, collector.NewAWSCollectorPlugin(awsSessions))
}
if o.ScalingScheduleMetrics {
scalingScheduleClient, err := versioned.NewForConfig(clientConfig)
if err != nil {
return errors.New("unable to create [Cluster]ScalingSchedule.zalando.org/v1 client")
}
clusterScalingSchedulesStore := cache.NewStore(cache.MetaNamespaceKeyFunc)
clusterReflector := cache.NewReflector(
cache.NewListWatchFromClient(scalingScheduleClient.ZalandoV1().RESTClient(), "ClusterScalingSchedules", "", fields.Everything()),
&v1.ClusterScalingSchedule{},
clusterScalingSchedulesStore,
0,
)
go clusterReflector.Run(ctx.Done())
scalingSchedulesStore := cache.NewStore(cache.MetaNamespaceKeyFunc)
reflector := cache.NewReflector(
cache.NewListWatchFromClient(scalingScheduleClient.ZalandoV1().RESTClient(), "ScalingSchedules", "", fields.Everything()),
&v1.ScalingSchedule{},
scalingSchedulesStore,
0,
)
go reflector.Run(ctx.Done())
clusterPlugin, err := collector.NewClusterScalingScheduleCollectorPlugin(clusterScalingSchedulesStore, time.Now)
if err != nil {
return fmt.Errorf("unable to create ClusterScalingScheduleCollector plugin: %v", err)
}
err = collectorFactory.RegisterObjectCollector("ClusterScalingSchedule", "", clusterPlugin)
if err != nil {
return fmt.Errorf("failed to register ClusterScalingSchedule object collector plugin: %v", err)
}
plugin, err := collector.NewScalingScheduleCollectorPlugin(scalingSchedulesStore, time.Now)
if err != nil {
return fmt.Errorf("unable to create ScalingScheduleCollector plugin: %v", err)
}
err = collectorFactory.RegisterObjectCollector("ScalingSchedule", "", plugin)
if err != nil {
return fmt.Errorf("failed to register ScalingSchedule object collector plugin: %v", err)
}
}
hpaProvider := provider.NewHPAProvider(client, 30*time.Second, 1*time.Minute, collectorFactory, o.DisregardIncompatibleHPAs, o.MetricsTTL, o.GCInterval)
go hpaProvider.Run(ctx)
@ -356,4 +406,6 @@ type AdapterServerOptions struct {
MetricsTTL time.Duration
// Interval to clean up metrics that are stored in in-memory cache
GCInterval time.Duration
// Time-based scaling based on the CRDs ScheduleScaling and ClusterScheduleScaling.
ScalingScheduleMetrics bool
}