Simple ZMON collector implementation (#2)

* Simple ZMON collector implementation

Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>

* Add tests for ZMON client

Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>

* Add tests for zmon collector

Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>

* Update ZMON collector docs

Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>

* Expose tags instead of entities for queries

Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>

* Remove unused function

Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>
This commit is contained in:
Mikkel Oscar Lyderik Larsen
2018-10-29 14:26:25 +01:00
committed by Arjun
parent b18acf3ed0
commit c86a82ca88
8 changed files with 964 additions and 31 deletions

View File

@ -294,3 +294,95 @@ The AWS account of the queue currently depends on how `kube-metrics-adapter` is
configured to get AWS credentials. The normal assumption is that you run the
adapter in a cluster running in the AWS account where the queue is defined.
Please open an issue if you would like support for other use cases.
## ZMON collector
The ZMON collector allows scaling based on external metrics exposed by
[ZMON](https://github.com/zalando/zmon) checks.
### Supported metrics
| Metric | Description | Type |
| ------------ | ------- | -- |
| `zmon-check` | Scale based on any ZMON check results | External |
### Example
This is an example of an HPA that will scale based on the specified value
exposed by a ZMON check with id `1234`.
```yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
annotations:
# metric-config.<metricType>.<metricName>.<collectorName>/<configKey>
metric-config.external.zmon-check.zmon/key: "custom.*"
metric-config.external.zmon-check.zmon/tag-application: "my-custom-app-*"
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: custom-metrics-consumer
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: zmon-check
metricSelector:
matchLabels:
check-id: "1234" # the ZMON check to query for metrics
key: "custom.value"
tag-application: my-custom-app
aggregators: avg # comma separated list of aggregation functions, default: last
duration: 5m # default: 10m
targetAverageValue: 30
```
The `check-id` specifies the ZMON check to query for the metrics. `key`
specifies the JSON key in the check output to extract the metric value from.
E.g. if you have a check which returns the following data:
```json
{
"custom": {
"value": 1.0
},
"other": {
"value": 3.0
}
}
```
Then the value `1.0` would be returned when the key is defined as `custom.value`.
The `tag-<name>` labels defines the tags used for the kariosDB query. In a
normal ZMON setup the following tags will be available:
* `application`
* `alias` (name of Kubernetes cluster)
* `entity` - full ZMON entity ID.
`aggregators` defines the aggregation functions applied to the metrics query.
For instance if you define the entity filter
`type=kube_pod,application=my-custom-app` you might get three entities back and
then you might want to get an average over the metrics for those three
entities. This would be possible by using the `avg` aggregator. The default
aggregator is `last` which returns only the latest metric point from the
query. The supported aggregation functions are `avg`, `dev`, `count`,
`first`, `last`, `max`, `min`, `sum`, `diff`. See the [KariosDB docs](https://kairosdb.github.io/docs/build/html/restapi/Aggregators.html) for
details.
The `duration` defines the duration used for the timeseries query. E.g. if you
specify a duration of `5m` then the query will return metric points for the
last 5 minutes and apply the specified aggregation with the same duration .e.g
`max(5m)`.
The annotations `metric-config.external.zmon-check.zmon/key` and
`metric-config.external.zmon-check.zmon/tag-<name>` can be optionally used if
you need to define a `key` or other `tag` with a "star" query syntax like
`values.*`. This *hack* is in place because it's not allowed to use `*` in the
metric label definitions. If both annotations and corresponding label is
defined, then the annotation takes precedence.

12
go.mod
View File

@ -22,7 +22,6 @@ require (
github.com/evanphx/json-patch v3.0.0+incompatible // indirect
github.com/fsnotify/fsnotify v1.4.7 // indirect
github.com/ghodss/yaml v1.0.0 // indirect
github.com/go-ini/ini v1.25.4 // indirect
github.com/go-openapi/jsonpointer v0.0.0-20180322222829-3a0015ad55fa // indirect
github.com/go-openapi/jsonreference v0.0.0-20180322222742-3fb327e6747d // indirect
github.com/go-openapi/spec v0.0.0-20180801175345-384415f06ee2 // indirect
@ -30,11 +29,9 @@ require (
github.com/gogo/protobuf v1.1.1 // indirect
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b
github.com/golang/groupcache v0.0.0-20180513044358-24b0969c4cb7 // indirect
github.com/golang/protobuf v1.2.0 // indirect
github.com/google/btree v0.0.0-20180813153112-4030bb1f1f0c // indirect
github.com/google/gofuzz v0.0.0-20170612174753-24818f796faf // indirect
github.com/googleapis/gnostic v0.2.0 // indirect
github.com/gopherjs/gopherjs v0.0.0-20180820052304-89baedc74dd7 // indirect
github.com/gorilla/websocket v1.3.0 // indirect
github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7 // indirect
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0 // indirect
@ -45,7 +42,6 @@ require (
github.com/inconshreveable/mousetrap v1.0.0 // indirect
github.com/jonboulle/clockwork v0.1.0 // indirect
github.com/json-iterator/go v1.1.5 // indirect
github.com/jtolds/gls v4.2.1+incompatible // indirect
github.com/kubernetes-incubator/custom-metrics-apiserver v0.0.0-20180824182428-26e5299457d3
github.com/mailru/easyjson v0.0.0-20180823135443-60711f1a8329 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.1 // indirect
@ -62,21 +58,21 @@ require (
github.com/prometheus/common v0.0.0-20180801064454-c7de2306084e
github.com/prometheus/procfs v0.0.0-20180725123919-05ee40e3a273 // indirect
github.com/sirupsen/logrus v1.0.6
github.com/smartystreets/assertions v0.0.0-20180820201707-7c9eb446e3cf // indirect
github.com/smartystreets/goconvey v0.0.0-20180222194500-ef6db91d284a // indirect
github.com/soheilhy/cmux v0.1.4 // indirect
github.com/spf13/cobra v0.0.3
github.com/spf13/pflag v1.0.2 // indirect
github.com/stretchr/testify v1.2.2 // indirect
github.com/stretchr/testify v1.2.2
github.com/tmc/grpc-websocket-proxy v0.0.0-20171017195756-830351dc03c6 // indirect
github.com/ugorji/go v1.1.1 // indirect
github.com/xiang90/probing v0.0.0-20160813154853-07dd2e8dfe18 // indirect
github.com/zalando-incubator/cluster-lifecycle-manager v0.0.0-20180921141935-824b77fb1f84
golang.org/x/crypto v0.0.0-20181015023909-0c41d7ab0a0e // indirect
golang.org/x/net v0.0.0-20180824152047-4bcd98cce591 // indirect
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f // indirect
golang.org/x/sys v0.0.0-20180824143301-4910a1d54f87 // indirect
golang.org/x/text v0.3.0 // indirect
golang.org/x/time v0.0.0-20180412165947-fbb02b2291d2 // indirect
google.golang.org/appengine v1.2.0 // indirect
google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8 // indirect
google.golang.org/grpc v1.14.0 // indirect
gopkg.in/airbrake/gobrake.v2 v2.0.9 // indirect

21
go.sum
View File

@ -8,8 +8,6 @@ github.com/PuerkitoBio/purell v1.1.0 h1:rmGxhojJlM0tuKtfdvliR84CFHljx9ag64t2xmVk
github.com/PuerkitoBio/purell v1.1.0/go.mod h1:c11w/QuzBsJSee3cPx9rAFu61PvFxuPbtSwDGJws/X0=
github.com/PuerkitoBio/urlesc v0.0.0-20170810143723-de5bf2ad4578 h1:d+Bc7a5rLufV/sSk/8dngufqelfh6jnri85riMAaF/M=
github.com/PuerkitoBio/urlesc v0.0.0-20170810143723-de5bf2ad4578/go.mod h1:uGdkoq3SwY9Y+13GIhn11/XLaGBb4BfwItxLd5jeuXE=
github.com/aws/aws-sdk-go v1.15.21 h1:STLvc6RrpycslC1NRtTvt/YSgDkIGCTrB9K9vE5R2oQ=
github.com/aws/aws-sdk-go v1.15.21/go.mod h1:mFuSZ37Z9YOHbQEwBWztmVzqXrEkub65tZoCYDt7FT0=
github.com/aws/aws-sdk-go v1.15.61 h1:M1mnQshHau/YfY2hV45rsaAevdMgLp7zh0oHRCgX100=
github.com/aws/aws-sdk-go v1.15.61/go.mod h1:E3/ieXAlvM0XWO57iftYVDLLvQ824smPP3ATZkfNZeM=
github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973 h1:xJ4a3vCFaGF/jqvzLMYoU8P317H5OQ+Via4RmuPwCS0=
@ -42,8 +40,6 @@ github.com/fsnotify/fsnotify v1.4.7 h1:IXs+QLmnXW2CcXuY+8Mzv/fWEsPGWxqefPtCP5CnV
github.com/fsnotify/fsnotify v1.4.7/go.mod h1:jwhsz4b93w/PPRr/qN1Yymfu8t87LnFCMoQvtojpjFo=
github.com/ghodss/yaml v1.0.0 h1:wQHKEahhL6wmXdzwWG11gIVCkOv05bNOh+Rxn0yngAk=
github.com/ghodss/yaml v1.0.0/go.mod h1:4dBDuWmgqj2HViK6kFavaiC9ZROes6MMH2rRYeMEF04=
github.com/go-ini/ini v1.25.4 h1:Mujh4R/dH6YL8bxuISne3xX2+qcQ9p0IxKAP6ExWoUo=
github.com/go-ini/ini v1.25.4/go.mod h1:ByCAeIL28uOIIG0E3PJtZPDL8WnHpFKFOtgjp+3Ies8=
github.com/go-openapi/jsonpointer v0.0.0-20180322222829-3a0015ad55fa h1:hr8WVDjg4JKtQptZpzyb196TmruCs7PIsdJz8KAOZp8=
github.com/go-openapi/jsonpointer v0.0.0-20180322222829-3a0015ad55fa/go.mod h1:+35s3my2LFTysnkMfxsJBAMHj/DoqoB9knIWoYG/Vk0=
github.com/go-openapi/jsonreference v0.0.0-20180322222742-3fb327e6747d h1:k3UQ7Z8yFYq0BNkYykKIheY0HlZBl1Hku+pO9HE9FNU=
@ -66,8 +62,6 @@ github.com/google/gofuzz v0.0.0-20170612174753-24818f796faf h1:+RRA9JqSOZFfKrOeq
github.com/google/gofuzz v0.0.0-20170612174753-24818f796faf/go.mod h1:HP5RmnzzSNb993RKQDq4+1A4ia9nllfqcQFTQJedwGI=
github.com/googleapis/gnostic v0.2.0 h1:l6N3VoaVzTncYYW+9yOz2LJJammFZGBO13sqgEhpy9g=
github.com/googleapis/gnostic v0.2.0/go.mod h1:sJBsCZ4ayReDTBIg8b9dl28c5xFWyhBTVRp3pOg5EKY=
github.com/gopherjs/gopherjs v0.0.0-20180820052304-89baedc74dd7 h1:WF7x3tAe0mEb4wf/yhSThHwZYQIjVmEGSbAH9hzOeZQ=
github.com/gopherjs/gopherjs v0.0.0-20180820052304-89baedc74dd7/go.mod h1:wJfORRmW1u3UXTncJ5qlYoELFm8eSnnEO6hX4iZ3EWY=
github.com/gorilla/websocket v1.3.0 h1:r/LXc0VJIMd0rCMsc6DxgczaQtoCwCLatnfXmSYcXx8=
github.com/gorilla/websocket v1.3.0/go.mod h1:E7qHFY5m1UJ88s3WnNqhKjPHQ0heANvMoAMk2YaljkQ=
github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7 h1:pdN6V1QBWetyv/0+wjACpqVH+eVULgEjkurDLq3goeM=
@ -90,8 +84,6 @@ github.com/jonboulle/clockwork v0.1.0 h1:VKV+ZcuP6l3yW9doeqz6ziZGgcynBVQO+obU0+0
github.com/jonboulle/clockwork v0.1.0/go.mod h1:Ii8DK3G1RaLaWxj9trq07+26W01tbo22gdxWY5EU2bo=
github.com/json-iterator/go v1.1.5 h1:gL2yXlmiIo4+t+y32d4WGwOjKGYcGOuyrg46vadswDE=
github.com/json-iterator/go v1.1.5/go.mod h1:+SdeFBvtyEkXs7REEP0seUULqWtbJapLOCVDaaPEHmU=
github.com/jtolds/gls v4.2.1+incompatible h1:fSuqC+Gmlu6l/ZYAoZzx2pyucC8Xza35fpRVWLVmUEE=
github.com/jtolds/gls v4.2.1+incompatible/go.mod h1:QJZ7F/aHp+rZTRtaJ1ow/lLfFfVYBRgL+9YlvaHOwJU=
github.com/kubernetes-incubator/custom-metrics-apiserver v0.0.0-20180824182428-26e5299457d3 h1:X22IRs6vbuj0xu3ZuMYMI2Qe0IDmxv0RJvWEwLw2nSg=
github.com/kubernetes-incubator/custom-metrics-apiserver v0.0.0-20180824182428-26e5299457d3/go.mod h1:KWRxWvzVCNvDtG9ejU5UdpgvxdCZFMUZu0xroKWG8Bo=
github.com/mailru/easyjson v0.0.0-20180823135443-60711f1a8329 h1:2gxZ0XQIU/5z3Z3bUBu+FXuk2pFbkN6tcwi/pjyaDic=
@ -124,10 +116,6 @@ github.com/prometheus/procfs v0.0.0-20180725123919-05ee40e3a273 h1:agujYaXJSxSo1
github.com/prometheus/procfs v0.0.0-20180725123919-05ee40e3a273/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk=
github.com/sirupsen/logrus v1.0.6 h1:hcP1GmhGigz/O7h1WVUM5KklBp1JoNS9FggWKdj/j3s=
github.com/sirupsen/logrus v1.0.6/go.mod h1:pMByvHTf9Beacp5x1UXfOR9xyW/9antXMhjMPG0dEzc=
github.com/smartystreets/assertions v0.0.0-20180820201707-7c9eb446e3cf h1:6V1qxN6Usn4jy8unvggSJz/NC790tefw8Zdy6OZS5co=
github.com/smartystreets/assertions v0.0.0-20180820201707-7c9eb446e3cf/go.mod h1:OnSkiWE9lh6wB0YB77sQom3nweQdgAjqCqsofrRNTgc=
github.com/smartystreets/goconvey v0.0.0-20180222194500-ef6db91d284a h1:JSvGDIbmil4Ui/dDdFBExb7/cmkNjyX5F97oglmvCDo=
github.com/smartystreets/goconvey v0.0.0-20180222194500-ef6db91d284a/go.mod h1:XDJAKZRPZ1CvBcN2aX5YOUTYGHki24fSF0Iv48Ibg0s=
github.com/soheilhy/cmux v0.1.4 h1:0HKaf1o97UwFjHH9o5XsHUOF+tqmdA7KEzXLpiyaw0E=
github.com/soheilhy/cmux v0.1.4/go.mod h1:IM3LyeVVIOuxMH7sFAkER9+bJ4dT7Ms6E4xg4kGIyLM=
github.com/spf13/cobra v0.0.3 h1:ZlrZ4XsMRm04Fr5pSFxBgfND2EBVa1nLpiy1stUsX/8=
@ -142,12 +130,15 @@ github.com/ugorji/go v1.1.1 h1:gmervu+jDMvXTbcHQ0pd2wee85nEoE0BsVyEuzkfK8w=
github.com/ugorji/go v1.1.1/go.mod h1:hnLbHMwcvSihnDhEfx2/BzKp2xb0Y+ErdfYcrs9tkJQ=
github.com/xiang90/probing v0.0.0-20160813154853-07dd2e8dfe18 h1:MPPkRncZLN9Kh4MEFmbnK4h3BD7AUmskWv2+EeZJCCs=
github.com/xiang90/probing v0.0.0-20160813154853-07dd2e8dfe18/go.mod h1:UETIi67q53MR2AWcXfiuqkDkRtnGDLqkBTpCHuJHxtU=
golang.org/x/crypto v0.0.0-20180820150726-614d502a4dac h1:7d7lG9fHOLdL6jZPtnV4LpI41SbohIJ1Atq7U991dMg=
golang.org/x/crypto v0.0.0-20180820150726-614d502a4dac/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4=
github.com/zalando-incubator/cluster-lifecycle-manager v0.0.0-20180921141935-824b77fb1f84 h1:LirBPRU6n8qjPFWCTIlTDh+1FW2M4z2RFS7lCzNkKgA=
github.com/zalando-incubator/cluster-lifecycle-manager v0.0.0-20180921141935-824b77fb1f84/go.mod h1:6GNNHCquvS1cn0APtLvgYNYEMYK+JRwp6ZTGxi+pc+w=
golang.org/x/crypto v0.0.0-20181015023909-0c41d7ab0a0e h1:IzypfodbhbnViNUO/MEh0FzCUooG97cIGfdggUrUSyU=
golang.org/x/crypto v0.0.0-20181015023909-0c41d7ab0a0e/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4=
golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180824152047-4bcd98cce591 h1:4S2XUgvg3hUNTvxI307qkFPb9zKHG3Nf9TXFzX/DZZI=
golang.org/x/net v0.0.0-20180824152047-4bcd98cce591/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be h1:vEDujvNQGv4jgYKudGeI/+DAX4Jffq6hpD55MmoEvKs=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f h1:wMNYb4v58l5UBM7MYRLPG6ZhfOqbKu7X5eyFl8ZhKvA=
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.0.0-20180824143301-4910a1d54f87 h1:GqwDwfvIpC33dK9bA1fD+JiDUNsuAiQiEkpHqUKze4o=
@ -156,6 +147,8 @@ golang.org/x/text v0.3.0 h1:g61tztE5qeGQ89tm6NTjjM9VPIm088od1l6aSorWRWg=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/time v0.0.0-20180412165947-fbb02b2291d2 h1:+DCIGbF/swA92ohVg0//6X2IVY3KZs6p9mix0ziNYJM=
golang.org/x/time v0.0.0-20180412165947-fbb02b2291d2/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
google.golang.org/appengine v1.2.0 h1:S0iUepdCWODXRvtE+gcRDd15L+k+k1AiHlMiMjefH24=
google.golang.org/appengine v1.2.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4=
google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8 h1:Nw54tB0rB7hY/N0NQvRW8DG4Yk3Q6T9cu9RcFQDu1tc=
google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8/go.mod h1:JiN7NxoALGmiZfu7CAH4rXhgtRTLTxftemlI0sWmxmc=
google.golang.org/grpc v1.14.0 h1:ArxJuB1NWfPY6r9Gp9gqwplT0Ge7nqv9msgu03lHLmo=

View File

@ -0,0 +1,175 @@
package collector
import (
"fmt"
"strconv"
"strings"
"time"
"github.com/zalando-incubator/kube-metrics-adapter/pkg/zmon"
autoscalingv2beta1 "k8s.io/api/autoscaling/v2beta1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/metrics/pkg/apis/external_metrics"
)
const (
// ZMONCheckMetric defines the metric name for metrics based on ZMON
// checks.
ZMONCheckMetric = "zmon-check"
zmonCheckIDLabelKey = "check-id"
zmonKeyLabelKey = "key"
zmonDurationLabelKey = "duration"
zmonAggregatorsLabelKey = "aggregators"
zmonTagPrefixLabelKey = "tag-"
defaultQueryDuration = 10 * time.Minute
zmonKeyAnnotationKey = "metric-config.external.zmon-check.zmon/key"
zmonTagPrefixAnnotationKey = "metric-config.external.zmon-check.zmon/tag-"
)
// ZMONCollectorPlugin defines a plugin for creating collectors that can get
// metrics from ZMON.
type ZMONCollectorPlugin struct {
zmon zmon.ZMON
}
// NewZMONCollectorPlugin initializes a new ZMONCollectorPlugin.
func NewZMONCollectorPlugin(zmon zmon.ZMON) (*ZMONCollectorPlugin, error) {
return &ZMONCollectorPlugin{
zmon: zmon,
}, nil
}
// NewCollector initializes a new ZMON collector from the specified HPA.
func (c *ZMONCollectorPlugin) NewCollector(hpa *autoscalingv2beta1.HorizontalPodAutoscaler, config *MetricConfig, interval time.Duration) (Collector, error) {
switch config.Name {
case ZMONCheckMetric:
annotations := map[string]string{}
if hpa != nil {
annotations = hpa.Annotations
}
return NewZMONCollector(c.zmon, config, annotations, interval)
}
return nil, fmt.Errorf("metric '%s' not supported", config.Name)
}
// ZMONCollector defines a collector that is able to collect metrics from ZMON.
type ZMONCollector struct {
zmon zmon.ZMON
interval time.Duration
checkID int
key string
labels map[string]string
tags map[string]string
duration time.Duration
aggregators []string
metricName string
metricType autoscalingv2beta1.MetricSourceType
}
// NewZMONCollector initializes a new ZMONCollector.
func NewZMONCollector(zmon zmon.ZMON, config *MetricConfig, annotations map[string]string, interval time.Duration) (*ZMONCollector, error) {
checkIDStr, ok := config.Labels[zmonCheckIDLabelKey]
if !ok {
return nil, fmt.Errorf("ZMON check ID not specified on metric")
}
checkID, err := strconv.Atoi(checkIDStr)
if err != nil {
return nil, err
}
key := ""
// get optional key
if k, ok := config.Labels[zmonKeyLabelKey]; ok {
key = k
}
// annotations takes precedence over label
if k, ok := annotations[zmonKeyAnnotationKey]; ok {
key = k
}
duration := defaultQueryDuration
// parse optional duration value
if d, ok := config.Labels[zmonDurationLabelKey]; ok {
duration, err = time.ParseDuration(d)
if err != nil {
return nil, err
}
}
// parse tags
tags := make(map[string]string)
for k, v := range config.Labels {
if strings.HasPrefix(k, zmonTagPrefixLabelKey) {
key := strings.TrimPrefix(k, zmonTagPrefixLabelKey)
tags[key] = v
}
}
// parse tags from annotations
// tags defined in annotations takes precedence over tags defined in
// the labels.
for k, v := range annotations {
if strings.HasPrefix(k, zmonTagPrefixAnnotationKey) {
key := strings.TrimPrefix(k, zmonTagPrefixAnnotationKey)
tags[key] = v
}
}
// default aggregator is last
aggregators := []string{"last"}
if k, ok := config.Labels[zmonAggregatorsLabelKey]; ok {
aggregators = strings.Split(k, ",")
}
return &ZMONCollector{
zmon: zmon,
interval: interval,
checkID: checkID,
key: key,
tags: tags,
duration: duration,
aggregators: aggregators,
metricName: config.Name,
metricType: config.Type,
labels: config.Labels,
}, nil
}
// GetMetrics returns a list of collected metrics for the ZMON check.
func (c *ZMONCollector) GetMetrics() ([]CollectedMetric, error) {
dataPoints, err := c.zmon.Query(c.checkID, c.key, c.tags, c.aggregators, c.duration)
if err != nil {
return nil, err
}
if len(dataPoints) < 1 {
return nil, nil
}
// pick the last data point
// TODO: do more fancy aggregations here (or in the query function)
point := dataPoints[len(dataPoints)-1]
metricValue := CollectedMetric{
Type: c.metricType,
External: external_metrics.ExternalMetricValue{
MetricName: c.metricName,
MetricLabels: c.labels,
Timestamp: metav1.Time{Time: point.Time},
Value: *resource.NewMilliQuantity(int64(point.Value*1000), resource.DecimalSI),
},
}
return []CollectedMetric{metricValue}, nil
}
// Interval returns the interval at which the collector should run.
func (c *ZMONCollector) Interval() time.Duration {
return c.interval
}

View File

@ -0,0 +1,140 @@
package collector
import (
"testing"
"time"
"github.com/stretchr/testify/require"
"github.com/zalando-incubator/kube-metrics-adapter/pkg/zmon"
autoscalingv2beta1 "k8s.io/api/autoscaling/v2beta1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/metrics/pkg/apis/external_metrics"
)
type zmonMock struct {
dataPoints []zmon.DataPoint
entities []zmon.Entity
}
func (m zmonMock) Query(checkID int, key string, tags map[string]string, aggregators []string, duration time.Duration) ([]zmon.DataPoint, error) {
return m.dataPoints, nil
}
func TestZMONCollectorNewCollector(t *testing.T) {
collectPlugin, _ := NewZMONCollectorPlugin(zmonMock{})
config := &MetricConfig{
MetricTypeName: MetricTypeName{
Name: ZMONCheckMetric,
},
Labels: map[string]string{
zmonCheckIDLabelKey: "1234",
zmonAggregatorsLabelKey: "max",
zmonTagPrefixLabelKey + "alias": "cluster_alias",
zmonDurationLabelKey: "5m",
zmonKeyLabelKey: "key",
},
}
hpa := &autoscalingv2beta1.HorizontalPodAutoscaler{}
collector, err := collectPlugin.NewCollector(hpa, config, 1*time.Second)
require.NoError(t, err)
require.NotNil(t, collector)
zmonCollector := collector.(*ZMONCollector)
require.Equal(t, "key", zmonCollector.key)
require.Equal(t, 1234, zmonCollector.checkID)
require.Equal(t, 1*time.Second, zmonCollector.interval)
require.Equal(t, 5*time.Minute, zmonCollector.duration)
require.Equal(t, []string{"max"}, zmonCollector.aggregators)
require.Equal(t, map[string]string{"alias": "cluster_alias"}, zmonCollector.tags)
// check that annotations overwrites labels
hpa.ObjectMeta = metav1.ObjectMeta{
Annotations: map[string]string{
zmonKeyAnnotationKey: "annotation_key",
zmonTagPrefixAnnotationKey + "alias": "cluster_alias_annotation",
},
}
collector, err = collectPlugin.NewCollector(hpa, config, 1*time.Second)
require.NoError(t, err)
require.NotNil(t, collector)
zmonCollector = collector.(*ZMONCollector)
require.Equal(t, "annotation_key", zmonCollector.key)
require.Equal(t, map[string]string{"alias": "cluster_alias_annotation"}, zmonCollector.tags)
// should fail if the metric name isn't ZMON
config.Name = "non-zmon-check"
_, err = collectPlugin.NewCollector(nil, config, 1*time.Second)
require.Error(t, err)
// should fail if the check id is not specified.
delete(config.Labels, zmonCheckIDLabelKey)
config.Name = ZMONCheckMetric
_, err = collectPlugin.NewCollector(nil, config, 1*time.Second)
require.Error(t, err)
}
func TestZMONCollectorGetMetrics(tt *testing.T) {
config := &MetricConfig{
MetricTypeName: MetricTypeName{
Name: ZMONCheckMetric,
Type: "foo",
},
Labels: map[string]string{
zmonCheckIDLabelKey: "1234",
zmonAggregatorsLabelKey: "max",
zmonTagPrefixLabelKey + "alias": "cluster_alias",
zmonDurationLabelKey: "5m",
zmonKeyLabelKey: "key",
},
}
for _, ti := range []struct {
msg string
dataPoints []zmon.DataPoint
collectedMetrics []CollectedMetric
}{
{
msg: "test successfully getting metrics",
dataPoints: []zmon.DataPoint{
{
Time: time.Time{},
Value: 1.0,
},
},
collectedMetrics: []CollectedMetric{
{
Type: config.Type,
External: external_metrics.ExternalMetricValue{
MetricName: config.Name,
MetricLabels: config.Labels,
Timestamp: metav1.Time{Time: time.Time{}},
Value: *resource.NewMilliQuantity(int64(1.0)*1000, resource.DecimalSI),
},
},
},
},
{
msg: "test not getting any metrics",
},
} {
tt.Run(ti.msg, func(t *testing.T) {
z := zmonMock{
dataPoints: ti.dataPoints,
}
zmonCollector, err := NewZMONCollector(z, config, nil, 1*time.Second)
require.NoError(t, err)
metrics, _ := zmonCollector.GetMetrics()
require.Equal(t, ti.collectedMetrics, metrics)
})
}
}
func TestZMONCollectorInterval(t *testing.T) {
collector := ZMONCollector{interval: 1 * time.Second}
require.Equal(t, 1*time.Second, collector.Interval())
}

View File

@ -19,6 +19,7 @@ package server
import (
"context"
"fmt"
"net"
"net/http"
"time"
@ -28,8 +29,11 @@ import (
"github.com/kubernetes-incubator/custom-metrics-apiserver/pkg/cmd/server"
"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/spf13/cobra"
"github.com/zalando-incubator/cluster-lifecycle-manager/pkg/credentials-loader/platformiam"
"github.com/zalando-incubator/kube-metrics-adapter/pkg/collector"
"github.com/zalando-incubator/kube-metrics-adapter/pkg/provider"
"github.com/zalando-incubator/kube-metrics-adapter/pkg/zmon"
"golang.org/x/oauth2"
"k8s.io/client-go/informers"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
@ -48,6 +52,8 @@ func NewCommandStartAdapterServer(stopCh <-chan struct{}) *cobra.Command {
EnableCustomMetricsAPI: true,
EnableExternalMetricsAPI: true,
MetricsAddress: ":7979",
ZMONTokenName: "zmon",
CredentialsDir: "/meta/credentials",
}
cmd := &cobra.Command{
@ -82,6 +88,14 @@ func NewCommandStartAdapterServer(stopCh <-chan struct{}) *cobra.Command {
"whether to enable External Metrics API")
flags.StringVar(&o.PrometheusServer, "prometheus-server", o.PrometheusServer, ""+
"url of prometheus server to query")
flags.StringVar(&o.ZMONKariosDBEndpoint, "zmon-kariosdb-endpoint", o.ZMONKariosDBEndpoint, ""+
"url of ZMON KariosDB endpoint to query for ZMON checks")
flags.StringVar(&o.ZMONTokenName, "zmon-token-name", o.ZMONTokenName, ""+
"name of the token used to query ZMON")
flags.StringVar(&o.Token, "token", o.Token, ""+
"static oauth2 token to use when calling external services like ZMON")
flags.StringVar(&o.CredentialsDir, "credentials-dir", o.CredentialsDir, ""+
"path to the credentials dir where tokens are stored")
flags.BoolVar(&o.SkipperIngressMetrics, "skipper-ingress-metrics", o.SkipperIngressMetrics, ""+
"whether to enable skipper ingress metrics")
flags.BoolVar(&o.AWSExternalMetrics, "aws-external-metrics", o.AWSExternalMetrics, ""+
@ -116,6 +130,13 @@ func (o AdapterServerOptions) RunCustomMetricsAdapterServer(stopCh <-chan struct
return fmt.Errorf("unable to construct lister client config to initialize provider: %v", err)
}
// convert stop channel to a context
ctx, cancel := context.WithCancel(context.Background())
go func() {
<-stopCh
cancel()
}()
clientConfig.Timeout = defaultClientGOTimeout
client, err := kubernetes.NewForConfig(clientConfig)
@ -153,7 +174,28 @@ func (o AdapterServerOptions) RunCustomMetricsAdapterServer(stopCh <-chan struct
// register generic pod collector
err = collectorFactory.RegisterPodsCollector("", collector.NewPodCollectorPlugin(client))
if err != nil {
return fmt.Errorf("failed to register skipper collector plugin: %v", err)
return fmt.Errorf("failed to register pod collector plugin: %v", err)
}
// enable ZMON based metrics
if o.ZMONKariosDBEndpoint != "" {
var tokenSource oauth2.TokenSource
if o.Token != "" {
tokenSource = oauth2.StaticTokenSource(&oauth2.Token{AccessToken: o.Token})
} else {
tokenSource = platformiam.NewTokenSource(o.ZMONTokenName, o.CredentialsDir)
}
httpClient := newOauth2HTTPClient(ctx, tokenSource)
zmonClient := zmon.NewZMONClient(o.ZMONKariosDBEndpoint, httpClient)
zmonPlugin, err := collector.NewZMONCollectorPlugin(zmonClient)
if err != nil {
return fmt.Errorf("failed to initialize ZMON collector plugin: %v", err)
}
collectorFactory.RegisterExternalCollector([]string{collector.ZMONCheckMetric}, zmonPlugin)
}
awsSessions := make(map[string]*session.Session, len(o.AWSRegions))
@ -170,13 +212,6 @@ func (o AdapterServerOptions) RunCustomMetricsAdapterServer(stopCh <-chan struct
hpaProvider := provider.NewHPAProvider(client, 30*time.Second, 1*time.Minute, collectorFactory)
// convert stop channel to a context
ctx, cancel := context.WithCancel(context.Background())
go func() {
<-stopCh
cancel()
}()
go hpaProvider.Run(ctx)
customMetricsProvider := hpaProvider
@ -200,6 +235,45 @@ func (o AdapterServerOptions) RunCustomMetricsAdapterServer(stopCh <-chan struct
return server.GenericAPIServer.PrepareRun().Run(ctx.Done())
}
// newInstrumentedOauth2HTTPClient creates an HTTP client with automatic oauth2
// token injection. Additionally it will spawn a go-routine for closing idle
// connections every 20 seconds on the http.Transport. This solves the problem
// of re-resolving DNS when the endpoint backend changes.
// https://github.com/golang/go/issues/23427
func newOauth2HTTPClient(ctx context.Context, tokenSource oauth2.TokenSource) *http.Client {
transport := &http.Transport{
DialContext: (&net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
}).DialContext,
TLSHandshakeTimeout: 10 * time.Second,
ResponseHeaderTimeout: 10 * time.Second,
IdleConnTimeout: 20 * time.Second,
MaxIdleConns: 10,
MaxIdleConnsPerHost: 2,
}
go func(transport *http.Transport, duration time.Duration) {
for {
select {
case <-time.After(duration):
transport.CloseIdleConnections()
case <-ctx.Done():
return
}
}
}(transport, 20*time.Second)
client := &http.Client{
Transport: transport,
}
// add HTTP client to context (this is how the oauth2 lib gets it).
ctx = context.WithValue(ctx, oauth2.HTTPClient, client)
// instantiate an http.Client containg the token source.
return oauth2.NewClient(ctx, tokenSource)
}
type AdapterServerOptions struct {
*server.CustomMetricsAdapterServerOptions
@ -210,8 +284,18 @@ type AdapterServerOptions struct {
// EnableExternalMetricsAPI switches on sample apiserver for External Metrics API
EnableExternalMetricsAPI bool
// PrometheusServer enables prometheus queries to the specified
// server.
// server
PrometheusServer string
// ZMONKariosDBEndpoint enables ZMON check queries to the specified
// kariosDB endpoint
ZMONKariosDBEndpoint string
// ZMONTokenName is the name of the token used to query ZMON
ZMONTokenName string
// Token is an oauth2 token used to authenticate with services like
// ZMON.
Token string
// CredentialsDir is the path to the dir where tokens are stored
CredentialsDir string
// SkipperIngressMetrics switches on support for skipper ingress based
// metric collection.
SkipperIngressMetrics bool

269
pkg/zmon/zmon.go Normal file
View File

@ -0,0 +1,269 @@
package zmon
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"net/url"
"time"
)
var (
// set of valid aggregators that can be used in queries
// https://kairosdb.github.io/docs/build/html/restapi/Aggregators.html
validAggregators = map[string]struct{}{
"avg": struct{}{},
"dev": struct{}{},
"count": struct{}{},
"first": struct{}{},
"last": struct{}{},
"max": struct{}{},
"min": struct{}{},
"sum": struct{}{},
"diff": struct{}{},
}
)
// Entity defines a ZMON entity.
type Entity struct {
ID string `json:"id"`
}
// ZMON defines an interface for talking to the ZMON API.
type ZMON interface {
Query(checkID int, key string, tags map[string]string, aggregators []string, duration time.Duration) ([]DataPoint, error)
}
// Client defines client for interfacing with the ZMON API.
type Client struct {
dataServiceEndpoint string
http *http.Client
}
// NewZMONClient initializes a new ZMON Client.
func NewZMONClient(dataServiceEndpoint string, client *http.Client) *Client {
return &Client{
dataServiceEndpoint: dataServiceEndpoint,
http: client,
}
}
// DataPoint defines a single datapoint returned from a query.
type DataPoint struct {
Time time.Time
Value float64
}
type metricQuery struct {
StartRelative sampling `json:"start_relative"`
Metrics []metric `json:"metrics"`
}
type sampling struct {
Value int64 `json:"value"`
Unit string `json:"unit"`
}
type metric struct {
Name string `json:"name"`
Limit int `json:"limit"`
Tags map[string][]string `json:"tags"`
GroupBy []tagGroup `json:"group_by"`
Aggregators []aggregator `json:"aggregator"`
}
type tagGroup struct {
Name string `json:"name"`
Tags []string `json:"tags"`
}
type aggregator struct {
Name string `json:"name"`
Sampling sampling `json:"sampling"`
}
type queryResp struct {
Queries []struct {
Results []struct {
Values [][]float64 `json:"values"`
} `json:"results"`
} `json:"queries"`
}
// Query queries the ZMON KairosDB endpoint and returns the resulting list of
// data points for the query.
//
// https://kairosdb.github.io/docs/build/html/restapi/QueryMetrics.html
func (c *Client) Query(checkID int, key string, tags map[string]string, aggregators []string, duration time.Duration) ([]DataPoint, error) {
endpoint, err := url.Parse(c.dataServiceEndpoint)
if err != nil {
return nil, err
}
// convert tags map
tagsSlice := make(map[string][]string, len(tags))
for k, v := range tags {
tagsSlice[k] = []string{v}
}
query := metricQuery{
StartRelative: durationToSampling(duration),
Metrics: []metric{
{
Name: fmt.Sprintf("zmon.check.%d", checkID),
Limit: 10000, // maximum limit of ZMON
Tags: tagsSlice,
GroupBy: []tagGroup{
{
Name: "tag",
Tags: []string{
"key",
},
},
},
Aggregators: make([]aggregator, 0, len(aggregators)),
},
},
}
// add aggregators
for _, aggregatorName := range aggregators {
if _, ok := validAggregators[aggregatorName]; !ok {
return nil, fmt.Errorf("invalid aggregator '%s'", aggregatorName)
}
query.Metrics[0].Aggregators = append(query.Metrics[0].Aggregators, aggregator{
Name: aggregatorName,
Sampling: durationToSampling(duration),
})
}
// add key to query if defined
if key != "" {
query.Metrics[0].Tags["key"] = []string{key}
}
body, err := json.Marshal(&query)
if err != nil {
return nil, err
}
endpoint.Path += "/api/v1/datapoints/query"
req, err := http.NewRequest(http.MethodPost, endpoint.String(), bytes.NewBuffer(body))
if err != nil {
return nil, err
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Accept", "application/json")
resp, err := c.http.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
d, err := ioutil.ReadAll(resp.Body)
if err != nil {
return nil, err
}
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("[kariosdb query] unexpected response code: %d", resp.StatusCode)
}
var result queryResp
err = json.Unmarshal(d, &result)
if err != nil {
return nil, err
}
if len(result.Queries) < 1 {
return nil, nil
}
if len(result.Queries[0].Results) < 1 {
return nil, nil
}
dataPoints := make([]DataPoint, 0, len(result.Queries[0].Results[0].Values))
for _, value := range result.Queries[0].Results[0].Values {
if len(value) != 2 {
return nil, fmt.Errorf("[kariosdb query] unexpected response data")
}
point := DataPoint{
Time: time.Unix(0, int64(value[0])*1000000),
Value: value[1],
}
dataPoints = append(dataPoints, point)
}
return dataPoints, nil
}
const (
day = 24 * time.Hour
week = day * 7
month = day * 30
year = day * 365
)
// durationToSampling converts a time.Duration to the sampling format expected
// by karios db. E.g. the duration `1 * time.Hour` would be converted to:
// sampling{
// Unit: "minutes",
// Value: 1,
// }
func durationToSampling(d time.Duration) sampling {
for _, u := range []struct {
Unit string
Nanoseconds time.Duration
}{
{
Unit: "years",
Nanoseconds: year,
},
{
Unit: "months",
Nanoseconds: month,
},
{
Unit: "weeks",
Nanoseconds: week,
},
{
Unit: "days",
Nanoseconds: day,
},
{
Unit: "hours",
Nanoseconds: 1 * time.Hour,
},
{
Unit: "minutes",
Nanoseconds: 1 * time.Minute,
},
{
Unit: "seconds",
Nanoseconds: 1 * time.Second,
},
{
Unit: "milliseconds",
Nanoseconds: 1 * time.Millisecond,
},
} {
if d.Nanoseconds()/int64(u.Nanoseconds) >= 1 {
return sampling{
Unit: u.Unit,
Value: int64(d.Round(u.Nanoseconds) / u.Nanoseconds),
}
}
}
return sampling{
Unit: "milliseconds",
Value: 0,
}
}

184
pkg/zmon/zmon_test.go Normal file
View File

@ -0,0 +1,184 @@
package zmon
import (
"fmt"
"net/http"
"net/http/httptest"
"testing"
"time"
"github.com/stretchr/testify/assert"
)
func TestQuery(tt *testing.T) {
client := &http.Client{}
for _, ti := range []struct {
msg string
duration time.Duration
aggregators []string
status int
body string
err error
dataPoints []DataPoint
key string
}{
{
msg: "test getting back a single data point",
duration: 1 * time.Hour,
status: http.StatusOK,
body: `{
"queries": [
{
"results": [
{
"values": [
[1539710395000,765952]
]
}
]
}
]
}`,
dataPoints: []DataPoint{
{
Time: time.Unix(1539710395, 0),
Value: 765952,
},
},
},
{
msg: "test getting back a single datapoint with key",
duration: 1 * time.Hour,
status: http.StatusOK,
key: "my-key",
body: `{
"queries": [
{
"results": [
{
"values": [
[1539710395000,765952]
]
}
]
}
]
}`,
dataPoints: []DataPoint{
{
Time: time.Unix(1539710395, 0),
Value: 765952,
},
},
},
{
msg: "test getting back a single datapoint with aggregators",
duration: 1 * time.Hour,
status: http.StatusOK,
aggregators: []string{"max"},
body: `{
"queries": [
{
"results": [
{
"values": [
[1539710395000,765952]
]
}
]
}
]
}`,
dataPoints: []DataPoint{
{
Time: time.Unix(1539710395, 0),
Value: 765952,
},
},
},
{
msg: "test query with invalid aggregator",
aggregators: []string{"invalid"},
err: fmt.Errorf("invalid aggregator 'invalid'"),
},
{
msg: "test query with invalid response",
status: http.StatusInternalServerError,
body: `{"error": 500}`,
err: fmt.Errorf("[kariosdb query] unexpected response code: 500"),
},
{
msg: "test getting invalid values response",
duration: 1 * time.Hour,
status: http.StatusOK,
body: `{
"queries": [
{
"results": [
{
"values": [
[1539710395000,765952,1]
]
}
]
}
]
}`,
err: fmt.Errorf("[kariosdb query] unexpected response data"),
},
} {
tt.Run(ti.msg, func(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(
func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(ti.status)
w.Write([]byte(ti.body))
}),
)
defer ts.Close()
zmonClient := NewZMONClient(ts.URL, client)
dataPoints, err := zmonClient.Query(1, ti.key, nil, ti.aggregators, ti.duration)
assert.Equal(t, ti.err, err)
assert.Len(t, dataPoints, len(ti.dataPoints))
assert.Equal(t, ti.dataPoints, dataPoints)
})
}
}
func TestDurationToSampling(tt *testing.T) {
for _, ti := range []struct {
msg string
duration time.Duration
sampling sampling
}{
{
msg: "1 hour should map to hours sampling",
duration: 1 * time.Hour,
sampling: sampling{
Unit: "hours",
Value: 1,
},
},
{
msg: "2 years should map to years sampling",
duration: 2 * day * 365,
sampling: sampling{
Unit: "years",
Value: 2,
},
},
{
msg: "1 nanosecond should map to 0 milliseconds sampling",
duration: 1,
sampling: sampling{
Unit: "milliseconds",
Value: 0,
},
},
} {
tt.Run(ti.msg, func(t *testing.T) {
assert.Equal(t, durationToSampling(ti.duration), ti.sampling)
})
}
}