一个计算机技术爱好者与学习者

0%

Prometheus Operator + Blackbox exporter

1. 前言

《Kubernetes Operator》一文中学习了Operator的基础,《Prometheus Blackbox exporter》一文中学习了blackbox exporter的安装配置。

而Prometheus Operator,顾名思义,是负责K8S中自动化管理Prometheus的Custom Controller。更多内容,参考coreos/prometheus-operator

本文中,我们研究的问题是:怎样利用Prometheus Operator,在Kubernetes集群中安装部署Prometheus,并且添加Blackbox exporter组件?

2. 安装Prom Operator

参考Prometheus Operator 初体验coreos/kube-prometheus,安装Prometheus Operator。

1、kubelet配置添加参数
vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
添加:

1
2
--authentication-token-webhook=true
--authorization-mode=Webhook

2、获取源码,并切换版本(与k8s版本的对应关系可以在github仓库找到)

1
2
3
4
5
git clone https://github.com/coreos/kube-prometheus.git
cd kube-prometheus
kubectl version
git branch -a
git checkout origin/release-0.4

3、安装Prom Operator

1
2
3
4
# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources
kubectl create -f manifests/setup
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/

4、查看安装

1
2
3
kubectl get crd | grep coreos
kubectl get pod -n monitoring
kubectl get svc -n monitoring

以上,Prometheus Operator安装完成,Prometheus也安装完成。

PS:卸载Prom Operator

1
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup

3. 安装Blackbox exporter

1、创建yaml文件 blackbox-exporter.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
apiVersion: v1
data:
config.yml: |
modules:
http_2xx:
prober: http
http:
method: GET
preferred_ip_protocol: "ip4"
http_post_2xx:
prober: http
http:
method: POST
preferred_ip_protocol: "ip4"
tcp:
prober: tcp
ping:
prober: icmp
timeout: 3s
icmp:
preferred_ip_protocol: "ip4"
dns_k8s:
prober: dns
timeout: 5s
dns:
transport_protocol: "tcp"
preferred_ip_protocol: "ip4"
query_name: "kubernetes.default.svc.cluster.local"
query_type: "A"
kind: ConfigMap
metadata:
name: blackbox-exporter
namespace: monitoring
---
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
name: blackbox-exporter
cluster: ali-huabei2-dev
name: blackbox-exporter
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
name: blackbox-exporter
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
name: blackbox-exporter
cluster: ali-huabei2-dev
spec:
containers:
- image: prom/blackbox-exporter:v0.16.0
name: blackbox-exporter
ports:
- containerPort: 9115
volumeMounts:
- name: config
mountPath: /etc/blackbox_exporter
args:
- --config.file=/etc/blackbox_exporter/config.yml
- --log.level=info
volumes:
- name: config
configMap:
name: blackbox-exporter
---
apiVersion: v1
kind: Service
metadata:
#annotations:
# service.beta.kubernetes.io/alicloud-loadbalancer-address-type: intranet
labels:
name: blackbox-exporter
cluster: ali-huabei2-dev
name: blackbox-exporter
namespace: monitoring
spec:
#externalTrafficPolicy: Local
selector:
name: blackbox-exporter
ports:
- name: http-metrics
port: 9115
targetPort: 9115
type: LoadBalancer

2、应用yaml文件

1
2
3
kubectl apply -f blackbox-exporter.yaml
kubectl get svc -n monitoring
kubectl get deploy -n monitoring

4. 配置使用Blackbox exporter(错误方法)

在Prometheus中配置使用Blackbox exporter是很简单的,scrape_configs里配置相应字段即可。但是,k8s中的Prometheus配置,会有一些不同。

1、获取prometheus.yml配置

1
kubectl get secrets -n monitoring prometheus-k8s -oyaml | grep prometheus.yaml.gz | awk '{print $2}' | base64 --decode | gzip -d > prometheus.yml

2、查看prometheus.yml配置,下面截取一段:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
global:
evaluation_interval: 30s
scrape_interval: 30s
external_labels:
prometheus: monitoring/k8s
prometheus_replica: $(POD_NAME)
rule_files:
- /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml
scrape_configs:
- job_name: monitoring/node-exporter/0
honor_labels: false
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
scrape_interval: 15s
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_label_k8s_app
regex: node-exporter
- action: keep
source_labels:
- __meta_kubernetes_endpoint_port_name
regex: https
- source_labels:
- __meta_kubernetes_endpoint_address_target_kind
- __meta_kubernetes_endpoint_address_target_name
separator: ;
regex: Node;(.*)
replacement: ${1}
target_label: node
- source_labels:
- __meta_kubernetes_endpoint_address_target_kind
- __meta_kubernetes_endpoint_address_target_name
separator: ;
regex: Pod;(.*)
replacement: ${1}
target_label: pod
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: service
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- source_labels:
- __meta_kubernetes_service_name
target_label: job
replacement: ${1}
- source_labels:
- __meta_kubernetes_service_label_k8s_app
target_label: job
regex: (.+)
replacement: ${1}
- target_label: endpoint
replacement: https
- source_labels:
- __meta_kubernetes_pod_node_name
target_label: instance
regex: (.*)
replacement: $1
action: replace
- source_labels:
- __meta_kubernetes_service_label_cluster
target_label: cluster
regex: (.*)
replacement: $1
action: replace

其中,job_name配置target名称,kubernetes_sd_configs配置k8s的服务发现,relabel_configs配置标签最终的显示。source_labels是样本的原标签,target_label是显示的标签;regex使用正则匹配value,replacement代表最终显示的value。$1代表regex正则匹配到的第一个字符串。

3、添加blackbox exporter的配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
- job_name: monitoring/blackbox-exporter/0
honor_labels: false
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
scrape_interval: 15s
scheme: http
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_label_name
regex: blackbox-exporter
- source_labels:
- __meta_kubernetes_service_label_name
target_label: job
regex: (.+)
replacement: ${1}
- source_labels:
- __meta_kubernetes_service_label_cluster
target_label: cluster
regex: (.*)
replacement: $1
action: replace

4、应用新的配置

1
2
3
4
5
6
7
8
# 1. compress prometheus.yaml
cat prometheus.yaml | gzip -f | base64 | tr -d "\n"
# 2. copy string
# 3. edit secret
kubectl edit secrets -n monitoring prometheus-k8s
# 4. replace prometheus.yaml.gz
# 5. get the latest config
kubectl get secrets -n monitoring prometheus-k8s -oyaml | grep prometheus.yaml.gz | awk '{print $2}' | base64 --decode | gzip -d | grep blackbox

然而,配置中并没有blackbox,配置没有发生改变!证明了prometheus的配置是自动生成的,手动修改无效。

5. 配置使用Blackbox exporter(正确方法)

Prometheus Operator中配置Target,是利用ServiceMonitor进行动态发现的方式。

1、创建servicemonitor的yaml文件,blackbox-exporter-sm.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
name: blackbox-exporter
release: p
name: blackbox-exporter
namespace: monitoring
spec:
namespaceSelector:
matchNames:
- monitoring
selector:
matchLabels:
name: blackbox-exporter
endpoints:
- interval: 15s
port: http-metrics
path: /probe
relabelings:
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __meta_kubernetes_service_label_cluster
targetLabel: cluster
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __param_module
targetLabel: module
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __param_target
targetLabel: target
params:
module:
- http_2xx
target:
- http://prometheus.io # Target to probe with http.
- https://prometheus.io # Target to probe with https.
- http://example.com:8080 # Target to probe with http on port 8080.
- interval: 15s
port: http-metrics
path: /probe
relabelings:
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __meta_kubernetes_service_label_cluster
targetLabel: cluster
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __param_module
targetLabel: module
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __param_target
targetLabel: target
params:
module:
- dns_k8s
target:
- 172.31.16.10 # dns ip address

2、应用到k8s集群
kubectl apply -f blackbox-exporter-sm.yaml

3、等待一分钟后,进行验证
访问prometheus的graph页面,可以查看blackbox-exporter指标。

1
{job=~"blackbox-exporter",__name__!~"^go.*"}

查看结果表明,params的配置中,http_2xx 探测只有第一个target生效了,另外两个target根本没有探测记录。本实验证明了,target里只能填写一个域名,多了无效。
要想配置多个站点的探测,最简单的办法就是配置多个endpoint。至于N个站点配置M种探测方式,如果你知道怎么配置,欢迎留言告知,感谢~

6. 配置告警

《使用Docker安装配置Prometheus》一文中,我们知道配置告警需要在prometheus配置文件中指定alertmanager实例和报警的rules文件。
而通过operator部署的prometheus,怎样配置告警呢?这里需要定义PrometheusRule资源,并且具备标签 prometheus=k8s 和 role=alert-rules。
这里以配置dns服务告警为例,dns服务出问题,不能正常解析 kubernetes.default.svc.cluster.local 。

1、查看alertmanager配置

1
kubectl get secrets -n monitoring alertmanager-main -oyaml | grep "alertmanager.yaml" | awk '{print $2}' | base64 -d

2、创建prometheus-rule-dns.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: dns-alert-rules
namespace: monitoring
spec:
groups:
- name: DNS
rules:
- alert: DNSServerError
annotations:
summary: No summary
description: No description
webhookToken: xxxxxxxxx
expr: |
probe_success{module="dns_k8s"} == 0
for: 1m
labels:
severity: critical
alertTag: k8s

3、应用rule
kubectl apply -f prometheus-rule-dns.yaml