1. 前言《Kubernetes Operator》 一文中学习了Operator的基础,《Prometheus Blackbox exporter》 一文中学习了blackbox exporter的安装配置。
而Prometheus Operator,顾名思义,是负责K8S中自动化管理Prometheus的Custom Controller。更多内容,参考coreos/prometheus-operator
本文中,我们研究的问题是:怎样利用Prometheus Operator,在Kubernetes集群中安装部署Prometheus,并且添加Blackbox exporter组件?
2. 安装Prom Operator参考Prometheus Operator 初体验 和coreos/kube-prometheus ,安装Prometheus Operator。
1、kubelet配置添加参数vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
添加:
1 2 --authentication-token-webhook =true --authorization-mode =Webhook
2、获取源码,并切换版本(与k8s版本的对应关系可以在github仓库找到)
1 2 3 4 5 git clone https://github.com/coreos/kube-prometheus.git cd kube-prometheus kubectl version git branch -a git checkout origin/release-0.4
3、安装Prom Operator
1 2 3 4 # Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources kubectl create -f manifests/setup until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done kubectl create -f manifests/
4、查看安装
1 2 3 kubectl get crd | grep coreos kubectl get pod -n monitoring kubectl get svc -n monitoring
以上,Prometheus Operator安装完成,Prometheus也安装完成。
PS:卸载Prom Operator
1 kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
3. 安装Blackbox exporter1、创建yaml文件 blackbox-exporter.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 apiVersion: v1data: config.yml: | modules: http_2xx: prober: http http: method: GET preferred_ip_protocol: "ip4" http_post_2xx: prober: http http: method: POST preferred_ip_protocol: "ip4" tcp: prober: tcp ping: prober: icmp timeout: 3 s icmp: preferred_ip_protocol: "ip4" dns_k8s: prober: dns timeout: 5 s dns: transport_protocol: "tcp" preferred_ip_protocol: "ip4" query_name: "kubernetes.default.svc.cluster.local" query_type: "A" kind: ConfigMapmetadata: name: blackbox-exporter namespace: monitoring--- apiVersion: apps/v1kind: Deploymentmetadata: creationTimestamp: null labels: name: blackbox-exporter cluster: ali-huabei2-dev name: blackbox-exporter namespace: monitoringspec: replicas: 1 selector: matchLabels: name: blackbox-exporter strategy: { } template: metadata: creationTimestamp: null labels: name: blackbox-exporter cluster: ali-huabei2-dev spec: containers: - image: prom/blackbox-exporter:v0.16 .0 name: blackbox-exporter ports: - containerPort: 9115 volumeMounts: - name: config mountPath: /etc/ blackbox_exporter args: - --config.file = /etc/ blackbox_exporter/config.yml - --log.level = info volumes: - name: config configMap: name: blackbox-exporter--- apiVersion: v1kind: Servicemetadata: #annotations: # service.beta.kubernetes.io/alicloud-loadbalancer-address-type: intranet labels: name: blackbox-exporter cluster: ali-huabei2-dev name: blackbox-exporter namespace: monitoringspec: #externalTrafficPolicy: Local selector: name: blackbox-exporter ports: - name: http-metrics port: 9115 targetPort: 9115 type: LoadBalancer
2、应用yaml文件
1 2 3 kubectl apply -f blackbox-exporter.yaml kubectl get svc -n monitoring kubectl get deploy -n monitoring
4. 配置使用Blackbox exporter(错误方法)在Prometheus中配置使用Blackbox exporter是很简单的,scrape_configs里配置相应字段即可。但是,k8s中的Prometheus配置,会有一些不同。
1、获取prometheus.yml配置
1 kubectl get secrets -n monitoring prometheus-k8s -oyaml | grep prometheus.yaml.gz | awk '{print $2}' | base64 --decode | gzip -d > prometheus.yml
2、查看prometheus.yml配置,下面截取一段:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 global : evaluation_interval : 30s scrape_interval : 30s external_labels : prometheus : monitoring/k8s prometheus_replica : $(POD_NAME) rule_files : - /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml scrape_configs : - job_name: monitoring/node-exporter/0 honor_labels : false kubernetes_sd_configs : - role: endpoints namespaces : names : - monitoring scrape_interval : 15s scheme : https tls_config : insecure_skip_verify : true bearer_token_file : /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs : - action: keep source_labels : - __meta_kubernetes_service_label_k8s_app regex : node-exporter - action: keep source_labels : - __meta_kubernetes_endpoint_port_name regex : https - source_labels: - __meta_kubernetes_endpoint_address_target_kind - __meta_kubernetes_endpoint_address_target_name separator : ; regex : Node;(.*) replacement : ${1} target_label : node - source_labels: - __meta_kubernetes_endpoint_address_target_kind - __meta_kubernetes_endpoint_address_target_name separator : ; regex : Pod;(.*) replacement : ${1} target_label : pod - source_labels: - __meta_kubernetes_namespace target_label : namespace - source_labels: - __meta_kubernetes_service_name target_label : service - source_labels: - __meta_kubernetes_pod_name target_label : pod - source_labels: - __meta_kubernetes_service_name target_label : job replacement : ${1} - source_labels: - __meta_kubernetes_service_label_k8s_app target_label : job regex : (.+) replacement : ${1} - target_label: endpoint replacement : https - source_labels: - __meta_kubernetes_pod_node_name target_label : instance regex : (.*) replacement : $1 action : replace - source_labels: - __meta_kubernetes_service_label_cluster target_label : cluster regex : (.*) replacement : $1 action : replace
其中,job_name配置target名称,kubernetes_sd_configs配置k8s的服务发现,relabel_configs配置标签最终的显示。source_labels是样本的原标签,target_label是显示的标签;regex使用正则匹配value,replacement代表最终显示的value。$1
代表regex正则匹配到的第一个字符串。
3、添加blackbox exporter的配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 - job_name: monitoring/blackbox-exporter/0 honor_labels : false kubernetes_sd_configs : - role: endpoints namespaces : names : - monitoring scrape_interval : 15s scheme : http tls_config : insecure_skip_verify : true bearer_token_file : /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs : - action: keep source_labels : - __meta_kubernetes_service_label_name regex : blackbox-exporter - source_labels: - __meta_kubernetes_service_label_name target_label : job regex : (.+) replacement : ${1} - source_labels: - __meta_kubernetes_service_label_cluster target_label : cluster regex : (.*) replacement : $1 action : replace
4、应用新的配置
1 2 3 4 5 6 7 8 cat prometheus.yaml | gzip -f | base64 | tr -d "\n" kubectl edit secrets -n monitoring prometheus-k8s kubectl get secrets -n monitoring prometheus-k8s -oyaml | grep prometheus.yaml.gz | awk '{print $2}' | base64 --decode | gzip -d | grep blackbox
然而,配置中并没有blackbox,配置没有发生改变!证明了prometheus的配置是自动生成的,手动修改无效。
5. 配置使用Blackbox exporter(正确方法)Prometheus Operator中配置Target,是利用ServiceMonitor进行动态发现的方式。
1、创建servicemonitor的yaml文件,blackbox-exporter-sm.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 apiVersion : monitoring.coreos.com/v1 kind : ServiceMonitor metadata : labels : name : blackbox-exporter release : p name : blackbox-exporter namespace : monitoring spec : namespaceSelector : matchNames : - monitoring selector : matchLabels : name : blackbox-exporter endpoints : - interval: 15s port : http-metrics path : /probe relabelings : - action: replace regex : (.*) replacement : $1 sourceLabels : - __meta_kubernetes_service_label_cluster targetLabel : cluster - action: replace regex : (.*) replacement : $1 sourceLabels : - __param_module targetLabel : module - action: replace regex : (.*) replacement : $1 sourceLabels : - __param_target targetLabel : target params : module : - http_2xx target : - http://prometheus.io # Target to probe with http. - https://prometheus.io # Target to probe with https. - http://example.com:8080 # Target to probe with http on port 8080. - interval: 15s port : http-metrics path : /probe relabelings : - action: replace regex : (.*) replacement : $1 sourceLabels : - __meta_kubernetes_service_label_cluster targetLabel : cluster - action: replace regex : (.*) replacement : $1 sourceLabels : - __param_module targetLabel : module - action: replace regex : (.*) replacement : $1 sourceLabels : - __param_target targetLabel : target params : module : - dns_k8s target : - 172.31.16.10 # dns ip address
2、应用到k8s集群kubectl apply -f blackbox-exporter-sm.yaml
3、等待一分钟后,进行验证 访问prometheus的graph页面,可以查看blackbox-exporter指标。
1 {job =~"blackbox-exporter" ,__name__!~"^go.*" }
查看结果表明,params的配置中,http_2xx 探测只有第一个target生效了,另外两个target根本没有探测记录。本实验证明了,target里只能填写一个域名,多了无效。 要想配置多个站点的探测,最简单的办法就是配置多个endpoint。至于N个站点配置M种探测方式,如果你知道怎么配置,欢迎留言告知,感谢~
6. 配置告警《使用Docker安装配置Prometheus》 一文中,我们知道配置告警需要在prometheus配置文件中指定alertmanager实例和报警的rules文件。 而通过operator部署的prometheus,怎样配置告警呢?这里需要定义PrometheusRule资源,并且具备标签 prometheus=k8s 和 role=alert-rules。 这里以配置dns服务告警为例,dns服务出问题,不能正常解析 kubernetes.default.svc.cluster.local 。
1、查看alertmanager配置
1 kubectl get secrets -n monitoring alertmanager-main -oyaml | grep "alertmanager.yaml" | awk '{print $2 }' | base64 -d
2、创建prometheus-rule-dns.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: prometheus: k8s role: alert-rules name: dns-alert-rules namespace: monitoring spec: groups: - name: DNS rules: - alert: DNSServerError annotations: summary: No summary description: No description webhookToken: xxxxxxxxx expr: | probe_success{module="dns_k8s"} == 0 for: 1m labels: severity: critical alertTag: k8s
3、应用rulekubectl apply -f prometheus-rule-dns.yaml