【Prometheus Operator 监控 K8S集群的Calico 与 Ingress-Nginx 组件】
2026/6/10 19:59:22 网站建设 项目流程

提示:本文原创作品,良心制作,干货为主,简洁清晰,一看就会

文章目录

  • 前言
  • 一、整体概述
    • 1.1 实验环境
    • 1.2 K8s基于Prometheus Operator全链路监控指标采集流程
  • 二、监控Calico组件
    • 2.1 监控 Calico Felix
      • 查看calico-node基本情况
      • 打开metrics监听
      • 创建svc
      • 创建serviceMonitor
      • prometheus开跨命名空间的权限
      • grafana添加dashboard面板
    • 2.2 监控kube-controllers
      • 查看kube-controllers基本情况
      • 创建servicemonitor
  • 三、监控ingress-nginx
    • 3.1 打开metrics监听及svc映射端口
    • 3.2 创建servicemonitor
    • 3.3 grafana添加dashboard面板

前言

此前我们借助 Prometheus Operator 完成了 K8s 高可用集群监控,覆盖 etcd、kube-controller-manager、kube-scheduler、kube-proxy 等组件。本文继续讲解 Calico 与 ingress-nginx 的监控配置,新手可参考往期两篇文章查漏补缺
Prometheus Operator监控K8S高可用集群及Etcd数据库:https://blog.csdn.net/m0_63756214/article/details/161484786?spm=1001.2014.3001.5501
Prometheus Operator 监控 K8s控制器、调度器、代理组件:https://blog.csdn.net/m0_63756214/article/details/161627526?spm=1001.2014.3001.5501


一、整体概述

1.1 实验环境

我的实验环境:

主机名ip作用
K8s-master1192.168.13.136k8s控制节点
K8s-master2192.168.13.137k8s控制节点
K8s-master3192.168.13.138k8s控制节点
k8s-node1192.168.13.139k8s工作节点
k8s-node2192.168.13.140k8s工作节点
NFS192.168.13.141NFS服务端,提供存储

1.2 K8s基于Prometheus Operator全链路监控指标采集流程

用 Prometheus Operator 标准化管理 Prometheus,ServiceMonitor 声明式配置监控规则,依托 K8s 原生 Service/Endpoints 自动发现 Pod,完成指标采集存储并由 Grafana 展示的云原生监控全流程

【用户编写资源】 ↓ 1. ServiceMonitor(CRD:采集规则说明书) 配置:标签筛选规则、metrics端口、抓取周期 ↓(Operator持续监听CRD变更) 2. Prometheus Operator控制器 ✅ 自动解析所有ServiceMonitor ✅ 自动生成prometheus.yml配置文件(内置kubernetes_sd_configs K8s服务发现+relabel规则) ✅ 热更新Prometheus配置,无需手动重启Prometheus Pod ↓(配置下发至Prometheus实例) 3. Prometheus Server 依托配置调用K8s APIServer,触发【K8s Endpoints服务发现】 ↓(APIServer查询集群资源关联关系) 4. Service资源(关键中转层) · Service依靠selector标签绑定后端Pod · K8s控制器自动生成对应Endpoints(Endpoints=Pod真实IP+端口清单) · ServiceMonitor靠spec.selector匹配Service标签,命中即纳入采集任务 ↓(解析Endpoints列表) 5. Endpoints = 实际监控目标(每个Pod IP:metrics端口) ↓ 6. 业务/组件Pod(kube-controller-manager/etcd/node-exporter等) 暴露 /metrics 指标接口 ↓ 7. Prometheus定时拉取指标存入时序数据库 ↓ 8. Grafana配置Dashboard读取Prometheus数据源,可视化展示监控数据

理解上述流程,能帮助我们更好的监控下面指标


二、监控Calico组件

Calico的组件:
Felix:Calico 的 “大脑”,运行在每个节点上,负责所有网络策略的实现
Typha:可选扩展组件,专门优化节点和数据存储之间的通信,提升集群规模上限
kube-controllers:Calico 的控制平面管家,负责资源清理、K8s API 同步等核心管理工作
监控配置:以上组件均可配置向 Prometheus 暴露监控指标,实现全链路监控覆盖

2.1 监控 Calico Felix

查看calico-node基本情况

## 1. 查看calico-node podroot@k8s-master1:~# kubectl get pod -n calico-system | grep calico-nodecalico-node-7zpmm 1/1 Running 1 (5h36m ago) 24h calico-node-8mbk7 1/1 Running 1 (5h55m ago) 23h calico-node-gj5pz 1/1 Running 1 (5h36m ago) 24h calico-node-k6wpw 1/1 Running 1 (5h36m ago) 24h calico-node-k95vj 1/1 Running 1 (5h36m ago) 24h## 查看pod标签root@k8s-master1:~# kubectl get pod calico-node-7zpmm -n calico-system --show-labelsNAME READY STATUS RESTARTS AGE LABELS calico-node-7zpmm 1/1 Running 1 (5h37m ago) 24h app.kubernetes.io/name=calico-node,controller-revision-hash=5f84f7dcd7,k8s-app=calico-node,pod-template-generation=6## 2. 查看calico-node有没有svc,可以看到没有root@k8s-master1:~# kubectl get svc -n calico-systemNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE calico-kube-controllers-metrics ClusterIP None <none>9094/TCP 58d calico-typha ClusterIP 10.105.252.79 <none>5473/TCP 58d## 3. 查看calico-node有没有自动生成servicemonitor,可以看到没有root@k8s-master1:~# kubectl get servicemonitor -n monitoring | grep calico

由于calico-node没有svc和servicemonitor ,所以接下来我们要创建相关的svc和servicemonitor


打开metrics监听

在此之前我们要打开calico的metrics监听,calico在启动时,默认没有打开metrics监听

root@k8s-master1:~# kubectl patch felixconfiguration default --type merge --patch '{"spec":{"prometheusMetricsEnabled": true}}'root@k8s-master1:~# curl http://192.168.13.136:9091/metrics


创建svc

root@k8s-master1:~# cd /k8s/svc/root@k8s-master1:/k8s/svc# vim calico.yamlapiVersion:v1kind:Servicemetadata:name:felix-metrics-svcnamespace:calico-systemlabels:k8s-app:calico-nodespec:clusterIP:Noneselector:k8s-app:calico-node# 要和pod的标签匹配ports:-name:http-metricsport:9091targetPort:9091root@k8s-master1:/k8s/svc#root@k8s-master1:/k8s/svc# kubectl apply -f calico.yaml

创建serviceMonitor

root@k8s-master1:~# cd kube-prometheus/manifests/root@k8s-master1:~/kube-prometheus/manifests# vim calico-serviceMonitor.yamlapiVersion:monitoring.coreos.com/v1kind:ServiceMonitormetadata:name:calico-node-metricsnamespace:monitoringspec:selector:matchLabels:k8s-app:calico-node# 抓取calico-node的svc标签namespaceSelector:matchNames:-calico-system# calico-node所在命名空间endpoints:-port:http-metricsinterval:15spath:/metrics root@k8s-master1:~/kube-prometheus/manifests# kubectl apply -f calico-serviceMonitor.yaml

prometheus开跨命名空间的权限

由于我的calico不在kube-system下,所以我现在需要对prometheus开跨命名空间的权限,让它能发现并监控 Calico 的指标

## prometheus访问k8s看有没有权限查看calico-system命名空间下的endpointsroot@k8s-master1:~# kubectl auth can-i list endpoints --as=system:serviceaccount:monitoring:prometheus-k8s -n calico-systemno
## prometheus开跨命名空间的权限root@k8s-master1:~# vim kube-prometheus/manifests/prometheus-rbac.yaml# 创建一个集群级角色apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRolemetadata:name:prometheus-cross-namespacerules:-apiGroups:[""]resources:["services","endpoints","pods"]verbs:["get","list","watch"]---# 上面的权限绑定给Prometheus账号apiVersion:rbac.authorization.k8s.io/v1kind:ClusterRoleBindingmetadata:name:prometheus-cross-namespace-bindingsubjects:-kind:ServiceAccountname:prometheus-k8snamespace:monitoringroleRef:kind:ClusterRolename:prometheus-cross-namespaceapiGroup:rbac.authorization.k8s.io root@k8s-master1:~# kubectl apply -f kube-prometheus/manifests/prometheus-rbac.yaml
## 再次查看root@k8s-master1:~# kubectl auth can-i list endpoints --as=system:serviceaccount:monitoring:prometheus-k8s -n calico-systemyes

浏览器访问Prometheus,可以看到已经监控到calico felix实例了


grafana添加dashboard面板

https://grafana.com/grafana/dashboards/12175-calico-felix/



2.2 监控kube-controllers

查看kube-controllers基本情况

## 1. 查看kube-controllers podroot@k8s-master1:~# kubectl get pod -n calico-system | grep kube-controllerscalico-kube-controllers-65c5875bc8-t8m28 1/1 Running 9 (56m ago) 60d## 2. 查看pod标签root@k8s-master1:~# kubectl get pod calico-kube-controllers-65c5875bc8-t8m28 -n calico-system --show-labelsNAME READY STATUS RESTARTS AGE LABELS calico-kube-controllers-65c5875bc8-t8m28 1/1 Running 9 (64m ago) 60d app.kubernetes.io/name=calico-kube-controllers,k8s-app=calico-kube-controllers,pod-template-hash=65c5875bc8## 3. 查看kube-controllers有没有自动创建svc,可以看到是有的root@k8s-master1:~# kubectl get svc -n calico-system | grep calico-kube-controllers-metricscalico-kube-controllers-metrics ClusterIP None <none>9094/TCP 60d## 4. 查看svc详细信息root@k8s-master1:~/kube-prometheus/manifests# kubectl get svc calico-kube-controllers-metrics -n calico-system -o yamlapiVersion:v1kind:Servicemetadata:annotations:prometheus.io/port:"9094"prometheus.io/scrape:"true"creationTimestamp:"2026-04-08T06:59:06Z"labels:k8s-app:calico-kube-controllersname:calico-kube-controllers-metricsnamespace:calico-systemownerReferences:-apiVersion:operator.tigera.io/v1blockOwnerDeletion:truecontroller:truekind:Installationname:defaultuid:1541d167-34fd-4d93-99d2-574abf3eae7eresourceVersion:"12791"uid:cbe84cf1-7504-4f5c-99b4-988bd5e95dc1spec:clusterIP:NoneclusterIPs:-NoneinternalTrafficPolicy:ClusteripFamilies:-IPv4ipFamilyPolicy:SingleStackports:-name:metrics-portport:9094protocol:TCPtargetPort:9094selector:k8s-app:calico-kube-controllerssessionAffinity:Nonetype:ClusterIPstatus:loadBalancer:{}## 5. 查看kube-controllers有没有自动创建servicemonitor,可以看到没有root@k8s-master1:~# kubectl get servicemonitor -n monitoring | grep calicocalico-node-metrics 2d22h

由上面的情况,我们得知,kube-controllers已经有了svc但是没有servicemonitor ,所以接下来我们要创建servicemonitor


创建servicemonitor

root@k8s-master1:~# cd kube-prometheus/manifests/root@k8s-master1:~/kube-prometheus/manifests# vim calico-kube-controllers-serviceMonitor.yamlapiVersion:monitoring.coreos.com/v1kind:ServiceMonitormetadata:name:calico-kube-controllersnamespace:monitoringlabels:app:calico-kube-controllersspec:jobLabel:calico-kube-controllersendpoints:-interval:30sport:metrics-port# 必须和 Service 的 ports.name 完全一致scheme:httpselector:matchLabels:k8s-app:calico-kube-controllers# 必须和 metrics Service 的标签一致namespaceSelector:matchNames:-calico-system root@k8s-master1:~/kube-prometheus/manifests# kubectl apply -f calico-kube-controllers-serviceMonitor.yaml

浏览器访问Prometheus,可以看到已经监控到kube-controllers实例了

## 如果没看到实例,可以执行下面这一步试试,或者等一会再查看Prometheus实例root@k8s-master1:~/kube-prometheus/manifests# kubectl -n monitoring port-forward svc/prometheus-k8s 9090:9090

三、监控ingress-nginx

3.1 打开metrics监听及svc映射端口

root@k8s-master1:~# cd /k8s/ingress-nginx/root@k8s-master1:/k8s/ingress-nginx# lsalertmanager-ingress.yaml grafana-ingress.yaml ingress-nginx.yaml prometheus-ingress.yaml root@k8s-master1:/k8s/ingress-nginx# vim ingress-nginx.yaml

## 更新ingress-nginxroot@k8s-master1:/k8s/ingress-nginx# kubectl replace -f ingress-nginx.yamlroot@k8s-master1:/k8s/ingress-nginx# curl 192.168.13.139:10254/metrics

root@k8s-master1:/k8s/ingress-nginx# kubectl get svc -n ingress-nginxNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller NodePort 10.107.51.27 <none>80:31303/TCP,443:31299/TCP,10254:31235/TCP 4m12s ingress-nginx-controller-admission ClusterIP 10.105.53.5 <none>443/TCP 4m12s

3.2 创建servicemonitor

root@k8s-master1:~# cd kube-prometheus/manifests/root@k8s-master1:~/kube-prometheus/manifests# vim ingress-nginx-servicemonitor.yamlapiVersion:monitoring.coreos.com/v1kind:ServiceMonitormetadata:name:ingress-nginx-controllernamespace:monitoringlabels:app:ingress-nginxspec:endpoints:-interval:30sport:http-metrics# 必须与 Service 中的 ports.name 完全一致path:/metricsscheme:httpselector:# 匹配ingress-nginx svc标签matchLabels:app.kubernetes.io/name:ingress-nginxapp.kubernetes.io/component:controllernamespaceSelector:matchNames:-ingress-nginx# ingress-nginx svc所在命名空间

浏览器访问Prometheus,可以看到ingress-nginx实例已经被添加


3.3 grafana添加dashboard面板

https://grafana.com/grafana/dashboards/14314-kubernetes-nginx-ingress-controller-nextgen-devops-nirvana/

至此,Prometheus Operator监控calico组件及ingress-nginx就完成了!



注:
文中若有疏漏,欢迎大家指正赐教。
本文为100%原创,转载请务必标注原创作者,尊重劳动成果。
求赞、求关注、求评论!你的支持是我更新的最大动力,评论区等你~

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询