Ceph 监控中应用 Prometheus relabel 功能-原创手记-慕课网

webp

relabel

1. 问题描述

工作环境中有三个独立的 Ceph 集群，分别负责对象存储、块存储和文件存储。搭建这几个 Ceph 集群时，我对 Ceph 重命名 Cluster name 的难度没有足够的了解，所以使用的都是默认的 cluster name：ceph，不巧的是 Prometheus 的 ceph_exporter 就是用 cluster name 来区分不同集群，结果是 Grafana 中各个集群的数据无法区分，所有的集群数据都绘制在了一个图标中，非常乱不说，而且部分数据还无法正常显示。

也许大家会说，那就改 Ceph cluster name 不就好了。问题是 Ceph 修改 Cluster name 没那么简单，ceph 文件存储目录都是和 Cluster name 有对应关系的，所以很多配置文件和数据都需要修改目录才能生效，对于已经开始正式使用的 Ceph 集群，这么做风险有点大。当然如果给每个 Ceph 集群单独搭建一个 Prometheus 和 Grafana 环境的话，问题也能解决，但这种方式显得太没技术含量了，不到万不得已，实在不想采用。

我最开始想到的解决方式是修改 ceph_exporter，既然 cluster name 不行，那加上 Ceph 的 fsid 总能区分出来了吧，就像这样：

webp

image.png

不过 fsid 这个变量很难直观看出来代表的是哪个 Ceph 集群，也不是一个好的方案。

最后多亏 neurodrone，才了解到 Prometheus 的 relabel 功能，可以完美的解决这个问题。

2. relabel 配置

Relabel 的本意其实修改导出 metrics 信息的 label 字段，可以对 metrics 做过滤，删除某些不必要的 metrics，label 重命名等，而且也支持对 label 的值作出修改。

举一个例子，三个集群的 ceph_pool_write_total 的 label cluster 取值都为 ceph。但在 Prometheus 的配置中，他们分别是分属于不通 job 的，我们可以通过对 job 进行 relabel 来修改 cluster label 的指，来完成区分。

# cluster1's metricceph_pool_write_total{cluster="ceph",pool=".rgw.root"} 4# cluster2's metricceph_pool_write_total{cluster="ceph",pool=".rgw.root"} 10# cluster3's metricceph_pool_write_total{cluster="ceph",pool=".rgw.root"} 7

具体的配置如下，cluster label 的值就改为了 ceph*，并且导出到了新 label clusters 中。

scrape_configs:
  - job_name: 'ceph1'
    relabel_configs:
    - source_labels: ["cluster"]
      replacement: "ceph1"
      action: replace
      target_label: "clusters"
    static_configs:
    - targets: ['ceph1:9128']
      labels:        alias: ceph1

  - job_name: 'ceph2'
    relabel_configs:
    - source_labels: ["cluster"]
      replacement: "ceph2"
      action: replace
      target_label: "clusters"
    static_configs:
    - targets: ['ceph2:9128']
      labels:        alias: ceph2

  - job_name: 'ceph3'
    relabel_configs:
    - source_labels: ["cluster"]
      replacement: "ceph3"
      action: replace
      target_label: "clusters"
    static_configs:
    - targets: ['ceph3:9128']
      labels:        alias: ceph3

修改后的 metric 信息变成这个样子，这样我们就可以区分出不同的 Ceph 集群的数据了。

# cluster1's metricceph_pool_write_total{clusters="ceph1",pool=".rgw.root"} 4# cluster2's metricceph_pool_write_total{clusters="ceph2",pool=".rgw.root"} 10# cluster3's metricceph_pool_write_total{clusters="ceph3",pool=".rgw.root"} 7

3. Grafana dashboard 调整

光是修改 Prometheus 的配置还不够，毕竟我们还要在界面上能体现出来，Grafana 的 dashboard 也要做对应的修改，本文使用的 dashboard 是 Ceph - Cluster。

首先是要 dashboard 添加 clusters 变量，在界面上操作即可。
先点击 dashboard 的 "settings" 按钮（显示齿轮图标的就是）

webp

image.png

如下图所示添加 clusters variable，最后保存。

webp

image.png

我们已经可以在 dashboard 上看到新加的 variable 了：

webp

image.png

接下来每个图表的查询语句也要做对应的修改：

webp

image.png

作者：blackpiglet
链接：https://www.jianshu.com/p/bb94c24e55de