Prometheus Exporter - 直接检测与自定义收集器

我目前正在为遥测网络应用程序编写一个 Prometheus 导出器。

我已阅读此处的文档“编写导出器”,虽然我了解实现自定义收集器以避免竞争条件的用例,但我不确定我的用例是否适合直接检测。

基本上,网络指标通过网络设备通过 gRPC 进行流式传输,因此我的导出器只需接收它们,而不必有效地抓取它们。

我使用以下代码直接检测:

  • 我使用 promauto 包声明我的指标以保持代码紧凑:

package metrics


import (

    "github.com/lucabrasi83/prom-high-obs/proto/telemetry"

    "github.com/prometheus/client_golang/prometheus"

    "github.com/prometheus/client_golang/prometheus/promauto"

)


var (

    cpu5Sec = promauto.NewGaugeVec(


        prometheus.GaugeOpts{

            Name: "cisco_iosxe_iosd_cpu_busy_5_sec_percentage",

            Help: "The IOSd daemon CPU busy percentage over the last 5 seconds",

        },

        []string{"node"},

    )

下面是我如何简单地设置 gRPC 协议缓冲区解码消息的指标值:

cpu5Sec.WithLabelValues(msg.GetNodeIdStr()).Set(float64(val))

最后,这是我的主循环,它基本上处理我感兴趣的指标的遥测 gRPC 流:

for {


        req, err := stream.Recv()

        if err == io.EOF {

            return nil

        }

        if err != nil {

            logging.PeppaMonLog(

                "error",

                fmt.Sprintf("Error while reading client %v stream: %v", clientIPSocket, err))


            return err

        }


        data := req.GetData()


        msg := &telemetry.Telemetry{}


        err = proto.Unmarshal(data, msg)


        if err != nil {

            log.Fatalln(err)

        }


        if !logFlag {

            logging.PeppaMonLog(

                "info",

                fmt.Sprintf(

                    "Telemetry Subscription Request Received - Client %v - Node %v - YANG Model Path %v",

                    clientIPSocket, msg.GetNodeIdStr(), msg.GetEncodingPath(),

                ),

            )

        }

        }

}


我使用 Grafana 作为前端,到目前为止,在关联 Prometheus 公开的指标与直接在设备上检查指标时,还没有看到任何特定的差异。


所以我想了解这是否遵循 Prometheus 最佳实践,或者我仍然应该采用自定义收集器路线。



浮云间
浏览 156回答 1
1回答

凤凰求蛊

您没有遵循最佳实践,因为您正在使用您链接到的文章所警告的全局指标。使用您当前的实现,在设备断开连接后(或者更准确地说,直到您的导出器重新启动),您的仪表板将永远显示 CPU 指标的一些任意且恒定的值。相反,RPC 方法应该维护一组本地指标,并在方法返回后将其删除。这样,当设备断开连接时,设备的指标就会从抓取输出中消失。这是执行此操作的一种方法。它使用包含当前活动指标的地图。每个映射元素都是一个特定流的一组指标(我理解它对应于一个设备)。一旦流结束,该条目就会被删除。package mainimport (&nbsp; &nbsp; "sync"&nbsp; &nbsp; "github.com/prometheus/client_golang/prometheus")// Exporter is a prometheus.Collector implementation.type Exporter struct {&nbsp; &nbsp; // We need some way to map gRPC streams to their metrics. Using the stream&nbsp; &nbsp; // itself as a map key is simple enough, but anything works as long as we&nbsp; &nbsp; // can remove metrics once the stream ends.&nbsp; &nbsp; sync.Mutex&nbsp; &nbsp; Metrics map[StreamServer]*DeviceMetrics}type DeviceMetrics struct {&nbsp; &nbsp; sync.Mutex&nbsp; &nbsp; CPU prometheus.Metric}// Globally defined descriptions are fine.var cpu5SecDesc = prometheus.NewDesc(&nbsp; &nbsp; "cisco_iosxe_iosd_cpu_busy_5_sec_percentage",&nbsp; &nbsp; "The IOSd daemon CPU busy percentage over the last 5 seconds",&nbsp; &nbsp; []string{"node"},&nbsp; &nbsp; nil, // constant labels)// Collect implements prometheus.Collector.func (e *Exporter) Collect(ch chan<- prometheus.Metric) {&nbsp; &nbsp; // Copy current metrics so we don't lock for very long if ch's consumer is&nbsp; &nbsp; // slow.&nbsp; &nbsp; var metrics []prometheus.Metric&nbsp; &nbsp; e.Lock()&nbsp; &nbsp; for _, deviceMetrics := range e.Metrics {&nbsp; &nbsp; &nbsp; &nbsp; deviceMetrics.Lock()&nbsp; &nbsp; &nbsp; &nbsp; metrics = append(metrics,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; deviceMetrics.CPU,&nbsp; &nbsp; &nbsp; &nbsp; )&nbsp; &nbsp; &nbsp; &nbsp; deviceMetrics.Unlock()&nbsp; &nbsp; }&nbsp; &nbsp; e.Unlock()&nbsp; &nbsp; for _, m := range metrics {&nbsp; &nbsp; &nbsp; &nbsp; if m != nil {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ch <- m&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}// Describe implements prometheus.Collector.func (e *Exporter) Describe(ch chan<- *prometheus.Desc) {&nbsp; &nbsp; ch <- cpu5SecDesc}// Service is the gRPC service implementation.type Service struct {&nbsp; &nbsp; exp *Exporter}func (s *Service) RPCMethod(stream StreamServer) (*Response, error) {&nbsp; &nbsp; deviceMetrics := new(DeviceMetrics)&nbsp; &nbsp; s.exp.Lock()&nbsp; &nbsp; s.exp.Metrics[stream] = deviceMetrics&nbsp; &nbsp; s.exp.Unlock()&nbsp; &nbsp; defer func() {&nbsp; &nbsp; &nbsp; &nbsp; // Stop emitting metrics for this stream.&nbsp; &nbsp; &nbsp; &nbsp; s.exp.Lock()&nbsp; &nbsp; &nbsp; &nbsp; delete(s.exp.Metrics, stream)&nbsp; &nbsp; &nbsp; &nbsp; s.exp.Unlock()&nbsp; &nbsp; }()&nbsp; &nbsp; for {&nbsp; &nbsp; &nbsp; &nbsp; req, err := stream.Recv()&nbsp; &nbsp; &nbsp; &nbsp; // TODO: handle error&nbsp; &nbsp; &nbsp; &nbsp; var msg *Telemetry = parseRequest(req) // Your existing code that unmarshals the nested message.&nbsp; &nbsp; &nbsp; &nbsp; var (&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; metricField *prometheus.Metric&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; metric&nbsp; &nbsp; &nbsp; prometheus.Metric&nbsp; &nbsp; &nbsp; &nbsp; )&nbsp; &nbsp; &nbsp; &nbsp; switch msg.GetEncodingPath() {&nbsp; &nbsp; &nbsp; &nbsp; case CpuYANGEncodingPath:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; metricField = &deviceMetrics.CPU&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; metric = prometheus.MustNewConstMetric(&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cpu5SecDesc,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; prometheus.GaugeValue,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ParsePBMsgCpuBusyPercent(msg), // func(*Telemetry) float64&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "node", msg.GetNodeIdStr(),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; )&nbsp; &nbsp; &nbsp; &nbsp; default:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; deviceMetrics.Lock()&nbsp; &nbsp; &nbsp; &nbsp; *metricField = metric&nbsp; &nbsp; &nbsp; &nbsp; deviceMetrics.Unlock()&nbsp; &nbsp; }&nbsp; &nbsp; return nil, &Response{}}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Go