Expose the measurement that kubelet uses to judge that "PLEG is
unhealthy". If we can observe the measurement growing then we can
alert before the node goes unhealthy.
Note that the existing metrics PLEGRelistInterval and
PLEGRelistDuration are poor for this, because when relist() gets
stuck they are never updated.
Signed-off-by: Bryan Boreham <bryan@weave.works>
A preemption is a disruption event that should have a metric so that
the rate of preemption can be assessed. Nodes that are under heavy
preemption may have conflicting workloads or otherwise need attention.
A sudden burst of preemption on a cluster in steady state could
indicate pathological conditions within the scheduler or workload
controllers.
As already mentioned in this issue https://github.com/kubernetes/kubernetes/issues/79286, some metrics like
"running_pod_count" and "running_container_count" uses non-standard prometheus metrics, this change converts them to be
standard prometheus gauges
Minor refactor in kubelet/pleg/generic.go and added some test for ruuning container and running pod metrics
Fixed issues related to github CI pipeline failure
* Updated bazel for new deps
* Add comment for exported metrics variables,RuuningContainerCount and RunningPodCount
* Specify keys explicitly in Guage metric instantation
Fix go lint errors
Replace "+=1" with "++", as reported by go lint
Set container state as a label for the metrics "running_container_count"
As per the metrics name "running_container_count" it should "ideally" be showing
the number of containers in "running" state , but it was showing all the container count, irrespective of the state it is in.
This commit adds a new label "container_running_state" to the metrics "running_container_count", which doesn't change the base metrics but adds the
option to query the metrics with "container_state" such as "running"/"unknown/...
remove unused methods reported by staticcheck
Remove variables while instantiating gauge(vec) which are default set to nil
Convert kubelet metrics(running_pod_count and running_container_count) to standard gauges and added label to running_container_count metrics.
Currently kubelet metrics(running_pod_count and running_container_count) use non-standard prometheus collectors , this change
converts them to standard prometheus gauges. Also this adds a new label(container_state) to running_container_count which does a breakdown of
containers tracked by kubelet based on the containers' state(running/unknown/created/exited).
Set statbility explicitly for running_pod_count and running_container_count and reformat test
register metrics explicitly in test , so that they don't become no-op
1) Add suffix (`seconds` or `total`) to metric name
2) Switch Summary metric to Histogram metric (Summary metrics are not
supported completely by prometheus-to-sd and can't be aggregated.)
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
* Metrics will not be expose until they are hooked up to a handler
* Metrics are not cached and expose a dos vector, this must be fixed before release or the stats should not be exposed through an api endpoint
Add a new latency metric for the time from seeing the pod for the first time
to starting a pod worker for it.
Also, change PodStartLatency to include this initial processing latency.
Functions Build/ParseDockerName now work with struct instead of the long
list of arguments. This new struct also was reused in the kubelet.go
instead of auxilary podContainer struct.