Commit Graph

184 Commits

Author SHA1 Message Date
Swati Sehgal
bc941633c1 node: topology-mgr: add metric to measure topology mgr admission latency
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-02-15 13:59:47 +00:00
Kubernetes Prow Robot
4df945853e
Merge pull request #115137 from swatisehgal/topologymgr-metrics
node: topologymgr: add metrics about admission requests and errors
2023-01-30 18:43:00 -08:00
Patrick Ohly
bc6c7fa912 logging: fix names of keys
The stricter checking with the upcoming logcheck v0.4.1 pointed out these names
which don't comply with our recommendations in
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments.
2023-01-23 14:24:29 +01:00
Swati Sehgal
172c55d310 node: topologymgr: add metrics about admission requests and errors
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-01-17 17:50:29 +00:00
Paco Xu
70e56fa71a cleanup: EphemeralContainers feature gate related codes 2023-01-15 21:15:01 +08:00
Peter Hunt
1a7388c2ef kubelet/metrics: add cri_metrics
that pulls metrics from the CRI

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2022-11-08 14:47:08 -05:00
David Ashpole
64af1adace
Second attempt: Plumb context to Kubelet CRI calls (#113591)
* plumb context from CRI calls through kubelet

* clean up extra timeouts

* try fixing incorrectly cancelled context
2022-11-05 06:02:13 -07:00
Kubernetes Prow Robot
1bf4af4584
Merge pull request #111930 from azylinski/new-histogram-pod_start_sli_duration_seconds
New histogram: Pod start SLI duration
2022-11-04 07:28:14 -07:00
Francesco Romani
ff44dc1932 cpumanager: the FG is locked to default (ON)
hence we can remove the if() guards, the feature
is always available.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-11-02 18:41:41 +01:00
Antonio Ojea
9c2b333925 Revert "plumb context from CRI calls through kubelet"
This reverts commit f43b4f1b95.
2022-11-02 13:37:23 +00:00
Kubernetes Prow Robot
9bbd0fbdb2
Merge pull request #113476 from marosset/hpc-to-stable
Promoting WindowsHostProcessContainers to stable
2022-11-01 19:59:43 -07:00
Kubernetes Prow Robot
7b84436168
Merge pull request #113408 from dashpole/kubelet_context
Plumb context to Kubelet CRI calls
2022-11-01 19:59:08 -07:00
Mark Rossetti
498d065cc5
Promoting WindowsHostProcessContainers to stable
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2022-11-01 14:06:25 -07:00
David Ashpole
f43b4f1b95
plumb context from CRI calls through kubelet 2022-10-28 02:55:28 +00:00
Francesco Romani
47d3299781 node: metrics: cpumanager: add pinning metrics
In order to improve the observability of the cpumanager,
add and populate metrics to track if the combination of
the kubelet configuration and podspec would trigger
exclusive core allocation and pinning.

We should avoid leaking any node/machine specific information
(e.g. core ids, even though this is admittedly an extreme example);
tracking these metrics seems to be a good first step, because
it allows us to get feedback without exposing details.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-10-27 14:40:40 +02:00
Artur Żyliński
9f31669a53 New histogram: Pod start SLI duration 2022-10-26 11:28:17 +02:00
Kubernetes Prow Robot
9bcb81e13f
Merge pull request #113175 from liggitt/pr_normalize_probes_lifecycle_handlers
Record event and metric for lifecycle fallback to http
2022-10-20 02:31:08 -07:00
Jordan Liggitt
a5d785fae8
Record metric for lifecycle fallback to http 2022-10-19 14:45:25 -04:00
Francesco Romani
ba6b468982 node: metrics: register podresources metrics
Because of a bug in the commit 1e7bb20c52,
podresources metrics were added, they are updated in the right
places, but they are never exported, so they cannot be consumed.
Fix trivially registering the metrics.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-10-06 15:14:56 +02:00
Clayton Coleman
e9a5fb7372
kubelet: Record a metric for latency of pod status update
Track how long it takes for pod updates to propagate from detection
to successful change on API server. Will guide future improvements
in pod start and shutdown latency.

Metric is `kubelet_pod_status_sync_duration_seconds` and is ALPHA
stability. Histogram buckets are chosen based on distribution of
observed status delays in practice.
2022-09-08 12:17:44 -04:00
JunYang
c71e3a7802 When metrics are counted, discard the wrong container startup time metrics 2022-07-15 08:56:12 +08:00
JunYang
f33652ce61 Fix kubelet panic when accessing metrics/resource endpoint 2022-07-14 16:38:48 +08:00
Kubernetes Prow Robot
b0254c8a0b
Merge pull request #108758 from fengzixu/improvement-volume-health
re-push "add volume kubelet_volume_stats_health_abnormal to kubelet #105585"
2022-03-29 17:35:34 -07:00
Kubernetes Prow Robot
5cb6fab8f6 Merge pull request #105585 from fengzixu/improvement-volume-health
add volume kubelet_volume_stats_health_abnormal to kubelet
2022-03-17 01:32:38 +00:00
fengzixu
7d675381f8 fix: fix panic bug when volumeHealthStatus is nil 2022-03-17 01:32:24 +00:00
Maciej Borsz
aa95513982
Revert "add volume kubelet_volume_stats_health_abnormal to kubelet" 2022-03-16 13:44:09 +01:00
Kubernetes Prow Robot
1a5abe5d1f
Merge pull request #105585 from fengzixu/improvement-volume-health
add volume kubelet_volume_stats_health_abnormal to kubelet
2022-03-15 05:58:11 -07:00
Shiming Zhang
5eb3e88f6b Support metrics for node shutdown 2022-03-11 17:31:10 +08:00
fengzixu
9808ae48a0 change the volume health status metrics name 2022-01-23 02:44:10 +00:00
Sergey Kanzhelev
7e7bc6d53b remove DynamicKubeletConfig logic from kubelet 2022-01-19 22:38:04 +00:00
fengzixu
5d544d3f01 fix comment 2022-01-11 14:28:31 +00:00
fengzixu
f96449f2e2 fix unit test 2022-01-11 13:50:18 +00:00
fengzixu
e2b5b5465a improve metrics comment 2022-01-11 13:50:18 +00:00
fengzixu
c1a58d715c fix unit test 2022-01-11 13:50:18 +00:00
fengzixu
5593e27429 improve metrics comment 2022-01-11 13:50:18 +00:00
fengzixu
1cdc694ac2 fix unit test 2022-01-11 13:50:18 +00:00
fengzixu
4a72f08a28 add useful comment for volume stats metrics 2022-01-11 13:50:18 +00:00
fengzixu
ed7fd0ced5 add volumeHealth label to metrics 2022-01-11 13:50:17 +00:00
fengzixu
bab1755274 fix: correct metrics expression 2022-01-11 13:50:17 +00:00
fengzixu
d71e21e01e add volume kubelet_volume_stats_health_abnormal to kubelet 2022-01-11 13:50:17 +00:00
Kubernetes Prow Robot
19591a1324
Merge pull request #105829 from yuanchen8911/master
Fix and improve comments on kubelet metrics
2022-01-04 23:02:32 -08:00
Davanum Srinivas
9405e9b55e
Check in OWNERS modified by update-yamlfmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-09 21:31:26 -05:00
Elana Hashman
b35c500541
Revert "Bump DynamicKubeConfig metric deprecation to 1.23" 2021-11-17 11:48:49 -08:00
Mark Rossetti
ef324d6bbd Adding kubelet metrics for started and failed to start HostProcess containers
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2021-11-04 14:39:57 -07:00
Yuan Chen
b99495d1d9 Fix and improve comments on kubelet metrics 2021-10-21 17:38:25 -07:00
Kubernetes Prow Robot
cab54856f1
Merge pull request #104933 from vikramcse/automate_mockery
conversion of tests from mockery to mockgen
2021-09-30 18:33:21 -07:00
vikram Jadhav
0de4397490 mockery to mockgen conversion 2021-09-25 16:15:08 +00:00
yxxhero
35df409a7e remove StartedPodsErrorsTotal metrice message
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-23 22:18:56 +08:00
Elana Hashman
d2ed3b28b7
Revert "revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update" 2021-08-06 08:38:56 -07:00
kerthcet
980cf85439 revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update
Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-08-02 23:15:10 +08:00