Commit Graph

109 Commits

Author SHA1 Message Date
Mark Rossetti
498d065cc5
Promoting WindowsHostProcessContainers to stable
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2022-11-01 14:06:25 -07:00
Francesco Romani
47d3299781 node: metrics: cpumanager: add pinning metrics
In order to improve the observability of the cpumanager,
add and populate metrics to track if the combination of
the kubelet configuration and podspec would trigger
exclusive core allocation and pinning.

We should avoid leaking any node/machine specific information
(e.g. core ids, even though this is admittedly an extreme example);
tracking these metrics seems to be a good first step, because
it allows us to get feedback without exposing details.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-10-27 14:40:40 +02:00
Kubernetes Prow Robot
9bcb81e13f
Merge pull request #113175 from liggitt/pr_normalize_probes_lifecycle_handlers
Record event and metric for lifecycle fallback to http
2022-10-20 02:31:08 -07:00
Jordan Liggitt
a5d785fae8
Record metric for lifecycle fallback to http 2022-10-19 14:45:25 -04:00
Francesco Romani
ba6b468982 node: metrics: register podresources metrics
Because of a bug in the commit 1e7bb20c52,
podresources metrics were added, they are updated in the right
places, but they are never exported, so they cannot be consumed.
Fix trivially registering the metrics.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-10-06 15:14:56 +02:00
Clayton Coleman
e9a5fb7372
kubelet: Record a metric for latency of pod status update
Track how long it takes for pod updates to propagate from detection
to successful change on API server. Will guide future improvements
in pod start and shutdown latency.

Metric is `kubelet_pod_status_sync_duration_seconds` and is ALPHA
stability. Histogram buckets are chosen based on distribution of
observed status delays in practice.
2022-09-08 12:17:44 -04:00
Kubernetes Prow Robot
b0254c8a0b
Merge pull request #108758 from fengzixu/improvement-volume-health
re-push "add volume kubelet_volume_stats_health_abnormal to kubelet #105585"
2022-03-29 17:35:34 -07:00
Kubernetes Prow Robot
5cb6fab8f6 Merge pull request #105585 from fengzixu/improvement-volume-health
add volume kubelet_volume_stats_health_abnormal to kubelet
2022-03-17 01:32:38 +00:00
Maciej Borsz
aa95513982
Revert "add volume kubelet_volume_stats_health_abnormal to kubelet" 2022-03-16 13:44:09 +01:00
Kubernetes Prow Robot
1a5abe5d1f
Merge pull request #105585 from fengzixu/improvement-volume-health
add volume kubelet_volume_stats_health_abnormal to kubelet
2022-03-15 05:58:11 -07:00
Shiming Zhang
5eb3e88f6b Support metrics for node shutdown 2022-03-11 17:31:10 +08:00
fengzixu
9808ae48a0 change the volume health status metrics name 2022-01-23 02:44:10 +00:00
Sergey Kanzhelev
7e7bc6d53b remove DynamicKubeletConfig logic from kubelet 2022-01-19 22:38:04 +00:00
fengzixu
bab1755274 fix: correct metrics expression 2022-01-11 13:50:17 +00:00
fengzixu
d71e21e01e add volume kubelet_volume_stats_health_abnormal to kubelet 2022-01-11 13:50:17 +00:00
Kubernetes Prow Robot
19591a1324
Merge pull request #105829 from yuanchen8911/master
Fix and improve comments on kubelet metrics
2022-01-04 23:02:32 -08:00
Elana Hashman
b35c500541
Revert "Bump DynamicKubeConfig metric deprecation to 1.23" 2021-11-17 11:48:49 -08:00
Mark Rossetti
ef324d6bbd Adding kubelet metrics for started and failed to start HostProcess containers
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2021-11-04 14:39:57 -07:00
Yuan Chen
b99495d1d9 Fix and improve comments on kubelet metrics 2021-10-21 17:38:25 -07:00
yxxhero
35df409a7e remove StartedPodsErrorsTotal metrice message
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-23 22:18:56 +08:00
Elana Hashman
d2ed3b28b7
Revert "revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update" 2021-08-06 08:38:56 -07:00
kerthcet
980cf85439 revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update
Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-08-02 23:15:10 +08:00
Elana Hashman
b5f24c334e
Bump DynamicKubeConfig metric deprecation to 1.23 2021-07-28 09:29:57 -07:00
Kubernetes Prow Robot
4d78db54a5
Merge pull request #103580 from tkestack/fix-version-format
fix kubelet panic when DynamicKubeletConfig enabled
2021-07-08 14:02:24 -07:00
Kubernetes Prow Robot
7c84064a4f
Merge pull request #99000 from verb/1.21-kubelet-metrics
Add kubelet metrics for ephemeral containers
2021-07-08 14:00:55 -07:00
Li Bo
79e230ea21 fix kubelet panic when DynamicKubeletConfig enabled 2021-07-08 16:20:51 +08:00
Sergey Kanzhelev
dffc2a60a2 deprecate and disable by default DynamicKubeletConfig feature flag 2021-07-02 23:53:11 +00:00
Lee Verberne
30d2ad576a Remove ManagedPod,ManagedContainer metrics
This replaces the generic ManagedPod and ManagedContainer kubelet
metrics with a gauge to track only ephemeral container usage.
2021-06-15 19:02:07 +02:00
pacoxu
650666406e update kubelet_running_pods metrics comments: pods that have a running pod sandbox
Signed-off-by: pacoxu <paco.xu@daocloud.io>
Co-authored-by: Elana Hashman <ehashman@users.noreply.github.com>
2021-04-29 11:05:52 +08:00
Lee Verberne
29178fff1c Add kubelet managed pod metrics 2021-04-13 14:13:30 +02:00
Francesco Romani
1e7bb20c52 kubelet: podresources: per-endpoint metrics
Before the addition of GetAllocatableResources, the
podresources API had just one endpoint `List()`, thus we could just
account for the total of the calls to have a good pulse of the API usage.
Now that we extend the API with more endpoints
(`GetAlloctableResources`), in order to improve the observability we add
per-endpoint counters, in addition to the existing counter of the total
API calls.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:14:58 +01:00
jialaijun
15612338e5 Migrate pkg/kubelet/metrics logs to structured logging. 2021-02-14 09:41:35 +08:00
wawa0210
f28f0953e6
Adjust kubelet_cgroup_manager_duration_seconds bucket 2021-01-19 16:23:14 +08:00
00041544
f2b8fdb265 Define const for metric name 2020-11-30 14:40:26 +08:00
Alvaro Aleman
801a52c06d
Allow debugging kubelet image pull times
This PR changes the buckets of the
kubelet_runtime_operation_duration_seconds metric to be
metrics.ExponentialBuckets(.005, 2.5, 14) in order to
allow debugging image pull times. Right now the biggest bucket is 10
seconds, which is an ordinary time frame to pull an image, making the
metric useless for the aforementioned usecase.
2020-11-11 20:18:36 -05:00
Renaud Gaubert
969e45f49f Add the pod_resources_endpoint_requests_total metric 2020-10-27 11:23:39 -07:00
RyderXia
2214117cd1 clean up unused var containerCache 2020-07-21 16:57:36 +08:00
RainbowMango
168c695e1a Update two metrics name to make promlint happy. 2020-06-23 15:16:18 +08:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Tim Allclair
43c7f3be29 Register RunPodSandbox* metrics 2020-01-28 13:26:11 -08:00
danielqsj
ab182552b4 clean SinceInMicroseconds, convert to SinceInSeconds 2020-01-10 17:05:38 +08:00
danielqsj
1a9b121764 remove deprecated metrics of kubelet 2020-01-10 16:46:52 +08:00
Kubernetes Prow Robot
49bc696614
Merge pull request #86251 from bboreham/pleg-last-seen-metric
Kubelet: add a metric to observe time since PLEG last seen
2020-01-06 18:06:18 -08:00
Bryan Boreham
cc0b3e82eb Kubelet: add a metric to observe time since PLEG last seen
Expose the measurement that kubelet uses to judge that "PLEG is
unhealthy". If we can observe the measurement growing then we can
alert before the node goes unhealthy.

Note that the existing metrics PLEGRelistInterval and
PLEGRelistDuration are poor for this, because when relist() gets
stuck they are never updated.

Signed-off-by: Bryan Boreham <bryan@weave.works>
2020-01-03 10:01:27 +00:00
yiyang5055
0f410d625a change CounterVec to use Counter in the Kubelet's Pod Lifecycle Event Generator 2019-12-11 23:51:28 +08:00
RainbowMango
30bf1f47dd Hide kubelet metrics that have been deprecated in 1.14 2019-11-13 19:17:38 +08:00
Kubernetes Prow Robot
b3dde20411
Merge pull request #84907 from RainbowMango/pr_migrate_custom_collector_kubelet
migrate kubelet custom metrics to stability framework part 1
2019-11-10 19:43:56 -08:00
Kubernetes Prow Robot
9646bd9736
Merge pull request #83664 from RainbowMango/pr_refactor_kubelet_ut_with_metrics_testutil
Refactor kubelet ut with metrics testutil
2019-11-10 19:43:42 -08:00
RainbowMango
ee4394a306 Migrate custom collector for kubelet 2019-11-08 09:16:57 +08:00
Clayton Coleman
3c44e11cfa
kubelet: Record preemptions similarly to evictions
A preemption is a disruption event that should have a metric so that
the rate of preemption can be assessed. Nodes that are under heavy
preemption may have conflicting workloads or otherwise need attention.
A sudden burst of preemption on a cluster in steady state could
indicate pathological conditions within the scheduler or workload
controllers.
2019-10-19 19:07:37 -04:00