Commit Graph

164 Commits

Author SHA1 Message Date
Eric Ernst
8dfc548709 resource-metrics: add pod/sandbox metrics to endpoint
Pod metrics may not be the same as the sum of container metrics. Add support for pod specific
metrics to allow for more accurate accounting of resources.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2020-11-03 09:57:46 -08:00
Renaud Gaubert
969e45f49f Add the pod_resources_endpoint_requests_total metric 2020-10-27 11:23:39 -07:00
Marek Siarkowicz
7d309e0104 Move Kubelet Summary API to staging repo 2020-09-22 18:23:28 +02:00
David Ashpole
0ffc149ccc move dashpole to emeritus in kubelet 2020-09-16 11:52:35 -07:00
RyderXia
b20ceaa85d regen 2020-07-22 10:53:11 +08:00
RyderXia
d76c2cc94c update build 2020-07-22 09:36:55 +08:00
RyderXia
2214117cd1 clean up unused var containerCache 2020-07-21 16:57:36 +08:00
RainbowMango
168c695e1a Update two metrics name to make promlint happy. 2020-06-23 15:16:18 +08:00
Seth Jennings
45d2b98aa8 add sjenning as kubelet approver 2020-06-19 13:00:55 -05:00
Kubernetes Prow Robot
677e8d6871
Merge pull request #86223 from dashpole/owners_changes
Add dashpole as kubelet approver
2020-06-18 22:59:58 -07:00
Davanum Srinivas
07d88617e5
Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
David Ashpole
86192d4b9a fix cpu resource metric type by changing to counter 2020-03-26 13:30:36 -07:00
Tim Allclair
43c7f3be29 Register RunPodSandbox* metrics 2020-01-28 13:26:11 -08:00
Kubernetes Prow Robot
be26fbc638
Merge pull request #86282 from RainbowMango/pr_refactor_resource_endpoint
Refactor kubelet resource metrics
2020-01-14 02:23:09 -08:00
danielqsj
ab182552b4 clean SinceInMicroseconds, convert to SinceInSeconds 2020-01-10 17:05:38 +08:00
danielqsj
1a9b121764 remove deprecated metrics of kubelet 2020-01-10 16:46:52 +08:00
Kubernetes Prow Robot
49bc696614
Merge pull request #86251 from bboreham/pleg-last-seen-metric
Kubelet: add a metric to observe time since PLEG last seen
2020-01-06 18:06:18 -08:00
Bryan Boreham
cc0b3e82eb Kubelet: add a metric to observe time since PLEG last seen
Expose the measurement that kubelet uses to judge that "PLEG is
unhealthy". If we can observe the measurement growing then we can
alert before the node goes unhealthy.

Note that the existing metrics PLEGRelistInterval and
PLEGRelistDuration are poor for this, because when relist() gets
stuck they are never updated.

Signed-off-by: Bryan Boreham <bryan@weave.works>
2020-01-03 10:01:27 +00:00
RainbowMango
c394d821fd Deal with auto-generated files:
- Update bazel by hack/update-bazel.sh
2019-12-16 10:27:02 +08:00
RainbowMango
0db7074e1a Add new endpoint for resource metrics. 2019-12-16 10:26:54 +08:00
David Ashpole
fca84c02bb add dashpole as kubelet approver 2019-12-12 11:10:24 -08:00
yiyang5055
0f410d625a change CounterVec to use Counter in the Kubelet's Pod Lifecycle Event Generator 2019-12-11 23:51:28 +08:00
RainbowMango
30bf1f47dd Hide kubelet metrics that have been deprecated in 1.14 2019-11-13 19:17:38 +08:00
Kubernetes Prow Robot
b3dde20411
Merge pull request #84907 from RainbowMango/pr_migrate_custom_collector_kubelet
migrate kubelet custom metrics to stability framework part 1
2019-11-10 19:43:56 -08:00
Kubernetes Prow Robot
9646bd9736
Merge pull request #83664 from RainbowMango/pr_refactor_kubelet_ut_with_metrics_testutil
Refactor kubelet ut with metrics testutil
2019-11-10 19:43:42 -08:00
RainbowMango
e4a128a4f7 Deal with auto-generated files.
Update bazel by hack/update-bazel.sh
2019-11-08 09:16:57 +08:00
RainbowMango
ee4394a306 Migrate custom collector for kubelet 2019-11-08 09:16:57 +08:00
Kubernetes Prow Robot
aebc8eae9d
Merge pull request #83713 from RainbowMango/pr_refactor_kubelet_collector_test
Refactor kubelet collector test
2019-10-24 22:55:38 -07:00
Clayton Coleman
3c44e11cfa
kubelet: Record preemptions similarly to evictions
A preemption is a disruption event that should have a metric so that
the rate of preemption can be assessed. Nodes that are under heavy
preemption may have conflicting workloads or otherwise need attention.
A sudden burst of preemption on a cluster in steady state could
indicate pathological conditions within the scheduler or workload
controllers.
2019-10-19 19:07:37 -04:00
RainbowMango
633bb52b49 Deal with auto-generated files.
Update bazel by hack/update-bazel.sh
2019-10-11 09:26:58 +08:00
RainbowMango
3b07393ea8 Refactor UT with testutil from k/k. 2019-10-11 09:25:17 +08:00
SataQiu
23a8be6e5f remove direct references to prometheus/testutil from kubelet/metrics 2019-10-10 12:56:28 +08:00
RainbowMango
debe2f7b43 Refactor TestRunningPodAndContainerCount with metrics testutil 2019-10-09 15:09:23 +08:00
SataQiu
77f42c8108 eliminate direct references to prometheus 2019-10-04 21:33:34 +08:00
Rajdeep Das
c02d49d775 Update running_pod_count and running_container_count metric
As already mentioned in this issue https://github.com/kubernetes/kubernetes/issues/79286, some metrics like
"running_pod_count" and "running_container_count" uses non-standard prometheus metrics, this change converts them to be
standard prometheus gauges

Minor refactor in kubelet/pleg/generic.go and added some test for ruuning container and running pod metrics

Fixed issues related to github CI pipeline failure

* Updated bazel for new deps
* Add comment for exported metrics variables,RuuningContainerCount and RunningPodCount
* Specify keys explicitly in Guage metric instantation

Fix go lint errors

Replace "+=1" with "++", as reported by go lint

Set container state as a label for the metrics "running_container_count"

As per the metrics name "running_container_count" it should "ideally" be showing
the number of containers in "running" state , but it was showing all the container count, irrespective of the state it is in.
This commit adds a new label "container_running_state" to the metrics "running_container_count", which doesn't change the base metrics but adds the
option to query the metrics with "container_state" such as "running"/"unknown/...

remove unused methods reported by staticcheck

Remove variables while instantiating gauge(vec) which are default set to nil

Convert kubelet metrics(running_pod_count and running_container_count) to standard gauges and added label to running_container_count metrics.

Currently kubelet metrics(running_pod_count and running_container_count) use non-standard prometheus collectors , this change
converts them to standard prometheus gauges. Also this adds a new label(container_state) to running_container_count which does a breakdown of
containers tracked by kubelet based on the containers' state(running/unknown/created/exited).

Set statbility explicitly for running_pod_count and running_container_count and reformat test

register metrics explicitly in test , so that they don't become no-op
2019-08-29 17:23:04 +02:00
Han Kang
3a50917795 migrate kubelet's metrics/probes & metrics endpoint to metrics stability framework 2019-08-28 11:16:38 -07:00
Seth Jennings
23b69cf02d kubelet: add eviction counter to metrics 2019-08-13 15:21:38 -05:00
obitech
a5bc997aa9 Fixed pull-kubernetes-verify issues 2019-08-03 21:07:12 +02:00
obitech
457972f1a4 Fix suggestions, track removed library in bazel 2019-08-03 21:07:12 +02:00
obitech
898c40a484 Fix golint failures in some pkg/kubelet packages
Fixed:
- pkg/kubelet/pod
- pkg/kubelet/metrics
- pkg/kubelet/configmap
- pkg/kubelet/config
2019-08-03 21:07:12 +02:00
Ryan Phillips
2bdf975d5b kubelet: add UID to kubelet_container_log_filesystem_used_bytes metric
buildPodRef creates a unique key with the {podName, namespace, UID}
tuple. By omitting the UID in the metric, duplicate metrics can be sent
to prometheus causing 500's on the /metrics endpoint.
2019-07-26 14:59:43 -05:00
Ted Yu
5d1bb99fcd Log warning if config labels deletion returns false 2019-07-16 09:46:12 -07:00
Seth Jennings
89dc2c65e4 kubelet: add sjenning to kubelet subdirectory owners files 2019-06-03 08:26:24 -05:00
David Ashpole
a95cf017e1 add dashpole to kubelet owners files 2019-05-29 13:33:48 -07:00
Yu-Ju Hong
191666d6a3 Fix computing of cpu nano core usage
CRI runtimes do not supply cpu nano core usage as it is not part of CRI
stats. However, there are upstream components that still rely on such
stats to function. The previous fix was faulty because the multiple
callers could compete and update the stats, causing
inconsistent/incoherent metrics. This change, instead, creates a
separate call for updating the usage, and rely on eviction manager,
which runs periodically, to trigger the updates. The caveat is that if
eviction manager is completley turned off, no one would compute the
usage.
2019-03-05 09:25:40 -08:00
Kubernetes Prow Robot
272d78f1d9
Merge pull request #73966 from alculquicondor/fix/lint-kubelet-server
Fix lint on pkg/kubelet/server/...
2019-02-25 20:27:48 -08:00
Aldo Culquicondor
e61cd68bf3 Fix lint on pkg/kubelet/server/... 2019-02-21 10:31:41 -05:00
haiyanmeng
ec18200f8b Fit RuntimeClass metrics to prometheus conventions
1) Add suffix (`seconds` or `total`) to metric name
2) Switch Summary metric to Histogram metric (Summary metrics are not
supported completely by prometheus-to-sd and can't be aggregated.)
2019-02-19 12:46:37 -08:00
danielqsj
79a3eb816c rename latency to duration in metrics 2019-02-18 17:40:04 +08:00
danielqsj
0bfe4c26b1 add default buckets for histogram metrics 2019-02-18 14:07:30 +08:00
danielqsj
4fa0ee7805 Mark deprecated in related kubelet metrics 2019-02-18 14:03:44 +08:00
danielqsj
0e9515c709 Move kubelet metrics to histogram metrics 2019-02-18 14:03:44 +08:00
danielqsj
9fd99a48f5 Change kubelet metrics to conform guideline 2019-02-18 14:01:58 +08:00
Kubernetes Prow Robot
289a60ad71
Merge pull request #72709 from changyaowei/pleg_relist
When pleg channel is full, discard events and record its count
2019-02-13 01:44:48 -08:00
Kubernetes Prow Robot
459e509f94
Merge pull request #73549 from haiyanmeng/runtimeclass
Add monitoring for RuntimeClass
2019-02-05 15:14:38 -08:00
haiyanmeng
18bcdcecce Add monitoring for RuntimeClass 2019-02-04 16:01:29 -08:00
changyaowei
b52afc350f when pleg channel is full, discard events and record how many events discard 2019-01-30 20:43:54 +08:00
danielqsj
1d73c7daed Add kubelet_node_name metrics 2019-01-15 18:01:04 +08:00
Yecheng Fu
ccb66066a9 vendor github.com/prometheus/client_golang/prometheus/testutil package 2018-12-02 10:25:50 +08:00
Lantao Liu
59e80cdac3 Fix kubelet panic.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-11-16 16:21:57 -08:00
Frederic Branczyk
4724fca678
pkg/kubelet/stats: Add container log size metric 2018-11-12 22:04:50 +01:00
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
Joonyoung Park
e6d02e9410 fix metrics help comment
pod_start_latency_microseconds is not broken down by podname.
2018-07-13 10:26:35 +09:00
Jeff Grafton
23ceebac22 Run hack/update-bazel.sh 2018-06-22 16:22:57 -07:00
Kubernetes Submit Queue
97f4a64fac
Merge pull request #63434 from adfinis-forks/bug_typo_kubelet_volume_stats
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix typo in volume_stats.go

**What this PR does / why we need it**:
While reviewing the implementation details I came across a typo in volume_stats.go
sed/volumeStatsCollecotr/volumeStatsCollector/

**Release note**:

```release-note
NONE
```
2018-05-24 11:44:20 -07:00
Michael Taufen
fd3432ef05 add dynamic config metrics
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
2018-05-22 14:08:55 -07:00
Lukas Grossar
64dee74bb7
Fix typo in volume_stats.go
volumeStatsCollecotr -> volumeStatsCollector
2018-05-04 15:49:07 +02:00
Jeff Grafton
ef56a8d6bb Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
Yecheng Fu
fecff55c59 Fix kubelet PVC metrics using a volume stats collector.
Volumes on each node changes, we should not only add PVC metrics into
gauge vector. It's better use a collector to collector metrics from
stats.
2018-02-11 23:48:06 +08:00
Jiaying Zhang
048bafdd0b Adds device plugin registration count metric and allocation latency metric. 2017-11-21 13:44:10 -08:00
Jeff Grafton
aee5f457db update BUILD files 2017-10-15 18:18:13 -07:00
Yu-Ju Hong
331628b7dc Move prometheus metrics for docker operations into dockershim 2017-09-25 10:03:17 -07:00
Matthew Wong
dac2068bbd Expose PVC metrics via kubelet prometheus 2017-09-01 12:50:17 -04:00
Jeff Grafton
a7f49c906d Use buildozer to delete licenses() rules except under third_party/ 2017-08-11 09:32:39 -07:00
Jeff Grafton
33276f06be Use buildozer to remove deprecated automanaged tags 2017-08-11 09:31:50 -07:00
NickrenREN
ec7bf948d4 Unregister some metrics
delete some registered metrics since they are not observed
2017-05-17 18:31:56 +08:00
Mike Danese
a05c3c0efd autogenerated 2017-04-14 10:40:57 -07:00
David Ashpole
9f7e09ddfe eviction age metrics 2017-04-11 09:07:16 -07:00
Seth Jennings
ccd87fca3f kubelet: add cgroup manager metrics 2017-03-06 08:53:47 -06:00
Jeff Grafton
20d221f75c Enable auto-generating sources rules 2017-01-05 14:14:13 -08:00
Mike Danese
c87de85347 autoupdate BUILD files 2016-12-12 13:30:07 -08:00
Random-Liu
ced5a848f5 Add instrumented CRI service which is enabled for both grpc and non-grpc
integration.
2016-10-25 10:59:27 -07:00
Mike Danese
3b6a067afc autogenerated 2016-10-21 17:32:32 -07:00
David McMahon
ef0c9f0c5b Remove "All rights reserved" from all the headers. 2016-06-29 17:47:36 -07:00
Random-Liu
148588e6a1 1) Add docker operation timeout metrics.
2) Cleanup kubelet stats and add runtime operation error and timeout
rate monitoring.
3) Monitor runtime operation error and timeout rate in
kubelet perf.
2016-05-06 10:53:13 -07:00
Phillip Wittrock
7bca355bb4 Spread pod volume metrics calc across calc period. Metrics are calculated independently. 2016-02-22 09:54:47 -08:00
Phillip Wittrock
3de94cd23c Supply volume fs metrics to server/stats/handler.go
* Metrics will not be expose until they are hooked up to a handler
* Metrics are not cached and expose a dos vector, this must be fixed before release or the stats should not be exposed through an api endpoint
2016-02-05 16:00:24 -08:00
Yu-Ju Hong
7d180b337b Record pleg pod relist interval and latency
Relisting latency/interval affects how quick kubelet discovers changes. Record
the metrics in Prometheus to surface such information.
2016-01-04 10:56:38 -08:00
Miciah Masters
8aa299da90 glog.Warning -> glog.Warningf
Fix three places where glog.Warning is used with a formatted string.
2015-08-19 16:22:28 -04:00
Mike Danese
17defc7383 run gofmt on everything we touched 2015-08-05 17:52:56 -07:00
Mike Danese
8e33cbfa28 rewrite go imports 2015-08-05 17:30:03 -07:00
Yu-Ju Hong
f96a8d0935 Kubelet: record the initial pod processing latency
Add a new latency metric for the time from seeing the pod for the first time
to starting a pod worker for it.

Also, change PodStartLatency to include this initial processing latency.
2015-06-19 12:07:55 -07:00
Prashanth Balasubramanian
831d7a36d0 Scrape /metrics of kubelets from e2e tests 2015-06-16 09:50:40 -07:00
Prashanth Balasubramanian
b5ed0e9b13 Dont generatePodStatus twice for new pods 2015-06-11 17:18:16 -07:00
Victor Marmol
6b0d3d8df0 Add DockerErrors metric in the Kubelet.
Allows the tracking of errors by Docker operation.
2015-06-02 17:38:09 -07:00
Rohit Jnagal
9eb01a6da1 Make SyncPodSync as the default SyncPodType.
We would like the default to be sync instead of create to easily differentiate
create operations in empty metrics map.
2015-05-12 06:25:48 +00:00
Eric Paris
6b3a6e6b98 Make copyright ownership statement generic
Instead of saying "Google Inc." (which is not always correct) say "The
Kubernetes Authors", which is generic.
2015-05-01 17:49:56 -04:00
Victor Marmol
090d0c95fa Remove ImagePull metric in Kubelet.
There is an equivalent metric from our Docker metrics and this one is
harder to maintain with the RuntimeHooks.
2015-04-29 17:12:03 -07:00
Yifan Gu
6c98b9daee kubelet/metrics: Move instrumented_docker.go to dockertools.
This can solve the circular import problem when we move the
kubelet.pullImage to kubelet/metrics or kubelet/container package.
2015-04-24 22:03:11 -07:00