Commit Graph

85 Commits

Author SHA1 Message Date
RainbowMango
168c695e1a Update two metrics name to make promlint happy. 2020-06-23 15:16:18 +08:00
Kubernetes Prow Robot
628102d038
Merge pull request #85390 from xiaoanyunfei/bugfix/plegTestRelisting
fix pleg TestRelisting
2020-06-20 14:40:38 -07:00
Sergey Kanzhelev
ee53488f19 fix golint issues in pkg/kubelet/container 2020-06-19 15:48:08 +00:00
xiaofei.sun
ddf1c5d3e9 fix pleg TestRelisting 2020-06-18 21:46:55 +08:00
Davanum Srinivas
07d88617e5
Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Kubernetes Prow Robot
de34d2ce1e
Merge pull request #87193 from mattjmcnaughton/mattjmcnaughton/cleanup-rkt-code-in-pleg
Clean up rkt specific code in `pkg/kubelet/pleg`
2020-01-14 22:21:46 -08:00
mattjmcnaughton
ab7e0f58d5
Clean up rkt specific code in pkg/kubelet/pleg
Clean up code in PLEG which was only necessary for the `rkt` runtime.

Rkt is no longer a built-in runtime and docker(shim) uses the CRI, so
its safe to remove this code entirely.

This diff removes the last mentions of `rkt` in the kubelet.
2020-01-14 07:42:30 -05:00
danielqsj
1a9b121764 remove deprecated metrics of kubelet 2020-01-10 16:46:52 +08:00
Kubernetes Prow Robot
49bc696614
Merge pull request #86251 from bboreham/pleg-last-seen-metric
Kubelet: add a metric to observe time since PLEG last seen
2020-01-06 18:06:18 -08:00
Bryan Boreham
cc0b3e82eb Kubelet: add a metric to observe time since PLEG last seen
Expose the measurement that kubelet uses to judge that "PLEG is
unhealthy". If we can observe the measurement growing then we can
alert before the node goes unhealthy.

Note that the existing metrics PLEGRelistInterval and
PLEGRelistDuration are poor for this, because when relist() gets
stuck they are never updated.

Signed-off-by: Bryan Boreham <bryan@weave.works>
2020-01-03 10:01:27 +00:00
yiyang5055
0f410d625a change CounterVec to use Counter in the Kubelet's Pod Lifecycle Event Generator 2019-12-11 23:51:28 +08:00
RainbowMango
6099d49046 Deal with auto-generated files.
- Update bazel by hack/update-bazel.sh
2019-10-09 15:12:21 +08:00
RainbowMango
debe2f7b43 Refactor TestRunningPodAndContainerCount with metrics testutil 2019-10-09 15:09:23 +08:00
Rajdeep Das
c02d49d775 Update running_pod_count and running_container_count metric
As already mentioned in this issue https://github.com/kubernetes/kubernetes/issues/79286, some metrics like
"running_pod_count" and "running_container_count" uses non-standard prometheus metrics, this change converts them to be
standard prometheus gauges

Minor refactor in kubelet/pleg/generic.go and added some test for ruuning container and running pod metrics

Fixed issues related to github CI pipeline failure

* Updated bazel for new deps
* Add comment for exported metrics variables,RuuningContainerCount and RunningPodCount
* Specify keys explicitly in Guage metric instantation

Fix go lint errors

Replace "+=1" with "++", as reported by go lint

Set container state as a label for the metrics "running_container_count"

As per the metrics name "running_container_count" it should "ideally" be showing
the number of containers in "running" state , but it was showing all the container count, irrespective of the state it is in.
This commit adds a new label "container_running_state" to the metrics "running_container_count", which doesn't change the base metrics but adds the
option to query the metrics with "container_state" such as "running"/"unknown/...

remove unused methods reported by staticcheck

Remove variables while instantiating gauge(vec) which are default set to nil

Convert kubelet metrics(running_pod_count and running_container_count) to standard gauges and added label to running_container_count metrics.

Currently kubelet metrics(running_pod_count and running_container_count) use non-standard prometheus collectors , this change
converts them to standard prometheus gauges. Also this adds a new label(container_state) to running_container_count which does a breakdown of
containers tracked by kubelet based on the containers' state(running/unknown/created/exited).

Set statbility explicitly for running_pod_count and running_container_count and reformat test

register metrics explicitly in test , so that they don't become no-op
2019-08-29 17:23:04 +02:00
Tim Allclair
a2c51674cf Cleanup more static check issues (S1*,ST*) 2019-08-21 10:40:21 -07:00
Khaled Henidak(Kal)
dba434c4ba kubenet for ipv6 dualstack 2019-07-02 22:26:25 +00:00
changyaowei
850f4bbd36 modify random failure 2019-04-27 08:04:58 +08:00
changyaowei
123d1a925f modify random failure 2019-04-15 20:26:00 +08:00
Davanum Srinivas
33081c1f07
New staging repository for cri-api
Change-Id: I2160b0b0ec4b9870a2d4452b428e395bbe12afbb
2019-03-26 18:21:04 -04:00
danielqsj
79a3eb816c rename latency to duration in metrics 2019-02-18 17:40:04 +08:00
danielqsj
9fd99a48f5 Change kubelet metrics to conform guideline 2019-02-18 14:01:58 +08:00
Kubernetes Prow Robot
289a60ad71
Merge pull request #72709 from changyaowei/pleg_relist
When pleg channel is full, discard events and record its count
2019-02-13 01:44:48 -08:00
changyaowei
19f73899fc modify test case 2019-02-13 16:27:15 +08:00
xichengliudui
5dd26ecab5 Fix function comment to consistent with its name
update pull request

update pull request
2019-02-12 01:37:20 -05:00
changyaowei
c70ee4272b delete prometheus in unit testing 2019-01-31 12:18:02 +08:00
changyaowei
b52afc350f when pleg channel is full, discard events and record how many events discard 2019-01-30 20:43:54 +08:00
Robert Krawitz
3373fcf0fc Reduce logspam for crash looping containers 2018-11-28 10:48:52 -05:00
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
k8s-ci-robot
45f6845a59
Merge pull request #69008 from sjenning/better-pleg-msg
improve pleg error msg when it has never been successful
2018-10-30 16:15:43 -07:00
Seth Jennings
5eab76934b improve pleg error msg when it has never been successful 2018-10-01 16:41:01 -05:00
Pingan2017
158552ff35 fix golint failures - /pkg/kubelet/images 2018-09-17 10:52:25 +08:00
Jeff Grafton
23ceebac22 Run hack/update-bazel.sh 2018-06-22 16:22:57 -07:00
Jeff Grafton
ef56a8d6bb Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
Lee Verberne
e10042d22f Increment CRI version from v1alpha1 to v1alpha2
This also incorporates the version string into the package name so
that incompatibile versions will fail to connect.

Arbitrary choices:
- The proto3 package name is runtime.v1alpha2. The proto compiler
  normally translates this to a go package of "runtime_v1alpha2", but
  I renamed it to "v1alpha2" for consistency with existing packages.
- kubelet/apis/cri is used as "internalapi". I left it alone and put the
  public "runtimeapi" in kubelet/apis/cri/runtime.
2018-02-07 09:06:26 +01:00
Jeff Grafton
efee0704c6 Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
Marcin Owsiany
36dc1c4515 Fix typo in function name.
Also remove a superfluous comment.
2017-10-17 11:31:46 +02:00
Jeff Grafton
aee5f457db update BUILD files 2017-10-15 18:18:13 -07:00
Kubernetes Submit Queue
28df7a1cae Merge pull request #47806 from dcbw/fix-pod-ip-race
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

kubelet: fix inconsistent display of terminated pod IPs

PLEG and kubelet race when reading and sending pod status to the apiserver.  PLEG
inserts status into a cache, and then signals kubelet.  Kubelet then eventually
reads the status out of that cache, but in the mean time the status could have
been changed by PLEG.

When a pod exits, pod status will no longer include the pod's IP address because
the network plugin/runtime will report "" for terminated pod IPs.  If this status
gets inserted into the PLEG cache before kubelet gets the status out of the cache,
kubelet will see a blank pod IP address.  This happens in about 1/5 of cases when
pods are short-lived, and somewhat less frequently for longer running pods.

To ensure consistency for properties of dead pods, copy an old status update's
IP address over to the new status update if (a) the new status update's IP is
missing and (b) all sandboxes of the pod are dead/not-ready (eg, no possibility
for a valid IP from the sandbox).

Fixes: https://github.com/kubernetes/kubernetes/issues/47265
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1449373

@eparis @freehan @kubernetes/rh-networking @kubernetes/sig-network-misc
2017-09-22 21:01:50 -07:00
Casey Davenport
be5cd7fed2 Recreate pod sandbox when the sandbox does not have an IP address. 2017-09-15 09:23:52 -07:00
Jeff Grafton
a7f49c906d Use buildozer to delete licenses() rules except under third_party/ 2017-08-11 09:32:39 -07:00
Jeff Grafton
33276f06be Use buildozer to remove deprecated automanaged tags 2017-08-11 09:31:50 -07:00
Dan Williams
8c16260160 kubelet: fix inconsistent display of terminated pod IPs by using events instead
PLEG and kubelet race when reading and sending pod status to the apiserver.  PLEG
inserts status into a cache, and then signals kubelet.  Kubelet then eventually
reads the status out of that cache, but in the mean time the status could have
been changed by PLEG.

When a pod exits, pod status will no longer include the pod's IP address because
the network plugin/runtime will report "" for terminated pod IPs.  If this status
gets inserted into the PLEG cache before kubelet gets the status out of the cache,
kubelet will see a blank pod IP address.  This happens in about 1/5 of cases when
pods are short-lived, and somewhat less frequently for longer running pods.

To ensure consistency for properties of dead pods, copy an old status update's
IP address over to the new status update if (a) the new status update's IP is
missing and (b) all sandboxes of the pod are dead/not-ready (eg, no possibility
for a valid IP from the sandbox).

Fixes: https://github.com/kubernetes/kubernetes/issues/47265
2017-07-21 09:52:10 -05:00
Kubernetes Submit Queue
c1f8fcd9fe Merge pull request #45496 from andyxning/fix_pleg_relist_time
Automatic merge from submit-queue

fix pleg relist time

This PR fix pleg reslist time. According to current implementation, we have a `Healthy` method periodically check the relist time. If current timestamp subtracts latest relist time is longer than `relistThreshold`(default is 3 minutes), we should return an error to indicate the error of runtime.

`relist` method is also called periodically. If runtime(docker) hung, the relist method should return immediately without updating the latest relist time. If we update latest relist time no matter runtime(docker) hung(default timeout is 2 minutes), the `Healthy` method will never return an error.

```release-note
Kubelet PLEG updates the relist timestamp only after successfully relisting.
```

/cc @yujuhong @Random-Liu @dchen1107
2017-05-21 04:17:14 -07:00
Clayton Coleman
3e095d12b4
Refactor move of client-go/util/clock to apimachinery 2017-05-20 14:19:48 -04:00
Andy Xie
af6c040630 fix pleg relist time 2017-05-18 11:40:04 +08:00
Mike Danese
a05c3c0efd autogenerated 2017-04-14 10:40:57 -07:00
deads2k
5a8f075197 move authoritative client-go utils out of pkg 2017-01-24 08:59:18 -05:00
deads2k
c47717134b move utils used in restclient to client-go 2017-01-19 07:55:14 -05:00
Kubernetes Submit Queue
9a88687e24 Merge pull request #37865 from yujuhong/decouple_lifecycle
Automatic merge from submit-queue

kubelet: remove the pleg health check from healthz

This prevents kubelet from being killed when docker hangs.

Also, kubelet will report node not ready if PLEG hangs (`docker ps` + `docker inspect`).
2017-01-12 19:10:14 -08:00