Commit Graph

29 Commits

Author SHA1 Message Date
RainbowMango
debe2f7b43 Refactor TestRunningPodAndContainerCount with metrics testutil 2019-10-09 15:09:23 +08:00
Rajdeep Das
c02d49d775 Update running_pod_count and running_container_count metric
As already mentioned in this issue https://github.com/kubernetes/kubernetes/issues/79286, some metrics like
"running_pod_count" and "running_container_count" uses non-standard prometheus metrics, this change converts them to be
standard prometheus gauges

Minor refactor in kubelet/pleg/generic.go and added some test for ruuning container and running pod metrics

Fixed issues related to github CI pipeline failure

* Updated bazel for new deps
* Add comment for exported metrics variables,RuuningContainerCount and RunningPodCount
* Specify keys explicitly in Guage metric instantation

Fix go lint errors

Replace "+=1" with "++", as reported by go lint

Set container state as a label for the metrics "running_container_count"

As per the metrics name "running_container_count" it should "ideally" be showing
the number of containers in "running" state , but it was showing all the container count, irrespective of the state it is in.
This commit adds a new label "container_running_state" to the metrics "running_container_count", which doesn't change the base metrics but adds the
option to query the metrics with "container_state" such as "running"/"unknown/...

remove unused methods reported by staticcheck

Remove variables while instantiating gauge(vec) which are default set to nil

Convert kubelet metrics(running_pod_count and running_container_count) to standard gauges and added label to running_container_count metrics.

Currently kubelet metrics(running_pod_count and running_container_count) use non-standard prometheus collectors , this change
converts them to standard prometheus gauges. Also this adds a new label(container_state) to running_container_count which does a breakdown of
containers tracked by kubelet based on the containers' state(running/unknown/created/exited).

Set statbility explicitly for running_pod_count and running_container_count and reformat test

register metrics explicitly in test , so that they don't become no-op
2019-08-29 17:23:04 +02:00
Khaled Henidak(Kal)
dba434c4ba kubenet for ipv6 dualstack 2019-07-02 22:26:25 +00:00
changyaowei
850f4bbd36 modify random failure 2019-04-27 08:04:58 +08:00
changyaowei
123d1a925f modify random failure 2019-04-15 20:26:00 +08:00
changyaowei
19f73899fc modify test case 2019-02-13 16:27:15 +08:00
changyaowei
c70ee4272b delete prometheus in unit testing 2019-01-31 12:18:02 +08:00
changyaowei
b52afc350f when pleg channel is full, discard events and record how many events discard 2019-01-30 20:43:54 +08:00
Seth Jennings
5eab76934b improve pleg error msg when it has never been successful 2018-10-01 16:41:01 -05:00
Dan Williams
8c16260160 kubelet: fix inconsistent display of terminated pod IPs by using events instead
PLEG and kubelet race when reading and sending pod status to the apiserver.  PLEG
inserts status into a cache, and then signals kubelet.  Kubelet then eventually
reads the status out of that cache, but in the mean time the status could have
been changed by PLEG.

When a pod exits, pod status will no longer include the pod's IP address because
the network plugin/runtime will report "" for terminated pod IPs.  If this status
gets inserted into the PLEG cache before kubelet gets the status out of the cache,
kubelet will see a blank pod IP address.  This happens in about 1/5 of cases when
pods are short-lived, and somewhat less frequently for longer running pods.

To ensure consistency for properties of dead pods, copy an old status update's
IP address over to the new status update if (a) the new status update's IP is
missing and (b) all sandboxes of the pod are dead/not-ready (eg, no possibility
for a valid IP from the sandbox).

Fixes: https://github.com/kubernetes/kubernetes/issues/47265
2017-07-21 09:52:10 -05:00
Clayton Coleman
3e095d12b4
Refactor move of client-go/util/clock to apimachinery 2017-05-20 14:19:48 -04:00
deads2k
5a8f075197 move authoritative client-go utils out of pkg 2017-01-24 08:59:18 -05:00
deads2k
c47717134b move utils used in restclient to client-go 2017-01-19 07:55:14 -05:00
deads2k
6a4d5cd7cc start the apimachinery repo 2017-01-11 09:09:48 -05:00
Yu-Ju Hong
a49d28710a Extend PLEG to handle pod sandboxes
PLEG will treat them as if they are regular containers and detect changes the
same manner. Note that this makes an assumption that container IDs will not
collide with the podsandbox IDs.
2016-08-30 09:54:24 -07:00
Harry Zhang
cb14b35bde Refactor util clock into it's own pkg 2016-07-28 02:29:04 -04:00
Ron Lai
a58c774c08 Including ContainerRemoved in PLEG event reporting 2016-07-14 16:39:03 -07:00
David McMahon
ef0c9f0c5b Remove "All rights reserved" from all the headers. 2016-06-29 17:47:36 -07:00
Dan Williams
9865ac325c kubelet/cni: make cni plugin runtime agnostic
Use the generic runtime method to get the netns path.  Also
move reading the container IP address into cni (based off kubenet)
instead of having it in the Docker manager code.  Both old and new
methods use nsenter and /sbin/ip and should be functionally
equivalent.
2016-06-22 11:36:10 -05:00
Andy Goldstein
3a87bfb6f7 PLEG: reinspect pods that failed prior inspections
Fix the following sequence of events:

1. relist call 1 successfully inspects a pod (just has infra container)
1. relist call 2 gets an error inspecting the same pod (has infra container and a transient
container that failed to create) and doesn't update the old/new pod records
1. relist calls 3+ don't inspect the pod any more (just has infra container so it doesn't look like
anything changed)

This change adds a new list that keeps track of pods that failed inspection and retries them the
next time relist is called. Without this change, a pod in this state would never be inspected again,
its entry in the status cache would never be updated, and the pod worker would never call syncPod
again because the most recent entry in the status cache has an error associated with it. Without
this change, pods in this state would be stuck Terminating forever, unless the user issued a
deletion with a grace period value of 0.
2016-05-03 11:06:35 -04:00
harry
b0900bf0d4 Refactor diff into sub pkg 2016-03-21 20:21:39 +08:00
k8s-merge-robot
3f16f5f2b8 Merge pull request #22233 from yujuhong/pleg_health
Auto commit by PR queue bot
2016-03-03 11:01:26 -08:00
Yu-Ju Hong
4846c1e1b2 pleg: add an internal clock for testability
Also add tests for the health check.
2016-03-01 17:53:03 -08:00
Yu-Ju Hong
e770f25882 pleg: add more tests for detecting missing container/pods 2016-03-01 17:23:23 -08:00
Tim St. Clair
7b6d843309 Move test-only files to test-only packages 2016-03-01 09:11:32 -08:00
Yu-Ju Hong
b56ed1a8c2 Support populating the runtime cache in PLEG
This changes does not turn on this feature (cache) for kubelet.
2016-01-13 10:19:47 -08:00
Yu-Ju Hong
73a4f8225c PLEG should report events if a container is removed
Currently, pleg would report a event if a container transitions from running to
exited between relisting. However, if would not report any event if a container
gets stopped and removed between relisting. This event will eventually be
handled when the pod syncs periodically, but this is undesirable. This change
ensures that we detect all such events.
2016-01-12 16:25:19 -08:00
Random-Liu
3cbdf79f8c Change original PodStatus to APIPodStatus, and start using kubelet internal PodStatus in dockertools 2015-12-04 17:37:39 -08:00
Yu-Ju Hong
bc6414a873 kubelet: add a generic pod lifecycle event generator
This change introduces pod lifecycle event generator (PLEG), and adds a generic
PLEG. The generic PLEG relies on relisting to discover container events, and is
container-runtime-agnostic. Both docker and rkt are changed to use generic
PLEG.
2015-11-13 09:55:36 -08:00