Commit Graph

8604 Commits

Author SHA1 Message Date
Matthias Bertschy
323f99ea8c startupProbe: Kubelet changes 2019-08-30 00:40:26 +02:00
Kubernetes Prow Robot
a9e5c4d6e4 Merge pull request #81968 from mtaufen/node-csr-hash
derive node CSR hashes from public keys
2019-08-29 13:31:41 -07:00
Kubernetes Prow Robot
da986c56ab Merge pull request #73944 from xiaoanyunfei/cleanup/rm_unuse_judge
rm unnecessary judgement
2019-08-29 13:30:57 -07:00
Kevin Klues
eb0216e54e Update semantics to set Preferred field in TopologyHint generation
We now only set Preferred to true if resources can be allocated with a
size equal to the minimimum _possible_ mask when all resources are
available.
2019-08-29 14:32:10 -05:00
Kevin Klues
e0e8b3e4fd Update CPUManager topology helpers to accept multiple ids 2019-08-29 13:22:54 -05:00
Rajdeep Das
c02d49d775 Update running_pod_count and running_container_count metric
As already mentioned in this issue https://github.com/kubernetes/kubernetes/issues/79286, some metrics like
"running_pod_count" and "running_container_count" uses non-standard prometheus metrics, this change converts them to be
standard prometheus gauges

Minor refactor in kubelet/pleg/generic.go and added some test for ruuning container and running pod metrics

Fixed issues related to github CI pipeline failure

* Updated bazel for new deps
* Add comment for exported metrics variables,RuuningContainerCount and RunningPodCount
* Specify keys explicitly in Guage metric instantation

Fix go lint errors

Replace "+=1" with "++", as reported by go lint

Set container state as a label for the metrics "running_container_count"

As per the metrics name "running_container_count" it should "ideally" be showing
the number of containers in "running" state , but it was showing all the container count, irrespective of the state it is in.
This commit adds a new label "container_running_state" to the metrics "running_container_count", which doesn't change the base metrics but adds the
option to query the metrics with "container_state" such as "running"/"unknown/...

remove unused methods reported by staticcheck

Remove variables while instantiating gauge(vec) which are default set to nil

Convert kubelet metrics(running_pod_count and running_container_count) to standard gauges and added label to running_container_count metrics.

Currently kubelet metrics(running_pod_count and running_container_count) use non-standard prometheus collectors , this change
converts them to standard prometheus gauges. Also this adds a new label(container_state) to running_container_count which does a breakdown of
containers tracked by kubelet based on the containers' state(running/unknown/created/exited).

Set statbility explicitly for running_pod_count and running_container_count and reformat test

register metrics explicitly in test , so that they don't become no-op
2019-08-29 17:23:04 +02:00
Kevin Klues
dcc9f66311 Add devicemanager tests for TopologyHint consumption 2019-08-29 08:22:50 -05:00
Kevin Klues
cc567afaf0 Consume TopologyHints in the devicemanager 2019-08-29 08:22:50 -05:00
Kevin Klues
a3320f80d9 Add devicemanager tests for TopologyHint generation 2019-08-29 07:45:43 -05:00
Kevin Klues
d3d7a8f5d4 Generate TopologyHints from the devicemanager 2019-08-29 07:45:43 -05:00
Louise Daly
9a118ceac4 Added stub support for Topology Manager to Device Manager
Co-authored-by: Conor Nolan <conor.nolan@intel.com>
Co-authored-by: Sreemanti Ghosh <sreemanti.ghosh@intel.com>
Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-29 07:45:43 -05:00
Kevin Klues
1c1f19c61c Change Topology.NUMANode in device plugin interface to a repeated field 2019-08-29 07:45:43 -05:00
Kubernetes Prow Robot
7d4d17583b Merge pull request #81722 from klueska/upstream-add-socket-awreness-to-topologymanager
Add NUMA Node awareness to the TopologyManager
2019-08-29 05:30:58 -07:00
sunxiaofei03
45d41ed9e5 replace iteration with hashmap in *state_of_world 2019-08-29 19:22:25 +08:00
KEBE
8dc401d141 Fix sync pod log format and a func typo. 2019-08-29 14:39:43 +08:00
xieyanker
4b775046d4 remove stateCheckPeriod
If exec logForceCheckPeriod, there is no need to exec stateCheckPeriod
2019-08-29 11:06:45 +08:00
Kubernetes Prow Robot
ca5babc1da Merge pull request #81534 from logicalhan/kubelet-migration
migrate kubelet's metrics/probes & metrics endpoint to metrics stability framework
2019-08-28 18:26:45 -07:00
Kevin Klues
ddfd9ac0ca Fix bug in CPUManager with setting topology for policies
Also add a check in the unit tests to avoid regressions
2019-08-28 17:32:25 -05:00
Jordan Liggitt
aef05c8dca Plumb NextProtos to TLS client config, honor http/2 client preference 2019-08-28 16:51:56 -04:00
Kubernetes Prow Robot
c6a506bb8c Merge pull request #78174 from gaorong/oom-event
enrich kubelet system oom event message info
2019-08-28 12:01:13 -07:00
Han Kang
3a50917795 migrate kubelet's metrics/probes & metrics endpoint to metrics stability framework 2019-08-28 11:16:38 -07:00
Kevin Klues
df1b54fc09 Fail fast with TopologyManager on machines with more than 8 NUMA Nodes 2019-08-28 11:04:52 -05:00
Kevin Klues
5660cd3cfb Add NUMA Node awareness to the TopologyManager 2019-08-28 11:04:52 -05:00
Kubernetes Prow Robot
35867b160a Merge pull request #81951 from klueska/upstream-update-cpu-amanger-numa-mapping
Update the CPUManager to include NUMANodeID in its topology information
2019-08-28 08:55:40 -07:00
Kubernetes Prow Robot
879418a714 Merge pull request #81828 from mars1024/bugfix/delete_lo_network
delete lo network when TearDownPod to avoid CNI cache leak
2019-08-28 03:09:11 -07:00
Kubernetes Prow Robot
de1cfa9bc1 Merge pull request #81787 from lmdaly/topology-manager-rename-strict-policy
Renaming strict policy to restricted policy
2019-08-28 01:38:04 -07:00
Kubernetes Prow Robot
08b67378d3 Merge pull request #81397 from ddebroy/win-socket
Support Kubelet PluginWatcher in Windows
2019-08-28 01:37:12 -07:00
Kubernetes Prow Robot
6cf7f3c342 Merge pull request #80320 from wk8/wk8/gmsa_cleanup
Make container removal fail if platform-specific containers fail
2019-08-27 22:41:29 -07:00
Deep Debroy
1321c9115b Support PluginWatcher in Windows
Signed-off-by: Deep Debroy <ddebroy@docker.com>
2019-08-27 16:24:38 -07:00
Kevin Klues
f4dbd29cdb Rename TopologyHint.SocketAffinity to TopologyHint.NUMANodeAffinity
As part of this, update the logic to use the NUMA information instead of
the Socket information when generating and consuming TopologyHints in
the CPUManager.
2019-08-27 16:51:05 -05:00
Kevin Klues
ecc14fe661 Update CPUManager to include NUMANodeID in CPUTopology
Unfortunately, the NUMA information is not readily available from
cadvisor, so we have to roll the logic to discover it by hand. In the
future, we should remove this custiom code to use the information
provided by cadvisor once it is made available.
2019-08-27 16:51:05 -05:00
Kevin Klues
869962fa48 Cache the discovered topology in the CPUManager instead of MachineInfo 2019-08-27 16:23:07 -05:00
Michael Taufen
9dcf4d4ae2 derive node CSR hashes from public keys
These hashes were previously derived from the private key.
This is not a best practice. After this PR they are derived from public
keys.
2019-08-27 09:41:41 -07:00
Bruce Ma
ec342ec98f delete lo network when TearDownPod to avoid CNI cache leak
Signed-off-by: Bruce Ma <brucema19901024@gmail.com>
2019-08-27 19:26:23 +08:00
hwdef
9b3f577b1d fix typo in pkg 2019-08-26 09:25:39 +08:00
Kubernetes Prow Robot
f105fef3d5 Merge pull request #81429 from huffmanca/resize_block_volume
Enables resizing of block volumes.
2019-08-23 17:59:05 -07:00
Kubernetes Prow Robot
d5f9a81d0f Merge pull request #79873 from tedyu/kube-runtime
Set runtimeState when RuntimeReady is not set or false
2019-08-23 17:58:37 -07:00
Kubernetes Prow Robot
0e1bad3764 Merge pull request #81747 from Random-Liu/fix-windows-log-follow
Fix windows kubectl log -f.
2019-08-23 06:53:24 -07:00
Kubernetes Prow Robot
10bd85a127 Merge pull request #81663 from jfbai/update-existing-node-lease-with-retry
Update existing node lease with retry.
2019-08-22 22:03:30 -07:00
Hemant Kumar
9dbe0b3ad8 Fix devicePath for raw block expansion
Fix tests
2019-08-22 22:48:46 -04:00
Jianfei Bai
aa173c834f Remove duplicated log. 2019-08-23 10:09:24 +08:00
Jean Rouge
4d4edcb27b Make container removal fail if platform-specific containers fail
https://github.com/kubernetes/kubernetes/pull/74737 introduced a new in-memory
map for the dockershim, that could potentially (in pathological cases) cause
memory leaks - for containers that use GMSA cred specs, get created
successfully, but then never get started nor removed.

This patch addresses this issue by making container removal fail altogether
when platform-specific clean ups fail: this allows clean ups to be retried
later, when the kubelet attempts to remove the container again.

Resolves issue https://github.com/kubernetes/kubernetes/issues/74843.

Signed-off-by: Jean Rouge <rougej+github@gmail.com>
2019-08-22 18:03:48 -07:00
Kubernetes Prow Robot
a3488b4cee Merge pull request #81206 from tallclair/staticcheck-kubelet-push
Cleanup Kubelet static analysis issues
2019-08-22 15:09:43 -07:00
Kubernetes Prow Robot
37651f1cef Merge pull request #80368 from danwinship/iptables-checks
iptables feature detection improvements
2019-08-22 13:31:20 -07:00
Christian Huffman
7a4cdf5ab2 Included resizing for CSI-based block volumes.
Perform a no-op when volume is of type raw block

Fix bug with checking volume mounts for readonly
2019-08-22 15:45:57 -04:00
Kubernetes Prow Robot
6b47754740 Merge pull request #81627 from tallclair/copy
Delete duplicate resource.Quantity.Copy()
2019-08-22 11:13:13 -07:00
Kubernetes Prow Robot
2af52db689 Merge pull request #81529 from tedyu/asoww-get-pod
Drop GetPods from ActualStateOfWorld
2019-08-22 08:43:24 -07:00
Kubernetes Prow Robot
53f4e0fa26 Merge pull request #57741 from dixudx/container_hash
Compute container hash based on API content, not go type
2019-08-22 08:42:46 -07:00
Louise Daly
2fb94231d0 Renaming strict policy to restricted policy
Restricted policy will fail admission of guaranteed pods where
all requested resources are not available on a single NUMA Node
2019-08-22 07:57:55 +01:00
Di Xu
739cdc8a8c Omit nil or empty field when calculating hash value 2019-08-22 13:46:52 +08:00