kubernetes

Author	SHA1	Message	Date
Tim Hockin	ef934a2c5e	Add Protocol() method to iptables Enables simpler printing of which IP family the iptables interface is managing.	2020-04-10 15:29:49 -07:00
Giuseppe Scrivano	26d94ad628	kubelet: do not configure the device cgroup Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-04-09 16:18:06 +02:00
Giuseppe Scrivano	a9772b2290	kubelet: adapt cgroup_manager to cgroup v2 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-04-09 16:18:04 +02:00
Giuseppe Scrivano	6d16fee229	kubelet: cpu hard capping is supported on cgroup v2 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-04-09 16:18:03 +02:00
Kubernetes Prow Robot	a34b914e82	Merge pull request #89921 from fromanirh/cpumanager-checkpoint cpumanager: drop old custom file backend	2020-04-08 13:37:58 -07:00
Kubernetes Prow Robot	7061dddf26	Merge pull request #88521 from mattjmcnaughton/mattjmcnaughton/add-error-testing-image-service Add error path testing to image handling by `kubeGenericRuntimeManager`	2020-04-07 22:45:43 -07:00
Francesco Romani	623587ec8b	cpumanager: test: add missing helper add back the missing AssertStateEqual helper; it is needed by some tests we still want to run. Signed-off-by: Francesco Romani <fromani@redhat.com>	2020-04-07 16:59:07 +02:00
Francesco Romani	be0fe3df9b	cpumanager: drop old custom file backend The cpumanager file-based state backend was obsoleted since few releases, aving the cpumanager moved to the checkpointmanager common infrastructure. The old test checking compatibility to/from the old format is also no longer needed, because the checkpoint format is stable (see https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/checkpointmanager). Signed-off-by: Francesco Romani <fromani@redhat.com>	2020-04-07 13:24:48 +02:00
Kubernetes Prow Robot	037db1cb6c	Merge pull request #89687 from dims/update-docker-dependency Update docker dependency and remove deprecated method use	2020-04-06 15:42:14 -07:00
Kubernetes Prow Robot	8cdf21ab4c	Merge pull request #86409 from sshukun/fix-golint Fix go-lint issues in package pkg/kubelet/checkpointmanager/testing/example_checkpoint_formats/v1	2020-04-06 15:42:01 -07:00
Kubernetes Prow Robot	0d8b4b5df4	Merge pull request #85994 from coderanger/patch-1 Tiny typo in a comment.	2020-04-06 15:41:47 -07:00
Kubernetes Prow Robot	9441df3aad	Merge pull request #89808 from fuweid/close-resize-chan remotecommand: close resize channel for notification	2020-04-06 13:47:46 -07:00
Wei Fu	d2b59f10c5	remotecommand: close resize channel for notification Remotecommand package should notify executor by closing resizeChan. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2020-04-06 23:32:23 +08:00
Andrew Sy Kim	2e56866c97	move apparmor annotation constants to k8s.io/api/core/v1 Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>	2020-04-06 10:22:04 -04:00
Davanum Srinivas	7368359782	Stop using deprecated method Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-04-03 15:48:32 -04:00
Kubernetes Prow Robot	b030be376b	Merge pull request #89581 from Wenfeng-GAO/simplify simplify code in topologymanager	2020-04-02 23:07:46 -07:00
Kubernetes Prow Robot	adcbe470d7	Merge pull request #89619 from mattjmcnaughton/mattjmcnaughton/delete-unused-builder-type Delete unused `Builder` type from `kubelet.go`	2020-04-02 21:40:31 -07:00
Kubernetes Prow Robot	69681d7df8	Merge pull request #89286 from uzuku/mar-fix-nil-pointer-in-format Handle nil pod in pod format	2020-04-02 19:04:00 -07:00
Kubernetes Prow Robot	4e6a12223b	Merge pull request #89567 from giuseppe/cgroupv2-unit-test kubelet: add tests for cgroup v2 conversions	2020-04-02 12:42:52 -07:00
Kubernetes Prow Robot	bbe5594409	Merge pull request #89296 from danwinship/random-emptily Don't log whether we're using iptables --random-fully	2020-04-02 12:42:24 -07:00
Noah Kantrowitz	14969831e9	Apply the same style of fix as #87913 but for HTTP methods too. Go does not validate HTTP methods beyond len!=0 and that they don't contain HTTP meta chars like a newline. Also to using string sets instead of maps.	2020-04-02 02:15:04 -07:00
Sascha Grunert	2dfb22b5b7	Remove unnecessary sprintf in node status tests There is no invocation to sprintf needed for those strings so we can remove them. Signed-off-by: Sascha Grunert <sgrunert@suse.com>	2020-04-01 14:16:28 +02:00
Andrew Sy Kim	e2bc3a755f	move well-known kubelet cloud provider annotations to k8s.io/cloud-provider (#88631 ) * move well-known kubelet cloud provider annotations to k8s.io/cloud-provider Signed-off-by: andrewsykim <kim.andrewsy@gmail.com> * cloud provider: rename AnnotationProvidedIPAddr to AnnotationAlphaProvidedIPAddr to indicate alpha status Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>	2020-03-31 23:01:27 -07:00
Kubernetes Prow Robot	357d3c9f93	Merge pull request #89584 from kevtaylor/kep/VolumeSubpathExpansion-Remove-FeatureGate Remove VolumeSubpathEnvExpansion Feature Gate	2020-03-31 20:03:27 -07:00
Kubernetes Prow Robot	1168b4b812	Merge pull request #88006 from tedyu/socket-path Unregister csiplugin even if socket path is gone	2020-03-31 10:54:40 -07:00
Ted Yu	c7bde41478	Unregister csiplugin even if socket path is gone Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-03-31 06:57:05 -07:00
Zhou Peng	930bedf144	[pkg/kubelet]: make func a little comfortable This func has only 1 argument, don't wrap it across multiple lines Signed-off-by: Zhou Peng <p@ctriple.cn>	2020-03-31 16:47:32 +08:00
Davanum Srinivas	a1bceb8915	add import restrictions Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-03-30 10:45:22 -04:00
Davanum Srinivas	765e926d35	Avoid using internal packages for streaming/ package Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-03-29 18:42:27 -04:00
mattjmcnaughton	6b5b8bb186	Delete unused `Builder` type from `kubelet.go` As far as I can tell, nothing uses this type. As a result, it doesn't really provide any benefit, and just clutters `kubelet.go`. There's also the risk of it falling out of date with `NewMainKubelet`, as nothing enforces `NewMainKubelet` being of the `Builder` type.	2020-03-28 20:04:58 -04:00
Kubernetes Prow Robot	fca2963aa2	Merge pull request #89540 from dashpole/fix_metric Fix cpu resource metric type by changing to counter	2020-03-27 14:36:07 -07:00
Kevin Taylor	9fd48b4039	Remove VolumeSubpathEnvExpansion Feature Gate	2020-03-27 16:28:33 +00:00
Wenfeng-GAO	1aebbee7da	simplify code in topologymanager	2020-03-28 00:04:51 +08:00
Giuseppe Scrivano	c4429d8bd4	kubelet: add tests for cgroup v2 conversions follow-up for https://github.com/kubernetes/kubernetes/pull/85218 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-03-27 13:50:57 +01:00
Kubernetes Prow Robot	708dd2ea7a	Merge pull request #89308 from zshihang/sync sync api/v1/pod/util with api/pod/util	2020-03-26 14:10:49 -07:00
Kubernetes Prow Robot	34c8b26c9f	Merge pull request #85218 from giuseppe/cgroupv2 kubelet: add initial support for cgroupv2	2020-03-26 14:10:23 -07:00
David Ashpole	86192d4b9a	fix cpu resource metric type by changing to counter	2020-03-26 13:30:36 -07:00
Kubernetes Prow Robot	4488fd4749	Merge pull request #89053 from bg-chun/move_package migration of re-usable package from pkg/kubelet/cm/cpumanager to pkg/kubelet/cm	2020-03-26 11:14:09 -07:00
yameiwang	6783f991c3	fix function NodeAllocatableRoot	2020-03-26 18:48:05 +08:00
Shihang Zhang	b56da85a77	sync api/v1/pod/util with api/pod/util and remove DefaultContainers	2020-03-24 16:42:32 -07:00
Kubernetes Prow Robot	89dfebb214	Merge pull request #89359 from gongguan/process eviction by process number	2020-03-24 15:27:25 -07:00
Kubernetes Prow Robot	f321d0ed12	Merge pull request #89361 from fuweid/me-use-statsfunc eviction: use previous statsFunc	2020-03-24 00:28:46 -07:00
Kubernetes Prow Robot	907d4c1bb9	Merge pull request #89381 from dashpole/comment_disable_readonly Add comment explaining when to remove cadvisor json endpoints	2020-03-23 20:31:19 -07:00
louisgong	e56d40d048	remove unused param	2020-03-24 09:25:04 +08:00
louisgong	0efb70c0a2	eviction by process number	2020-03-24 09:25:04 +08:00
David Ashpole	b4ed7273da	add comment explaining when to remove the --enable-cadvisor-json-endpoints	2020-03-23 12:52:00 -07:00
Wei Fu	a809aaf03d	eviction: use previous statsFunc No need to use summary to create statsFunc for localStorageEviction. Just use vals from makeSignalObservations. Signed-off-by: Wei Fu <fuweid89@gmail.com>	2020-03-23 19:11:17 +08:00
Dan Winship	8edd656238	Don't log whether we're using iptables --random-fully	2020-03-20 08:06:27 -04:00
Uzuku	302cda742a	Handle nil pod in pod format	2020-03-20 15:30:44 +08:00
Kubernetes Prow Robot	e74ad38854	Merge pull request #89013 from dims/copy-jsonlog-from-docker/docker-locally Copy jsonlog from docker/docker locally	2020-03-19 12:08:37 -07:00
Kubernetes Prow Robot	dfb6993947	Merge pull request #89182 from dims/just-use-runtime-numcpu Just use runtime.NumCPU on windows	2020-03-19 06:05:51 -07:00
Odin Ugedal	2830827442	Add support for removing unsupported huge page sizes When kubelet is restarted, it will now remove the resources for huge page sizes no longer supported. This is required when: - node disables huge pages - changing the default huge page size in older versions of linux (because it will then only support the newly set default). - Software updates that change what sizes are supported (eg. by changing boot parameters).	2020-03-19 13:08:08 +01:00
Kubernetes Prow Robot	34ad7d1984	Merge pull request #88450 from shikanon/fix/golintTypo fix typos error in handlers_test.go file	2020-03-18 14:24:44 -07:00
Kubernetes Prow Robot	0c8ac83e04	Merge pull request #88871 from dashpole/fix_oom Use the container whose limit is hit for system OOMs	2020-03-17 19:27:54 -07:00
Davanum Srinivas	825f99c396	run update-vendor.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-03-17 21:26:07 -04:00
Davanum Srinivas	0c52ffe08f	make local copy of JSONLog Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-03-17 21:25:55 -04:00
Kubernetes Prow Robot	761c72f691	Merge pull request #88348 from tedyu/image-not-nil Check that ImageInspect pointer is not nil	2020-03-17 16:21:01 -07:00
Kubernetes Prow Robot	ffc87f2d0c	Merge pull request #88266 from mattjmcnaughton/mattjmcnaughton/delete-pluginwatcher-DOS-TODO Delete TODO around implementing rate limiting to protect against DOS	2020-03-17 16:20:34 -07:00
Davanum Srinivas	25c3ddf22e	Just use runtime.NumCPU on windows docker folks added NumCPU implementation for windows that supported hot-plugging of CPUs. The implementation used the GetProcessAffinityMask to be able to check which CPUs are active as well. `3707a76921` The golang "runtime" package has also bene using GetProcessAffinityMask since 1.6 beta1: `6410e67a1e` So we don't seem to need the sysinfo.NumCPU from docker/docker. (Note that this is PR is an effort to get away from dependencies from docker/docker) Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-03-17 15:53:52 -04:00
Eric Mountain	22e0ee768b	Removes container RefManager	2020-03-16 14:30:57 +01:00
Byonggon Chun	a3047672d0	move pkg/kubelet/cm/cpumanager/containermap to pkg/kubelet/cm/containermap for reusing containerMap is used in CPU Manager to store all containers information in the node. containerMap provides a mapping from (pod, container) -> containerID for all containers a pod It is reusable in another component in pkg/kubelet/cm which needs to track changes of all containers in the node. Signed-off-by: Byonggon Chun <bg.chun@samsung.com>	2020-03-14 02:38:51 +09:00
Giuseppe Scrivano	bb5ed1b797	kubelet: add initial support for cgroupv2 do a conversion from the cgroups v1 limits to cgroups v2. e.g. cpu.shares on cgroups v1 has a range of [2-262144] while the equivalent on cgroups v2 is cpu.weight that uses a range [1-10000]. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-03-12 08:50:19 +01:00
Kubernetes Prow Robot	562a420d86	Merge pull request #88915 from roycaihw/fix/image-manager-data-race Fix a data race in kubelet image manager	2020-03-11 15:04:37 -07:00
Kubernetes Prow Robot	a37d68ec05	Merge pull request #88917 from adelina-t/fix_pod_admit_handler Implement noopWindowsResourceAllocator	2020-03-11 07:45:37 -07:00
Kubernetes Prow Robot	7989ca4324	Merge pull request #88734 from joelsmith/master Work-around for missing memory metrics on CRI-O exited containers	2020-03-10 16:21:36 -07:00
Haowei Cai	462b75388f	let image cache do sort on write instead of on read to avoid data race and improve efficienty	2020-03-10 15:33:34 -07:00
Adelina Tuvenie	a9f834d17d	Implement noopWindowsResourceAllocator On Windows, the podAdmitHandler returned by the GetAllocateResourcesPodAdmitHandler() func and registered by the Kubelet is nil. We implement a noopWindowsResourceAllocator that would admit any pod for Windows in order to be consistent with the original implementation.	2020-03-10 21:32:23 +01:00
Savitha Raghunathan	3234d34714	moving volume plugin dir to kubelet config - part 1	2020-03-10 16:22:29 -04:00
Clayton Coleman	c26653ced9	kubelet: Also set PodIPs when assign a host network PodIP When we clobber PodIP we should also overwrite PodIPs and not rely on the apiserver to fix it for us - this caused the Kubelet status manager to report a large string of the following warnings when it tried to reconcile a host network pod: ``` I0309 19:41:05.283623 1326 status_manager.go:846] Pod status is inconsistent with cached status for pod "machine-config-daemon-jvwz4_openshift-machine-config-operator(61176279-f752-4e1c-ac8a-b48f0a68d54a)", a reconciliation should be triggered: &v1.PodStatus{ ... // 5 identical fields HostIP: "10.0.32.2", PodIP: "10.0.32.2", - PodIPs: []v1.PodIP{{IP: "10.0.32.2"}}, + PodIPs: []v1.PodIP{}, StartTime: s"2020-03-09 19:41:05 +0000 UTC", InitContainerStatuses: nil, ... // 3 identical fields } ``` With the changes to the apiserver, this only happens once, but it is still a bug.	2020-03-09 18:15:32 -04:00
zyu	78e2668539	Delay sorting of evictUnits slice in kuberuntime_gc Signed-off-by: zyu <yuzhihong@gmail.com>	2020-03-09 12:24:42 -07:00
Kubernetes Prow Robot	ef672c1c2d	Merge pull request #88678 from verult/slow-rxm-attach Parallelize attach operations across different nodes for volumes that allow multi-attach	2020-03-06 13:17:21 -08:00
David Ashpole	fc6b4719fd	Use the container whose limit is hit for system OOMs	2020-03-06 11:06:16 -08:00
Christian Huffman	c6fd25d100	Updated CSIDriver references	2020-03-06 08:21:26 -05:00
Kubernetes Prow Robot	5708511499	Merge pull request #88708 from mikedanese/deleteopts Migrate clientset metav1.DeleteOpts to pass-by-value	2020-03-05 23:09:23 -08:00
Cheng Xing	ef3d66b98b	Parallelize attach operations across different nodes for volumes that allow multi-attach	2020-03-05 22:22:05 -08:00
Kubernetes Prow Robot	cd0057c16a	Merge pull request #88876 from nolancon/none-policy-fix Topology Manager none policy bug fix	2020-03-05 21:40:33 -08:00
Kubernetes Prow Robot	e90c908f64	Merge pull request #88141 from tedyu/pvc-being-del Don't try to create VolumeSpec immediately after underlying PVC is being deleted	2020-03-05 21:39:23 -08:00
Kubernetes Prow Robot	ce01a9bad0	Merge pull request #88857 from nolancon/test-fix Check for nil cpuManager in container manager	2020-03-05 20:05:14 -08:00
Kubernetes Prow Robot	48541a0b16	Merge pull request #87650 from nolancon/beta-feature-gate Update TopologyManager Feature Gate	2020-03-05 20:03:04 -08:00
Ted Yu	723761aa88	Don't try to create VolumeSpec immediately after underlying PVC is being deleted Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-03-05 16:45:50 -08:00
Mike Danese	76f8594378	more artisanal fixes Most of these could have been refactored automatically but it wouldn't have been uglier. The unsophisticated tooling left lots of unnecessary struct -> pointer -> struct transitions.	2020-03-05 14:59:47 -08:00
Mike Danese	c58e69ec79	automated refactor	2020-03-05 14:59:46 -08:00
Joel Smith	da988294ec	Work-around for missing metrics on CRI-O exited containers HPA needs metrics for exited init containers before it will take action. By setting memory and CPU usage to zero for any containers that cAdvisor didn't provide statistics for, we are assured that HPA will be able to correctly calculate pod resource usage.	2020-03-05 13:20:43 -07:00
nolancon	0551d408ac	Bug fix for TM none policy	2020-03-05 14:25:48 +00:00
nolancon	4baa1d967d	Check for nil cpuManager	2020-03-05 07:54:33 +00:00
Kubernetes Prow Robot	7a513b575a	Merge pull request #88440 from smarterclayton/container_success_fix Ensure Kubelet always reports terminating pod container status	2020-03-04 20:13:04 -08:00
Kubernetes Prow Robot	ac32644d6e	Merge pull request #87759 from klueska/upstream-move-cpu-allocation-to-pod-admit Guarantee aligned resources across containers	2020-03-04 20:12:37 -08:00
Clayton Coleman	8bc5cb01a9	kubelet: Clear the podStatusChannel before invoking syncBatch The status manager syncBatch() method processes the current state of the cache, which should include all entries in the channel. Flush the channel before we call a batch to avoid unnecessary work and to unblock pod workers when the node is congested. Discovered while investigating long shutdown intervals on the node where the status channel stayed full for tens of seconds. Add a for loop around the select statement to avoid unnecessary invocations of the wait.Forever closure each time.	2020-03-04 13:34:25 -05:00
Clayton Coleman	8722c834e5	kubelet: Never restart containers in deleting pods When constructing the API status of a pod, if the pod is marked for deletion no containers should be started. Previously, if a container inside of a terminating pod failed to start due to a container runtime error (that populates reasonCache) the reasonCache would remain populated (it is only updated by syncPod for non-terminating pods) and the delete action on the pod would be delayed until the reasonCache entry expired due to other pods. This dramatically reduces the amount of time the Kubelet waits to delete pods that are terminating and encountered a container runtime error.	2020-03-04 13:34:25 -05:00
Yu-Ju Hong	2364c10e2e	kubelet: Don't delete pod until all container status is available After a pod reaches a terminal state and all containers are complete we can delete the pod from the API server. The dispatchWork method needs to wait for all container status to be available before invoking delete. Even after the worker stops, status updates will continue to be delivered and the sync handler will continue to sync the pods, so dispatchWork gets multiple opportunities to see status. The previous code assumed that a pod in Failed or Succeeded had no running containers, but eviction or deletion of running pods could still have running containers whose status needed to be reported. This modifies earlier test to guarantee that the "fallback" exit code 137 is never reported to match the expectation that all pods exit with valid status for all containers (unless some exceptional failure like eviction were to occur while the test is running).	2020-03-04 13:34:25 -05:00
Clayton Coleman	ad3d8949f0	kubelet: Preserve existing container status when pod terminated The kubelet must not allow a container that was reported failed in a restartPolicy=Never pod to be reported to the apiserver as success. If a client deletes a restartPolicy=Never pod, the dispatchWork and status manager race to update the container status. When dispatchWork (specifically podIsTerminated) returns true, it means all containers are stopped, which means status in the container is accurate. However, the TerminatePod method then clears this status. This results in a pod that has been reported with status.phase=Failed getting reset to status.phase.Succeeded, which is a violation of the guarantees around terminal phase. Ensure the Kubelet never reports that a container succeeded when it hasn't run or been executed by guarding the terminate pod loop from ever reporting 0 in the absence of container status.	2020-03-04 13:34:24 -05:00
Kubernetes Prow Robot	9d0cbb7503	Merge pull request #88673 from jsafrane/block-feature-ga Promote block volumes to GA	2020-03-03 12:17:12 -08:00
Kubernetes Prow Robot	06b798781a	Merge pull request #88591 from smarterclayton/status_update kubelet: Avoid sending no-op patches	2020-03-03 09:43:38 -08:00
Jan Safranek	3af671011a	Generated API	2020-03-02 22:21:42 +01:00
Jan Safranek	8536787133	Add unit tests	2020-03-02 12:54:02 +01:00
nolancon	e8538d9b76	Add mutex to Topology Manager Add/RemoveContainer This was exposed as a potential bug during e2e test debugging of this PR.	2020-03-02 04:07:21 +00:00
nolancon	1e613e5a4c	Update TopologyManager Feature Gate: - Alpha to Beta. - True by default. - Remove redundant validation checks.	2020-03-02 03:32:05 +00:00
Rob Scott	132d2afca0	Adding IngressClass to networking/v1beta1 Co-authored-by: Christopher M. Luciano <cmluciano@us.ibm.com>	2020-03-01 18:17:09 -08:00
Jan Safranek	2c1b743766	Promote block volume features to GA	2020-02-28 20:48:38 +01:00
James Munnelly	d5dae04898	certificates: update controllers to understand signerName field Signed-off-by: James Munnelly <james.munnelly@jetstack.io>	2020-02-27 15:54:31 +00:00
Kevin Klues	2327934a86	Rename GetTopologyPodAmitHandler() as GetAllocateResourcesPodAdmitHandler(). It is named as such to reflect its new function. Also remove the Topology Manager feature gate check at higher level kubelet.go, as it is now done in GetAllocateResourcesPodAdmitHandler().	2020-02-27 07:52:43 +00:00
nolancon	a9c6129577	Device Manager - Update unit tests - Pass container to Allocate(). - Loop through containers to call Allocate() on container by container basis.	2020-02-27 07:24:34 +00:00
nolancon	cb9fdc49db	Device Manager - Refactor allocatePodResources - allocatePodResources logic altered to allow for container by container device allocation. - New type PodReusableDevices - New field in devicemanager devicesToReuse	2020-02-27 07:24:34 +00:00
nolancon	0a9bd0334d	CPU Manager - Updates to unit tests: - Where previously we called manager.AddContainer(), we now call both manager.Allocate() and manager.AddContainer(). - Some test cases now have two expected errors. One each from Allocate() and AddContainer(). Existing outcomes are unchanged.	2020-02-27 07:24:34 +00:00
nolancon	467f66580b	CPU Manager - Add check to policy.Allocate() for init conatiners If container allocated CPUs is an init container, release those CPUs back into the shared pool for re-allocation to next container.	2020-02-27 07:24:33 +00:00
nolancon	709989efa2	CPU Manager - Rename policy.AddContainer() to policy.Allocate()	2020-02-27 07:24:33 +00:00
Kevin Klues	0d68bffd03	Change GetTopologyPodAdmitHandler() to be more general GetTopologyPodAdmitHandler() now returns a lifecycle.PodAdmitHandler type instead of the TopologyManager directly. The handler it returns is generally responsible for attempting to allocate any resources that require a pod admission check. When the TopologyManager feature gate is on, this comes directly from the TopologyManager. When it is off, we simply attempt the allocations ourselves and fail the admission on an unexpected error. The higher level kubelet.go feature gate check will be removed in an upcoming PR.	2020-02-27 07:24:26 +00:00
Clayton Coleman	b252865479	kubelet: Avoid sending no-op patches In an e2e run, out of 1857 pod status updates executed by the Kubelet 453 (25%) were no-ops - they only contained the UID of the pod and no status changes. If the patch is a no-op we can avoid invoking the server and continue.	2020-02-26 23:06:38 -05:00
Kubernetes Prow Robot	a726c9c9cb	Merge pull request #88435 from andrewsykim/ccm-clean-up move well known cloud provider taints to k8s.io/cloud-provider/api	2020-02-26 13:33:41 -08:00
Kubernetes Prow Robot	6ec3ea855d	Merge pull request #85282 from serathius/flag-kubelet Add show-hidden-metrics-for-version to kubelet	2020-02-26 03:54:26 -08:00
Kubernetes Prow Robot	16a7650e2b	Merge pull request #86101 from PatrickLang/fix-cpumaximum Fix cpu resource limit on Windows	2020-02-26 00:20:26 -08:00
RainbowMango	7b7c73bf87	Clean up duplicate code and remove import cycle.	2020-02-26 15:19:29 +08:00
Kubernetes Prow Robot	851efa8a34	Merge pull request #84051 from bart0sh/PR0079-multiple-sizes-hugepages Implement support for multiple sizes huge pages	2020-02-25 14:40:27 -08:00
Kubernetes Prow Robot	46fcbcf84d	Merge pull request #84792 from DataDog/eric.mountain/simple_probe_no_ref_fix_master Fixes `No ref for container` in probes after kubelet restart	2020-02-25 11:58:49 -08:00
Marek Siarkowicz	d44d5b35f3	Add show-hidden-metrics-for-version to kubelet	2020-02-25 20:46:34 +01:00
mattjmcnaughton	f215096715	Add error path testing to image handling by `kubeGenericRuntimeManager` In https://github.com/kubernetes/kubernetes/pull/88372, we added the ability to inject errors to the `FakeImageService`. Use this ability to test the error paths executed by the `kubeGenericRuntimeManager` when underlying `ImageService` calls fail. I don't foresee this change having a huge impact, but it should set a good precedent for test coverage, and should the failure case behavior become more "interesting" or risky in the future, we already will have the scaffolding in place with which we can expand the tests.	2020-02-25 08:27:30 -05:00
Patrick Lang	63ff616aa8	Adding Windows CPU limit tests	2020-02-24 19:46:39 +00:00
Patrick Lang	19acf7d051	Fix cpu resource limit on Windows	2020-02-24 19:46:39 +00:00
shikanon	d09ce2d6ac	fix typos error in pkg/kubelet/lifecycle	2020-02-25 01:21:27 +08:00
andrewsykim	8c633356df	move well known cloud provider taints to k8s.io/cloud-provider/api Signed-off-by: andrewsykim <kim.andrewsy@gmail.com>	2020-02-23 19:54:59 -05:00
Kubernetes Prow Robot	20e6883a75	Merge pull request #88290 from tallclair/spr-deprecate Start deprecation process for StreamingProxyRedirects	2020-02-21 10:32:45 -08:00
Kubernetes Prow Robot	0943976757	Merge pull request #83295 from oshothebig/typo Fix typo in docker_sandbox.go	2020-02-21 10:32:32 -08:00
Eric Mountain	4cb28f64ea	Fixes for the `No ref for container` in probes after kubelet restart	2020-02-21 13:32:48 +01:00
Kubernetes Prow Robot	d0983b562d	Merge pull request #84731 from verb/ec-pid Add namespace targeting mode to CRI and kubelet	2020-02-20 04:29:17 -08:00
Kubernetes Prow Robot	224aca4e01	Merge pull request #88251 from kublr/fix/kubelet-systemd-reservation Partially fix incorrect configuration of kubepods.slice unit by kubelet	2020-02-19 16:11:25 -08:00
Ted Yu	884e5ee3d4	Check that ImageInspect pointer is not nil	2020-02-19 12:45:54 -08:00
Tim Allclair	98ad7416fa	Start deprecation process for StreamingProxyRedirects	2020-02-19 10:53:45 -08:00
Ed Bartosh	0eb65bd7da	Implement support for multiple sizes huge pages This implementation allows Pod to request multiple hugepage resources of different size and mount hugepage volumes using storage medium HugePage-<size>, e.g. spec: containers: resources: requests: hugepages-2Mi: 2Mi hugepages-1Gi: 2Gi volumeMounts: - mountPath: /hugepages-2Mi name: hugepage-2mi - mountPath: /hugepages-1Gi name: hugepage-1gi ... volumes: - name: hugepage-2mi emptyDir: medium: HugePages-2Mi - name: hugepage-1gi emptyDir: medium: HugePages-1Gi NOTE: This is an alpha feature. Feature gate HugePageStorageMediumSize must be enabled for it to work.	2020-02-19 18:15:40 +02:00
Kubernetes Prow Robot	64340bd914	Merge pull request #87906 from smarterclayton/evict_limit kubelet: Record kubelet_evictions when limits are hit	2020-02-18 22:40:25 -08:00
YuikoTakada	a80564dbdd	Fix non-ascii characters in pkg/kubelet/qos/doc.go	2020-02-19 05:07:26 +00:00
Kubernetes Prow Robot	3d70825195	Merge pull request #87933 from jdef/fix/86367 Fix docker/journald logging conformance	2020-02-18 20:58:25 -08:00
Clayton Coleman	af9e0be163	kubelet: Record kubelet_evictions when limits are hit The pod, container, and emptyDir volumes can all trigger evictions when their limits are breached. To ensure that administrators can alert on these type of evictions, update kubelet_evictions to include the following signal types: * ephemeralcontainerfs.limit - container ephemeral storage breaches its limit * ephemeralpodfs.limit - pod ephemeral storage breaches its limit * emptydirfs.limit - pod emptyDir storage breaches its limit	2020-02-18 15:08:30 -05:00
mattjmcnaughton	f5080850fc	Delete TODO in `image_gc_manager` I think the TODO here may have actually been unnecessary. There isn't a ton of interest around merging https://github.com/kubernetes/kubernetes/pull/87425, which contains a fix. Delete the TODO so we don't devote time to working on this area in the future.	2020-02-18 08:35:29 -05:00
mattjmcnaughton	ccf488fe11	Delete TODO around implementing rate limiting to protect against DOS From discussions in https://github.com/kubernetes/kubernetes/pull/83784, it appears there actually isn't a ton of risk around a DOS. Delete the TODO.	2020-02-18 08:24:25 -05:00
Oleg Chunikhin	b651178849	fix incorrect configuration of kubepods.slice unit by kubelet (issue #88197 )	2020-02-17 13:22:45 -05:00
Kubernetes Prow Robot	1c60045db0	Merge pull request #88173 from BenTheElder/gives-a-whole-new-pause upgrade pause everywhere	2020-02-15 02:11:27 -08:00
Kubernetes Prow Robot	1a0f923a65	Merge pull request #87712 from alena1108/jan30kubelet Ineffassign fixes for pkg/controller and kubelet	2020-02-14 14:29:27 -08:00
James DeFelice	0e178f9341	rename to sharedLimitWriter	2020-02-14 13:48:41 -06:00
Benjamin Elder	1631825e44	bump pause to 3.2 in kubelet	2020-02-14 11:40:15 -08:00
Kubernetes Prow Robot	3273cd99b1	Merge pull request #88138 from thockin/sig-net-driver-approvers Create an OWNERS alias for net-driver-approvers	2020-02-13 21:10:40 -08:00
Tim Hockin	fc5b08569f	Create an OWNERS alias for net-driver-approvers	2020-02-13 14:43:45 -08:00
James DeFelice	a4230055f3	address review feedback	2020-02-13 11:44:11 -06:00
Hemant Kumar	c058073046	Add a event to PV when mount fails because of fs mismatch Filesystem mismatch is a special event. This could indicate either user has asked for incorrect filesystem or there is a error from which mount operation can not recover on retry. Co-Authored-By: Jordan Liggitt <jordan@liggitt.net>	2020-02-13 12:29:42 -05:00
Kubernetes Prow Robot	52fb02fdbe	Merge pull request #87718 from wojtek-t/kubelet_not_watching_immutable_secret_configmaps WatchBasedManager stops watching immutable objects	2020-02-11 23:14:33 -08:00
Kevin Klues	0b168f0243	Change devicemanager to implement HintProvider.Allocate() This change will not work on its own. Higher level code needs to make sure and call Allocate() before AddContainer is called. This is already being done in cases when the TopologyManager feature gate is enabled (in the PodAdmitHandler of the TopologyManager). However, we need to make sure we add proper logic to call it in cases when the TopologyManager feature gate is disabled.	2020-02-10 03:27:47 +00:00
Kevin Klues	91f91858a5	Change CPUManager to implement HintProvider.Allocate() This change will not work on its own. Higher level code needs to make sure and call Allocate() before AddContainer is called. This is already being done in cases when the TopologyManager feature gate is enabled (in the PodAdmitHandler of the TopologyManager). However, we need to make sure we add proper logic to call it in cases when the TopologyManager feature gate is disabled.	2020-02-10 03:27:47 +00:00
Kevin Klues	9e4ee5ecc3	Add Allocate() call to TopologyManager's HintProvider interface Having this interface allows us to perform a tight loop of: for each container { containerHints = {} for each provider { containerHints[provider] = provider.GatherHints(container) } containerHints.MergeAndPublish() for each provider { provider.Allocate(container) } } With this in place we can now be sure that the hints gathered in one iteration of the loop always consider the allocations made in the previous.	2020-02-10 03:27:47 +00:00
Kevin Klues	a3f099ea4d	Split devicemanager Allocate into two functions Instead of having a single call for Allocate(), we now split this into two functions Allocate() and UpdatePluginResources(). The semantics split across them: // Allocate configures and assigns devices to a pod. From the requested // device resources, Allocate will communicate with the owning device // plugin to allow setup procedures to take place, and for the device // plugin to provide runtime settings to use the device (environment // variables, mount points and device files). Allocate(pod v1.Pod) error // UpdatePluginResources updates node resources based on devices already // allocated to pods. The node object is provided for the device manager to // update the node capacity to reflect the currently available devices. UpdatePluginResources( node schedulernodeinfo.NodeInfo, attrs *lifecycle.PodAdmitAttributes) error As we move to a model in which the TopologyManager is able to ensure aligned allocations from the CPUManager, devicemanger, and any other TopologManager HintProviders in the same synchronous loop, we will need to be able to call Allocate() independently from an UpdatePluginResources(). This commit makes that possible.	2020-02-10 03:27:47 +00:00
Kubernetes Prow Robot	ac97b2d65e	Merge pull request #83507 from lyft/support-resetting-cpuacct Prevent returning invalid usageNanoCores value when cpuacct is reset in a live container	2020-02-09 08:45:53 -08:00
Kubernetes Prow Robot	abe6321296	Merge pull request #87952 from mikedanese/opts add *Options to Create, Update, and Patch in generated clientsets	2020-02-08 20:43:53 -08:00
Kubernetes Prow Robot	d09f8b9d54	Merge pull request #79409 from takmatsu/add-phase Modify Kubelet Pod Resources API to get only active pods	2020-02-08 16:09:52 -08:00
Mike Danese	bfc75d9a5c	manual fixes	2020-02-08 12:32:33 -05:00
Mike Danese	25651408ae	generated: run refactor	2020-02-08 12:30:21 -05:00
Kubernetes Prow Robot	dde6e8e746	Merge pull request #87858 from smarterclayton/different_type kubelet: Debug pod status output diff is wrong	2020-02-08 06:44:06 -08:00
Kubernetes Prow Robot	334d788f08	Merge pull request #87299 from mikedanese/ctx context in client-go	2020-02-08 06:43:52 -08:00
Kubernetes Prow Robot	b3ba969756	Merge pull request #87913 from cheftako/master Add code to fix kubelet/metrics memory issue.	2020-02-07 21:51:53 -08:00
Mike Danese	2637772298	some manual fixes	2020-02-07 18:17:40 -08:00
Mike Danese	3aa59f7f30	generated: run refactor	2020-02-07 18:16:47 -08:00
Kubernetes Prow Robot	d8b325b534	Merge pull request #85856 from adelina-t/cpu_requests_fix_ctrd Fix Cpu Requests priority Windows.	2020-02-07 15:19:58 -08:00
Walter Fender	9802bfcec0	Add code to fix kubelet/metrics memory issue. Bucketing url paths based on concept/handling. Bucketing code placed by handling code to encourage usage. Added unit tests. Fix format.	2020-02-07 15:12:24 -08:00
James DeFelice	6f8dfdfb0e	Fix docker/journald logging conformance See issue #86367	2020-02-07 16:04:27 -06:00
Kubernetes Prow Robot	8af70aeafc	Merge pull request #87390 from mattjmcnaughton/mattjmcnaughton/refactor-docker-specific-oom-out-of-qos-pkg Refactor docker specific oom const out of qos pkg	2020-02-07 12:31:48 -08:00
Takeaki Matsumoto	785fac6826	Make updateAllocatedDevices() as a public method and call it in podresources api	2020-02-07 13:26:56 +09:00
Kubernetes Prow Robot	3dc3c7c653	Merge pull request #87788 from ahg-g/ahg-filter Reduce overhead of error message formatting and allocation for NodeResource filter	2020-02-06 17:46:50 -08:00
Kubernetes Prow Robot	6ac8f2c70c	Merge pull request #87758 from klueska/upstream-cleanup-topology-manager Cleanup TopologyManager and update policy.Merge()	2020-02-06 17:46:27 -08:00
Alena Prokharchyk	d634ed3850	Removed unnecessary not nil check in node registration process	2020-02-06 15:49:15 -08:00
marosset	999fdfaddf	Calling hcsshim instead of docker api to get stats for windows to greatly reduce latency	2020-02-06 17:59:10 +00:00
Clayton Coleman	aed4d639a5	kubelet: Debug pod status output diff is wrong The types were different so the diff output is not useful, both should be pointers: ``` Feb 05 19:44:40 ci-ln-6k7l4-w-c-w9wbb.c.openshift-gce-devel-ci.internal hyperkube[2737]: I0205 19:44:40.222259 2737 status_manager.go:642] Pod status is inconsistent with cached status for pod "prometheus-k8s-1_openshift-monitoring(0e9137b8-3bd2-4353-b7f5-672749106dc1)", a reconciliation should be triggered: Feb 05 19:44:40 ci-ln-6k7l4-w-c-w9wbb.c.openshift-gce-devel-ci.internal hyperkube[2737]: interface{}( Feb 05 19:44:40 ci-ln-6k7l4-w-c-w9wbb.c.openshift-gce-devel-ci.internal hyperkube[2737]: - s"&PodStatus{Phase:Running,Conditions:[]PodCondition{PodCondition{Type:Initialized,Status:True,LastProbeTime:0001-01-01 00:00:00 +0000 UTC,LastTransitionTime:2020-02-05 19:13:30 +0000 UTC,Reason:,Message:,},PodCondit> Feb 05 19:44:40 ci-ln-6k7l4-w-c-w9wbb.c.openshift-gce-devel-ci.internal hyperkube[2737]: + v1.PodStatus{ Feb 05 19:44:40 ci-ln-6k7l4-w-c-w9wbb.c.openshift-gce-devel-ci.internal hyperkube[2737]: + Phase: "Running", Feb 05 19:44:40 ci-ln-6k7l4-w-c-w9wbb.c.openshift-gce-devel-ci.internal hyperkube[2737]: + Conditions: []v1.PodCondition{ ```	2020-02-05 14:52:46 -05:00
Kubernetes Prow Robot	d90dd93855	Merge pull request #82111 from xieyanker/xieyanker-patch-2 remove stateCheckPeriod	2020-02-05 04:17:55 -08:00
Abdullah Gharaibeh	0a476eb7d4	reduce overhead of error message formatting and allocation for scheudler NodeResource filter	2020-02-04 11:02:29 -05:00
Kevin Klues	d5addb4090	Cleanup logging and creation logic of TopologyManager in prep for beta	2020-02-03 17:13:29 +00:00
Kevin Klues	bc686ea27b	Update TopologyManager.GetTopologyHints() to take pointers Previously, this function was taking full Pod and Container objects unnecessarily. This commit updates this so that they will take pointers instead.	2020-02-03 17:13:28 +00:00
Kevin Klues	adaa58b6cb	Update TopologyManager.Policy.Merge() to return a simple bool Previously, the verious Merge() policies of the TopologyManager all eturned their own lifecycle.PodAdmitResult result. However, for consistency in any failed admits, this is better handled in the top-level Topology manager, with each policy only returning a boolean about whether or not they would like to admit the pod or not. This commit changes the semantics to match this logic.	2020-02-03 17:13:28 +00:00
Kevin Klues	95a3ac447f	Fix bug in TopologManager RemoveContainer() Previously, we unconditionally removed all topology hints from a pod whenever just one container was being removed. This commit makes it so we only remove the hints for the single container being removed, and then conditionally remove the pod from the podTopologyHints[podUID] when no containers left in it.	2020-02-03 17:13:14 +00:00
Kubernetes Prow Robot	7e5bfe4417	Merge pull request #85472 from dcbw/kubelet-network-approvers kubelet/network: add sig-network-approvers to OWNERS	2020-02-01 12:55:19 -08:00
wojtekt	b11b7d354d	WatchBasedManager stops watching immutable objects	2020-01-31 20:53:21 +01:00
Kubernetes Prow Robot	1baceba376	Merge pull request #87394 from mattjmcnaughton/mattjmcnaughton/delete-sysctl-runtime-admit-handler Delete the sysctl runtime admit handler	2020-01-30 21:20:45 -08:00
Alena Prokharchyk	6c3093f970	Ineffassign fixes for pkg/controller and kubelet	2020-01-30 14:35:10 -08:00
Lee Verberne	d05bcf6800	Add namespace mode targeting to dockershim	2020-01-30 15:31:43 +01:00
Lee Verberne	4d4e111f01	Generated code for kubelet namespace targeting	2020-01-30 15:31:43 +01:00
Lee Verberne	9a6d50cb2a	Add namespace targeting to the kubelet	2020-01-30 15:31:43 +01:00
Kubernetes Prow Robot	7164152844	Merge pull request #87664 from liggitt/revert-parallel-volume Revert "Merge pull request #87258 from verult/slow-rxm-attach"	2020-01-29 22:12:01 -08:00
Kubernetes Prow Robot	ec3fc59f1b	Merge pull request #87627 from tallclair/rc-metrics Register RunPodSandbox* metrics	2020-01-29 22:11:25 -08:00
Jordan Liggitt	cd1059e3c4	Revert "Merge pull request #87258 from verult/slow-rxm-attach" This reverts commit `15c3f1b119`, reversing changes made to `52d7614a8c`.	2020-01-29 14:58:32 -05:00
Mike Danese	d55d6175f8	refactor	2020-01-29 08:50:45 -08:00
Kubernetes Prow Robot	649391af22	Merge pull request #84154 from ohsewon/hugepage_per_container Implement support for setting hugepages limit on container cgroup sandbox.	2020-01-29 07:32:14 -08:00
Tim Allclair	43c7f3be29	Register RunPodSandbox* metrics	2020-01-28 13:26:11 -08:00
Kubernetes Prow Robot	15c3f1b119	Merge pull request #87258 from verult/slow-rxm-attach Parallelize attach operations across different nodes for volumes that allow multi-attach	2020-01-28 08:33:41 -08:00
sewon.oh	463442aa29	Update container hugepage limit when creating the container Unit test for updating container hugepage limit Add warning message about ignoring case. Update error handling about hugepage size requirements Signed-off-by: sewon.oh <sewon.oh@samsung.com>	2020-01-28 09:35:02 +09:00
Cheng Xing	c6a03fa5be	Parallelize attach operations across different nodes for volumes that allow multi-attach	2020-01-27 15:02:25 -08:00
Jiahui Feng	b2bb3dfb59	add logging before kubelet waiting for cert during bootstrapping.	2020-01-27 10:12:36 -08:00
Kubernetes Prow Robot	98f63eee1b	Merge pull request #87460 from nolancon/policies_refactor Refactor Topology Manager policies to reduce code duplication	2020-01-24 21:11:15 -08:00
mattjmcnaughton	9e1c99c4e2	Delete the sysctl runtime admit handler As of https://github.com/kubernetes/kubernetes/pull/72831, the minimum docker version is 1.13.1. (and the minimum API version is 1.26). The only time the `RuntimeAdmitHandler` returns anything other than accept is when the Docker API version < 1.24. In other words, we can be confident that Docker will always support sysctl. As a result, we can delete this unnecessary and docker-specific code.	2020-01-22 08:51:39 -05:00
nolancon	4d76b1c8de	Add mergeFilteredHints: - Move remaining logic from mergeProvidersHints to generic top level mergeFilteredHints function. - Add numaNodes as parameter in order to make generic. - Move single NUMA node specific check to single-numa-node Merge function.	2020-01-22 09:07:41 +00:00
nolancon	fc300e0e7d	Move filterSingleNumaHints call to top level Merge	2020-01-22 08:39:22 +00:00
nolancon	45660fd3a2	Add filterProvidersHints function: - Move initial 'filtering' functionality to generic function filterProvidersHints level policy.go. - Call new function from top level Merge function. - Rename some variables/parameters to reflect changes.	2020-01-22 08:35:28 +00:00
nolancon	df9b2595f3	Update filterHints to filterSingleNumaHints: - Change function name - Remove policy parameter (unnecessary) - Update unit test to reflect change	2020-01-22 07:15:00 +00:00
Wei Huang	3f8b202266	Move GeneralPredicates logic to kubelet.	2020-01-21 09:27:00 -08:00
Kubernetes Prow Robot	9822016bf8	Merge pull request #87397 from klueska/upstream-cpu-manager-set-initial-containers Initialize CPUManager containerMap to set of initial containers	2020-01-20 17:39:50 -08:00
Kubernetes Prow Robot	e6b5194ec1	Merge pull request #84300 from klueska/upstream-cpu-manager-reconcile-on-container-state Update logic in `CPUManager` `reconcileState()`	2020-01-20 12:27:37 -08:00
Kevin Klues	bd9d8fa42f	Initialize CPUManager containerMap to set of initial containers A recent change made it so that the CPUManager receives a list of initial containers that exist on the system at startup. This list can be non-empty, for example, after a kubelet retart. This commit ensures that the CPUManagers containerMap structure is initialized with the containers from this list.	2020-01-20 20:42:29 +01:00
Kubernetes Prow Robot	37ee6425ef	Merge pull request #87255 from klueska/upstream-remove-redundant-active-pods-check Remove check for empty activePods list in CPUManager removeStaleState	2020-01-20 09:05:50 -08:00
Kubernetes Prow Robot	23fa359d6c	Merge pull request #84705 from whypro/cpumanager-panic-master Return error instead of panic when cpu manager fails on startup.	2020-01-20 07:25:37 -08:00
mattjmcnaughton	73b2d83125	Refactor docker specific oom const out of qos pkg Previously, `pkg/kubelet/qos` contained two different docker-specific OOM constants. One set the oom adj for sandbox docker containers and the other set the oom adj for the docker daemon. Move both to be closer to their actual usages in `dockershim`. This change addresses a TODO and leads us towards the overall goal of making `pkg/kubelet` runtime agnostic except for `dockershim`.	2020-01-20 10:22:31 -05:00
Kevin Klues	7be9b0fe55	Update comments and error messages in the CPUManager	2020-01-20 15:31:01 +01:00
Kevin Klues	f2acbf6607	Base CPUManager state reconciliation on container state, not pod state	2020-01-20 13:57:30 +00:00
Kevin Klues	f6cf9b8ce9	Move CPUManager Pod Status logic before container loop	2020-01-20 13:57:30 +00:00
Davanum Srinivas	b3853138a4	update generated files	2020-01-17 11:23:53 -05:00
Kubernetes Prow Robot	50f9ea7999	Merge pull request #85798 from nolancon/merge-policy-rebase Updated - topologymanager: Add Merge method to Policy	2020-01-17 05:14:56 -08:00
Kubernetes Prow Robot	9701baea0f	Merge pull request #87283 from klueska/update-printing-for-tm-bitmask Update bitmask printing to print in groups of 2 instead of all 64 bits	2020-01-16 12:04:32 -08:00
Kevin Klues	708278098a	Update bitmask printing to print in groups of 2 instead of all 64 bits	2020-01-16 17:28:52 +01:00
Kevin Klues	7069b1d6e8	Update TopologyManager single-numa-node logic to handle "don't cares" The logic has been updated to match the logic of the best-effort policy except in two places: 1) The hint filtering frunction has been updated to allow "don't care" hints encoded with a `nil` affinity mask, to pass through the filter in addition to hints that have just a single NUMA bit set. 2) After calculating the `bestHint` we transform "don't care" affinities encoded as having all NUMA bits set in their affinity masks into "don't care" affinities encoded as `nil`.	2020-01-16 08:50:35 +00:00
Kevin Klues	2905ffffa7	Rename TopologyManager test TestPolicyBestEffortMerge for consistency	2020-01-16 08:50:21 +00:00
Kevin Klues	94489c137c	Cleanup use of defaultAffinity in mergePermutation of TopologyManager	2020-01-16 08:50:12 +00:00
nolancon	5e23517ebf	Use reflect.DeepEqual check in policy_test.go	2020-01-16 08:13:07 +00:00
nolancon	92eb7cd601	Update "Single NUMA hint generation" expected affinity to nil	2020-01-16 08:13:07 +00:00
nolancon	8b3f6e61a2	Move test case "Two providers, 1 with 2 hints, 1 with single non-preferred hint matching" into specific policy tests	2020-01-16 08:13:07 +00:00
nolancon	681c42bfc2	Move test case "Two providers, 1 hint each, same mask, 1 preferred, 1 not 2/2" into specific policy tests	2020-01-16 08:13:07 +00:00
nolancon	a38a2562b2	Move test case "Two providers, 1 hint each, same mask, 1 preferred, 1 not 1/2" into specific policy test.	2020-01-16 08:13:07 +00:00
nolancon	f639da7637	Move test case "Two providers, 1 hint each, no common mask" into specific policy tests.	2020-01-16 08:13:07 +00:00
nolancon	401a2bb285	Move test case "Single TopologyHint with Preferred as false and NUMANodeAffinity as nil" into specific policy tests.	2020-01-16 08:13:06 +00:00
nolancon	6460ef6392	Move test case "Single TopologyHint with Preferred as true and NUMANodeAffinity as nil" into specific policy tests.	2020-01-16 08:13:06 +00:00
nolancon	baeff9ec5d	Move test case "HintProvider returns empty non-nil map[string][]TopologyHint from provider" into specific policy tests.	2020-01-16 08:13:06 +00:00
nolancon	599217d482	Move test case "HintProvider returns -nil map[string][]TopologyHint from provider" into specific policy tests	2020-01-16 08:13:06 +00:00
nolancon	57661ee946	Move test case 'HintProvider returns empty non-nil map[string][]TopologyHint' into specific policy tests.	2020-01-16 08:13:06 +00:00
nolancon	51f1af0395	Move test case 'TopologyHint not set' into individual policy tests	2020-01-16 08:13:06 +00:00
nolancon	8466a5852a	Restore policy_test.go to upstream Following commits will contain incremental changes to this file to ease review process and ensure all tests are accounted for.	2020-01-16 08:13:06 +00:00
nolancon	59bb6c4d6f	Update checks in mergeProvidersHints: - Initialize best Hint to TopologyHint{} - Update checks. - Move generic unit test case into policy specific tests and updated expected outcome to reflect changes.	2020-01-16 08:13:06 +00:00
nolancon	6758f95117	Restore original policy none test cases: Mistakenly overwritten in earlier commit	2020-01-16 08:13:06 +00:00
nolancon	2d1a535a35	Make mergePermutation generic: - Remove policy parameters to make function generic - Move function into top level policy.go	2020-01-16 08:13:06 +00:00
nolancon	5487941485	Refactor filterHints: - Restructure function - Remove bug fix for catching {nil true} - To be fixed in later commit - Restore unit tests to original state for testing filterHints	2020-01-16 08:13:06 +00:00
nolancon	adfd11f38f	Make iterateAllProviderTopologyHints generic: - Remove policy parameters to make this function generic. - Move function out of individual policies and into policy.go	2020-01-16 08:13:06 +00:00
nolancon	e43f0a5293	Reinstate canAdmitPodResult in policy_none: This is to keep consistency with the other policies. This change may be made across all policies in a future PR, but removing it from the scope of this PR for now.	2020-01-16 08:13:05 +00:00
nolancon	4cc5b9e46c	Edit hints returned from policies and unit tests: - Best Effort Policy: Return hint with nil affinity as opposed to defaultAffinity when provider has no preference for NUMA affinty or no possible NUMA affinities. - Single NUMA Node Policy: Remove defaultHint from mergeProvidersHints. Instead return appropriate TopologyHint where required. - Update unit tests to reflect changes. Some test cases moved into individual policy test functions due to differing returned affinties per policy.	2020-01-16 08:13:05 +00:00
nolancon	e3d0c9397f	Updates to single-numa-node policy: - Remove getHintMatch method. - Replace with simplified versions of mergePermutation and iterateAllProviderTopologyHints methods - as used in best-effort. - Remove getHintMatch unit tests.	2020-01-16 08:13:05 +00:00
nolancon	b5ca4989e3	Update unit tests: - Update filterHints test to reflect changes in previous commit. - Some common test cases achieve differing expected results based on policy due to independent merge strategies. These cases are moved into individual policy based test functions.	2020-01-16 08:13:05 +00:00
nolancon	17d615bca2	Update filterHints: - Only append valid preferred-true hints to filtered - Return true if allResourceHints only consist of nil-affinity/preferred-true hints: {nil true}, update defaultHint preference accordingly.	2020-01-16 08:13:05 +00:00
Adrian Chiris	9f21f49493	Additional unit tests for Topology Manager methods	2020-01-16 08:13:05 +00:00
Adrian Chiris	f886d2a832	Update single-numa-node policy unit tests	2020-01-16 08:13:05 +00:00
Adrian Chiris	2825a7be1a	Add new functionality for single-numa-node policy: Explanation taken from original commit: - Change the current method of finding the best hint. Instead of going over all permutations, sort the hints and find the narrowest hint common to all resources. - Break out early when merging to a preferred hint is not possible	2020-01-16 08:13:05 +00:00
Adrian Chiris	5ce2ea2773	Return defaultAffinity from PolicyBestEffort: Now that PolicySingleNUMANode is not considered here, return defaultAffinity as was the original case before previous bug fix	2020-01-16 08:13:05 +00:00
Adrian Chiris	eda1521562	Make mergeProviderHints policy-specific: - Remove need to pass policy and numaNodes as arguments - Remove PolicySingleNUMANode special case check in policy_best_effort - Add mergeProviderHints base to policy_single_numa_node for upcoming commit	2020-01-16 08:13:05 +00:00
Adrian Chiris	dc36924c37	Update policy_none removing canAdmitPodResult Update unit tests for none_policy Add Name test for policy_restricted	2020-01-16 08:13:05 +00:00
Adrian Chiris	cf8b098dda	Refactor policy-best-effort - Modularize code with mergePermutation method	2020-01-16 08:13:05 +00:00
Sascha Grunert	278717bc57	Fix ineffectual assignment to CPUSets Signed-off-by: Sascha Grunert <sgrunert@suse.com>	2020-01-16 08:57:42 +01:00
Kevin Klues	34b942a41d	Remove check for empty activePods list in CPUManager removeStaleState This check is redundant since we protect this call with a call to `m.sourcesReady.AllReady()` earlier on. Moreover, having this check in place means that we will leave some stale state around in cases where there are actually no active pods in the system and this loop hasn't cleaned them up yet. This can happen, for example, if a pod exits while the kubelet is down for some reason. We see this exact case being triggered in our e2e tests, where a test has been failing since October when this change was first introduced.	2020-01-15 20:09:24 +01:00
Kevin Klues	5802f3a910	Add proper activePods list in TestGetTopologyHints for CPUManager	2020-01-15 20:08:41 +01:00
Eric Ernst	cdd92d39b7	preemption: typo cleanup Signed-off-by: Eric Ernst <eric.ernst@intel.com>	2020-01-15 10:55:40 -08:00
Kubernetes Prow Robot	208cc18c69	Merge pull request #86728 from mattjmcnaughton/mattjmcnaughton/add-test-coverage-oom-watcher Add test coverage for oom watcher	2020-01-15 01:21:33 -08:00
Kubernetes Prow Robot	de34d2ce1e	Merge pull request #87193 from mattjmcnaughton/mattjmcnaughton/cleanup-rkt-code-in-pleg Clean up rkt specific code in `pkg/kubelet/pleg`	2020-01-14 22:21:46 -08:00
mattjmcnaughton	9077603b97	Add richer unit tests for OomWatcher Add unit tests for OomWatcher that actually test the logic defined in the `Start` method. As a result of an earlier refactor, its now trivial to mock the OOMInstance events which the `oom_watcher` is supposed to be watching.	2020-01-14 07:57:06 -05:00
mattjmcnaughton	ab7e0f58d5	Clean up rkt specific code in `pkg/kubelet/pleg` Clean up code in PLEG which was only necessary for the `rkt` runtime. Rkt is no longer a built-in runtime and docker(shim) uses the CRI, so its safe to remove this code entirely. This diff removes the last mentions of `rkt` in the kubelet.	2020-01-14 07:42:30 -05:00
Kubernetes Prow Robot	be26fbc638	Merge pull request #86282 from RainbowMango/pr_refactor_resource_endpoint Refactor kubelet resource metrics	2020-01-14 02:23:09 -08:00
Kubernetes Prow Robot	f4db8212be	Merge pull request #76496 from danielqsj/metrics-2 Clean deprecated metrics	2020-01-13 20:53:09 -08:00
Kubernetes Prow Robot	61d36e4a43	Merge pull request #85850 from danwinship/kubelet-ipv6-node-ip Allow "kubelet --node-ip ::" to mean prefer IPv6	2020-01-13 17:41:08 -08:00
Kubernetes Prow Robot	8467561f2c	Merge pull request #86783 from mattjmcnaughton/mattjmcnaughton/remove-unnecessary-modification-container-pid-namespace Remove no longer needed `modifyContainerPIDNamespaceOverrides`	2020-01-10 15:43:50 -08:00
Kubernetes Prow Robot	befc371364	Merge pull request #86702 from mattjmcnaughton/mattjmcnaughton/refactor-oom-watcher-to-allow-greater-test-coverage Refactor oom watcher to allow greater test coverage	2020-01-10 15:43:37 -08:00
danielqsj	ab182552b4	clean SinceInMicroseconds, convert to SinceInSeconds	2020-01-10 17:05:38 +08:00
danielqsj	8ae3f80048	remove deprecated metrics of dockershim	2020-01-10 17:05:38 +08:00
danielqsj	1a9b121764	remove deprecated metrics of kubelet	2020-01-10 16:46:52 +08:00
Kubernetes Prow Robot	7a50fdb2a6	Merge pull request #85993 from chendotjs/fix-cidr kubenet: replace gateway with cni result	2020-01-09 20:13:04 -08:00
Kubernetes Prow Robot	792fe793a1	Merge pull request #86946 from cchord/fix_typo fix typo	2020-01-08 14:46:24 -08:00
Kubernetes Prow Robot	fd0358fd21	Merge pull request #86689 from klueska/upstream-fix-cpumanager-v1-state-checksum Lock checksum calculation for v1 CPUManager state to pre 1.18 logic	2020-01-08 02:57:40 -08:00
Jiahao Zhu	680df17f39	fix typo	2020-01-08 15:48:58 +08:00
Kubernetes Prow Robot	8dca390262	Merge pull request #84927 from mattjmcnaughton/mattjmcnaughton/fix-kubelet-config-common Fix golint failures for pkg/kubelet/config/...	2020-01-07 21:09:40 -08:00
mattjmcnaughton	8897c435ad	Refactor oom watcher to allow greater test coverage This diff contains a strict refactor; there are no behavioral changes. Address a long standing TODO in `oom_watcher_linux_test.go` around test coverage. We refactor our `oom.Watcher` so it takes in a struct fulfulling the `streamer` interface (i.e. defines `StreamOoms` method). In production, we will continue to use the `oomparser` from `cadvisor`. However, for testing purposes, we can now create our own `fakeStreamer`, and control how it streams `oomparser.OomInstance`. With this fake, we can implement richer unit testing for the `oom.Watcher` itself. Actually adding the additional unit tests will come in a later commit.	2020-01-07 21:48:14 -05:00
Kubernetes Prow Robot	49e24adf3e	Merge pull request #86832 from mattjmcnaughton/mattjmcnaughton/remove-dead-code-in-fake-docker-client Remove dead code in fake docker client	2020-01-07 07:36:18 -08:00
Dan Winship	ce68edf700	Allow "kubelet --node-ip ::" to mean prefer IPv6	2020-01-07 07:53:21 -05:00
Kubernetes Prow Robot	f3df7a2fdb	Merge pull request #86727 from mattjmcnaughton/mattjmcnaughton/remove-recorder-PastEventf Remove `recorder.PastEventf` method	2020-01-07 04:38:49 -08:00
Kubernetes Prow Robot	dd5272b76f	Merge pull request #86575 from gongguan/nkubemark kubemark use remote cri	2020-01-07 03:20:46 -08:00
Kubernetes Prow Robot	195e8e3ad9	Merge pull request #86844 from mattjmcnaughton/mattjmcnaughton/update-cadvisor-stats-provider-comment Correct comment around which integrations require cadvisor_stats	2020-01-07 01:13:14 -08:00
Kubernetes Prow Robot	8b8f2aa4a5	Merge pull request #85431 from irbull/api-doc Add public documentation for kubelet/apis/config	2020-01-06 23:12:18 -08:00
louisgong	324e5ce7e3	hollow-node use remote CRI	2020-01-07 11:00:45 +08:00
Kubernetes Prow Robot	59b4933fb8	Merge pull request #86724 from gongguan/fix-fake-CRI fix fake remote CRI	2020-01-06 18:06:57 -08:00
Kubernetes Prow Robot	49bc696614	Merge pull request #86251 from bboreham/pleg-last-seen-metric Kubelet: add a metric to observe time since PLEG last seen	2020-01-06 18:06:18 -08:00
Kubernetes Prow Robot	d6412b856f	Merge pull request #84345 from danielqsj/withdialer replace grpc.WithDialer which is deprecated	2020-01-06 15:56:17 -08:00
Kubernetes Prow Robot	19ecd690fa	Merge pull request #86646 from yutedz/client-protocol Require client / server protocols	2020-01-06 13:34:18 -08:00
Kubernetes Prow Robot	b112ad4f0b	Merge pull request #86845 from mattjmcnaughton/mattjmcnaughton/remove-rkt-from-runtime-options Remove `rkt` from container runtime options	2020-01-06 11:12:29 -08:00
Kubernetes Prow Robot	9acf7d11fe	Merge pull request #86344 from klueska/upstream-cm-approver Add klueska as an approver in pkg/kubelet/cm/OWNERS	2020-01-06 09:54:16 -08:00
Ted Yu	906adbdfcd	Require client / server protocols	2020-01-06 08:50:04 -08:00
mattjmcnaughton	794d0d9b4d	Remove `rkt` from container runtime options Part of efforts to clean up mentions of rkt in kubelet. rkt was removed entirely in 1.11, in favor of using `rktlet` and CRI instead. It should no longer be listed at all as a runtime.	2020-01-05 09:27:38 -05:00
mattjmcnaughton	06b44c76fd	Correct comment around which integrations require cadvisor_stats This commit is part of a larger effort to clean up references to `rkt` in the kubelet. Previously, this comment hard-coded which integrations required the cadvisor stats provider. The comment has grown stale (i.e. referenced rkt and did not reference cri-o). Update the comment to instead point to the code which determines which integrations need the cadvisor stats provider.	2020-01-05 09:23:09 -05:00
mattjmcnaughton	f2cb1f35fe	Remove dead code in fake docker client The `FakeDockerClient` had a number of methods defined on it which were not being called anywhere. The majority were of the form `Assert...`. In the spirit of removing dead code, remove the methods which aren't being called.	2020-01-05 08:31:59 -05:00
louisgong	e8eb5c656b	fix fake remote CRI	2020-01-04 08:43:17 +08:00
Aresforchina	2293b47346	add some comments for const variable	2020-01-03 23:28:21 +08:00
Bryan Boreham	cc0b3e82eb	Kubelet: add a metric to observe time since PLEG last seen Expose the measurement that kubelet uses to judge that "PLEG is unhealthy". If we can observe the measurement growing then we can alert before the node goes unhealthy. Note that the existing metrics PLEGRelistInterval and PLEGRelistDuration are poor for this, because when relist() gets stuck they are never updated. Signed-off-by: Bryan Boreham <bryan@weave.works>	2020-01-03 10:01:27 +00:00
mattjmcnaughton	d09fe8e247	Remove no longer needed `modifyContainerPIDNamespaceOverrides` As of https://github.com/kubernetes/kubernetes/pull/72831/, the minimum kubernetes version is now 1.13.1. As a result, this function becomes a no-op. As the TODO indicates, we should delete it.	2020-01-02 09:09:02 -05:00
mattjmcnaughton	92940fa80d	Remove `recorder.PastEventf` method The `recorder.PastEventf` method wasn't actually working as advertised. It was supposed to accept a timestamp, which would be used when generating the event. However, as the [source code](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/record/event.go#L316) shows, this `timestamp` was never actually used. In other words, `PastEventf` is identical to `Eventf`. We have two options: one would be to fix `PastEventf` so that it works as advertised. The other would be to delete `PastEventf` and only support `Eventf`. Ultimately, I could only find one use of `PastEventf` in the code base, so I propose we just delete `PastEventf` and convert all uses to `Eventf`.	2019-12-30 12:00:23 -05:00
Kevin Klues	b373121a14	Make CPUManagerCheckpointV2 type an alias of CPUManagerCheckpoint This change is to prevent problems when we remove the V1->V2 migration code in the future. Without this, the checksums of all checkpoints would be hashed with the name CPUManagerCheckpointV2 embedded inside of them, which is undesirable. We want the checkpoints to be hashed with the name CPUManagerCheckpoint instead.	2019-12-28 19:29:13 +01:00
Kevin Klues	5faf8f4c52	Lock checksum calculation for v1 CPUManager state to pre 1.18 logic The updated CPUManager from PR #84462 implements logic to migrate the CPUManager checkpoint file from an old format to a new one. To do so, it defines the following types: ``` type CPUManagerCheckpoint = CPUManagerCheckpointV2 type CPUManagerCheckpointV1 struct { ... } type CPUManagerCheckpointV2 struct { ... } ``` This replaces the old definition of just: ``` type CPUManagerCheckpoint struct { ... } ``` Code was put in place to ensure proper migration from checkpoints in V1 format to checkpoints in V2 format. However (and this is a big however), all of the unit tests were performed on V1 checkpoints that were generated using the type name `CPUManagerCheckpointV1` and not the original type name of `CPUManagerCheckpoint`. As such, the checksum in the checkpoint file uses the `CPUManagerCheckpointV1` type to calculate its checksum and not the original type name of `CPUManagerCheckpoint`. This causes problems in the real world since all pre-1.18 checkpoint files will have been generated with the original type name of `CPUManagerCheckpoint`. When verifying the checksum of the checkpoint file across an upgrade to 1.18, the checksum is calculated assuming a type name of `CPUManagerCheckpointV1` (which is incorrect) and the file is seen to be corrupt. This patch ensures that all V1 checksums are verified against a type name of `CPUManagerCheckpoint` instead of ``CPUManagerCheckpointV1`. It also locks the algorithm used to calculate the checksum in place, since it wil never change in the future (for pre-1.18 checkpoint files at least).	2019-12-28 14:17:55 +01:00
danielqsj	19fe9f8d94	replace grpc.WithDialer which is deprecated	2019-12-26 17:46:59 +08:00
SataQiu	2497a1209b	bump k8s.io/utils version	2019-12-21 14:54:44 +08:00
Kubernetes Prow Robot	03e90b80ce	Merge pull request #86167 from yiyang5055/change-CounterVec-to-Counter change CounterVec to use Counter in the Kubelet's Pod Lifecycle Event…	2019-12-19 11:33:56 -08:00
whypro	f4bd4e2e96	Return error instead of panic when cpu manager starts failed.	2019-12-19 21:56:23 +08:00
chenyaqi01	c5002a348e	kubenet: replace gateway with cni result	2019-12-19 18:32:25 +08:00
Jacek Kaniuk	4303be3d9f	Revert pull request #85879 "hollow-node use remote CRI"	2019-12-19 10:52:35 +01:00
sshukun	12a5bdec00	Fix go-lint issues in package pkg/kubelet/checkpointmanager/testing/example_checkpoint_formats/v1	2019-12-19 12:53:33 +09:00
Kubernetes Prow Robot	814fc34cde	Merge pull request #85879 from gongguan/cri-kubemark hollow-node use remote CRI	2019-12-18 06:01:57 -08:00
louisgong	e8e1cc9ee0	extract PreInitRuntimeService from NewMainKubelet	2019-12-18 11:48:29 +08:00
Kubernetes Prow Robot	40df9f82d0	Merge pull request #82492 from gnufied/fix-uncertain-mounts Fix uncertain mounts	2019-12-17 14:49:57 -08:00

... 4 5 6 7 8 ...

8604 Commits