kubernetes

Author	SHA1	Message	Date
Kubernetes Submit Queue	14b32888de	Merge pull request #52635 from Random-Liu/fix-cri-stats Automatic merge from submit-queue (batch tested with PRs 51337, 47080, 52646, 52635, 52666). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. Fix CRI container/imagefs stats. `ContainerStats`, `ListContainerStats` and `ImageFsInfo` are returning `not implemented` error now. This PR fixes it. @yujuhong @feiskyer @yguo0905	2017-09-19 17:31:11 -07:00
Kubernetes Submit Queue	0bd2ed16a0	Merge pull request #47080 from jingxu97/May/allocatable Automatic merge from submit-queue (batch tested with PRs 51337, 47080, 52646, 52635, 52666). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. Map a resource to multiple signals in eviction manager It is possible to have multiple signals that point to the same type of resource, e.g., both SignalNodeFsAvailable and SignalAllocatableNodeFsAvailable refer to the same resource NodeFs. Change the map from map[v1.ResourceName]evictionapi.Signal to map[v1.ResourceName][]evictionapi.Signal What this PR does / why we need it: Which issue this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged): fixes #52661 Special notes for your reviewer: Release note: ```release-note ```	2017-09-19 17:31:07 -07:00
Kubernetes Submit Queue	08486ab4aa	Merge pull request #52561 from jiayingz/deviceplugin-failure Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. Fixes a race in deviceplugin/manager_test.go and a race in deviceplug… …in/manager.go. What this PR does / why we need it: Which issue this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged): fixes # https://github.com/kubernetes/kubernetes/issues/52560 Special notes for your reviewer: Tested with go test -count 50 -race k8s.io/kubernetes/pkg/kubelet/deviceplugin and all runs passed. Release note: ```release-note ```	2017-09-19 13:35:44 -07:00
Kubernetes Submit Queue	f80999f438	Merge pull request #48970 from caseydavenport/fix-kubelet-restart Automatic merge from submit-queue (batch tested with PRs 48970, 52497, 51367, 52549, 52541). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. Recreate pod sandbox when the sandbox does not have an IP address. What this PR does / why we need it: Attempts to fix a bug where Pods do not receive networking when the kubelet restarts during pod creation. Which issue this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged): fixes # https://github.com/kubernetes/kubernetes/issues/48510 Release note: ```release-note NONE ```	2017-09-19 01:17:39 -07:00
wackxu	d8aa0ca82a	fix the bad code comment and make the format unify	2017-09-19 11:15:10 +08:00
Chakravarthy Nelluri	b8d1c3bcd8	Fix volume remount on reboot	2017-09-18 16:28:21 -04:00
Jiaying Zhang	34dccc5d2a	Fixes some races in deviceplugin manager_test.go and manager.go.	2017-09-18 13:19:51 -07:00
Lantao Liu	d387eab817	Fix CRI container/imagefs stats.	2017-09-18 07:48:20 +00:00
FengyunPan	bfc171ccaa	Improve codes which checks whether sandbox contains containers Currently when evictSandboxes() checks whether sandbox contains containers, it traverses all the containers for every sandbox, but when cluster has many containres, it wastes a lot of time. It is better to use sets in this case.	2017-09-18 14:34:34 +08:00
Kubernetes Submit Queue	3277de69b4	Merge pull request #52176 from liggitt/heartbeat-timeout Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. Eliminate hangs/throttling of node heartbeat Fixes https://github.com/kubernetes/kubernetes/issues/48638 Fixes #50304 Stops kubelet from wedging when updating node status if unable to establish tcp connection. Notes that this only affects the node status loop. The pod sync loop would still hang until the dead TCP connections timed out, so more work is needed to keep the sync loop responsive in the face of network issues, but this change lets existing pods coast without the node controller trying to evict them ```release-note kubelet to master communication when doing node status updates now has a timeout to prevent indefinite hangs ```	2017-09-16 09:45:29 -07:00
supereagle	87c29a08e1	fix typos: remove duplicated word in comments	2017-09-16 14:38:10 +08:00
David Porter	aee1e58d58	Handle nil WritableLayer	2017-09-16 00:13:17 +00:00
David Porter	0b1f806557	Fix nil dereference if storage id is nil	2017-09-16 00:13:04 +00:00
Clayton Coleman	eb0cab5b18	Do not set message when terminationMessagePath not found If terminationMessagePath is set to a file that does not exist, we should not log an error message and instead try falling back to logs (based on the user's request).	2017-09-15 16:27:36 -04:00
Casey Davenport	94bf2b0ccf	Attempt at fixing UTs	2017-09-15 09:23:52 -07:00
Casey Davenport	be5cd7fed2	Recreate pod sandbox when the sandbox does not have an IP address.	2017-09-15 09:23:52 -07:00
Kubernetes Submit Queue	b5fbd71bbc	Merge pull request #52290 from jiayingz/deviceplugin-failure Automatic merge from submit-queue (batch tested with PRs 52452, 52115, 52260, 52290) Fixes device plugin re-registration handling logic to make sure: - If a device plugin exits, its exported resource will be removed. - No capacity change if a new device plugin instance comes up to replace the old instance. What this PR does / why we need it: Which issue this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged): fixes https://github.com/kubernetes/kubernetes/issues/52510 Special notes for your reviewer: Release note: ```release-note ```	2017-09-15 02:00:08 -07:00
Kubernetes Submit Queue	86dc5fceda	Merge pull request #52451 from yujuhong/enable-cri-stats Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237) kubelet: enable CRI container metrics Fixes #46984	2017-09-15 01:08:05 -07:00
Kubernetes Submit Queue	ce5c41ab0f	Merge pull request #52363 from balajismaniam/fix-cpuman-restartpol-never-bug Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781) Make CPU manager release CPUs when Pod enters completed phase. What this PR does / why we need it: When CPU manager is enabled, this PR releases allocated CPUs when container is not running and is non-restartable. Which issue this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged): fixes #52351 Special notes for your reviewer: This bug is only reproduced for pods with `restartPolicy` = `Never` or `OnFailure`. The following output is from a 4 CPU node. This bug can be reproduced as long >= half the cores are requested. pod1.yaml: ``` apiVersion: v1 kind: Pod metadata: name: test-pod1 spec: containers: - image: ubuntu command: ["/bin/bash"] args: ["-c", "sleep 5"] name: test-container1 resources: requests: cpu: 2 memory: 100Mi limits: cpu: 2 memory: 100Mi restartPolicy: "Never" ``` pod2.yaml: ``` apiVersion: v1 kind: Pod metadata: name: test-pod2 spec: containers: - image: ubuntu command: ["/bin/bash"] args: ["-c", "sleep 5"] name: test-container1 resources: requests: cpu: 2 memory: 100Mi limits: cpu: 2 memory: 100Mi restartPolicy: "Never" ``` Run a local Kubernetes cluster with CPU manager enabled. ```sh KUBELET_FLAGS='--feature-gates=CPUManager=true --cpu-manager-policy=static --cpu-manager-reconcile-period=1s --kube-reserved=cpu=500m' ./hack/local-up-cluster.sh ``` _Before:_ Create `test-pod1` using pod1.yaml. ``` ./cluster/kubectl.sh create -f pod1.yaml ``` Wait for the pod to complete and wait another 90 seconds (give enough time for GC to kick-in). Create `test-pod2` using pod2.yaml. ``` ./cluster/kubectl.sh create -f pod2.yaml ``` Get all pods in the cluster. ``` ./cluster/kubectl.sh get pods -a NAME READY STATUS RESTARTS AGE test-pod1 0/1 Completed 0 1m test-pod2 0/1 not enough cpus available to satisfy request 0 9s ``` _After:_ Create `test-pod1` using pod1.yaml. ``` ./cluster/kubectl.sh create -f pod1.yaml ``` Wait for the pod to complete and wait another 90 seconds (give enough time for GC to kick-in). Create `test-pod2` using pod2.yaml. ``` ./cluster/kubectl.sh create -f pod2.yaml ``` Get all pods in the cluster. ``` ./cluster/kubectl.sh get pods -a NAME READY STATUS RESTARTS AGE test-pod1 0/1 Completed 0 1m test-pod2 0/1 Completed 0 9s ```	2017-09-15 00:11:14 -07:00
Kubernetes Submit Queue	20a4112e88	Merge pull request #46542 from derekwaynecarr/quota-ignore-pod-whose-node-lost Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781) Ignore pods for quota marked for deletion whose node is unreachable What this PR does / why we need it: Traditionally, we charge to quota all pods that are in a non-terminal phase. We have a user report that noted the behavior change in kube 1.5 for the node controller to no longer force delete pods whose nodes have been lost. Instead, the pod is marked for deletion, and the reason is updated to state that the node is unreachable. The user expected the quota to be released. If the user was at their quota limit, their application may not be able to create a new replica given the current behavior. As a result, this PR ignores pods marked for deletion that have exceeded their grace period. Which issue this PR fixes xref https://bugzilla.redhat.com/show_bug.cgi?id=1455743 fixes https://github.com/kubernetes/kubernetes/issues/52436 Release note: ```release-note Ignore pods marked for deletion that exceed their grace period in ResourceQuota ```	2017-09-15 00:11:10 -07:00
yanxuean	3150d3ebcb	improve the relation of ExecInContainer and Exec keep the relation between ExecInContainer and Exec be consistence with PortForward in streaming server Signed-off-by: yanxuean <yan.xuean@zte.com.cn>	2017-09-15 09:29:46 +08:00
Jiaying Zhang	5cac9fc984	Fixes device plugin re-registration handling logic to make sure: - If a device plugin exits, its exported resource will be removed. - No capacity change if a new device plugin instance comes up to replace the old instance.	2017-09-14 15:24:46 -07:00
Jordan Liggitt	f8f57d8959	Use separate client for node status loop	2017-09-14 15:56:22 -04:00
David Porter	a854ddb358	Implement metrics for Windows Nodes This implements stats for windows nodes in a new package, winstats. WinStats exports methods to get cadvisor like datastructures, however with windows specific metrics. WinStats only gets node level metrics and information, container stats will go via the CRI. This enables the use of the summary api to get metrics for windows nodes.	2017-09-14 06:32:51 +00:00
Yu-Ju Hong	2c415cc506	kubelet: enable CRI container metrics	2017-09-13 15:09:35 -07:00
Lee Verberne	e2e6a8cd85	Fix typo in kubelet kuberuntime container test Changes "Expetected" to "Expected"	2017-09-13 14:32:48 +02:00
Kubernetes Submit Queue	c6a9b1e198	Merge pull request #52125 from yujuhong/fix-file-sync Automatic merge from submit-queue (batch tested with PRs 52339, 52343, 52125, 52360, 52301) dockershim: check if f.Sync() returns an error and surface it ```release-note dockershim: check the error when syncing the checkpoint. ```	2017-09-12 21:45:56 -07:00
Balaji Subramaniam	e2e356964a	Make CPU manager release allocated CPUs when container enters completed phase.	2017-09-12 21:01:01 -07:00
Kubernetes Submit Queue	b04f81d342	Merge pull request #52344 from smarterclayton/no_log_pull Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352) Log at higher verbosity levels some common SyncPod errors This log message was 90% of all glog.Errorf level statements reported on a production cluster, hiding other more impactful errors. We already log it in start container, but for extra caution we continue to log it at v(3) here (the downside of not logging a start container error is worse than some log spam at higher levels). HandleError() is intended only for unknown and unexpected errors. ```release-note NONE ``` @derekwaynecarr @sjenning	2017-09-12 19:40:03 -07:00
Kubernetes Submit Queue	32f1521cc2	Merge pull request #52046 from dashpole/soft_eviction Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352) [BugFix] Soft Eviction timer works correctly fixes #51516 thresholdsMet should not exclude previously met thresholds when we do not have new stats for a threshold. /assign @vishh @derekwaynecarr cc @kubernetes/sig-node-bugs	2017-09-12 19:39:55 -07:00
Kubernetes Submit Queue	8e95e39c15	Merge pull request #52297 from derekwaynecarr/code-hygiene Automatic merge from submit-queue (batch tested with PRs 51041, 52297, 52296, 52335, 52338) Use cAdvisor constant for crio imagefs What this PR does / why we need it: code hygiene to use a constant from cAdvisor Release note: ```release-note NONE ```	2017-09-12 11:10:10 -07:00
Clayton Coleman	a5ac80cbce	Log at higher verbosity levels some common SyncPod errors	2017-09-12 10:52:31 -04:00
Kubernetes Submit Queue	d8847a8f1d	Merge pull request #52119 from mtaufen/sync-files Automatic merge from submit-queue fsync config checkpoint files after writing @yujuhong brought up that it's possible for a hard reboot to result in empty checkpoint files, if they haven't been synced to disk yet. This PR ensures that Kubelet configuration checkpoints are synced after writing to avoid this issue. fixes #52222 Release note: ```release-note NONE ```	2017-09-12 05:41:25 -07:00
Kubernetes Submit Queue	01154dd3cf	Merge pull request #51870 from feiskyer/sandbox-creds Automatic merge from submit-queue (batch tested with PRs 52264, 51870) Use credentials from providers for docker sandbox image What this PR does / why we need it: Sandbox image lookup uses creds from docker config only; other credential providers are ignored. This is a regression introduced in dockershim. Which issue this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged): fixes #51293 Special notes for your reviewer: Should also cherry-pick this to release-1.6 and release-1.7. Release note: ```release-note Fix credentials providers for docker sandbox image. ```	2017-09-12 02:10:24 -07:00
yanxuean	799d0e5a6e	correct to handler	2017-09-12 13:47:08 +08:00
Derek Carr	cf2c688385	Use cAdvisor constant for crio imagefs	2017-09-11 14:08:00 -04:00
Derek Carr	da01c6d3a2	Ignore pods for quota that exceed deletion grace period	2017-09-11 13:31:52 -04:00
Yu-Ju Hong	aaf26b2eaa	dockershim: remove support for legacy containers The code was first introduced in 1.6 to help pre-CRI-kubelet upgrade to using the CRI implementation. They can safely be removed now.	2017-09-11 08:44:27 -07:00
xiangpengzhao	0484a1c2c5	Remove backward compatibility of hostportChainName	2017-09-10 00:24:00 +08:00
Kubernetes Submit Queue	d6df4a5127	Merge pull request #52063 from mtaufen/dkcfg-e2enode Automatic merge from submit-queue (batch tested with PRs 52047, 52063, 51528) Improve dynamic kubelet config e2e node test and fix bugs Rather than just changing the config once to see if dynamic kubelet config at-least-sort-of-works, this extends the test to check that the Kubelet reports the expected Node condition and the expected configuration values after several possible state transitions. Additionally, this adds a stress test that changes the configuration 100 times. It is possible for resource leaks across Kubelet restarts to eventually prevent the Kubelet from restarting. For example, this test revealed that cAdvisor's leaking journalctl processes (see: https://github.com/google/cadvisor/issues/1725) could break dynamic kubelet config. This test will help reveal these problems earlier. This commit also makes better use of const strings and fixes a few bugs that the new testing turned up. Related issue: #50217 I had been sitting on this until the cAdvisor fix merged in #51751, as these tests fail without that fix. Release note: ```release-note NONE ```	2017-09-08 16:06:56 -07:00
Pengfei Ni	4d5d97438b	Use credentials from providers for docker sandbox image	2017-09-09 07:02:04 +08:00
Kubernetes Submit Queue	943817f57b	Merge pull request #52047 from balajismaniam/cpuman-large-topo-test Automatic merge from submit-queue Added large topology tests for static policy in CPU Manager. What this PR does / why we need it: This PR adds a very large topology test case for the CPU Manager feature. Related to #51180. CC @ConnorDoyle	2017-09-08 15:57:41 -07:00
Kevin	f50761c9d4	fix prober ticking shift for kubelet restarted cases	2017-09-08 17:31:02 +08:00
Yu-Ju Hong	a850614613	dockershim: check if f.Sync() returns an error and surface it	2017-09-07 16:05:02 -07:00
Michael Taufen	a846ba191c	Improve dynamic kubelet config e2e node test and fix bugs Rather than just changing the config once to see if dynamic kubelet config at-least-sort-of-works, this extends the test to check that the Kubelet reports the expected Node condition and the expected configuration values after several possible state transitions. Additionally, this adds a stress test that changes the configuration 100 times. It is possible for resource leaks across Kubelet restarts to eventually prevent the Kubelet from restarting. For example, this test revealed that cAdvisor's leaking journalctl processes (see: https://github.com/google/cadvisor/issues/1725) could break dynamic kubelet config. This test will help reveal these problems earlier. This commit also makes better use of const strings and fixes a few bugs that the new testing turned up. Related issue: #50217	2017-09-07 15:50:17 -07:00
Michael Taufen	47beb80368	fsync config checkpoint files after writing	2017-09-07 14:42:18 -07:00
Kubernetes Submit Queue	ae6b329368	Merge pull request #51644 from sjenning/init-container-status-fix Automatic merge from submit-queue (batch tested with PRs 51239, 51644, 52076) do not update init containers status if terminated fixes #29972 #41580 This fixes an issue where, if a completed init container is removed while the pod or subsequent init containers are still running, the status for that init container will be reset to `Waiting` with `PodInitializing`. This can manifest in a number of ways. If the init container is removed why the main pod containers are running, the status will be reset with no functional problem but the status will be reported incorrectly in `kubectl get pod` for example If the init container is removed why a subsequent init container is running, the init container will be re-executed leading to all manner of badness. @derekwaynecarr @bparees	2017-09-07 14:31:23 -07:00
Derek Carr	27365eb900	Fix cross-build	2017-09-07 09:53:52 -04:00
Kubernetes Submit Queue	a51eb2ac4e	Merge pull request #49202 from cbonte/node-addresses Automatic merge from submit-queue (batch tested with PRs 51728, 49202) Fix setNodeAddress when a node IP and a cloud provider are set What this PR does / why we need it: When a node IP is set and a cloud provider returns the same address with several types, only the first address was accepted. With the changes made in PR #45201, the vSphere cloud provider returned the ExternalIP first, which led to a node without any InternalIP. The behaviour is modified to return all the address types for the specified node IP. Which issue this PR fixes: fixes #48760 Special notes for your reviewer: * I'm not a golang expert, is it possible to mock `kubelet.validateNodeIP()` to avoid the need of real host interface addresses in the test ? * It would be great to have it backported for a next 1.6.8 release. Release note: ```release-note NONE ```	2017-09-06 20:01:00 -07:00
Kubernetes Submit Queue	b6545a086c	Merge pull request #51728 from derekwaynecarr/cadvisor-stats Automatic merge from submit-queue (batch tested with PRs 51728, 49202) Enable CRI-O stats from cAdvisor What this PR does / why we need it: cAdvisor may support multiple container runtimes (docker, rkt, cri-o, systemd, etc.) As long as the kubelet continues to run cAdvisor, runtimes with native cAdvisor support may not want to run multiple monitoring agents to avoid performance regression in production. Pending kubelet running a more light-weight monitoring solution, this PR allows remote runtimes to have their stats pulled from cAdvisor when cAdvisor is registered stats provider by introspection of the runtime endpoint. See issue https://github.com/kubernetes/kubernetes/issues/51798 Special notes for your reviewer: cAdvisor will be bumped to pick up https://github.com/google/cadvisor/pull/1741 At that time, CRI-O will support fetching stats from cAdvisor. Release note: ```release-note NONE ```	2017-09-06 20:00:57 -07:00

... 3 4 5 6 7 ...

5510 Commits