Commit Graph

29256 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
b5fbd71bbc Merge pull request #52290 from jiayingz/deviceplugin-failure
Automatic merge from submit-queue (batch tested with PRs 52452, 52115, 52260, 52290)

Fixes device plugin re-registration handling logic to make sure:

- If a device plugin exits, its exported resource will be removed.
- No capacity change if a new device plugin instance comes up to replace the old instance.



**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/52510

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-09-15 02:00:08 -07:00
Kubernetes Submit Queue
b7953a787e Merge pull request #52260 from andyzhangx/azuremounter-issue
Automatic merge from submit-queue (batch tested with PRs 52452, 52115, 52260, 52290)

fix azure disk mounter issue

**What this PR does / why we need it**:
fix azure disk mounter issue, it's a P1 bug, it exists in 1.7, 1.8 release, should cherry pick to 1.7, 1.8

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes #52261 

consider following issue: 
1) A pod mounting an azure disk in a k8s agent
2) The kubelet is restarted in that k8s agent
3) The pod could not start up, it always reports error as following:

  4d            1m              3065    kubelet, 14777acs9000                   Warning         FailedMount     MountVolume.SetUp failed for volume "pvc-7a0cdeb9-92c7-11e7-b86b-000d3a36d70c" : azureDisk - No
t a mounting point for disk andykubewin175-dynamic-pvc-7a0cdeb9-92c7-11e7-b86b-000d3a36d70c on \var\lib\kubelet\pods\d146c023-92c7-11e7-b86b-000d3a36d70c\volumes\kubernetes.io~azure-disk\pvc-7a0cdeb9-92c7-11
e7-b86b-000d3a36d70c
  4d            1m              3157    kubelet, 14777acs9000                   Warning         FailedMount     Error syncing pod

**Special notes for your reviewer**:
If you take a look at following implementation of vsphere or gce, it will return nil instead of error:
https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/vsphere_volume/vsphere_volume.go#L217-L220
https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/gce_pd/gce_pd.go#L273-L275

The logic of return info parsing here, it's wrong to return error
https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_generator.go#L469-L475

**Release note**:

```release-note
```
2017-09-15 02:00:03 -07:00
Kubernetes Submit Queue
935726f109 Merge pull request #52452 from gnufied/fix-quota-on-update
Automatic merge from submit-queue (batch tested with PRs 52452, 52115, 52260, 52290)

Fix support for updating quota on update

This PR implements support for properly handling quota when resources are updated. We never take negative values and add them up.

Fixes https://github.com/kubernetes/kubernetes/issues/51736 

cc @derekwaynecarr 

/sig storage

```release-note
Make sure that resources being updated are handled correctly by Quota system
```
2017-09-15 01:59:56 -07:00
Kubernetes Submit Queue
6b5462803e Merge pull request #52009 from zjj2wry/export-svc
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)

fix issue(#47976)Invalid value error when creating service from expor…

…ted config



**What this PR does / why we need it**:
close issue #47976
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-09-15 01:08:08 -07:00
Kubernetes Submit Queue
86dc5fceda Merge pull request #52451 from yujuhong/enable-cri-stats
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)

kubelet: enable CRI container metrics

Fixes #46984
2017-09-15 01:08:05 -07:00
Kubernetes Submit Queue
7181dd4946 Merge pull request #50476 from caesarxuchao/plumb-proxy
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)

Plumbing the proxy dialer to the webhook admission plugin

* Fixing https://github.com/kubernetes/kubernetes/issues/49987. Plumb the `Dial` function to the `transport.Config`
* Fixing https://github.com/kubernetes/kubernetes/issues/52366. Let the webhook admission plugin sets the `TLSConfg.ServerName`.

I tested it in my gke setup. I don't have time to implement an e2e test before 1.8 release. I think it's ok to add the test later, because *i)* the change only affects the alpha webhook admission feature, and *ii)* the webhook feature is unusable without the fix. That said, it's up to my reviewer to decide.

Filed https://github.com/kubernetes/kubernetes/issues/52368 for the missing e2e test.

( The second commit is https://github.com/kubernetes/kubernetes/pull/52372, which is just a cleanup of client configuration in e2e tests. It removed a function that marshalled the client config to json and then unmarshalled it. It is a prerequisite of this PR, because this PR added the `Dial` function to the config which is not json marshallable.)

```release-note
Fixed the webhook admission plugin so that it works even if the apiserver and the nodes are in two networks (e.g., in GKE).
Fixed the webhook admission plugin so that webhook author could use the DNS name of the service as the CommonName when generating the server cert for the webhook.

Action required:
Anyone who generated server cert for admission webhooks need to regenerate the cert. Previously, when generating server cert for the admission webhook, the CN value doesn't matter. Now you must set it to the DNS name of the webhook service, i.e., `<service.Name>.<service.Namespace>.svc`.
```
2017-09-15 01:08:01 -07:00
Kubernetes Submit Queue
ce5c41ab0f Merge pull request #52363 from balajismaniam/fix-cpuman-restartpol-never-bug
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)

Make CPU manager release CPUs when Pod enters completed phase. 

**What this PR does / why we need it**: When CPU manager is enabled, this PR releases allocated CPUs when container is not running and is non-restartable. 

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #52351

**Special notes for your reviewer**:
This bug is only reproduced for pods with `restartPolicy` = `Never` or `OnFailure`.  The following output is from a 4 CPU node. This bug can be reproduced as long >= half the cores are requested. 

pod1.yaml:
```
apiVersion: v1
kind: Pod
metadata:
  name: test-pod1
spec:
  containers:
  - image: ubuntu
    command: ["/bin/bash"]
    args: ["-c", "sleep 5"]
    name: test-container1
    resources:
      requests:
        cpu: 2
        memory: 100Mi
      limits:
        cpu: 2
        memory: 100Mi
  restartPolicy: "Never"
```

pod2.yaml:
```
apiVersion: v1
kind: Pod
metadata:
  name: test-pod2
spec:
  containers:
  - image: ubuntu
    command: ["/bin/bash"]
    args: ["-c", "sleep 5"]
    name: test-container1
    resources:
      requests:
        cpu: 2
        memory: 100Mi
      limits:
        cpu: 2
        memory: 100Mi
  restartPolicy: "Never"
```
Run a local Kubernetes cluster with CPU manager enabled. 
```sh
KUBELET_FLAGS='--feature-gates=CPUManager=true --cpu-manager-policy=static --cpu-manager-reconcile-period=1s --kube-reserved=cpu=500m' ./hack/local-up-cluster.sh
```
_Before:_
Create `test-pod1` using pod1.yaml. 
```
./cluster/kubectl.sh create -f pod1.yaml
```
Wait for the pod to complete and wait another 90 seconds (give enough time for GC to kick-in). 

Create `test-pod2` using pod2.yaml. 
```
./cluster/kubectl.sh create -f pod2.yaml
```

Get all pods in the cluster. 
```
./cluster/kubectl.sh get pods -a
NAME        READY     STATUS                                         RESTARTS   AGE
test-pod1   0/1       Completed                                      0          1m
test-pod2   0/1       not enough cpus available to satisfy request   0          9s
```

_After:_
Create `test-pod1` using pod1.yaml. 
```
./cluster/kubectl.sh create -f pod1.yaml
```
Wait for the pod to complete and wait another 90 seconds (give enough time for GC to kick-in). 

Create `test-pod2` using pod2.yaml. 
```
./cluster/kubectl.sh create -f pod2.yaml
```

Get all pods in the cluster. 
```
./cluster/kubectl.sh get pods -a
NAME        READY     STATUS      RESTARTS   AGE
test-pod1   0/1       Completed    0          1m
test-pod2   0/1       Completed    0          9s
```
2017-09-15 00:11:14 -07:00
Kubernetes Submit Queue
20a4112e88 Merge pull request #46542 from derekwaynecarr/quota-ignore-pod-whose-node-lost
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)

Ignore pods for quota marked for deletion whose node is unreachable

**What this PR does / why we need it**:
Traditionally, we charge to quota all pods that are in a non-terminal phase.  We have a user report that noted the behavior change in kube 1.5 for the node controller to no longer force delete pods whose nodes have been lost.  Instead, the pod is marked for deletion, and the reason is updated to state that the node is unreachable.  The user expected the quota to be released.  If the user was at their quota limit, their application may not be able to create a new replica given the current behavior.  As a result, this PR ignores pods marked for deletion that have exceeded their grace period.

**Which issue this PR fixes**
xref https://bugzilla.redhat.com/show_bug.cgi?id=1455743
fixes https://github.com/kubernetes/kubernetes/issues/52436

**Release note**:
```release-note
Ignore pods marked for deletion that exceed their grace period in ResourceQuota
```
2017-09-15 00:11:10 -07:00
Harry Zhang
221db2bd6b Refactor node taint conditions 2017-09-15 14:34:11 +08:00
yanxuean
3150d3ebcb improve the relation of ExecInContainer and Exec
keep the relation between ExecInContainer and Exec be consistence with PortForward in streaming server

Signed-off-by: yanxuean <yan.xuean@zte.com.cn>
2017-09-15 09:29:46 +08:00
Madhan Raj Mookkandy
63020d5f72 Vendor changes
Vendoring (NEW) in github.com/Microsoft/hcsshim
2017-09-14 16:00:07 -07:00
Madhan Raj Mookkandy
5b87513972 Fix Bazel build 2017-09-14 15:50:47 -07:00
Madhan Raj Mookkandy
f503755e53 Add Windows Kernel Proxy support
Windows Kernel now exposes "Internal Load Balancing"
	using VFP (Virtual Filtering Platform) part of Virtual Switch. An inbuild
	windows service HNS (Host Networking Service) acts as interface to program
	the VFP. VFP is synonymous to iptables in functionality. HNS uses json based
	data as input.

	With the help of the interface available in github.com/Microsoft/hcsshim,
	these APIs are exposed to the world in github to program HNS and use
	the feature.

	*** More info about the changes in this PR ***
	(1) For every endpoint available in the system, an HNS Endpoint is added
	    (1.a) for local endpoints, a local HNS Endpoint would already exist, as part of
            container creation.
	    (1.b) For all remote endpoints, a remote HNS Endpoint is created via HNS

	(2) For every Service, a HNS ILB LoadBalancer is added referring the endpoints
	    created in (1)
		Sample Input to HNS:
		{
 	       "Policies":  [
        	                 {
                	             "ExternalPort":  80,
                        	     "InternalPort":  80,
	                             "Protocol":  6,
        	                     "Type":  "ELB",
                	             "VIPs":  [
                        	                  "11.0.98.129"
                                	      ]
	                         }
        	             ],
	        "References":  [
                           "/endpoints/ca8b877b-ab90-499a-bc0e-7d736c425632",
                           "/endpoints/ee0ef08b-8434-4f8b-b748-393884e77465"
        		]
    		}

	(2-a) This is done for Cluster IP, LoadBalancer Ingress IP, NodePort, External IP

	Following the regular service and endpoint updates,
	the HNS is notified of the updates and the system is kept in sync.
2017-09-14 15:50:47 -07:00
Chao Xu
186a0684d5 plumb the proxyTransport to the webhook admission plugin;
set the ServerName in the config for webhook admission plugin.
2017-09-14 15:35:12 -07:00
Kubernetes Submit Queue
8db782cc54 Merge pull request #52382 from spiffxp/test-apps-go-junit-repot
Automatic merge from submit-queue (batch tested with PRs 52376, 52439, 52382, 52358, 52372)

Workaround go-junit-report bug for TestApps

**What this PR does / why we need it**: Fix output from pkg/kubectl/apps/TestApps unit test

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #51253

**Special notes for your reviewer**: Literally copy-pasta of the approach taken in #45320.  Maybe a sign that this should be extracted into something shared. I'm just trying to see if we can make https://k8s-testgrid.appspot.com/kubernetes-presubmits and https://k8s-testgrid.appspot.com/release-master-blocking a little more green for now.

```release-note
NONE
```
2017-09-14 15:27:02 -07:00
Jiaying Zhang
5cac9fc984 Fixes device plugin re-registration handling logic to make sure:
- If a device plugin exits, its exported resource will be removed.
- No capacity change if a new device plugin instance comes up to replace the old instance.
2017-09-14 15:24:46 -07:00
juanvallejo
de21640573 add/update tests
Move negative check for testing "not patched" output to test-cmd-util.sh
as exiting with code 1 was causing patch_test.go to fail when the error
was expected as part of the test.
2017-09-14 16:23:23 -04:00
Jordan Liggitt
f8f57d8959 Use separate client for node status loop 2017-09-14 15:56:22 -04:00
Josh Horwitz
e2ccd5dcad Changes the node cloud controller to use its name for events 2017-09-14 13:41:12 +01:00
zhengjiajin
6c84274113 fix kubectl get cronjob lose age info 2017-09-14 20:11:39 +08:00
zhengjiajin
e62bc0b7c8 fix issue(11233)enhance kubectl config command 2017-09-14 17:14:42 +08:00
tianshapjq
d698404adc typo in annotations 2017-09-14 16:32:28 +08:00
David Porter
a854ddb358 Implement metrics for Windows Nodes
This implements stats for windows nodes in a new package, winstats.
WinStats exports methods to get cadvisor like datastructures, however
with windows specific metrics. WinStats only gets node level metrics and
information, container stats will go via the CRI. This enables the
use of the summary api to get metrics for windows nodes.
2017-09-14 06:32:51 +00:00
m1093782566
d082cfaf1b add ut for pkg/kubectl/autoscale_test.go 2017-09-14 09:15:44 +08:00
Hemant Kumar
066fcf785e Implement support for updating resources 2017-09-13 21:05:44 -04:00
Yu-Ju Hong
2c415cc506 kubelet: enable CRI container metrics 2017-09-13 15:09:35 -07:00
juanvallejo
dfef8574cf error msg fixes
Miscellaneous error message tweaks.
Makes the error message displayed in `kubectl get` more visible.
Returns exit code 1 if a patch fails.
2017-09-13 17:55:29 -04:00
David Zhu
d33274ce64 Updated pd.go tests to use GCE API instead of GCloud Commands 2017-09-13 11:55:18 -07:00
Erik McClenney
b6f23b1eed Add some comments to the version and user-agent changes. 2017-09-13 11:16:36 -07:00
Kubernetes Submit Queue
1c55faf0bb Merge pull request #51387 from alrs/fix-storageos-swallowed-err
Automatic merge from submit-queue

Fix swallowed errors in various volume packages

**What this PR does / why we need it**: Fixes swallowed errors in various volume packages.

**Release note**:
```release-note NONE
```
2017-09-13 11:10:24 -07:00
Kubernetes Submit Queue
a91c8939b7 Merge pull request #51601 from caesarxuchao/minor-test-fix
Automatic merge from submit-queue (batch tested with PRs 51601, 52153, 52364, 52362, 52342)

Minor fixes to validation test

Some test cases confuse the new object with the old object. This PR fixed that. Also added a test to verify that deletionTimestamp cannot be added (via the REST endpoints).
2017-09-13 09:30:01 -07:00
Aaron Crickenberger
eb08dffcb6 Workaround go-junit-report bug for TestApps
Blatant copy-pasta of 83ff8f2
2017-09-13 07:28:36 -07:00
mattjmcnaughton
dc8df2701b Fix incorrect status msg in podautoscaler
Fix #49256

When `ScalingLimited = true` for the `hpa`, there is an accompanying
status message, describing why scaling is limited. Previously if the desired
replica count was 0, and spec.minReplicas > 0, the status message
indicated "the desired replica count was less than the min replica
count". This was particularly confusing when `spec.MinReplicas = 1`. If
there was no `spec.minReplicas`, then the status message indicated "the
desired replica count was zero" which is more informative.

Update the calculation of status message so that if the desired replica
count is 0, we always display the clearer "the desired replica count was
zero" status message, even if spec.minReplicas > 0.

Signed-off-by: mattjmcnaughton <mattjmcnaughton@gmail.com>
2017-09-13 09:27:38 -04:00
Lee Verberne
e2e6a8cd85 Fix typo in kubelet kuberuntime container test
Changes "Expetected" to "Expected"
2017-09-13 14:32:48 +02:00
stewart-yu
7a49841bf1 Add testcase for SelfLink function 2017-09-13 20:20:01 +08:00
Lion-Wei
ed802f1db7 add FlagPersistent flag in nodePort and other situation 2017-09-13 16:03:25 +08:00
andyzhangx
7d4e049ea1 allow windows mount path
fix according to review comments

fix according to review comments
2017-09-13 05:35:56 +00:00
Kubernetes Submit Queue
dc02dfe560 Merge pull request #52301 from tallclair/psp-seccomp
Automatic merge from submit-queue (batch tested with PRs 52339, 52343, 52125, 52360, 52301)

'*' is valid for allowed seccomp profiles

**What this PR does / why we need it**:
This should be valid on a PodSecurityPolicy, but is currently rejected:
```
seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
```

**Which issue this PR fixes**: fixes #52300

```release-note
NONE
```
2017-09-12 21:46:02 -07:00
Kubernetes Submit Queue
c6a9b1e198 Merge pull request #52125 from yujuhong/fix-file-sync
Automatic merge from submit-queue (batch tested with PRs 52339, 52343, 52125, 52360, 52301)

dockershim: check if f.Sync() returns an error and surface it

```release-note
dockershim: check the error when syncing the checkpoint.
```
2017-09-12 21:45:56 -07:00
Kubernetes Submit Queue
5bc9d7b412 Merge pull request #52339 from liggitt/alpha-test
Automatic merge from submit-queue (batch tested with PRs 52339, 52343, 52125, 52360, 52301)

Prevent enabling alpha APIs by default

related to #47691
This is a follow up to #51839 to add a check that we do not enable alpha APIs by default
2017-09-12 21:45:52 -07:00
Balaji Subramaniam
e2e356964a Make CPU manager release allocated CPUs when container enters completed phase. 2017-09-12 21:01:01 -07:00
Di Xu
f8211adf42 print HostPathType for kubectl describe 2017-09-13 11:24:48 +08:00
Kubernetes Submit Queue
b04f81d342 Merge pull request #52344 from smarterclayton/no_log_pull
Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352)

Log at higher verbosity levels some common SyncPod errors

This log message was 90% of all glog.Errorf level statements reported on a production cluster, hiding other more impactful errors. We already log it in start container, but for extra caution we continue to log it at v(3) here (the downside of not logging a start container error is worse than some log spam at higher levels).

HandleError() is intended only for unknown and unexpected errors.

```release-note
NONE
```

@derekwaynecarr @sjenning
2017-09-12 19:40:03 -07:00
Kubernetes Submit Queue
32f1521cc2 Merge pull request #52046 from dashpole/soft_eviction
Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352)

[BugFix] Soft Eviction timer works correctly

fixes #51516

thresholdsMet should not exclude previously met thresholds when we do not have new stats for a threshold.

/assign @vishh @derekwaynecarr 
cc @kubernetes/sig-node-bugs
2017-09-12 19:39:55 -07:00
Shiyang Wang
87abe13022 fix return 0 error in DefaultSubCommandRun 2017-09-13 10:26:51 +08:00
zhengjiajin
23a05f6c23 fix issue(#52244)kubectl describe serviceaccount have redundance null line, we should keep accordance for kubectl describe command 2017-09-13 09:56:26 +08:00
Kubernetes Submit Queue
58b126d98b Merge pull request #52248 from zjj2wry/set-env-list
Automatic merge from submit-queue

fix kubectl set env --list description

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
none
```
2017-09-12 17:43:28 -07:00
Kubernetes Submit Queue
39659ac1dd Merge pull request #51252 from andyzhangx/azuredisk-windows
Automatic merge from submit-queue

Azuredisk mount on windows node

**What this PR does / why we need it**:
This PR will enable azure disk on windows node, customer could create a pod mounted with azure disk on windows node. 
There are a few pending items still left:
1) Current fstype would be forced as NTFS, will change if there is such requirement
2) GetDeviceNameFromMount function is not implemented(empty) because in Linux, we could use "cat /proc/mounts" to read all mounting points in OS easily, but in Windows, there is no such place, I am still figuring out. The empty function would cause a few warning logging, but it will not affect the main logic now.

**Special notes for your reviewer**:
1. This PR depends on https://github.com/kubernetes/kubernetes/pull/51240, which allow windows mount path in config validation
2. There is a bug in docker on windows(https://github.com/moby/moby/issues/34729), the ContainerPath could only be a drive letter now(e.g. D:), dir path would fail in the end.

The example pod with mount path is like below:

```
kind: Pod
apiVersion: v1
metadata:
  name: pod-uses-shared-hdd-5g
  labels:
    name: storage
spec:
  containers:
  - image: microsoft/iis
    name: az-c-01
    volumeMounts:
    - name: blobdisk01
      mountPath: 'F:'
  nodeSelector:
    beta.kubernetes.io/os: windows
  volumes:
  - name: blobdisk01
    persistentVolumeClaim:
      claimName: pv-dd-shared-hdd-5
```

**Release note**:

```release-note
2017-09-12 17:43:13 -07:00
Kubernetes Submit Queue
3ed9191ac7 Merge pull request #52164 from soltysh/set_job_image
Automatic merge from submit-queue

Update set image description to remove job from resources that can update container image

**What this PR does / why we need it**:
This addressed the comment raised in https://github.com/kubernetes/kubernetes/issues/48388#issuecomment-322500960 by @harrissAvalon

**Special notes for your reviewer**:

**Release note**:

```release-note
none
```
2017-09-12 11:44:33 -07:00
Kubernetes Submit Queue
b05d8ad1ec Merge pull request #52338 from PiotrProkop/fix-hugepages
Automatic merge from submit-queue (batch tested with PRs 51041, 52297, 52296, 52335, 52338)

Fix pagesize mount option name

**What this PR does / why we need it**:
Fixes #52337  .
2017-09-12 11:10:18 -07:00