Automatic merge from submit-queue (batch tested with PRs 41921, 41695, 42139, 42090, 41949)
Unify fake runtime helper in kuberuntime, rkt and dockertools.
Addresses https://github.com/kubernetes/kubernetes/pull/42081#issuecomment-282429775.
Add `pkg/kubelet/container/testing/fake_runtime_helper.go`, and change `kuberuntime`, `rkt` and `dockertools` to use it.
@yujuhong This is a small unit test refactoring PR. Could you help me review it?
Automatic merge from submit-queue (batch tested with PRs 41921, 41695, 42139, 42090, 41949)
AWS: Support shared tag `kubernetes.io/cluster/<clusterid>`
We recognize an additional cluster tag:
kubernetes.io/cluster/<clusterid>
This now allows us to share resources, in particular subnets.
In addition, the value is used to track ownership/lifecycle. When we
create objects, we record the value as "owned".
We also refactor out tags into its own file & class, as we are touching
most of these functions anyway.
```release-note
AWS: Support shared tag `kubernetes.io/cluster/<clusterid>`
```
Automatic merge from submit-queue (batch tested with PRs 38676, 41765, 42103, 41833, 41702)
AWS: Skip instances that are taggged as a master
We recognize a few AWS tags, and skip over masters when finding zones
for dynamic volumes. This will fix#34583.
This is not perfect, in that really the scheduler is the only component
that can correctly choose the zone, but should address the common
problem.
```release-note
AWS: Do not consider master instance zones for dynamic volume creation
```
Automatic merge from submit-queue (batch tested with PRs 38676, 41765, 42103, 41833, 41702)
kubelet: cm: refactor QoS logic into seperate interface
This commit has no functional change. It refactors the QoS cgroup logic into a new `QOSContainerManager` interface to allow for better isolation for QoS cgroup features coming down the pike.
This is a breakout of the refactoring component of my QoS memory limits PR https://github.com/kubernetes/kubernetes/pull/41149 which will need to be rebased on top of this.
@vishh @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 38676, 41765, 42103, 41833, 41702)
Support --context flag completion for kubectl
**What this PR does / why we need it**:
With this PR, `--context` flag completion is supported for kubectl.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 41597, 42185, 42075, 42178, 41705)
Honor output formats in kubectl patch
Currently, output formats other than `-o name` are only honored when in `--local` mode.
This PR also prints the result from the server when in regular mode
Automatic merge from submit-queue (batch tested with PRs 41597, 42185, 42075, 42178, 41705)
force rbd image unlock if the image is not used
**What this PR does / why we need it**:
Ceph RBD image could be locked if the host that holds the lock is down. In such case, the image cannot be used by other Pods.
The fix is to detect the orphaned locks and force unlock.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#31790
**Special notes for your reviewer**:
Note, previously, RBD volume plugin maps the image, mount it, and create a lock on the image. Since the proposed fix uses `rbd status` output to determine if the image is being used, the sequence has to change to: rbd lock checking (through `rbd lock list`), mapping check (through `rbd status`), forced unlock if necessary (through `rbd lock rm`), image lock, image mapping, and mount.
**Release note**:
```release-note
force unlock rbd image if the image is not used
```
Automatic merge from submit-queue (batch tested with PRs 42162, 41973, 42015, 42115, 41923)
kubectl drain: make code reusable
DrainOptions requires a few fields to be set, and the expectation is
that these are set as part of construction of the object. If they are
set, then the drain code can be reused in other kubernetes projects.
This does not create a contract that DrainOptions should fulfill going
forwards, any more than any of the other types that happen to be exposed
are part of the contract. Instead, this merely makes use outside the
package possible.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42162, 41973, 42015, 42115, 41923)
Increase Min Timeout for kill pod
Should mitigate #41347, which describes flakes in the inode eviction test due to "GracePeriodExceeded" errors.
When we use gracePeriod == 0, as we do in eviction, the pod worker currently sets a timeout of 2 seconds to kill a pod.
We are hitting this timeout fairly often during eviction tests, causing extra pods to be evicted (since the eviction manager "fails" to evict that pod, and kills the next one).
This PR increases the timeout from 2 seconds to 4, although we could increase it even more if we think that would be appropriate.
cc @yujuhong @vishh @derekwaynecarr
Automatic merge from submit-queue
fix kubectl describe pod, show tolerationSeconds
**What this PR does / why we need it**:
tolerationSeconds is now not shown in kubectl describe resutl, this PR is to fix it.
With this fix, pod toleration with tolerationSeconds would like below:
```yaml
Name: bar
Namespace: foo
Node: /
Labels: <none>
Status:
IP:
Controllers: <none>
Containers: <none>
No volumes.
QoS Class:
Node-Selectors: <none>
Tolerations: key1=value1
key2=value2:NoSchedule
key3=value3:NoExecute for 300s
```
**Which issue this PR fixes** :
Related issue: #1574
Related PR: #39469
**Special notes for your reviewer**:
**Release note**:
```release-note
make kubectl describe pod show tolerationSeconds
```
Automatic merge from submit-queue
Don't filter items when resources requested by name
Add tracking on resource.Builder if a "named" item is requested (from
file, stream, url, or resource args) and use that in `get` to accurately
determine whether to filter resources. Add tests.
Fixes#41150, #40492
```release-note
Completed pods should not be hidden when requested by name via `kubectl get`.
```
Automatic merge from submit-queue (batch tested with PRs 42200, 39535, 41708, 41487, 41335)
Update kube-proxy support for Windows
**What this PR does / why we need it**:
The kube-proxy is built upon the sophisticated iptables NAT rules. Windows does not have an equivalent capability. This introduces a change to the architecture of the user space mode of the Windows version of kube-proxy to match the capabilities of Windows.
The proxy is organized around service ports and portals. For each service a service port is created and then a portal, or iptables NAT rule, is opened for each service ip, external ip, node port, and ingress ip. This PR merges the service port and portal into a single concept of a "ServicePortPortal" where there is one connection opened for each of service IP, external ip, node port, and ingress IP.
This PR only affects the Windows kube-proxy. It is important for the Windows kube-proxy because it removes the limited portproxy rule and RRAS service and enables full tcp/udp capability to services.
**Special notes for your reviewer**:
**Release note**:
```
Add tcp/udp userspace proxy support for Windows.
```
Add tracking on resource.Builder if a "named" item is requested (from
file, stream, url, or resource args) and use that in `get` to accurately
determine whether to filter resources. Add tests.
- Add a new type PortworxVolumeSource
- Implement the kubernetes volume plugin for Portworx Volumes under pkg/volume/portworx
- The Portworx Volume Driver uses the libopenstorage/openstorage specifications and apis for volume operations.
Changes for k8s configuration and examples for portworx volumes.
- Add PortworxVolume hooks in kubectl, kube-controller-manager and validation.
- Add a README for PortworxVolume usage as PVs, PVCs and StorageClass.
- Add example spec files
Handle code review comments.
- Modified READMEs to incorporate to suggestions.
- Add a test for ReadWriteMany access mode.
- Use util.UnmountPath in TearDown.
- Add ReadOnly flag to PortworxVolumeSource
- Use hostname:port instead of unix sockets
- Delete the mount dir in TearDown.
- Fix link issue in persistentvolumes README
- In unit test check for mountpath after Setup is done.
- Add PVC Claim Name as a Portworx Volume Label
Generated code and documentation.
- Updated swagger spec
- Updated api-reference docs
- Updated generated code under pkg/api/v1
Godeps update for Portworx Volume Driver
- Adds github.com/libopenstorage/openstorage
- Adds go.pedge.io/pb/go/google/protobuf
- Updates Godep Licenses
Automatic merge from submit-queue (batch tested with PRs 35094, 42095, 42059, 42143, 41944)
Use chroot for containerized mounts
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
Automatic merge from submit-queue (batch tested with PRs 35094, 42095, 42059, 42143, 41944)
Fix azure file secret reference
Follow up to https://github.com/kubernetes/kubernetes/pull/41957
Fixes nil dereference getting secret name from AzureFile volume source.
Adds unit tests to make sure all secret references are extracted correctly, and adds reflective tests to help catch drift if new secret references are added to the pod spec
Automatic merge from submit-queue (batch tested with PRs 35094, 42095, 42059, 42143, 41944)
client-gen: create private registry in fake clientset
This cuts of the laster `k8s.io/kubernetes/pkg/api.{Registry+Scheme+Codecs}` dependency from the clientsets. This enables clientset generation for packages that must not have a dependency onto kubernetes itself.
@deads2k there is more than the namespace checking we discussed: the RESTMapper built from the registry. This introduces a private registry. I try get that out from the normal versioned client as much as possible. I would even like to remove this private registry some day, at best remove all registry code from the client. But that's for another day...
Automatic merge from submit-queue (batch tested with PRs 40746, 41699, 42108, 42174, 42093)
Switch kube-proxy to informers & save 2/3 of cpu & memory of non-iptables related code.
Fix#42000
This PR should be no-op from the behavior perspective.
It is changing KubeProxy to use standard "informer" framework instead of combination of reflector + undelta store.
This is significantly reducing CPU usage of kube-proxy and number of memory allocations.
Previously, on every endpoints/service update, we were copying __all__ endpoints/services at least 3 times, now it is once (which should also be removed in the future).
In Kubemark-500, hollow-proxies were processing backlog from load test for an hour after the test was finishing. With this change, it is keeping up with the load.
@thockin @ncdc @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 40746, 41699, 42108, 42174, 42093)
Output result of apply operation
Fixes#41690
Plumbs the resulting object from patch operations back to the top level so it can be output when printing
Automatic merge from submit-queue
numeric ordering of kubectl outputs
**What this PR does / why we need it**:
Instead of having kubectl listing the pods in a alphabetical way:
foobar-1-build
foobar-10-build
foobar-2-build
foobar-3-build
With the parameter --sort-by '{.metadata.name}' it now gives:
foobar-1-build
foobar-2-build
foobar-3-build
foobar-10-build
**Which issue this PR fixes**
https://github.com/openshift/origin/issues/7229
**Special notes for your reviewer**:
I have followed the dependencies requirements from https://github.com/kubernetes/community/blob/master/contributors/devel/godep.md
**Release note**:
```release-note
Import a natural sorting library and use it in the sorting printer.
```
Automatic merge from submit-queue
clean up generic apiserver options
Clean up generic apiserver options before we tag any levels. This makes them more in-line with "normal" api servers running on the platform.
Also remove dead example code.
@sttts
Automatic merge from submit-queue
Reserve kubernetes.io and k8s.io namespace for flex volume options
Split from https://github.com/kubernetes/kubernetes/pull/39488.
Flex volume already stuffs system information into the options map, and assumes it is free to do so:
```
optionFSType = "kubernetes.io/fsType"
optionReadWrite = "kubernetes.io/readwrite"
optionKeySecret = "kubernetes.io/secret"
```
this formalizes that by reserving the `kubernetes.io` and `k8s.io` namespaces so that user-specified options are never stomped by the system, and flex plugins can know that options with those namespaces came from the system, not user-options.
```release-note
Parameter keys in a StorageClass `parameters` map may not use the `kubernetes.io` or `k8s.io` namespaces.
```
Automatic merge from submit-queue (batch tested with PRs 41937, 41151, 42092, 40269, 42135)
Add a unit test for idempotent applys to the TPR entries.
The test in apply_test follows the general pattern of other tests.
We load from a file in test/fixtures and mock the API server in the
function closure in the HttpClient call.
The apply operation expects a last-modified-configuration annotation.
That is written verbatim in the test/fixture file.
References #40841
**What this PR does / why we need it**:
Adds one unit test for TPR's using applies.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
References:
https://github.com/kubernetes/features/issues/95https://github.com/kubernetes/kubernetes/issues/40841#issue-204769102
**Special notes for your reviewer**:
I am not super proud of the tpr-entry name.
But I feel like we need to call the two objects differently.
The one which has Kind:ThirdPartyResource
and the one has Kind:Foo.
Is the name "ThirdPartyResource" used interchangeably for both ? I used tpr-entry for the Kind:Foo object.
Also I !assume! this is testing an idempotent apply because the last-applied-configuration annotation is the same as the object itself.
This is the state I see in the logs of kubectl if I do a proper idempotent apply of a third party resource entry.
I guess I will know more once I start playing around with apply command that change TPR objects.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 41234, 42186, 41615, 42028, 41788)
apimachinery: handle duplicated and conflicting type registration
Double registrations were leading to duplications in `KnownKinds()`. Conflicting registrations with same gvk, but different types were not detected.
Automatic merge from submit-queue (batch tested with PRs 41234, 42186, 41615, 42028, 41788)
Make DaemonSet respect critical pods annotation when scheduling
**What this PR does / why we need it**: #41612
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#41612
**Special notes for your reviewer**:
**Release note**:
```release-note
Make DaemonSet respect critical pods annotation when scheduling.
```
cc @kubernetes/sig-apps-feature-requests @erictune @vishh @liggitt @kargakis @lukaszo @piosz @davidopp
Automatic merge from submit-queue
Enforce Node Allocatable via cgroups
This PR enforces node allocatable across all pods using a top level cgroup as described in https://github.com/kubernetes/community/pull/348
This PR also provides an option to enforce `kubeReserved` and `systemReserved` on user specified cgroups.
This PR will by default make kubelet create top level cgroups even if `kubeReserved` and `systemReserved` is not specified and hence `Allocatable = Capacity`.
```release-note
New Kubelet flag `--enforce-node-allocatable` with a default value of `pods` is added which will make kubelet create a top level cgroup for all pods to enforce Node Allocatable. Optionally, `system-reserved` & `kube-reserved` values can also be specified separated by comma to enforce node allocatable on cgroups specified via `--system-reserved-cgroup` & `--kube-reserved-cgroup` respectively. Note the default value of the latter flags are "".
This feature requires a **Node Drain** prior to upgrade failing which pods will be restarted if possible or terminated if they have a `RestartNever` policy.
```
cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-node-feature-requests
TODO:
- [x] Adjust effective Node Allocatable to subtract hard eviction thresholds
- [x] Add unit tests
- [x] Complete pending e2e tests
- [x] Manual testing
- [x] Get the proposal merged
@dashpole is working on adding support for evictions for enforcing Node allocatable more gracefully. That work will show up in a subsequent PR for v1.6
Automatic merge from submit-queue (batch tested with PRs 41205, 42196, 42068, 41588, 41271)
[CRI] enable kubenet traffic shaping
ref: https://github.com/kubernetes/kubernetes/issues/37316
Another way to do this is to expose another interface in network host to allow network plugins to retrieve annotation. But that seems unnecessary and more complicated.
We recognize an additional cluster tag:
kubernetes.io/cluster/<clusterid>
This now allows us to share resources, in particular subnets.
In addition, the value is used to track ownership/lifecycle. When we
create objects, we record the value as "owned".
We also refactor out tags into its own file & class, as we are touching
most of these functions anyway.
Automatic merge from submit-queue (batch tested with PRs 42053, 41282, 42056, 41663, 40927)
Allow getting logs directly from deployment, job and statefulset
**Special notes for your reviewer**:
@smarterclayton you asked for it in OpenShift
```release-note
kubectl logs allows getting logs directly from deployment, job and statefulset
```