Automatic merge from submit-queue (batch tested with PRs 56021, 55843, 55088, 56117, 55859). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Changes nvidia-gpu device plugin addon config settings:
- Runs as system critical pod
- Makes resource limits to match its resource requets
- Modifies test/e2e/scheduling/nvidia-gpus.go to cope with the recent
change of running the device plugin as a system addon.
- The resource settings of the addon is based on the test results
from 8 nvidia-tesla-k80 gpus.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 56021, 55843, 55088, 56117, 55859). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Extends deviceplugin to gracefully handle full device plugin lifecycle.
**What this PR does / why we need it**:
- Instead of using cm.capacity field to communicate device plugin resource capacity,
this PR changes to use an explicit cm.GetDevicePluginResourceCapacity() function
that returns device plugin resource capacity as well as any inactive device plugin resource.
Kubelet syncNodeStatus call this function during its periodic run to update node status
capacity and allocatable. After this call, device plugin can remove the inactive device
plugin resource from its allDevices field as the update is already pushed to API server.
- Extends device plugin checkpoint data to record registered resources
so that we can finish resource removing even upon kubelet restarts.
- Passes sourcesReady from kubelet to device plugin to avoid removing
inactive pods during grace period of kubelet restart.
- Extends gpu_device_plugin e2e_node test to verify that scheduled pods
can continue to run even after device plugin deletion and kubelet
restarts.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Together with https://github.com/kubernetes/kubernetes/pull/54488, fixes https://github.com/kubernetes/kubernetes/issues/53395
**Special notes for your reviewer**:
**Release note**:
```release-note
Extends deviceplugin to gracefully handle full device plugin lifecycle.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixes issue where PVCs using `standard` StorageClass create PDs in disks in wrong zone in multi-zone GKE clusters
Fixes#50115
Changed GetAllZones to only get zones with nodes that are currently running (renamed to GetAllCurrentZones). Added E2E test to confirm this behavior.
Automatic merge from submit-queue (batch tested with PRs 55112, 56029, 55740, 56095, 55845). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Updating vsphere cloud provider to support k8s cluster spread across multiple vCenters
**What this PR does / why we need it**:
vSphere cloud provider in Kubernetes 1.8 was designed to work only if all the nodes of the cluster are in one single datacenter folder. This is a hard restriction that makes the cluster not span across different folders/datacenter/vCenters. Users have use-cases to span the cluster across datacenters/vCenters.
**Which issue(s) this PR fixes**
Fixes # https://github.com/vmware/kubernetes/issues/255
**Special notes for your reviewer**:
This is a change purely in vsphere cloud provider and no changes in kubernetes core are needed.
**Release note**:
```release-note
With this change
- User should be able to create k8s cluster which spans across multiple ESXi clusters, datacenters or even vCenters.
- vSphere cloud provider (VCP) uses OS hostname and not vSphere Inventory VM Name.
That means, now VCP can handle cases where user changes VM inventory name.
- VCP can handle cases where VM migrates to other ESXi cluster or datacenter or vCenter.
The only requirement is the shared storage. VCP needs shared storage on all Node VMs.
```
Internally tested and reviewed the code.
@tthole, @shaominchen, @abrarshivani
Automatic merge from submit-queue (batch tested with PRs 54824, 55911, 55730, 55979, 55961). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add options for mounting SCSI or NVMe local SSD though Block or Filesystem and do all of that with UUID
Fixes: #51431
Fixed version of: #53466
Mount SCSI local SSD by UUID in /mnt/disks/by-uuid/, also allows for users to request and mount NVMe disks. Both types of disks will be accessible either through block or file-system.
I have confirmed that it is no longer crashing when nodes are initialized on GKE.
- Runs as system critical pod
- Makes resource limits to match its resource requets
- Modifies test/e2e/scheduling/nvidia-gpus.go to cope with the recent
change of running the device plugin as a system addon.
- The resource settings of the addon is based on the test results
from 8 nvidia-tesla-k80 gpus.
This PR adds pod-level ephemeral storage metric into Summary API.
Pod-level ephemeral storage usage is the sum of all containers and local
ephemeral volume including EmptyDir (if not backed up by memory or
hugepages), configueMap, and downwardAPI.
running (renamed to GetAllCurrentZones). Added E2E test to confirm this
behavior.
Added node informer to cloud-provider controller to keep track of zones
with k8s nodes in them.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix potential unexpected object mutation that can lead to data races
**What this PR does / why we need it**:
In #51526 I introduced an optimization - do a deep copy instead of to and from JSON roundtrip to convert anything that implements `runtime.Unstructured`. I just discovered that the method that is used there `UnstructuredContent()` in both `Unstructured` and `UnstructuredList` may mutate the original object.
2008750398/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured/unstructured.go (L87-L92)7c10cbc642/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured/unstructured_list.go (L58-L75)
This is problematic because previously (before #51526) there was no mutation and because this is unexpected and may lead to data races - it is bad behaviour to mutate original object when you just want a copy of it.
This PR fixes the issue.
Without the fix the tests I've added are failing because when comparison is done original object is not the same:
```
converter_test.go:154: Object changed, diff:
object.Object[items]:
a: []interface {}{}
b: <nil>
converter_test.go:154: Object changed, diff:
object.Object[items]:
a: []interface {}{map[string]interface {}{"kind":"Pod"}}
b: <nil>
```
However the underlying issue is not fixed here - `UnstructuredContent()` is brittle and dangerous. Method name does not imply that it mutates data when you call it. And godoc does not mention that either:
509df603b1/staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go (L233-L249)
Something needs to be done about it IMO.
Also `UnstructuredContent()` implementation in `UnstructuredList` does not implement the behaviour required by godoc in `runtime.Unstructured`.
**Release note**:
```release-note
NONE
```
/kind bug
/sig api-machinery
/assign @sttts
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
NetworkPolicy e2e: named port egress test
**What this PR does / why we need it**:
Add an e2e NetworkPolicy test that ensures that an egress rule that specifies a named port properly applies to egress traffic.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52040
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 54837, 55970, 55912, 55898, 52977). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix Flaky Allocatable Setup Tests
**What this PR does / why we need it**:
Fixes a flaky node e2e serial test.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#55830
**Special notes for your reviewer**:
The test was flaking because we were reading the node status before the restarted kubelet had written it.
This fixes this by waiting until we see an updated node status (looking at the condition's heartbeat time).
This also fixes an incorrect error message.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 54837, 55970, 55912, 55898, 52977). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use framework.ConformanceIt for node e2e conformance tests
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
ref #54726#53909
**Special notes for your reviewer**:
/cc @mml
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 55642, 55897, 55835, 55496, 55313). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Table printers and server generation should always copy ListMeta
Tables should be a mapping from lists, so if the incoming object has these add them to the table. Paging over server side tables was broken without this. Add tests on the generic creater and on the resttest compatibility.
@deads2k
Automatic merge from submit-queue (batch tested with PRs 55642, 55897, 55835, 55496, 55313). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disable container disk metrics when using the CRI stats integration
Issue: https://github.com/kubernetes/kubernetes/issues/51798
As explained in the issue, runtimes which make use of the CRI Stats API still have the performance overhead of collecting those same stats through cAdvisor.
The CRI Stats API has metrics for CPU, Memory, and Disk. This PR significantly reduces the added overhead due to collecting these stats in both cAdvisor and in the runtime.
This PR disables container disk metrics, which are very expensive to collect.
This PR does not disable node-level disk stats, as the "Raw" container handler does not currently respect ignoring DiskUsageMetrics.
This PR factors out the logic for determining whether or not to use the CRI stats provider into a helper function, as cAdvisor is instantiated before it is passed to the kubelet as a dependency.
cc @kubernetes/sig-node-pr-reviews @derekwaynecarr
/kind feature
/sig node
/assign @Random-Liu @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 54556, 55379, 55881, 55891, 55705). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adds node auto-repair e2e tests.
This PR adds node auto-repair e2e tests.
- vsphere.conf (cloud-config) is now needed only on master node
- VCP uses OS hostname and not vSphere inventory name
- VCP is now resilient to VM inventory name change and VM migration
AdmissionResponse allows mutating webhook to send apiserver a json patch
to mutate the object.
This reflects the imperative nature of AdmissionReview. It adds
AdmissionRequest and AdmissionResponse in place of status/spec.
The AdmissionResponse the allows the mutating webhook
to send back a json path with the mutated version of the requested
object.
Fixed the integration test to clean up properly.
Switched test image to 1.8v5 to reflect API changes.
Make sure to cache test framework client for cleaup test code.
Switched to pointer for patch type.
Factored in @liggitt's feedback.
Factored in @lavalamp's feedback.
Automatic merge from submit-queue (batch tested with PRs 55254, 55525, 50108, 54674, 55263). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Correct clean up actions in e2e tests
**What this PR does / why we need it**:
Remove the duplicate "cleanup action" code in `test/e2e/e2e.go`, and use the clean up code in test/e2e/framework instead.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#55505
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Chunking is now beta and on by default. The kops job is still using
etcd2 which does not support chunking, so flag the test as skipped until
kops is updated to a supported etcd version.
Tables should be a mapping from lists, so if the incoming object has
these add them to the table. Allows paging over server side tables.
Add tests on the generic creater and on the resttest compatibility.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Lift embedded structure out of eviction-related KubeletConfiguration fields
- Changes the following KubeletConfiguration fields from `string` to
`map[string]string`:
- `EvictionHard`
- `EvictionSoft`
- `EvictionSoftGracePeriod`
- `EvictionMinimumReclaim`
- Adds flag parsing shims to maintain Kubelet's public flags API, while
enabling structured input in the file API.
- Also removes `kubeletconfig.ConfigurationMap`, which was an ad-hoc flag
parsing shim living in the kubeletconfig API group, and replaces it
with the `MapStringString` shim introduced in this PR. Flag parsing
shims belong in a common place, not in the kubeletconfig API.
I manually audited these to ensure that this wouldn't cause errors
parsing the command line for syntax that would have previously been
error free (`kubeletconfig.ConfigurationMap` was unique in that it
allowed keys to be provided on the CLI without values. I believe this was
done in `flags.ConfigurationMap` to facilitate the `--node-labels` flag,
which rightfully accepts value-free keys, and that this shim was then
just copied to `kubeletconfig`). Fortunately, the affected fields
(`ExperimentalQOSReserved`, `SystemReserved`, and `KubeReserved`) expect
non-empty strings in the values of the map, and as a result passing the
empty string is already an error. Thus requiring keys shouldn't break
anyone's scripts.
- Updates code and tests accordingly.
Regarding eviction operators, directionality is already implicit in the
signal type (for a given signal, the decision to evict will be made when
crossing the threshold from either above or below, never both). There is
no need to expose an operator, such as `<`, in the API. By changing
`EvictionHard` and `EvictionSoft` to `map[string]string`, this PR
simplifies the experience of working with these fields via the
`KubeletConfiguration` type. Again, flags stay the same.
Other things:
- There is another flag parsing shim, `flags.ConfigurationMap`, from the
shared flag utility. The `NodeLabels` field still uses
`flags.ConfigurationMap`. This PR moves the allocation of the
`map[string]string` for the `NodeLabels` field from
`AddKubeletConfigFlags` to the defaulter for the external
`KubeletConfiguration` type. Flags are layered on top of an internal
object that has undergone conversion from a defaulted external object,
which means that previously the mere registration of flags would have
overwritten any previously-defined defaults for `NodeLabels` (fortunately
there were none).
Related: #53833 (lifting embedded structures out of string fields is part of getting this API to beta)
```release-note
The EvictionHard, EvictionSoft, EvictionSoftGracePeriod, EvictionMinimumReclaim, SystemReserved, and KubeReserved fields in the KubeletConfiguration object (kubeletconfig/v1alpha1) are now of type map[string]string, which facilitates writing JSON and YAML files.
```