Automatic merge from submit-queue (batch tested with PRs 54824, 55911, 55730, 55979, 55961). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add options for mounting SCSI or NVMe local SSD though Block or Filesystem and do all of that with UUID
Fixes: #51431
Fixed version of: #53466
Mount SCSI local SSD by UUID in /mnt/disks/by-uuid/, also allows for users to request and mount NVMe disks. Both types of disks will be accessible either through block or file-system.
I have confirmed that it is no longer crashing when nodes are initialized on GKE.
Automatic merge from submit-queue (batch tested with PRs 55615, 56010, 55990). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Master now supports staged upgrades.
It will wait until specifically told to upgrade with an action unless the configuration option require-manual-upgrade is false and then master nodes will upgrade immediately.
**What this PR does / why we need it**:
This update alters the kubernetes-master upgrade path for juju charms. It makes the master act like the worker in that it blocks the upgrade until each unit is specifically requested to update.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
PR for tests coming momentarily to juju-solutions/kubernetes-jenkins
**Release note**:
```release-note
Upgrading the kubernetes-master units now results in staged upgrades just like the kubernetes-worker nodes. Use the upgrade action in order to continue the upgrade process on each unit such as `juju run-action kubernetes-master/0 upgrade`
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump Heapster version to 1.5.0-beta.1
**What this PR does / why we need it**:
Bumps Heapster version to 1.5.0-beta.1
**Which issue(s) this PR fixes**:
Fixes#54962
**Special notes for your reviewer**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add kawych to cluster-monitoring deployment owners
**What this PR does / why we need it**:
Add kawych to cluster-monitoring deployment owners
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 55841, 55948, 55945). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Set -w flag on all iptables calls during master startup
Lack of this flag sometimes causes iptables to return error code 4 (if
other process holds xtables lock). As a result, because of `set -o errexit`,
whole startup script fails, leaving master in an incorrect state.
This is another occurence of (already closed) https://github.com/kubernetes/kubernetes/issues/7370
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
Bugfix: master startup script on GCP no longer fails randomly due to concurrent iptables invocations.
```
Lack of this flag sometimes causes iptables to return error code 4 (if
other process holds xtables lock). As a result, because of `set -o errexit`,
whole startup script fails, leaving master in an incorrect state.
This is another occurence of (already closed) https://github.com/kubernetes/kubernetes/issues/7370
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump addon manager version used to 6.5
**What this PR does / why we need it**:
Bump addon manager version to use #55466. This adds leader election-like mechanism to addon manager.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
Release note copied from #55466. This is intended to be cherrypicked into 1.7 and 1.8 branches.
**Release note**:
```release-note
Addon manager supports HA masters.
```
Automatic merge from submit-queue (batch tested with PRs 55682, 55444, 55456, 55717, 55131). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not crash on empty NODE_NAMES array.
**Which issue(s) this PR fixes**:
Fixes#55675
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add options for mounting SCSI or NVMe local SSD though Block or Filesystem and do all of that with UUID
Fixes: #51431
Mount SCSI local SSD by UUID in /mnt/disks/by-uuid/, also allows for users to request and mount NVMe disks. Both types of disks will be accessable either through block or filesystem
To see code in progress for NVMe and block support see working branch: https://github.com/davidz627/kubernetes/tree/localExt
Automatic merge from submit-queue (batch tested with PRs 55648, 55274, 54982, 51955, 55639). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Swap NetworkPolicy storage to networking.k8s.io/v1
Finishes(?) the NetworkPolicy v1 migration.
Fixes#50604
The integration test passes. I copied the test-update-storage-objects.sh change from #50327 and have no idea if it's right.
/cc @sttts @caesarxuchao @thockin
**Release note**:
```release-note
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix prometheus-to-sd sidecar in metadata proxy
Ref https://github.com/kubernetes/kubernetes/issues/55695#issuecomment-344300188
This is making 2 changes:
- restoring resource requests and limits of the metadata-proxy sidecar as it was before, and remove them for prom-to-sd sidecar (best effort) like at everywhere else
- pass pod name and namespace args to prom-to-sd sidecar (because just noticed)
/cc @ihmccreery @loburm @crassirostris - Does this make sense?
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use results of kube-controller-manager leader election in addon manager
**What this PR does / why we need it**:
This adds leader election-like mechanism to addon manager. Currently, in a multi-master setup, upgrading one master will trigger a fight between addon managers on different masters, each forcing its own versions of addons. This leads to pod unavailability until all masters are upgraded to new version.
To avoid implementing leader election in bash, results of leader election in kube-controller-manager are used. Long term, addon manager probably should be rewritten in a real prgramming language (probably Go), and then, real leader election should be implemented there.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
I don't think there was an issue for this specifically, but this PR is related to https://github.com/kubernetes/kubernetes/issues/473
**Special notes for your reviewer**:
**Release note**:
```release-note
Addon manager supports HA masters.
```
Automatic merge from submit-queue (batch tested with PRs 54602, 54877, 55243, 55509, 55128). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
PodSecurityPolicies for addons
**What this PR does / why we need it**:
1. Colocate addon PodSecurityPolicy config with the addons (in a `podsecuritypolicies` subdirectory).
2. Add policies for addons that are currently missing policies (not in the default GCE suite)
3. Remove HostPath SSL certs from several heapster deployments, so that heapster doesn't require a special PSP
**Which issue(s) this PR fixes**:
#43538
**Release note**:
```release-note
- Add PodSecurityPolicies for cluster addons
- Remove SSL cert HostPath volumes from heapster addons
```
Automatic merge from submit-queue (batch tested with PRs 54602, 54877, 55243, 55509, 55128). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add some error handling in place of ilusory one.
**What this PR does / why we need it**:
TL;DR: "set -e" is ignored inside function foo when it's called like
"foo || something".
See https://github.com/kubernetes/kubernetes/issues/55229 for details.
This is a short-term hack that will hopefully let us at least see the
error messages whenever we hit intermittent certificate setup errors
next time. Once we know what fails there, we can start working on an
actual fix, which may very well involve rewriting this in a language
other than shell, with better error handling.
**Which issue(s) this PR fixes**
Partially addresses #55229
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 54826, 53576, 55591, 54946, 54825). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached
- Instead of the old `Accelerators` feature that added `alpha.kubernetes.io/nvidia-gpu` resource, use the new `DevicePlugins` feature that adds vendor specific resources. (In case of nvidia GPUs it will
add `nvidia.com/gpu` resource.)
- Add node label to GCE nodes with accelerators attached. This node label is the same as what GKE attaches to node pools with accelerators attached. (For example, for nvidia-tesla-p100 GPU, the label would be `cloud.google.com/gke-accelerator=nvidia-tesla-p100`) This will help us target accelerator specific
daemonsets etc. to these nodes.
- Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached.
- Some minor documentation improvements in addon manager.
**Release note**:
```release-note
GCE nodes with NVIDIA GPUs attached now expose `nvidia.com/gpu` as a resource instead of `alpha.kubernetes.io/nvidia-gpu`.
```
/sig cluster-lifecycle
/sig scheduling
/area hw-accelerators
https://github.com/kubernetes/features/issues/368
Automatic merge from submit-queue (batch tested with PRs 55283, 55461, 55288, 53970, 55487). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Support collecting log for alternative container runtime in e2e test.
Fixes https://github.com/kubernetes/kubernetes/issues/55629.
Add support to collect logs for alternative container runtime in e2e.
Example for `cri-containerd`:
```
$ go run hack/e2e.go -- --test -v --test_args="--report-dir=$PWD --container-runtime-services=cri-containerd,containerd,cri-containerd-installation"
```
```release-note
none
```
/cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-testing-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 53337, 55465, 55512, 55522, 54554). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Allow configuring docker storage driver in GCE
**What this PR does / why we need it**:
For GCE, allow configuring of the docker storage driver.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
GCE: Provide an option to configure the docker storage driver.
```
Automatic merge from submit-queue (batch tested with PRs 54460, 55258, 54858, 55506, 55510). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix CRI fluentd config.
This should fix the cri-containerd stackdriver test failure:
```
Cluster level logging implemented by Stackdriver should ingest logs
```
I copied the pattern from a comment previously. However, it doesn't actually work properly. `\b` only matches word boundary, and seems to match the boundary of previous word in our case.
That's why we get the log with a leading space:
```
Nov 10 18:39:11.661: INFO: Unexpected error occurred: log entry ingested incorrectly, got --> <--I0101 00:00:00.000000 1 main.go:1] Text, want Text
```
@kubernetes/sig-node-bugs @kubernetes/sig-instrumentation-bugs
Signed-off-by: Lantao Liu <lantaol@google.com>
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 54987, 55221, 54099, 55144, 54215). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
extracted elasticsearch-logging service name as environment variable
**What this PR does / why we need it**:
Deploying the cluster-addon fluentd-elasticsearch with customized resource definitions can cause elasticsearch discovery to fail because the service name `elasticsearch-logging` is hard-coded in cluster/addons/fluentd-elasticsearch/es-image/elasticsearch_logging_discovery.go
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
-> none yet
**Special notes for your reviewer**:
The name of the environment variable is ELASTICSEARCH_SERVICE_NAME. When non is given the fallback service-name fallback is `elasticsearch-logging`
```release-note
[fluentd-elasticsearch addon] Elasticsearch service name can be overridden via env variable ELASTICSEARCH_SERVICE_NAME
```
Automatic merge from submit-queue (batch tested with PRs 54987, 55221, 54099, 55144, 54215). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Increase waiting time (120s) for docker startup in health-monitor.sh
Fix the issue of killing docker again when startup takes longer time on overloaded nodes.
Automatic merge from submit-queue (batch tested with PRs 53047, 54861, 55413, 55395, 55308). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Delete network by default in kube-down unless using default network
Since I'm seeing folks leak networks in one of our test project (k8s-scale-testing) if they're using kube-up to create/delete their network.
I guess we're not having this problem for config-test.sh where we're mostly creating new network.
/cc @ixdy @zmerlynn
/release-note-none