Automatic merge from submit-queue (batch tested with PRs 48812, 48276)
Change fluentd-gcp monitoring to use metrics exposed by SD plugin
Following https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/pull/135, make fluentd-gcp expose metrics in Prometheus registry and use them instead of counting records in the pipeline.
/cc @piosz @igorpeshansky
```release-note
Fluentd-gcp DaemonSet exposes different set of metrics.
```
Automatic merge from submit-queue (batch tested with PRs 48864, 48651, 47703)
Enable logexporter mechanism to dump logs from k8s nodes to GCS directly
Ref https://github.com/kubernetes/kubernetes/issues/48513
This adds support for logexporter from k8s side. Next I'll send a PR adding support from test-infra side.
/cc @kubernetes/sig-scalability-misc @kubernetes/test-infra-maintainers @fejta @wojtek-t @gmarek
Automatic merge from submit-queue
Fixed cluster validation for multizonal clusters.
Fixed cluster validation for multizonal clusters.
This should fix HA master e2e tests.
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46748, 48826)
Added `CriticalAddonsOnly` toleration for npd.
**What this PR does / why we need it**:
We should add `CriticalAddonsOnly` toleration to make sure the daemonset can be scheduled on the node even if already planned to run critical pod.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47015
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue
Properly nest code blocks
**What this PR does / why we need it**:
Markdown code blocks are adjusted to better display on GitHub. See [rendered](c3fbec7663/cluster/addons/cluster-loadbalancing/glbc/README.md) version.
**Release note**:
```release-note
Adjust markdown code block in README for Google Load Balancer addon.
```
Automatic merge from submit-queue
Update docs for user-guide
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48781, 48817, 48830, 48829, 48053)
Fix yaml-quote typo
Caught this looking through CI logs.
/assign wojtek-t
Automatic merge from submit-queue (batch tested with PRs 48279, 48566, 48319, 48794, 47952)
Add prometheus plugin on fluentd image.
**What this PR does / why we need it**:
This PR adds the prometheus plugin on Fluentd.
**Special notes for your reviewer**:
The plugin used was: https://github.com/kazegusuri/fluent-plugin-prometheus, on the latest stable version.
All configs used are default.
**Release note**:
```release-note
Fluentd-es addon now exposes a /metrics endpoint for monitoring on port 24231.
```
Automatic merge from submit-queue
Use Container-optimzed OS images for nodes by default
Part of the deprecation of the debian-based ContainerVM images.
```release-note
kube-up and kubemark will default to using cos (GCI) images for nodes.
The previous default was container-vm (CVM, "debian"), which is deprecated.
If you need to explicitly use container-vm for some reason, you should set
KUBE_NODE_OS_DISTRIBUTION=debian
```
Automatic merge from submit-queue
Pass cluster name to Heapster with Stackdriver sink.
**What this PR does / why we need it**:
Passes cluster name as argument to Heapster when it's used with Stackdriver sink to allow setting resource label 'cluster_name' in exported metrics.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 48405, 48742, 48748, 48571, 48482)
Setting default FlexVolume driver directory on COS images.
**What this PR does / why we need it**: The original default FlexVolume driver directory is not writable on COS. A new location is necessary to make FlexVolume work.
This directory doesn't exist by default. FlexVolume users need to create this directory, bind mount it, and remount with the executable permission. The other candidate is /home/kubernetes/bin, but the directory is already getting cluttered. I will submit a different PR for a script that automates this step.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#48570
Automatic merge from submit-queue (batch tested with PRs 48698, 48712, 48516, 48734, 48735)
GCE: Allow empty NETWORK_PROJECT_ID env var
Changes:
1. Adds `GCE_API_ENDPOINT` logic to container-linux as it was added to GCI in #47881.
1. Apply `NETWORK_PROJECT_ID` value to gce.conf only if the env var is set.
/sig network
/area platform/gce
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Launch kubemark with an existing Kubemark master
In order to expand the use of kubemark, allow developers to use kubemark with a pre-existing Kubernetes cluster.
Ref issue #44393
Automatic merge from submit-queue (batch tested with PRs 48399, 48450, 48144)
Skip errors when unregistering juju kubernetes-workers
**What this PR does / why we need it**: When removing a kubernetes node from using Juju and for some reason kubernetes master fails we should not error the node, instead we should proceed with the removal of the node and the master will recognise that node as unavailable because it will fail heartbeats.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/300
**Special notes for your reviewer**:
**Release note**:
```
Clean decommission of Juju kubernetes worker units
```
Automatic merge from submit-queue (batch tested with PRs 48399, 48450, 48144)
configure kube-proxy to run with unset conntrack param when in lxc
**What this PR does / why we need it**: Configures the Juju Charm code to run kube-proxy with `conntrack-max-per-core` set to `0` when in an lxc as a workaround for issues when mounting `/sys/module/nf_conntrack/parameters/hashsize`
**Release note**:
```release-note
Configures the Juju Charm code to run kube-proxy with conntrack-max-per-core set to 0 when in an lxc as a workaround for issues when mounting /sys/module/nf_conntrack/parameters/hashsize
```
Automatic merge from submit-queue (batch tested with PRs 47043, 48448, 47515, 48446)
Fix charms leaving services running after remove-unit
**What this PR does / why we need it**:
This fixes a case where removed charm units can sometimes leave behind running services that interfere with the rest of the cluster.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix charms leaving services running after remove-unit
```
Automatic merge from submit-queue (batch tested with PRs 48439, 48440, 48394)
Fix kubernetes charms not restarting services after snap upgrades
**What this PR does / why we need it**:
This fixes a problem where the Kubernetes charms don't restart services after upgrading snaps. This can cause certain fixes not to be picked up (for example https://github.com/juju-solutions/release/pull/10)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixed kubernetes charms not restarting services after snap upgrades
```
Automatic merge from submit-queue (batch tested with PRs 48439, 48440, 48394)
Fix: namespace-create have kubectl in path
**What this PR does / why we need it**: In juju deployed clusters namespace-create action is failing
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/326
**Special notes for your reviewer**:
**Release note**:
```Fix: namespace-create action on Juju deployed clusters
```
Automatic merge from submit-queue
Add configuration for swift container name
**What this PR does / why we need it:**
This review updates the OpenStack Heat provider to allow for configuring the name of the Swift object store.
**Which issue this PR fixes:**
fixes#47966
**Special notes for your reviewer**:
Note that the terminology for OpenStack Swift conflicts with K8S terminology. In this instance, container is referring to the organization structure of Swift storage objects.
**Release note**:
```release-note
Adds configuration option for Swift object store container name to OpenStack Heat provider.
```
Automatic merge from submit-queue (batch tested with PRs 48317, 48313, 48351, 48357, 48115)
Ensure get_password is accessing a file that exists.
**What this PR does / why we need it**: get_password will throw an exception instead of returning None in case the basic_auth.csv file is missing but /root/cdk/ is there in a juju deployment.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/324
**Special notes for your reviewer**:
**Release note**:
```
Fix race condition where /root/cdk is not yet initialised in kubernetes-master setup by Juju
```