KUBE_GCE_ENABLE_IP_ALIASES=true will enable allocation of PodCIDR ips
using the ip alias mechanism rather than using routes.
NODE_IP_RANGE will control the node instance IP cidr
KUBE_GCE_IP_ALIAS_SIZE controls the size of each podCIDR
IP_ALIAS_SUBNETWORK controls the name of the subnet created for the cluster
Automatic merge from submit-queue
[Federation] Remove FEDERATIONS_DOMAIN_MAP references
Remove all references to FEDERATIONS_DOMAIN_MAP as this method is no longer is used and is replaced by adding federation domain map to kube-dns configmap.
cc @madhusudancs @kubernetes/sig-federation-pr-reviews
**Release note**:
```
[Federation] Mechanism of adding `federation domain maps` to kube-dns deployment via `--federations` flag is superseded by adding/updating `federations` key in `kube-system/kube-dns` configmap. If user is using kubefed tool to join cluster federation, adding federation domain maps to kube-dns is already taken care by `kubefed join` and does not need further action.
```
Automatic merge from submit-queue
Use shared informers for proxy endpoints and service configs
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
Follow-up to #43295 cc @wojtek-t
Will race with #43937 for conflicting changes 😄 cc @thockin
cc @smarterclayton @sttts @liggitt @deads2k @derekwaynecarr @eparis @kubernetes/rh-cluster-infra
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
Automatic merge from submit-queue (batch tested with PRs 43546, 43544)
Default to enabling legacy ABAC policy in non-test kube-up.sh environments
Fixes https://github.com/kubernetes/kubernetes/issues/43541
In 1.5, we unconditionally stomped the abac policy file if KUBE_USER was set, and unconditionally used ABAC mode pointing to that file.
In 1.6, unless the user opts out (via `ENABLE_LEGACY_ABAC=false`), we want the same legacy policy included as a fallback to RBAC.
This PR:
* defaults legacy ABAC **on** in normal deployments
* defaults legacy ABAC **on** in upgrade E2Es (ensures combination of ABAC and RBAC works properly for upgraded clusters)
* defaults legacy ABAC **off** in non-upgrade E2Es (ensures e2e tests 1.6+ run with tightened permissions, and that default RBAC roles cover the required core components)
GKE changes to drive the `ENABLE_LEGACY_ABAC` envvar were made by @cjcullen out of band
```release-note
`kube-up.sh` using the `gce` provider enables both RBAC authorization and the permissive legacy ABAC policy that makes all service accounts superusers. To opt out of the permissive ABAC policy, export the environment variable `ENABLE_LEGACY_ABAC=false` before running `cluster/kube-up.sh`.
```
Automatic merge from submit-queue
Add an env KUBE_ENABLE_MASTER_NOSCHEDULE_TAINT and disable it by default
This PR changed master `NoSchedule` taint to opt-in.
As is discussed with @bgrant0607 @janetkuo, `NoSchedule` master taint breaks existing user workload, we should not enable it by default.
Previously, NPD required the taint because it can only support one OS distro with a specific configuration. If master and node are using different OS distros, NPD will not work either on master or node. However, we've already fixed this in https://github.com/kubernetes/kubernetes/pull/40206, so for NPD it's fine to disable the taint.
This should work, but I'll still try it in my cluster to confirm.
@kubernetes/sig-scheduling-misc @dchen1107 @mikedanese
Automatic merge from submit-queue
Allow ABAC to be disabled easily on upgrades
**What this PR does / why we need it**:
Adds a local variable to the configure-helper script so that ABAC_AUTHZ_FILE can be set to a nonexistent file in kube-env to disable ABAC on a cluster that previously was using ABAC.
@liggitt @Q-Lee
Automatic merge from submit-queue (batch tested with PRs 41931, 39821, 41841, 42197, 42195)
Adding legacy ABAC for 1.6
This is a fork of a previous [pull request](https://github.com/kubernetes/kubernetes/pull/42014) to include feedback as the original author is unavailable.
Adds a mechanism to optionally enable legacy abac for 1.6 to provide a migration path for existing users.
Automatic merge from submit-queue (batch tested with PRs 35094, 42095, 42059, 42143, 41944)
Use chroot for containerized mounts
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
Automatic merge from submit-queue (batch tested with PRs 40746, 41699, 42108, 42174, 42093)
Avoid fake node names in user info
Node usernames should follow the format `system:node:<node-name>`,
but if we don't know the node name, it's worse to put a fake one in.
In the future, we plan to have a dedicated node authorizer, which would
start rejecting requests from a user with a bogus node name like this.
The right approach is to either mint correct credentials per node, or use node bootstrapping so it requests a correct client certificate itself.
Automatic merge from submit-queue (batch tested with PRs 41937, 41151, 42092, 40269, 42135)
GCE will properly regenerate basic_auth.csv on kube-apiserver start.
**What this PR does / why we need it**:
If basic_auth.csv does not exist we will generate it as normal.
If basic_auth.csv exists we will remove the old admin password before adding the "new" one. (Turns in to a no-op if the password exists).
This did not work properly before because we were replacing by key, where the key was the password. New password would not match and so not replace the old password.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#41935
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Use docker log rotation mechanism instead of logrotate
This is a solution for https://github.com/kubernetes/kubernetes/issues/38495.
Instead of rotating logs using logrotate tool, which is configured quite rigidly, this PR makes docker responsible for the rotation and makes it possible to configure docker logging parameters. It solves the following problems:
* Logging agent will stop loosing lines upon rotation
* Container's logs size will be more strictly constrained. Instead of checking the size hourly, size will be checked upon write, preventing https://github.com/kubernetes/kubernetes/issues/27754
It's still far from ideal, for example setting logging options per pod, as suggested in https://github.com/kubernetes/kubernetes/issues/15478 would be much more flexible, but latter approach requires deep changes, including changes in API, which may be in vain because of CRI and long-term vision for logging.
Changes include:
* Change in salt. It's possible to configure docker log parameters, using variables in pillar. They're exported from env variables on `gce`, but for different cloud provider they have to be exported first.
* Change in `configure-helper.sh` scripts for those os on `gce` that don't use salt + default values exposed via env variables
This change may be problematic for kubelet logs functionality with CRI enabled, that will be tackled in the follow-up PR, if confirmed.
CC @piosz @Random-Liu @yujuhong @dashpole @dchen1107 @vishh @kubernetes/sig-node-pr-reviews
```release-note
On GCI by default logrotate is disabled for application containers in favor of rotation mechanism provided by docker logging driver.
```
If the file does not exist we will generate it as normal.
If the file exists we will remove the old admin password before adding
the "new" one. (Turns in to a no-op if the password exists).
This did not work properly before because we were replacing by key,
where the key was the password. New password would not match and so
not replace the old password.
Added a METADATA_CLOBBERS_CONFIG flag
METADATA_CLOBBERS_CONFIG controls if we consider the values on disk or in
metadata to be the canonical source of truth. Currently defaulting to
false for GCE and forcing to true for GKE.
Added handling for older forms of the basic_auth.csv file.
Fixed comment to reflect new METADATA_CLOBBERS_CONFIG var.
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
Automatic merge from submit-queue (batch tested with PRs 41364, 40317, 41326, 41783, 41782)
Add ability to enable cache mutation detector in GCE
Add the ability to enable the cache mutation detector in GCE. The current default behavior (disabled) is retained.
When paired with https://github.com/kubernetes/test-infra/pull/1901, we'll be able to detect shared informer cache mutations in gce e2e PR jobs.
Allow cache mutation detector enablement by PRs in an attempt to find
mutations before they're merged in to the code base. It's just for the
apiserver and controller-manager for now. If/when the other components
start using a SharedInformerFactory, we should set them up just like
this as well.
Automatic merge from submit-queue
Added configurable etcd initial-cluster-state to kube-up script.
Added configurable etcd initial-cluster-state to kube-up script. This
allows creation of multi-master cluster from scratch. This is a
cherry-pick of #41320 from 1.5 branch.
```release-note
Added configurable etcd initial-cluster-state to kube-up script.
```
Automatic merge from submit-queue (batch tested with PRs 40297, 41285, 41211, 41243, 39735)
cluster/gce: Add env var to enable apiserver basic audit log.
For now, this is focused on a fixed set of flags that makes the audit
log show up under /var/log/kube-apiserver-audit.log and behave similarly
to /var/log/kube-apiserver.log. Allowing other customization would
require significantly more complex changes.
Audit log rotation is handled the same as for `kube-apiserver.log`.
**What this PR does / why we need it**:
Add a knob to enable [basic audit logging](https://kubernetes.io/docs/admin/audit/) in GCE.
**Which issue this PR fixes**:
**Special notes for your reviewer**:
We would like to cherrypick/port this to release-1.5 also.
**Release note**:
```release-note
The kube-apiserver [basic audit log](https://kubernetes.io/docs/admin/audit/) can be enabled in GCE by exporting the environment variable `ENABLE_APISERVER_BASIC_AUDIT=true` before running `cluster/kube-up.sh`. This will log to `/var/log/kube-apiserver-audit.log` and use the same `logrotate` settings as `/var/log/kube-apiserver.log`.
```
For now, this is focused on a fixed set of flags that makes the audit
log show up under /var/log/kube-apiserver-audit.log and behave similarly
to /var/log/kube-apiserver.log. Allowing other customization would
require significantly more complex changes.
Audit log rotation is handled externally by the wildcard /var/log/*.log
already configured in configure-helper.sh.
Added configurable etcd initial-cluster-state to kube-up script. This
allows creation of multi-master cluster from scratch. This is a
cherry-pick of #41320 from 1.5 branch.
Automatic merge from submit-queue (batch tested with PRs 41121, 40048, 40502, 41136, 40759)
Remove deprecated kubelet flags that look safe to remove
Removes:
```
--config
--auth-path
--resource-container
--system-container
```
which have all been marked deprecated since at least 1.4 and look safe to remove.
```release-note
The deprecated flags --config, --auth-path, --resource-container, and --system-container were removed.
```
Automatic merge from submit-queue (batch tested with PRs 38739, 40480, 40495, 40172, 40393)
Use existing ABAC policy file when upgrading GCE cluster
When upgrading, continue loading an existing ABAC policy file so that existing system components continue working as-is
```
When upgrading an existing 1.5 GCE cluster using `cluster/gce/upgrade.sh`, an existing ABAC policy file located at /etc/srv/kubernetes/abac-authz-policy.jsonl (the default location in 1.5) will enable the ABAC authorizer in addition to the RBAC authorizer. To switch an upgraded 1.5 cluster completely to RBAC, ensure the control plane components and your superuser have been granted sufficient RBAC permissions, move the legacy ABAC policy file to a backup location, and restart the apiserver.
```
Automatic merge from submit-queue
Enable lazy initialization of ext3/ext4 filesystems
**What this PR does / why we need it**: It enables lazy inode table and journal initialization in ext3 and ext4.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#30752, fixes#30240
**Release note**:
```release-note
Enable lazy inode table and journal initialization for ext3 and ext4
```
**Special notes for your reviewer**:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.
These extended options come from a script that was historically located at */usr/share/google/safe_format_and_mount* and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount
Checking the history of this script, I found the commit [Disable lazy init of inode table and journal.](4d7346f7f5). This one introduces the extended flags with this description:
```
Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.
```
The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.
When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system.
As of https://github.com/kubernetes/kubernetes/issues/30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory.
I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using *bio*s, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.
As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.