Automatic merge from submit-queue
Pick a specific GCI version by default on GCE.
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone. It would pick up the latest GCI release on
that milestone at the time of cluster creation. The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI. However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.
With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.
We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases. We can also automate the process to
automatically bump the hard-coded GCI version in future.
@vishh @adityakali can you please review?
cc @kubernetes/goog-image FYI
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone. It would pick up the latest GCI release on
that milestone at the time of cluster creation. The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI. However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.
With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.
We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases. We can also automate the process to
automatically bump the hard-coded GCI version in future.
Automatic merge from submit-queue
keep docker0 with private cidr range
fixes: #31465
Keep docker0 when using kubenet on GCI. Assign 169.254.123.1/24 to docker0 to avoid cidr conflict.
Automatic merge from submit-queue
fix feature_gates salt plumbing
Fix salt plumbing for `--feature-gate` from `FEATURE_GATES kube env.
Was generating grains.conf and kube-env for master only. Verified it works now for gci and debian master/nodes.
cc @thockin @timstclair
Automatic merge from submit-queue
Build and push kube-dns for 1.4 release.
Fix#31355.
Following docker images had been uploaded:
gcr.io/google_containers/kubedns-amd64:1.7
gcr.io/google_containers/kubedns-arm:1.7
gcr.io/google_containers/kubedns-arm64:1.7
Build for ppc64le is disabled by default, and it failed to be built using:
`KUBE_BUILD_PPC64LE=y make release`
I'm still working on making the ppc64le build. Updates will be added following this thread.
@girishkalele @thockin
Automatic merge from submit-queue
gci: decouple from the built-in kubelet version
Prior to this change, configure.sh would:
(1) compare versions of built-in kubelet and downloaded kubelet, and
(2) bind-mount downloaded kubelet at /usr/bin/kubelet in case of
version mismatch
With this change, configure.sh:
(1) compares the two versions only on test clusters, and
(2) uses the actual file paths to start kubelet w/o any bind-mounting
To allow (2), this change also provides its own version of kubelet
systemd service file.
Effectively with this change we will always use the downloaded kubelet
binary along with its own systemd service file on non-test clusters. The
main advantage is this change does not rely on the kubelet being built in to
the OS image.
@dchen1107 @wonderfly can you please review
cc/ @kubernetes/goog-image FYI
Prior to this change, configure.sh would:
(1) compare versions of built-in kubelet and downloaded kubelet, and
(2) bind-mount downloaded kubelet at /usr/bin/kubelet in case of
version mismatch
With this change, configure.sh:
(1) compares the two versions only on test clusters, and
(2) uses the actual file paths to start kubelet w/o any bind-mounting
To allow (2), this change also provides its own version of kubelet
systemd service file.
Effectively with this change we will always use the downloaded kubelet
binary along with its own systemd service file on non-test clusters. The
main advantage is this change does not rely on the kubelet being built in to
the OS image.
Automatic merge from submit-queue
Add admission controller for default storage class.
The admission controller adds a default class to PVCs that do not require any
specific class. This way, users (=PVC authors) do not need to care about
storage classes, administrator can configure a default one and all these PVCs
that do not care about class will get the default one.
The marker of default class is annotation "volume.beta.kubernetes.io/storage-class", which must be set to "true" to work. All other values (or missing annotation) makes the class non-default.
Based on @thockin's code, added tests and made it not to reject a PVC when no class is marked as default.
.
@kubernetes/sig-storage
Automatic merge from submit-queue
Support for creation/removal of master replicas.
HA master: initial support for creation/removal of masters replicas by
kube-up/kube-down scripts for GCE on gci (other distributions, including debian, are not supported yet).
The admission controller adds a default class to PVCs that do not require any
specific class. This way, users (=PVC authors) do not need to care about
storage classes, administrator can configure a default one and all these PVCs
that do not care about class will get the default one.
Automatic merge from submit-queue
Use --regions instead of --region for gcloud list [resource]
gcloud has started complaining:
```
WARNING: Abbreviated flag [--region] will be disabled in release 132.0.0, use the full name [--regions].
WARNING: Abbreviated flag [--region] will be disabled in release 132.0.0, use the full name [--regions].
WARNING: Abbreviated flag [--region] will be disabled in release 132.0.0, use the full name [--regions].
```
We'll probably need to cherry-pick this, as otherwise the list-resources script will start failing at some point in the future.
Automatic merge from submit-queue
Update core etcd references to use 3.0.4
This updates the core references to use 3.0.4.
There are still legacy references in the code base that should be cleaned, or just removed but I'm reluctant to purge.
/cc @kubernetes/sig-scalability
Automatic merge from submit-queue
Avoid unnecessary copies on GCI initialization.
The issue I faced was that when starting a cluster I was getting:
```
Aug 12 11:12:46 e2e-test-wojtekt-master configure.sh[1079]: cp: error writing '/home/kubernetes/kubernetes-src.tar.gz': No space left on device
```
This PR reduces amount of space that is needed on startup, as well as this speeds up starting cluster.
@lavalamp @dchen1107
Automatic merge from submit-queue
Add support for kube-up.sh to deploy Calico network policy to GCI masters
Also remove requirement for calicoctl from Debian / salt installed nodes and clean it up a little by deploying calico-node with a manifest rather than calicoctl. This also makes it more reliable by retrying properly.
How to use:
```
make quick-release
NETWORK_POLICY_PROVIDER=calico cluster/kube-up.sh
```
One place where I was uncertain:
- CPU allocations (on the master particularly, where there's very little spare capacity). I took some from etcd, but if there's a better way to decide this, I'm happy to change it.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29037)
<!-- Reviewable:end -->
It can run tests against multiple existing images that match a regex.
GCI images will be using a regex.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Automatic merge from submit-queue
AWS/GCE: Rework use of master name
* Add a pillar for `hostname` (because even if there's a good Salt function for it, I don't trust it to return the short hostname)
* Move `INITIAL_ETCD_CLUSTER` to just the GCE turn-up
* Remove `master_name`, which isn't needed
* Add a pillar for hostname (because even if there's a good Salt
function for it, I don't trust it to return the short hostname)
* Move INITIAL_ETCD_CLUSTER to just the GCE turn-up
* Remove the master_name, which isn't needed as a pillar
Automatic merge from submit-queue
In cluster scripts correct gcloud list arg from '--zone' to '--zones'
I started getting these messages when doing `kube-up` and similar operations:
WARNING: Abbreviated flag [--zone] will be disabled in release 132.0.0, use the full name [--zones].
This PR corrects the flag where used.
Note there are many uses of `--zone` on commands like `gcloud instances describe` which are still correct - those commands do not accept multiple zones.
Automatic merge from submit-queue
[Garbage Collector] add e2e tests again
#27151 is reverted because gke didn't start correctly after it's merged (https://github.com/kubernetes/kubernetes/pull/27151#issuecomment-233030686).
The possible problem is the `unbound variable`, which is fixed in the second commit of this PR. However, I cannot verify if the PR will fail the gke suite since I don't have the environment to run that suite.
@wojtek-t @lavalamp
Automatic merge from submit-queue
kube-up: increase download timeout for kubernetes.tar.gz
Particularly on smaller instances on AWS, we were hitting the 80 second
timeout now that our image is well over the 1GB mark.
Increase the timeout from 80 seconds to 300 seconds.
Fix#29418
Particularly on smaller instances on AWS, we were hitting the 80 second
timeout now that our image is well over the 1GB mark.
Increase the timeout from 80 seconds to 300 seconds.
Fix#29418
Automatic merge from submit-queue
fix logrotate config (again)
we need to add the dateformat option so that the logrotate
can create unique logfiles for each rotation. Without this,
logrotation is skipped with message like (generated in
verbose mode of logrotate):
rotating log /var/log/rotate-test.log, log->rotateCount is 5
dateext suffix '-20160718'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
destination /var/log/rotate-test2.log-20160718.gz already exists, skipping rotation
Tested as follows:
# config in '/etc/logrotate.d/rotate-test':
/var/log/rotate-test.log {
rotate 5
copytruncate
missingok
notifempty
compress
maxsize 100M
daily
dateext
dateformat -%Y%m%d-%s
create 0644 root root
}
# create 150Mb of /var/log/rotate-test.log
$ dd if=/dev/zero of=/var/log/rotate-test.log bs=1048576 count=150 conv=notrunc oflag=append
# run logrotate
$ /usr/sbin/logrotate -v /etc/logrotate.conf
...
rotating pattern: /var/log/rotate-test.log after 1 days (5 rotations)
empty log files are not rotated, log files >= 104857600 are rotated earlier, old logs are removed
considering log /var/log/rotate-test.log
log needs rotating
rotating log /var/log/rotate-test.log, log->rotateCount is 5
Converted ' -%Y%m%d-%s' -> '-%Y%m%d-%s'
dateext suffix '-20160718-1468875268'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
copying /var/log/rotate-test.log to /var/log/rotate-test.log-20160718-1468875268
truncating /var/log/rotate-test.log
compressing log with: /bin/gzip
Repeating 'dd' and 'logrotate' commands now generate logfiles correctly.
#27754
@bprashanth can you please review?
we need to add the dateformat option so that the logrotate
can create unique logfiles for each rotation. Without this,
we logrotation is skipped with message like (generated in
verbose mode of logrotate):
rotating log /var/log/rotate-test.log, log->rotateCount is 5
dateext suffix '-20160718'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
destination /var/log/rotate-test2.log-20160718.gz already exists, skipping rotation
Tested as follows:
# config in '/etc/logrotate.d/rotate-test':
/var/log/rotate-test.log {
rotate 5
copytruncate
missingok
notifempty
compress
maxsize 100M
daily
dateext
dateformat -%Y%m%d-%s
create 0644 root root
}
# create 150Mb of /var/log/rotate-test.log
$ dd if=/dev/zero of=/var/log/rotate-test.log bs=1048576 count=150 conv=notrunc oflag=append
# run logrotate
$ /usr/sbin/logrotate -v /etc/logrotate.conf
...
rotating pattern: /var/log/rotate-test.log after 1 days (5 rotations)
empty log files are not rotated, log files >= 104857600 are rotated earlier, old logs are removed
considering log /var/log/rotate-test.log
log needs rotating
rotating log /var/log/rotate-test.log, log->rotateCount is 5
Converted ' -%Y%m%d-%s' -> '-%Y%m%d-%s'
dateext suffix '-20160718-1468875268'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
copying /var/log/rotate-test.log to /var/log/rotate-test.log-20160718-1468875268
truncating /var/log/rotate-test.log
compressing log with: /bin/gzip
Repeating 'dd' and 'logrotate' commands now generate logfiles correctly.
Automatic merge from submit-queue
[garbage collector] add e2e test
This PR also includes some changes to plumb controller-manager's `--enable_garbage_collector` from the environment variable.
The e2e test will not be run by the core suite because it's marked `[Feature:GarbageCollector]`.
The corresponding jenkins job configuration PR is https://github.com/kubernetes/test-infra/pull/132.
Automatic merge from submit-queue
Substitute federation_domain_map parameter with its value in node bootstrap scripts.
This PR also removes the substitution code we added to the build scripts.
**Release Note**
```release-note
If you use one of the kube-dns replication controller manifest in `cluster/saltbase/salt/kube-dns`, i.e. `cluster/saltbase/salt/kube-dns/{skydns-rc.yaml.base,skydns-rc.yaml.in}`, either substitute one of `__PILLAR__FEDERATIONS__DOMAIN__MAP__` or `{{ pillar['federations_domain_map'] }}` with the corresponding federation name to domain name value or remove them if you do not support cluster federation at this time. If you plan to substitute the parameter with its value, here is an example for `{{ pillar['federations_domain_map'] }`
pillar['federations_domain_map'] = "- --federations=myfederation=federation.test"
where `myfederation` is the name of the federation and `federation.test` is the domain name registered for the federation.
```
cc @erictune @kubernetes/sig-cluster-federation @MikeSpreitzer @luxas
[]()
Automatic merge from submit-queue
Add Calico as policy provider in GCE
Adds Calico as policy provider to GCE, enforcing the extensions/v1beta1 NetworkPolicy API.
Still to do:
- [x] Enable NetworkPolicy API when POLICY_PROVIDER is provided.
- [x] Fix CNI plugin, policy controller versions.
CC @thockin - does this general approach look good?
Automatic merge from submit-queue
federation: Updating KubeDNS to try finding a local service first for federation query
Ref https://github.com/kubernetes/kubernetes/issues/26762
Updating KubeDNS to try to find a local service first for federation query.
Without this change, KubeDNS always returns the DNS hostname, even if a local service exists.
Have updated the code to first remove federation name from path if it exists, so that the default search for local service happens. If we dont find a local service, then we try to find the DNS hostname.
Will appreciate a strong review since this is my first change to KubeDNS.
https://github.com/kubernetes/kubernetes/pull/25727 was the original PR that added federation support to KubeDNS.
cc @kubernetes/sig-cluster-federation @quinton-hoole @madhusudancs @bprashanth @mml
Automatic merge from submit-queue
Use new fluentd-gcp container with journal support
This makes use of the systemd-journal support added in PR #27981
and Fixes#27446.
cc/ @a-robinson @andyzheng0831
Following from #27830, this copies the source onto the instance and
displays the location of it prominently (keeping the download link for
anyone that just wants to curl it).
Example output (this tag doesn't exist yet):
---
Welcome to Kubernetes v1.4.0!
You can find documentation for Kubernetes at:
http://docs.kubernetes.io/
The source for this release can be found at:
/usr/local/share/doc/kubernetes/kubernetes-src.tar.gz
Or you can download it at:
https://storage.googleapis.com/kubernetes-release/release/v1.4.0/kubernetes-src.tar.gz
It is based on the Kubernetes source at:
https://github.com/kubernetes/kubernetes/tree/v1.4.0
For Kubernetes copyright and licensing information, see:
/usr/local/share/doc/kubernetes/LICENSES
---
Following from #27830, this copies the source onto the instance and
displays the location of it prominently (keeping the download link for
anyone that just wants to curl it).
Example output (this tag doesn't exist yet):
---
Welcome to Kubernetes v1.4.0!
You can find documentation for Kubernetes at:
http://docs.kubernetes.io/
The source for this release can be found at:
/usr/local/share/doc/kubernetes/kubernetes-src.tar.gz
Or you can download it at:
https://storage.googleapis.com/kubernetes-release/release/v1.4.0/kubernetes-src.tar.gz
It is based on the Kubernetes source at:
https://github.com/kubernetes/kubernetes/tree/v1.4.0
For Kubernetes copyright and licensing information, see:
/usr/local/share/doc/kubernetes/LICENSES
---
Automatic merge from submit-queue
GCE provider: Limit Filter calls to regexps rather than insane blobs
Filters can't exceed 4k, and GET requests against the GCE API are also limited, so these break down in different ways at different cluster counts. Fix it by introducing an advisory `node-instance-prefix` configuration in the GCE provider that can hint the `EnsureLoadBalancer`/`UpdateLoadBalancer code` (and the firewall creation/update code). If it's not there, or wrong (a hostname that's registered violates it), just ignore it and grab the whole project.
Fixes#27731
[]()
Filters can't exceed 4k, and GET requests against the GCE API are also
limited, so these break down in different ways at different cluster
counts. Fix it by introducing an advisory node-instance-prefix
configuration in the GCE provider that can hint the
EnsureLoadBalancer/UpdateLoadBalancer code (and the firewall
creation/update code). If it's not there, or wrong (a hostname that's
registered violates it), just ignore it and grab the whole project.
Automatic merge from submit-queue
federation: Creating kubeconfig files to be used for creating secrets for clusters on aws and gke
Extension of https://github.com/kubernetes/kubernetes/pull/26914 which created the kubeconfig files for gce clusters.
This PR extends it to AWS, vagrant and GKE.
The change for AWS and vagrant is exactly same as GCE.
For GKE, since `gcloud create clusters` creates kubeconfig, we are just copying the generated kubeconfig to the desired location
cc @kubernetes/sig-cluster-federation @colhom
@roberthbailey for GKE
Automatic merge from submit-queue
rkt: Map kubelet's `--stage1-image` flag to rkt's `--stage1-name` flag.
This enables rkt to use cached stage1 image instead of unpacking the stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to enable rkt to find the stage1 image with the name specified by this flag.
Also, the cloud config is modified to pre-load the stage1 images.
cc @kubernetes/sig-rktnetes @kubernetes/sig-node
Automatic merge from submit-queue
add logrotate service and configuration for GCI
This change mirrors the configuration in cluster/saltbase/salt/logrotate for GCI.
On GCI we use systemd timers (https://www.freedesktop.org/software/systemd/man/systemd.timer.html) and install an hourly timer - kube-logrotate.timer. This will invoke kube-logrotate.service (which calls /usr/sbin/logrotate) once every hour to perform log rotation as per the rotation rules installed under /etc/logrotate.d/.
@kubernetes/goog-image @zmerlynn @dchen1107 @andyzheng0831
This enables rkt to use cached stage1 image instead of unpacking the
stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to
enable rkt to find the stage1 image with the name specified by this flag.
Automatic merge from submit-queue
make GCI image detection robust
This change makes sure that in case we roll back a released GCI image, the image detection logic picks a correct active image.
@kubernetes/goog-image @Amey-D @wonderfly @dchen1107
Automatic merge from submit-queue
Prep for continuous Docker validation test
```release-note
Add a test config variable to specify desired Docker version to run on GCI.
```
We want to continuously validate Docker releases (#25215), on GCI. This change
adds a new test config variable, `KUBE_GCI_DOCKER_VERSION`, through which we can
specify which version of Docker we want to run on the master and nodes. This
change also patches the Jenkins e2e-runner with the ability to fetch the latest
Docker (pre)release, and sets the aforementioned variable accordingly.
Tested on my local Jenkins instance that was able to start a cluster with the latest Docker version (different from installed version) running on both master and nodes.
@dchen1107 Can you review?
cc/ @andyzheng0831 for changes in `cluster/gce/gci/helper.sh`, and @ixdy @spxtr for changes to the Jenkins e2e-runner
cc/ @kubernetes/goog-image
Automatic merge from submit-queue
Revert "Revert "GCI: add support for network plugin""
PR #27027 added the network plugin support in GCI config, but later a bug in the network plugin broke e2e tests (see issue #27118). The bug was fixed by #27141 and we have been repeatedly run the serial e2e tests more than 10 times to verify the fix. Now it should be safe to put the GCI network plugin support back.
We will first merge in the master branch and monitor the Jenkins serial tests for a while and then cherry-pick it into release-1.3 branch.
Automatic merge from submit-queue
version bump for gci to milestone 53
Fixes#26455
GCI release 53 includes kubernetes v1.3.0-alpha.5 with docker-1.11.2.
@dchen1107 @kubernetes/goog-image @andyzheng0831
Automatic merge from submit-queue
support for mounting local-ssds on GCI
This change adds support for mounting local ssds on GCI.
It updates the previous container-vm behavior as well to
match that for GCI nodes by mounting the local-ssds under
the same path (/mnt/disks/ssdN).
@vulpecula @roberthbailey @andyzheng0831 @kubernetes/goog-image
Automatic merge from submit-queue
Trusty: fix the 'ping' issue and fluentd-gcp issue #26379
This PR is mainly for being picking up the fix in #27016 and #27102 in trusty code, so that we can fix the issues in the release-1.2 branch for GCI. It contains two parts:
(1) Adding iptables rules to accept ICMP traffic, otherwise 'ping' from a pod does not work;
(2) Revising the code for cleaning up docker0 stuff including the bridge and iptables rules. I slightly refactor the code of starting kubelet and removing docker0 stuff before starting kubelet. The old code did it after starting kubelet but before restarting docker. I think doing it before starting kubelet is safter.
cc/ @roberthbailey @fabioy @dchen1107 @a-robinson @kubernetes/goog-image
Automatic merge from submit-queue
cluster/gce/coreos: Update heapster apiVersion
This fixes an inadvertant search-replace error in #26617.
The error was missed then because the search-replace issue wasn't
present in the standalone controllers, but was in all the others.
I verified that with this change heapster comes up under the default influxdb monitoring and without this change addon manager spits out validation failure errors for the heapster yaml.
cc @yifan-gu
This change adds support for mounting local ssds on GCI.
It updates the previous container-vm behavior as well to
match that for GCI nodes by mounting the local-ssds under
the same path (/mnt/disks/ssdN).
This fixes an inadvertant search-replace error in #26617.
The error was missed then because the search-replace issue wasn't
present in the standalone controllers, but was in all the others.
Automatic merge from submit-queue
GCI: fix the issue #26379
This PR deletes docker0 explicitly to fix the issue. In some cases, coexistence of docker0 and cbr0 make troubles in GCI-based cluster instances.
I verified it in GKE. With the fix, fluentd-gcp pod shows no error. "curl google.com" can work inside a pod. Mark it as P0 to match the issue priority.
@a-robinson @roberthbailey @freehan @kubernetes/goog-image
Automatic merge from submit-queue
Enable support for memory eviction configuration via salt
Added evictions based on memory by default whenever the available memory is < 100Mi.
Updated GCE and GCI.
Automatic merge from submit-queue
Bump cluster autoscaler version and enable scale down by default
Follow up of https://github.com/kubernetes/contrib/pull/1148.
cc: @piosz @fgrzadkowski @jszczepkowski
Automatic merge from submit-queue
Re-enable node problem detector by default
Re-enable node problem detector started in gce cluster by default.
For now, in the master node, the node problem detector will be started and do nothing (see https://github.com/kubernetes/node-problem-detector/pull/13).
But in fact, in my test cluster, the master has no extra cpu to run the node problem detector, so node problem detector is started on all nodes except master, which is what we want but not expected...
@dchen1107
/cc @kubernetes/sig-node
/cc @andyzheng0831 for the gci script change.
[]()
Automatic merge from submit-queue
Don't run fluentd-es on GCI masters
It isn't run on containervm masters. It can't do anything on the master because the master doesn't have kube-proxy running to enable fluentd to talk to the elasticsearch service.
@andyzheng0831
We want to continuously validate Docker releases (#25215), on GCI. This change
adds a new test config variable, `KUBE_GCI_DOCKER_VERSION`, through which we can
specify which version of Docker we want to run on the master and nodes. This
change also patches the Jenkins e2e-runner with the ability to fetch the latest
Docker (pre)release, and sets the aforementioned variable accordingly.
Automatic merge from submit-queue
GCI/Trusty: support the Docker registry mirror
@roberthbailey @zmerlynn please review it.
cc/ @fabioy @dchen1107 @kubernetes/goog-image FYI.
cc/ @ojarjur it is very straightforward to add support for GCI, which is pretty much like the change to ContainerVM's configure-vm.sh in your original PR #25841.
Automatic merge from submit-queue
GCI: correct the fix in #26363
This PR is mainly for correcting the fix to 'find' command in #26363. I added "-maxdepth 1" in an earlier change, and #26363 tried to fix it by changing the search path. This is potentially incorrect, when yaml files are in more than one layer deep. The real fix should be removing the "-maxdepth 1" flag from 'find' command. This PR also updates two minor places in the file configure-helper.sh introduced by two previous PR #26413 and #26048.
@roberthbailey @wonderfly
cc/ @dchen1107 @fabioy @kubernetes/goog-image
Automatic merge from submit-queue
pin GCI version to milestone 52
This is mainly for pinning the 1.2 branch to GCI milestone 52
which contains correct docker and kubelet built in.
Doing this allows us to upgrade docker to v1.11 (issue #26455)
in GCI 53 without breaking the 1.2 release branch.
@kubernetes/goog-image @dchen1107 @roberthbailey @andyzheng0831
This is mainly for pinning the 1.2 branch to GCI milestone 52
which contains correct docker and kubelet built in.
Doing this allows us to upgrade docker to v1.11 (issue #26455)
in GCI 53 without breaking the 1.2 release branch.
Automatic merge from submit-queue
Move the defaults setting of GCI to util.sh
fixes#26291
This change recovers some of the side effects of
https://github.com/kubernetes/kubernetes/pull/26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
@euank @roberthbailey Can you review?
Automatic merge from submit-queue
cluster/coreos: Update heapster addon to beta2
fixes#26616
As noted there, heapster was updated but not for gce/coreos which breaks anything that depends on heapster's new metrics API (i.e. autoscaling)
This change recovers some of the side effects of
https://github.com/kubernetes/kubernetes/pull/26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
Automatic merge from submit-queue
Support for cluster autoscaler in GCE Trusty and GCI images
Fixes: #26346
Ref: #26197
cc: @fgrzadkowski @vulpecula @piosz @jszczepkowski
Automatic merge from submit-queue
Prepull images in e2e
Quick and dirty image puller because the SQ stalled multiple times just *today* on image pull flake (https://github.com/kubernetes/kubernetes/issues/25277).
@kubernetes/sig-node @kubernetes/sig-testing wdyt?
Automatic merge from submit-queue
Make node-instance-group base names unique to prevent collisions
We create multiple IGMs for >1000 Node clusters. When we have a conflict on base name IGMs will fight over ownership of the VM that happen to have the name belonging to multiple IGMs.
This change will increase reliability of starting big clusters.
cc @wojtek-t @alex-mohr @roberthbailey @mikedanese
Automatic merge from submit-queue
Add node problem detector as an addon pod.
```release-note
Introduce a new add-on pod NodeProblemDetector.
NodeProblemDetector is a DaemonSet running on each node, monitoring node health and reporting
node problems as NodeCondition and Event. Currently it already supports kernel log monitoring, and
will support more problem detection in the future. It is enabled by default on gce now.
```
This PR enables NodeProblemDetector as an add-on pod.
/cc @mikedanese @kubernetes/sig-node
[]()
Automatic merge from submit-queue
Configuration for GCP webhook authentication and authorization
This PR adds configuration for GCP webhook authentication and authorization in ContainerVM and GCI. The change of configure-vm.sh and kube-apiserver.manifest is directly copied from @cjcullen's PR #25380 and #25296. The change in GCI script configure-helper.sh includes the support for webhook authentication and authorization, and also some code refactor to improve readability.
@cjcullen @roberthbailey @zmerlynn please review it. The original PRs are P1, please mark this as P1.
cc/ @fabioy @kubernetes/goog-image FYI.
I verified it by running e2e tests on GCI cluster. Without the GCI side change, cluster creation fails as being capture by GKE Jenkins tests. I don't test when the two env GCP_AUTHN_URL and GCP_AUTHZ_URL are set, because they are only set in GKE. After this PR is merged, @cjcullen will test in GKE.
Automatic merge from submit-queue
cluster/gce/coreos: Set service-cluster-ip-range
Broken by #19242
See also #26002
This is necessary to kube-up for me, but depending on how #26002 plays out, this PR might not be necessary. Happy to close this or merge or whatever depending on what's best.
cc @yifan-gu @sjpotter @mikedanese
Automatic merge from submit-queue
GCI: Fix the condition for using the default image
This PR revises the condition for using the default GCI image. The old logic is not convenient for manually run e2e tests in some cases (mainly for GCI team to test custom images). The new logic by this PR is very similar to the logic in using ContainerVM. When setting distro to "gci", if master or node image is unset, we use gci-dev for it. If either is set, we respect it.
@roberthbailey @zmerlynn @dchen1107 please review it, and we should cherry pick it in release-1.2 branch. Thanks!
cc/ @kubernetes/goog-image @adityakali FYI
Automatic merge from submit-queue
GCI/Trusty: Fix an issue in using 'find' commands
This PR makes the logic of 'find' command consistent with the 'cp' command afterwards, i.e., only check one layer of a given dir. Without this fix, we have seen a recent breakage after PR #25309 added the file cluster/addons/fluentd-elasticsearch/es-image/template-k8s-logstash.json. The 'find' command discovers this json file, but the 'cp' command fails.
@roberthbailey @dchen1107 @zmerlynn please review this fix, and mark it as a cherry pick candidate. I already verified this fix can resolve the breakage.
cc/ @wonderfly @fabioy @kubernetes/goog-image FYI
Automatic merge from submit-queue
GCI: Enable the log of upstart jobs
This PR enables the log of upstart jobs in master.yaml and node.yaml. By default, log of upstart jobs are enabled in Trusty and placed in /var/log/upstart, but not enabled in GCI. This change explicitly directs the log to the system logger. For trusty, they are in /var/log/syslog file. In GCI, we can check it using "journalctl". This change will be useful for debugging if cluster initialization fails.
@roberthbailey @maisem @dchen1107 please review it. This will be useful for issues like #23634. We should also cherry pick it in release-1.2
cc/ @fabioy @zmerlynn @wonderfly FYI.
Automatic merge from submit-queue
Salt configuration for the new Cluster Autoscaler for GCE
Adds support for cloud autoscaler from contrib/cloud-autoscaler in kube-up.sh GCE script.
cc: @fgrzadkowski @piosz
Automatic merge from submit-queue
Use --format='value(name)' with gcloud instead of grep/awk/cut
Fixing our fragile parsing of `gcloud` is getting old (#24746, #25159, maybe others?).
Instead, let's just get the proper output out of `gcloud` in the first place.
Automatic merge from submit-queue
Change default clusterCIDRs from /16 to /14 in GCE configs allowing 1000 Node clusters by default.
cc @thockin @roberthbailey @wojtek-t @zmerlynn @davidopp
Automatic merge from submit-queue
GCI: Add two GCI specific metadata pairs
This PR adds two GCI specific metadata pairs when using GCI image.
(1) "gci-update-strategy": by default the GCI in-place updater is enabled. It means that when a new image is released, the instance on the old image will be upgraded to the new image. In this change, we turn it off;
(2) "gci-ensure-gke-docker": GCI is built with two versions of docker. When this metadata is set to "true", the version satisfying kubernetes qualification will be used. Setting this metadata prevents from using incorrect docker version.
Automatic merge from submit-queue
Fix detect-node-names to not error out if there are no nodes
Fixes#21564.
Teardown was not working correctly in rare cases because `detect-node-names` was failing before any of the actual cleanup was run. I'm pretty sure the issue was that there was an instance group, but no instances in the instance group, so we bailed out when we tried to expand the bash array.
This PR adds a guard so we don't bail if the array is empty.
cc @jlowdermilk @spxtr
Automatic merge from submit-queue
Add support for running clusters on GCI
Google Container-VM Image (GCI) is the next revision of Container-VM. See documentation at https://cloud.google.com/compute/docs/containers/vm-image/. This change adds support for starting a Kubernetes cluster using GCI.
With this change, users can start a kubernetes cluster using the latest kubelet and kubectl release binary built in the GCI image by running:
$ KUBE_OS_DISTRIBUTION="gci" cluster/kube-up.sh
Or run a testing cluster on GCI by running:
$ KUBE_OS_DISTRIBUTION="gci" go run hack/e2e.go -v --up
The commands above will choose the latest GCI image by default.
Automatic merge from submit-queue
Switch to ABAC authorization from AllowAll
Switch from AllowAll to ABAC. All existing identities (that are created by deployment scripts) are given full permissions through ABAC. Manually created identities will need policies added to the `policy.jsonl` file on the master.
Automatic merge from submit-queue
don't source the kube-env in addon-manager
This was added in 2feb658ed7 which became unused after #23603 but wasn't removed
Automatic merge from submit-queue
Trusty: Add debug supports for docker and kubelet
This PR adds debug support in two aspects: (1) For a test cluster, docker command will have "--debug" flag. Recently we noticed that this is very helpful in debug e2e test failures; (2) The kubelet command line will be put in /etc/default/kubelet. If a developer wants to test kubelet flags without recreating a cluster, she/he only needs to revise this file and then run "initctl restart kubelet". In addition, this PR fixes a couple of small things like comments and alignment.
Test result:
(1) Manually verified changing /etc/default/kubelet and run "initctl restart kubelet";
(2) Verified docker command line flag "--debug";
(3) e2e on pure trusty cluster and hybrid cluster all passed.
@roberthbailey @dchen1107 @zmerlynn please review it.
cc/ @yujuhong @fabioy @wonderfly FYI.
Automatic merge from submit-queue
Trusty: Add retry in curl commands
This fix is for improving robustness in fetch critical metadata files when the metadata server is temporarily unreachable.
@roberthbailey @zmerlynn @dchen1107 please review it.
cc/ @fabioy @wonderfly FYI.
Automatic merge from submit-queue
jenkins: Allow configuration of release bucket
This allows others to leverage the existing E2E code to test some
patched kube binary by simply overriding the bucket and reusing many of
the existing scripts
Automatic merge from submit-queue
Trusty: Handle the new var in kube-proxy manifest
This is to capture the kube-proxy manifest change in PR #24429.
@roberthbailey @fabioy @zmerlynn please review this change and mark it as cherry pick candidate. We need to catch up 1.2.3 release.
cc/ @dchen1107 @wonderfly @cjcullen FYI.
I have verified this fix. Without this fix, kube-proxy pod in Trusty nodes cannot be started correctly, i.e., the command line has an unhadled variable. And some other kube-system pods do not work correctly as kube-proxy is not working well. After applying this fix, kube-proxy can be started correctly, and all kube-system pods run successfully.
Automatic merge from submit-queue
Strip comments from configure-vm.sh for gce
We are getting very close to the 32KiB limit on GCE metadata entry length. We used to strip comments before putting the value in metadata, but I think we removed it in a refactor because it wasn't absolutely necessary, and leaving it out made the scripts slightly cleaner. It's close to being necessary again.
Removing comments reduces the size from 31,609B to 27,221B: https://www.diffchecker.com/0xmmecvw.
This allows others to leverage the existing E2E code to test some
patched kube binary by simply overriding the bucket and reusing many of
the existing scripts
Automatic merge from submit-queue
Trusty: Fixes for running GKE master
This PR includes two fixes for running GKE master on our image:
(1) The kubelet command line assembly had a missing part for cbr0. We did not catch it because the code path is not covered by OSS k8s tests;
(2) Remove the "" from the variables in the cert files. It causes a parsing issue in GKE. Again, this code path is not covered by k8s tests.
This PR also refactors the code for assembling kubelet flag. I move all logic into a single function assemble_kubelet_flags in configure-helper.sh for better readability and also simplify node.yaml and master.yaml.
@roberthbailey @dchen1107 please review it, and mark it as cherrypick-candidate. This PR is verified by @maisem. Together with his CL for GKE, we can run GKE cluster with master on our image and nodes on ContainerVM.
cc/ @maisem @fabioy @wonderfly FYI
This only applies to gce kube-up. 60 seconds of open connection should
be sufficient for anything that we should be downloading. The release
tar is currently 255M.
Automatic merge from submit-queue
Trusty: Regional release .tar.gz support
@zmerlynn and @roberthbailey please review it. This change is to support the feature added in PR #22234. The entire logic is pretty much the same as in #22234, with only few minor changes in implementation.
I had manually run e2e tests with "export RELEASE_REGION_FALLBACK=true" on two clusters: (1) Trusty on master nodes on ContainerVM; (2) Master and nodes all on trusty. All tests are green. I don't figure out a way to simulate regional fallback. But I did test the function download_or_bust() out-of-box.
cc/ @wonderfly @dchen1107 @fabioy FYI.
This should allow allow the non_masquerade_cidr option to get configured
in /etc/salt/minion.d/grains.conf, allowing the flag to used by kubelet
in /etc/sysconfig/kubelet. Default configuration is set in pillar
Allow the gcr.io/google_containers registry to be overridden
regionally by just blasting a new KUBE_ADDON_REGISTRY out. Instead of
adding every addon to Salt and asking all of the other consumers
(Trusty, Juju, Mesos, etc) to change, just script the sed ourselves.
This is probably the 9th grossest thing I've ever done, but it works
well, and it works quickly. I kind of wish it didn't.
This change revises the way to provide kube-system manifests for clusters on Trusty. Originally, we maintained copies of some manifests under cluster/gce/trusty/kube-manifests, which is not scalable and hard to maintain. With this change, clusters on Trusty will use the same source of manifests as ContainerVM. This change also fixes some minor problems such as shell variables and comments to meet the style guidance better.
The kubelet flag "nosystem" was removed recently, which breaks kubelet in Trusty. This changes remove the flag usage accordingly. It also revises several aspects of Trusty support to make it in the same page as running on ContainerVM, such as new flags in kubelet and new logic in api-server and etcd pods.
PR #22022 added a new variable "cpurequest" in kube-proxy.manifest. This makes kubelet in Trusty fail to start the kube-proxy pod as this variable value is not set.
* In kube-up.sh, create a staging bucket with a location nearest the
zone being created. If new variable RELEASE_REGION_FALLBACK is set
(default false), create multiple buckets and stage to fallback
URLs. (In open source, this path is primarily for testing.)
* In configure-vm.sh, split the URL env variables by comma (if any
extra are present) and retry on the fallback URLs. Also factor the
hash checking into this path rather than outside, since a corrupt
release in a particular geo can be retried in a different geo.
* Remove the local already-staged .tar.gz checks. They've caused
several issues along the way, and with this code path become virtually
unmaintainable. (I could add a sentinel for each bucket it's possibly
staged to, but ew.)
We run unattened-upgrades manually, and then reboot automatically if we
find /var/run/reboot-required; then we check if any services need
restarting and restart them automatically using the needrestart tool.
This should mean we don't _have_ to build new images on every security
update, though we can do so to avoid a reboot.
Issue #21382
This change corrects how we determine the log level. Moreover, it explicitly redirects kubelet log to /var/log/kubelet.log, as we noticed it may miss sometimes.
This change moves the code of running and monitoring addon pods in a daemon type upstart job, so that addon manifest monitoring can be restarted automatically upon failure. Second, it updates the usage of "kube-ui" to "dashboard" to match the change in PR #20330.
This change support running kubernetes master on Ubuntu Trusty.
It uses pure cloud-config and shell scripts, and completely gets
rid of saltstack or the release salt tarball.