It can run tests against multiple existing images that match a regex.
GCI images will be using a regex.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Automatic merge from submit-queue
AWS/GCE: Rework use of master name
* Add a pillar for `hostname` (because even if there's a good Salt function for it, I don't trust it to return the short hostname)
* Move `INITIAL_ETCD_CLUSTER` to just the GCE turn-up
* Remove `master_name`, which isn't needed
* Add a pillar for hostname (because even if there's a good Salt
function for it, I don't trust it to return the short hostname)
* Move INITIAL_ETCD_CLUSTER to just the GCE turn-up
* Remove the master_name, which isn't needed as a pillar
Automatic merge from submit-queue
In cluster scripts correct gcloud list arg from '--zone' to '--zones'
I started getting these messages when doing `kube-up` and similar operations:
WARNING: Abbreviated flag [--zone] will be disabled in release 132.0.0, use the full name [--zones].
This PR corrects the flag where used.
Note there are many uses of `--zone` on commands like `gcloud instances describe` which are still correct - those commands do not accept multiple zones.
Automatic merge from submit-queue
[Garbage Collector] add e2e tests again
#27151 is reverted because gke didn't start correctly after it's merged (https://github.com/kubernetes/kubernetes/pull/27151#issuecomment-233030686).
The possible problem is the `unbound variable`, which is fixed in the second commit of this PR. However, I cannot verify if the PR will fail the gke suite since I don't have the environment to run that suite.
@wojtek-t @lavalamp
Automatic merge from submit-queue
kube-up: increase download timeout for kubernetes.tar.gz
Particularly on smaller instances on AWS, we were hitting the 80 second
timeout now that our image is well over the 1GB mark.
Increase the timeout from 80 seconds to 300 seconds.
Fix#29418
Particularly on smaller instances on AWS, we were hitting the 80 second
timeout now that our image is well over the 1GB mark.
Increase the timeout from 80 seconds to 300 seconds.
Fix#29418
Automatic merge from submit-queue
fix logrotate config (again)
we need to add the dateformat option so that the logrotate
can create unique logfiles for each rotation. Without this,
logrotation is skipped with message like (generated in
verbose mode of logrotate):
rotating log /var/log/rotate-test.log, log->rotateCount is 5
dateext suffix '-20160718'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
destination /var/log/rotate-test2.log-20160718.gz already exists, skipping rotation
Tested as follows:
# config in '/etc/logrotate.d/rotate-test':
/var/log/rotate-test.log {
rotate 5
copytruncate
missingok
notifempty
compress
maxsize 100M
daily
dateext
dateformat -%Y%m%d-%s
create 0644 root root
}
# create 150Mb of /var/log/rotate-test.log
$ dd if=/dev/zero of=/var/log/rotate-test.log bs=1048576 count=150 conv=notrunc oflag=append
# run logrotate
$ /usr/sbin/logrotate -v /etc/logrotate.conf
...
rotating pattern: /var/log/rotate-test.log after 1 days (5 rotations)
empty log files are not rotated, log files >= 104857600 are rotated earlier, old logs are removed
considering log /var/log/rotate-test.log
log needs rotating
rotating log /var/log/rotate-test.log, log->rotateCount is 5
Converted ' -%Y%m%d-%s' -> '-%Y%m%d-%s'
dateext suffix '-20160718-1468875268'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
copying /var/log/rotate-test.log to /var/log/rotate-test.log-20160718-1468875268
truncating /var/log/rotate-test.log
compressing log with: /bin/gzip
Repeating 'dd' and 'logrotate' commands now generate logfiles correctly.
#27754
@bprashanth can you please review?
we need to add the dateformat option so that the logrotate
can create unique logfiles for each rotation. Without this,
we logrotation is skipped with message like (generated in
verbose mode of logrotate):
rotating log /var/log/rotate-test.log, log->rotateCount is 5
dateext suffix '-20160718'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
destination /var/log/rotate-test2.log-20160718.gz already exists, skipping rotation
Tested as follows:
# config in '/etc/logrotate.d/rotate-test':
/var/log/rotate-test.log {
rotate 5
copytruncate
missingok
notifempty
compress
maxsize 100M
daily
dateext
dateformat -%Y%m%d-%s
create 0644 root root
}
# create 150Mb of /var/log/rotate-test.log
$ dd if=/dev/zero of=/var/log/rotate-test.log bs=1048576 count=150 conv=notrunc oflag=append
# run logrotate
$ /usr/sbin/logrotate -v /etc/logrotate.conf
...
rotating pattern: /var/log/rotate-test.log after 1 days (5 rotations)
empty log files are not rotated, log files >= 104857600 are rotated earlier, old logs are removed
considering log /var/log/rotate-test.log
log needs rotating
rotating log /var/log/rotate-test.log, log->rotateCount is 5
Converted ' -%Y%m%d-%s' -> '-%Y%m%d-%s'
dateext suffix '-20160718-1468875268'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
copying /var/log/rotate-test.log to /var/log/rotate-test.log-20160718-1468875268
truncating /var/log/rotate-test.log
compressing log with: /bin/gzip
Repeating 'dd' and 'logrotate' commands now generate logfiles correctly.
Automatic merge from submit-queue
[garbage collector] add e2e test
This PR also includes some changes to plumb controller-manager's `--enable_garbage_collector` from the environment variable.
The e2e test will not be run by the core suite because it's marked `[Feature:GarbageCollector]`.
The corresponding jenkins job configuration PR is https://github.com/kubernetes/test-infra/pull/132.
Automatic merge from submit-queue
Substitute federation_domain_map parameter with its value in node bootstrap scripts.
This PR also removes the substitution code we added to the build scripts.
**Release Note**
```release-note
If you use one of the kube-dns replication controller manifest in `cluster/saltbase/salt/kube-dns`, i.e. `cluster/saltbase/salt/kube-dns/{skydns-rc.yaml.base,skydns-rc.yaml.in}`, either substitute one of `__PILLAR__FEDERATIONS__DOMAIN__MAP__` or `{{ pillar['federations_domain_map'] }}` with the corresponding federation name to domain name value or remove them if you do not support cluster federation at this time. If you plan to substitute the parameter with its value, here is an example for `{{ pillar['federations_domain_map'] }`
pillar['federations_domain_map'] = "- --federations=myfederation=federation.test"
where `myfederation` is the name of the federation and `federation.test` is the domain name registered for the federation.
```
cc @erictune @kubernetes/sig-cluster-federation @MikeSpreitzer @luxas
[]()
Automatic merge from submit-queue
Add Calico as policy provider in GCE
Adds Calico as policy provider to GCE, enforcing the extensions/v1beta1 NetworkPolicy API.
Still to do:
- [x] Enable NetworkPolicy API when POLICY_PROVIDER is provided.
- [x] Fix CNI plugin, policy controller versions.
CC @thockin - does this general approach look good?
Automatic merge from submit-queue
federation: Updating KubeDNS to try finding a local service first for federation query
Ref https://github.com/kubernetes/kubernetes/issues/26762
Updating KubeDNS to try to find a local service first for federation query.
Without this change, KubeDNS always returns the DNS hostname, even if a local service exists.
Have updated the code to first remove federation name from path if it exists, so that the default search for local service happens. If we dont find a local service, then we try to find the DNS hostname.
Will appreciate a strong review since this is my first change to KubeDNS.
https://github.com/kubernetes/kubernetes/pull/25727 was the original PR that added federation support to KubeDNS.
cc @kubernetes/sig-cluster-federation @quinton-hoole @madhusudancs @bprashanth @mml
Automatic merge from submit-queue
Use new fluentd-gcp container with journal support
This makes use of the systemd-journal support added in PR #27981
and Fixes#27446.
cc/ @a-robinson @andyzheng0831
Following from #27830, this copies the source onto the instance and
displays the location of it prominently (keeping the download link for
anyone that just wants to curl it).
Example output (this tag doesn't exist yet):
---
Welcome to Kubernetes v1.4.0!
You can find documentation for Kubernetes at:
http://docs.kubernetes.io/
The source for this release can be found at:
/usr/local/share/doc/kubernetes/kubernetes-src.tar.gz
Or you can download it at:
https://storage.googleapis.com/kubernetes-release/release/v1.4.0/kubernetes-src.tar.gz
It is based on the Kubernetes source at:
https://github.com/kubernetes/kubernetes/tree/v1.4.0
For Kubernetes copyright and licensing information, see:
/usr/local/share/doc/kubernetes/LICENSES
---
Following from #27830, this copies the source onto the instance and
displays the location of it prominently (keeping the download link for
anyone that just wants to curl it).
Example output (this tag doesn't exist yet):
---
Welcome to Kubernetes v1.4.0!
You can find documentation for Kubernetes at:
http://docs.kubernetes.io/
The source for this release can be found at:
/usr/local/share/doc/kubernetes/kubernetes-src.tar.gz
Or you can download it at:
https://storage.googleapis.com/kubernetes-release/release/v1.4.0/kubernetes-src.tar.gz
It is based on the Kubernetes source at:
https://github.com/kubernetes/kubernetes/tree/v1.4.0
For Kubernetes copyright and licensing information, see:
/usr/local/share/doc/kubernetes/LICENSES
---
Automatic merge from submit-queue
GCE provider: Limit Filter calls to regexps rather than insane blobs
Filters can't exceed 4k, and GET requests against the GCE API are also limited, so these break down in different ways at different cluster counts. Fix it by introducing an advisory `node-instance-prefix` configuration in the GCE provider that can hint the `EnsureLoadBalancer`/`UpdateLoadBalancer code` (and the firewall creation/update code). If it's not there, or wrong (a hostname that's registered violates it), just ignore it and grab the whole project.
Fixes#27731
[]()
Filters can't exceed 4k, and GET requests against the GCE API are also
limited, so these break down in different ways at different cluster
counts. Fix it by introducing an advisory node-instance-prefix
configuration in the GCE provider that can hint the
EnsureLoadBalancer/UpdateLoadBalancer code (and the firewall
creation/update code). If it's not there, or wrong (a hostname that's
registered violates it), just ignore it and grab the whole project.
Automatic merge from submit-queue
federation: Creating kubeconfig files to be used for creating secrets for clusters on aws and gke
Extension of https://github.com/kubernetes/kubernetes/pull/26914 which created the kubeconfig files for gce clusters.
This PR extends it to AWS, vagrant and GKE.
The change for AWS and vagrant is exactly same as GCE.
For GKE, since `gcloud create clusters` creates kubeconfig, we are just copying the generated kubeconfig to the desired location
cc @kubernetes/sig-cluster-federation @colhom
@roberthbailey for GKE
Automatic merge from submit-queue
rkt: Map kubelet's `--stage1-image` flag to rkt's `--stage1-name` flag.
This enables rkt to use cached stage1 image instead of unpacking the stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to enable rkt to find the stage1 image with the name specified by this flag.
Also, the cloud config is modified to pre-load the stage1 images.
cc @kubernetes/sig-rktnetes @kubernetes/sig-node
Automatic merge from submit-queue
add logrotate service and configuration for GCI
This change mirrors the configuration in cluster/saltbase/salt/logrotate for GCI.
On GCI we use systemd timers (https://www.freedesktop.org/software/systemd/man/systemd.timer.html) and install an hourly timer - kube-logrotate.timer. This will invoke kube-logrotate.service (which calls /usr/sbin/logrotate) once every hour to perform log rotation as per the rotation rules installed under /etc/logrotate.d/.
@kubernetes/goog-image @zmerlynn @dchen1107 @andyzheng0831
This enables rkt to use cached stage1 image instead of unpacking the
stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to
enable rkt to find the stage1 image with the name specified by this flag.
Automatic merge from submit-queue
make GCI image detection robust
This change makes sure that in case we roll back a released GCI image, the image detection logic picks a correct active image.
@kubernetes/goog-image @Amey-D @wonderfly @dchen1107
Automatic merge from submit-queue
Prep for continuous Docker validation test
```release-note
Add a test config variable to specify desired Docker version to run on GCI.
```
We want to continuously validate Docker releases (#25215), on GCI. This change
adds a new test config variable, `KUBE_GCI_DOCKER_VERSION`, through which we can
specify which version of Docker we want to run on the master and nodes. This
change also patches the Jenkins e2e-runner with the ability to fetch the latest
Docker (pre)release, and sets the aforementioned variable accordingly.
Tested on my local Jenkins instance that was able to start a cluster with the latest Docker version (different from installed version) running on both master and nodes.
@dchen1107 Can you review?
cc/ @andyzheng0831 for changes in `cluster/gce/gci/helper.sh`, and @ixdy @spxtr for changes to the Jenkins e2e-runner
cc/ @kubernetes/goog-image
Automatic merge from submit-queue
Revert "Revert "GCI: add support for network plugin""
PR #27027 added the network plugin support in GCI config, but later a bug in the network plugin broke e2e tests (see issue #27118). The bug was fixed by #27141 and we have been repeatedly run the serial e2e tests more than 10 times to verify the fix. Now it should be safe to put the GCI network plugin support back.
We will first merge in the master branch and monitor the Jenkins serial tests for a while and then cherry-pick it into release-1.3 branch.
Automatic merge from submit-queue
version bump for gci to milestone 53
Fixes#26455
GCI release 53 includes kubernetes v1.3.0-alpha.5 with docker-1.11.2.
@dchen1107 @kubernetes/goog-image @andyzheng0831
Automatic merge from submit-queue
support for mounting local-ssds on GCI
This change adds support for mounting local ssds on GCI.
It updates the previous container-vm behavior as well to
match that for GCI nodes by mounting the local-ssds under
the same path (/mnt/disks/ssdN).
@vulpecula @roberthbailey @andyzheng0831 @kubernetes/goog-image
Automatic merge from submit-queue
Trusty: fix the 'ping' issue and fluentd-gcp issue #26379
This PR is mainly for being picking up the fix in #27016 and #27102 in trusty code, so that we can fix the issues in the release-1.2 branch for GCI. It contains two parts:
(1) Adding iptables rules to accept ICMP traffic, otherwise 'ping' from a pod does not work;
(2) Revising the code for cleaning up docker0 stuff including the bridge and iptables rules. I slightly refactor the code of starting kubelet and removing docker0 stuff before starting kubelet. The old code did it after starting kubelet but before restarting docker. I think doing it before starting kubelet is safter.
cc/ @roberthbailey @fabioy @dchen1107 @a-robinson @kubernetes/goog-image
Automatic merge from submit-queue
cluster/gce/coreos: Update heapster apiVersion
This fixes an inadvertant search-replace error in #26617.
The error was missed then because the search-replace issue wasn't
present in the standalone controllers, but was in all the others.
I verified that with this change heapster comes up under the default influxdb monitoring and without this change addon manager spits out validation failure errors for the heapster yaml.
cc @yifan-gu
This change adds support for mounting local ssds on GCI.
It updates the previous container-vm behavior as well to
match that for GCI nodes by mounting the local-ssds under
the same path (/mnt/disks/ssdN).
This fixes an inadvertant search-replace error in #26617.
The error was missed then because the search-replace issue wasn't
present in the standalone controllers, but was in all the others.
Automatic merge from submit-queue
GCI: fix the issue #26379
This PR deletes docker0 explicitly to fix the issue. In some cases, coexistence of docker0 and cbr0 make troubles in GCI-based cluster instances.
I verified it in GKE. With the fix, fluentd-gcp pod shows no error. "curl google.com" can work inside a pod. Mark it as P0 to match the issue priority.
@a-robinson @roberthbailey @freehan @kubernetes/goog-image
Automatic merge from submit-queue
Enable support for memory eviction configuration via salt
Added evictions based on memory by default whenever the available memory is < 100Mi.
Updated GCE and GCI.
Automatic merge from submit-queue
Bump cluster autoscaler version and enable scale down by default
Follow up of https://github.com/kubernetes/contrib/pull/1148.
cc: @piosz @fgrzadkowski @jszczepkowski
Automatic merge from submit-queue
Re-enable node problem detector by default
Re-enable node problem detector started in gce cluster by default.
For now, in the master node, the node problem detector will be started and do nothing (see https://github.com/kubernetes/node-problem-detector/pull/13).
But in fact, in my test cluster, the master has no extra cpu to run the node problem detector, so node problem detector is started on all nodes except master, which is what we want but not expected...
@dchen1107
/cc @kubernetes/sig-node
/cc @andyzheng0831 for the gci script change.
[]()
Automatic merge from submit-queue
Don't run fluentd-es on GCI masters
It isn't run on containervm masters. It can't do anything on the master because the master doesn't have kube-proxy running to enable fluentd to talk to the elasticsearch service.
@andyzheng0831
We want to continuously validate Docker releases (#25215), on GCI. This change
adds a new test config variable, `KUBE_GCI_DOCKER_VERSION`, through which we can
specify which version of Docker we want to run on the master and nodes. This
change also patches the Jenkins e2e-runner with the ability to fetch the latest
Docker (pre)release, and sets the aforementioned variable accordingly.
Automatic merge from submit-queue
GCI/Trusty: support the Docker registry mirror
@roberthbailey @zmerlynn please review it.
cc/ @fabioy @dchen1107 @kubernetes/goog-image FYI.
cc/ @ojarjur it is very straightforward to add support for GCI, which is pretty much like the change to ContainerVM's configure-vm.sh in your original PR #25841.
Automatic merge from submit-queue
GCI: correct the fix in #26363
This PR is mainly for correcting the fix to 'find' command in #26363. I added "-maxdepth 1" in an earlier change, and #26363 tried to fix it by changing the search path. This is potentially incorrect, when yaml files are in more than one layer deep. The real fix should be removing the "-maxdepth 1" flag from 'find' command. This PR also updates two minor places in the file configure-helper.sh introduced by two previous PR #26413 and #26048.
@roberthbailey @wonderfly
cc/ @dchen1107 @fabioy @kubernetes/goog-image
Automatic merge from submit-queue
pin GCI version to milestone 52
This is mainly for pinning the 1.2 branch to GCI milestone 52
which contains correct docker and kubelet built in.
Doing this allows us to upgrade docker to v1.11 (issue #26455)
in GCI 53 without breaking the 1.2 release branch.
@kubernetes/goog-image @dchen1107 @roberthbailey @andyzheng0831
This is mainly for pinning the 1.2 branch to GCI milestone 52
which contains correct docker and kubelet built in.
Doing this allows us to upgrade docker to v1.11 (issue #26455)
in GCI 53 without breaking the 1.2 release branch.
Automatic merge from submit-queue
Move the defaults setting of GCI to util.sh
fixes#26291
This change recovers some of the side effects of
https://github.com/kubernetes/kubernetes/pull/26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
@euank @roberthbailey Can you review?
Automatic merge from submit-queue
cluster/coreos: Update heapster addon to beta2
fixes#26616
As noted there, heapster was updated but not for gce/coreos which breaks anything that depends on heapster's new metrics API (i.e. autoscaling)
This change recovers some of the side effects of
https://github.com/kubernetes/kubernetes/pull/26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
Automatic merge from submit-queue
Support for cluster autoscaler in GCE Trusty and GCI images
Fixes: #26346
Ref: #26197
cc: @fgrzadkowski @vulpecula @piosz @jszczepkowski
Automatic merge from submit-queue
Prepull images in e2e
Quick and dirty image puller because the SQ stalled multiple times just *today* on image pull flake (https://github.com/kubernetes/kubernetes/issues/25277).
@kubernetes/sig-node @kubernetes/sig-testing wdyt?
Automatic merge from submit-queue
Make node-instance-group base names unique to prevent collisions
We create multiple IGMs for >1000 Node clusters. When we have a conflict on base name IGMs will fight over ownership of the VM that happen to have the name belonging to multiple IGMs.
This change will increase reliability of starting big clusters.
cc @wojtek-t @alex-mohr @roberthbailey @mikedanese
Automatic merge from submit-queue
Add node problem detector as an addon pod.
```release-note
Introduce a new add-on pod NodeProblemDetector.
NodeProblemDetector is a DaemonSet running on each node, monitoring node health and reporting
node problems as NodeCondition and Event. Currently it already supports kernel log monitoring, and
will support more problem detection in the future. It is enabled by default on gce now.
```
This PR enables NodeProblemDetector as an add-on pod.
/cc @mikedanese @kubernetes/sig-node
[]()
Automatic merge from submit-queue
Configuration for GCP webhook authentication and authorization
This PR adds configuration for GCP webhook authentication and authorization in ContainerVM and GCI. The change of configure-vm.sh and kube-apiserver.manifest is directly copied from @cjcullen's PR #25380 and #25296. The change in GCI script configure-helper.sh includes the support for webhook authentication and authorization, and also some code refactor to improve readability.
@cjcullen @roberthbailey @zmerlynn please review it. The original PRs are P1, please mark this as P1.
cc/ @fabioy @kubernetes/goog-image FYI.
I verified it by running e2e tests on GCI cluster. Without the GCI side change, cluster creation fails as being capture by GKE Jenkins tests. I don't test when the two env GCP_AUTHN_URL and GCP_AUTHZ_URL are set, because they are only set in GKE. After this PR is merged, @cjcullen will test in GKE.
Automatic merge from submit-queue
cluster/gce/coreos: Set service-cluster-ip-range
Broken by #19242
See also #26002
This is necessary to kube-up for me, but depending on how #26002 plays out, this PR might not be necessary. Happy to close this or merge or whatever depending on what's best.
cc @yifan-gu @sjpotter @mikedanese
Automatic merge from submit-queue
GCI: Fix the condition for using the default image
This PR revises the condition for using the default GCI image. The old logic is not convenient for manually run e2e tests in some cases (mainly for GCI team to test custom images). The new logic by this PR is very similar to the logic in using ContainerVM. When setting distro to "gci", if master or node image is unset, we use gci-dev for it. If either is set, we respect it.
@roberthbailey @zmerlynn @dchen1107 please review it, and we should cherry pick it in release-1.2 branch. Thanks!
cc/ @kubernetes/goog-image @adityakali FYI
Automatic merge from submit-queue
GCI/Trusty: Fix an issue in using 'find' commands
This PR makes the logic of 'find' command consistent with the 'cp' command afterwards, i.e., only check one layer of a given dir. Without this fix, we have seen a recent breakage after PR #25309 added the file cluster/addons/fluentd-elasticsearch/es-image/template-k8s-logstash.json. The 'find' command discovers this json file, but the 'cp' command fails.
@roberthbailey @dchen1107 @zmerlynn please review this fix, and mark it as a cherry pick candidate. I already verified this fix can resolve the breakage.
cc/ @wonderfly @fabioy @kubernetes/goog-image FYI
Automatic merge from submit-queue
GCI: Enable the log of upstart jobs
This PR enables the log of upstart jobs in master.yaml and node.yaml. By default, log of upstart jobs are enabled in Trusty and placed in /var/log/upstart, but not enabled in GCI. This change explicitly directs the log to the system logger. For trusty, they are in /var/log/syslog file. In GCI, we can check it using "journalctl". This change will be useful for debugging if cluster initialization fails.
@roberthbailey @maisem @dchen1107 please review it. This will be useful for issues like #23634. We should also cherry pick it in release-1.2
cc/ @fabioy @zmerlynn @wonderfly FYI.
Automatic merge from submit-queue
Salt configuration for the new Cluster Autoscaler for GCE
Adds support for cloud autoscaler from contrib/cloud-autoscaler in kube-up.sh GCE script.
cc: @fgrzadkowski @piosz
Automatic merge from submit-queue
Use --format='value(name)' with gcloud instead of grep/awk/cut
Fixing our fragile parsing of `gcloud` is getting old (#24746, #25159, maybe others?).
Instead, let's just get the proper output out of `gcloud` in the first place.
Automatic merge from submit-queue
Change default clusterCIDRs from /16 to /14 in GCE configs allowing 1000 Node clusters by default.
cc @thockin @roberthbailey @wojtek-t @zmerlynn @davidopp
Automatic merge from submit-queue
GCI: Add two GCI specific metadata pairs
This PR adds two GCI specific metadata pairs when using GCI image.
(1) "gci-update-strategy": by default the GCI in-place updater is enabled. It means that when a new image is released, the instance on the old image will be upgraded to the new image. In this change, we turn it off;
(2) "gci-ensure-gke-docker": GCI is built with two versions of docker. When this metadata is set to "true", the version satisfying kubernetes qualification will be used. Setting this metadata prevents from using incorrect docker version.
Automatic merge from submit-queue
Fix detect-node-names to not error out if there are no nodes
Fixes#21564.
Teardown was not working correctly in rare cases because `detect-node-names` was failing before any of the actual cleanup was run. I'm pretty sure the issue was that there was an instance group, but no instances in the instance group, so we bailed out when we tried to expand the bash array.
This PR adds a guard so we don't bail if the array is empty.
cc @jlowdermilk @spxtr
Automatic merge from submit-queue
Add support for running clusters on GCI
Google Container-VM Image (GCI) is the next revision of Container-VM. See documentation at https://cloud.google.com/compute/docs/containers/vm-image/. This change adds support for starting a Kubernetes cluster using GCI.
With this change, users can start a kubernetes cluster using the latest kubelet and kubectl release binary built in the GCI image by running:
$ KUBE_OS_DISTRIBUTION="gci" cluster/kube-up.sh
Or run a testing cluster on GCI by running:
$ KUBE_OS_DISTRIBUTION="gci" go run hack/e2e.go -v --up
The commands above will choose the latest GCI image by default.
Automatic merge from submit-queue
Switch to ABAC authorization from AllowAll
Switch from AllowAll to ABAC. All existing identities (that are created by deployment scripts) are given full permissions through ABAC. Manually created identities will need policies added to the `policy.jsonl` file on the master.
Automatic merge from submit-queue
don't source the kube-env in addon-manager
This was added in 2feb658ed7 which became unused after #23603 but wasn't removed
Automatic merge from submit-queue
Trusty: Add debug supports for docker and kubelet
This PR adds debug support in two aspects: (1) For a test cluster, docker command will have "--debug" flag. Recently we noticed that this is very helpful in debug e2e test failures; (2) The kubelet command line will be put in /etc/default/kubelet. If a developer wants to test kubelet flags without recreating a cluster, she/he only needs to revise this file and then run "initctl restart kubelet". In addition, this PR fixes a couple of small things like comments and alignment.
Test result:
(1) Manually verified changing /etc/default/kubelet and run "initctl restart kubelet";
(2) Verified docker command line flag "--debug";
(3) e2e on pure trusty cluster and hybrid cluster all passed.
@roberthbailey @dchen1107 @zmerlynn please review it.
cc/ @yujuhong @fabioy @wonderfly FYI.
Automatic merge from submit-queue
Trusty: Add retry in curl commands
This fix is for improving robustness in fetch critical metadata files when the metadata server is temporarily unreachable.
@roberthbailey @zmerlynn @dchen1107 please review it.
cc/ @fabioy @wonderfly FYI.
Automatic merge from submit-queue
jenkins: Allow configuration of release bucket
This allows others to leverage the existing E2E code to test some
patched kube binary by simply overriding the bucket and reusing many of
the existing scripts
Automatic merge from submit-queue
Trusty: Handle the new var in kube-proxy manifest
This is to capture the kube-proxy manifest change in PR #24429.
@roberthbailey @fabioy @zmerlynn please review this change and mark it as cherry pick candidate. We need to catch up 1.2.3 release.
cc/ @dchen1107 @wonderfly @cjcullen FYI.
I have verified this fix. Without this fix, kube-proxy pod in Trusty nodes cannot be started correctly, i.e., the command line has an unhadled variable. And some other kube-system pods do not work correctly as kube-proxy is not working well. After applying this fix, kube-proxy can be started correctly, and all kube-system pods run successfully.
Automatic merge from submit-queue
Strip comments from configure-vm.sh for gce
We are getting very close to the 32KiB limit on GCE metadata entry length. We used to strip comments before putting the value in metadata, but I think we removed it in a refactor because it wasn't absolutely necessary, and leaving it out made the scripts slightly cleaner. It's close to being necessary again.
Removing comments reduces the size from 31,609B to 27,221B: https://www.diffchecker.com/0xmmecvw.
This allows others to leverage the existing E2E code to test some
patched kube binary by simply overriding the bucket and reusing many of
the existing scripts
Automatic merge from submit-queue
Trusty: Fixes for running GKE master
This PR includes two fixes for running GKE master on our image:
(1) The kubelet command line assembly had a missing part for cbr0. We did not catch it because the code path is not covered by OSS k8s tests;
(2) Remove the "" from the variables in the cert files. It causes a parsing issue in GKE. Again, this code path is not covered by k8s tests.
This PR also refactors the code for assembling kubelet flag. I move all logic into a single function assemble_kubelet_flags in configure-helper.sh for better readability and also simplify node.yaml and master.yaml.
@roberthbailey @dchen1107 please review it, and mark it as cherrypick-candidate. This PR is verified by @maisem. Together with his CL for GKE, we can run GKE cluster with master on our image and nodes on ContainerVM.
cc/ @maisem @fabioy @wonderfly FYI
This only applies to gce kube-up. 60 seconds of open connection should
be sufficient for anything that we should be downloading. The release
tar is currently 255M.
Automatic merge from submit-queue
Trusty: Regional release .tar.gz support
@zmerlynn and @roberthbailey please review it. This change is to support the feature added in PR #22234. The entire logic is pretty much the same as in #22234, with only few minor changes in implementation.
I had manually run e2e tests with "export RELEASE_REGION_FALLBACK=true" on two clusters: (1) Trusty on master nodes on ContainerVM; (2) Master and nodes all on trusty. All tests are green. I don't figure out a way to simulate regional fallback. But I did test the function download_or_bust() out-of-box.
cc/ @wonderfly @dchen1107 @fabioy FYI.
This should allow allow the non_masquerade_cidr option to get configured
in /etc/salt/minion.d/grains.conf, allowing the flag to used by kubelet
in /etc/sysconfig/kubelet. Default configuration is set in pillar
Allow the gcr.io/google_containers registry to be overridden
regionally by just blasting a new KUBE_ADDON_REGISTRY out. Instead of
adding every addon to Salt and asking all of the other consumers
(Trusty, Juju, Mesos, etc) to change, just script the sed ourselves.
This is probably the 9th grossest thing I've ever done, but it works
well, and it works quickly. I kind of wish it didn't.
This change revises the way to provide kube-system manifests for clusters on Trusty. Originally, we maintained copies of some manifests under cluster/gce/trusty/kube-manifests, which is not scalable and hard to maintain. With this change, clusters on Trusty will use the same source of manifests as ContainerVM. This change also fixes some minor problems such as shell variables and comments to meet the style guidance better.
The kubelet flag "nosystem" was removed recently, which breaks kubelet in Trusty. This changes remove the flag usage accordingly. It also revises several aspects of Trusty support to make it in the same page as running on ContainerVM, such as new flags in kubelet and new logic in api-server and etcd pods.
PR #22022 added a new variable "cpurequest" in kube-proxy.manifest. This makes kubelet in Trusty fail to start the kube-proxy pod as this variable value is not set.
* In kube-up.sh, create a staging bucket with a location nearest the
zone being created. If new variable RELEASE_REGION_FALLBACK is set
(default false), create multiple buckets and stage to fallback
URLs. (In open source, this path is primarily for testing.)
* In configure-vm.sh, split the URL env variables by comma (if any
extra are present) and retry on the fallback URLs. Also factor the
hash checking into this path rather than outside, since a corrupt
release in a particular geo can be retried in a different geo.
* Remove the local already-staged .tar.gz checks. They've caused
several issues along the way, and with this code path become virtually
unmaintainable. (I could add a sentinel for each bucket it's possibly
staged to, but ew.)
We run unattened-upgrades manually, and then reboot automatically if we
find /var/run/reboot-required; then we check if any services need
restarting and restart them automatically using the needrestart tool.
This should mean we don't _have_ to build new images on every security
update, though we can do so to avoid a reboot.
Issue #21382
This change corrects how we determine the log level. Moreover, it explicitly redirects kubelet log to /var/log/kubelet.log, as we noticed it may miss sometimes.
This change moves the code of running and monitoring addon pods in a daemon type upstart job, so that addon manifest monitoring can be restarted automatically upon failure. Second, it updates the usage of "kube-ui" to "dashboard" to match the change in PR #20330.
This change support running kubernetes master on Ubuntu Trusty.
It uses pure cloud-config and shell scripts, and completely gets
rid of saltstack or the release salt tarball.
We adapt the existing code to work across all zones in a region.
We require a feature-flag to enable Ubernetes-Lite
Reasons:
* There are some behavioural changes if users create volumes with
the same name in two zones.
* We don't want to make one API call per zone if we're not running
Ubernetes-Lite.
* Ubernetes-Lite is still experimental.
There isn't a parallel flag implemented for AWS, because at the moment
there would be no behaviour changes from this.
This is for internal use at the moment, for testing Ubernetes Lite, but
arguably makes the code a little cleaner.
Also rename KUBE_SHARE_MASTER -> KUBE_USE_EXISTING_MASTER
Some functionality in hack/lib is currently depended on by
cluster/common.sh so kube-up from the full release tar (which
does not include hack/) is currently broken. With this PR we
create cluster/lib/ and move the necessary bits from hack/
over to get kube-up working again.
Fixes: 96d1b8d1b2
Signed-off-by: Mike Danese <mikedanese@google.com>
Implement a flag that defines the frequency at which a node's out of
disk condition can change its status. Use this flag to suspend out of
disk status changes in the time period specified by the flag, after
the status is changed once.
Set the flag to 0 in e2e tests so that we can predictably test out of
disk node condition.
Also, use util.Clock interface for all time related functionality in
the kubelet. Calling time functions in unversioned package or time
package such as unversioned.Now() or time.Now() makes it really hard
to test such code. It also makes the tests flaky and sometimes
unnecessarily slow due to time.Sleep() calls used to simulate the
time elapsed. So use util.Clock interface instead which can be faked
in the tests.
This change is to pick up the fix in PR #18178. It avoids confusing
cadvisor when systemd is present in an instance but does not act
as the init system.
This allows resource usage monitoring test to launch 100 test pods per node, in
addition to the add-on pods.
Also reduce the test time length since the results over the shorter period are
representative enough.
This change refactors the code of preparing kube-system manifests
for trusty based cluster. The manifests used by nodes do not contain
salt configuration, so we can simply copy them from the directory
cluster/saltbase/salt, make a tarball, and upload to Google Storage.
The node.yaml has some logic that will be also used by the kubernetes
master on trusty work (issue #16702). This change moves the code
shared by the master and node configuration to a separate script, and
the master and node configuration can source it to use the code.
Moreover, this change stages the script for GKE use.
Addresses #15968
This patch removes KUBE_ENABLE_EXPERIMENTAL_API and similar calls in
favor of specifying desired features in KUBE_RUNTIME_CONFIG. Changes
have also been made to e2e scripts to re-enable using
KUBE_RUNTIME_CONFIG rather than EXPERIMENTAL_API env vars.
This also introduces KUBE_ENABLE_DAEMONSETS and KUBE_ENABLE_DEPLOYMENTS.
Signed-off-by: Christian Stewart <christian@paral.in>
If the `hostname` commands used in the polling loop fail, their stdout
is going to be empty and so `getent hosts` command will actually
succeed. For the loop to work as expected, make sure the subcommands
return a string which is an invalid host name.
Allows loading existing auth from kubeconfig on kube-up if a
valid KUBE_CONTEXT is specified, instead of always force
regenerating auth (basic or token) when creating a new cluster.
When KUBE_E2E_STORAGE_TEST_ENVIRONMENT is set to 'true', kube-up.sh script
will:
- Install the right packages for all storage volumes.
- Use devicemapper as docker storage backend. 'aufs', the default one on
Debian, does not support extended attibutes required by Ceph RBD and Gluster
server containers.
Tested on GCE and Vagrant, e2e tests for storage volumes passes without any
additional configuration.