Automatic merge from submit-queue
Update GCI_VERSION to gci-dev-55-8866-0-0
Update GCI base image:
Change log:
* Built-in kubernetes updated to v1.4.0
* Enabled VXLAN and IP_SET config options in kernel to support some networking tools
* OpenSSL CVE fixes
```release-note
Update GCI base image:
* Enabled VXLAN and IP_SET config options in kernel to support some networking tools (ebtools)
* OpenSSL CVE fixes
```
cc/ @kubernetes/goog-image cc/ @dchen1107
Automatic merge from submit-queue
Nodefs becomes imagefs on GCI
Kubelet cannot identify rootfs correctly
For #33444
```release-note
Enforce Disk based pod eviction with GCI base image in Kubelet
```
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Changelog:
* Built-in kubernetes updated to v1.4.0
* Enabled VXLAN and IP_SET config options in kernel to support some networking tools
* OpenSSL CVE fixes
Automatic merge from submit-queue
Speed up dockerized builds
This PR speeds up dockerized builds. First, we make sure that we are as incremental as possible. The bigger change is that now we use rsync to move sources into the container and get data back out.
To do yet:
* [x] Add a random password to rsync. This is 128bit MD4, but it is better than nothing.
* [x] Lock down rsync to only come from the host.
* [x] Deal with remote docker engines -- this should be necessary for docker-machine on the mac.
* [x] Allow users to specify the port for the rsync daemon. Perhaps randomize this or let docker pick an ephemeral port and detect the port?
* [x] Copy back generated files so that users can check them in. This is done for `zz_generated.*` files generated by `make generated_files`
* [x] This should include generated proto files so that we can remove the hack-o-rama that is `hack/hack/update-*-dockerized.sh`
* [x] Start "versioning" the build container and the data container so that the CI system doesn't have to be manually kicked.
* [x] Get some benchmarks to qualify how much faster.
This replaces #28518 and is related to #30600.
cc @thockin @spxtr @david-mcmahon @MHBauer
Benchmarks by running `make clean ; sync ; time bash -xc 'time build/make-build-image.sh ; time sync ; time build/run.sh make ; time sync; time build/run.sh make'` on a GCE n1-standard-8 with PD-SSD.
| setup | build image | sync | first build | sync | second build | total |
|-------|-------------|----- |----------|------|--------------|------|
| baseline | 0m11.420s | 0m0.812s | 7m2.353s | 0m42.380s | 7m8.381s | 15m5.348s |
| this pr | 0m10.977s | 0m15.168s | 7m31.096s | 1m55.692s | 0m16.514s | 10m9.449s |
Automatic merge from submit-queue
Add support for vpshere cloud provider in kubeup
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
vSphere cloud provider added in 1.3 was not configured when deploying via kubeup
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Add support for vSphere Cloud Provider when deploying via kubeup on vSphere.
```
When deploying on vSphere using kube up add configuration
for vSphere cloud provider.
Automatic merge from submit-queue
Bump glbc version to 0.8.0
Picks up k8s.io godeps for v1.4 thereby fixing an int overflow bug in the upstream delayed-workqueue pkg. Without this the controller spams logs with retries in the "soft error" case, which is easy to come by when users eg: create ingresses that point to non-exist services.
Should go into 1.4.1, because 1.4.0 is pretty much out at this point.
https://github.com/kubernetes/kubernetes/issues/33279
Automatic merge from submit-queue
Bump up addon kube-dns to v20 for graceful termination
Below images are built and pushed:
- gcr.io/google_containers/kubedns-amd64:1.8
- gcr.io/google_containers/kubedns-arm:1.8
- gcr.io/google_containers/kubedns-arm64:1.8
- gcr.io/google_containers/kubedns-ppc64le:1.8
Both kubedns and dnsmasq are bumped up in the manifest files.
@thockin @bprashanth
Automatic merge from submit-queue
cluster/gci: Minor spacing tweak
Two shall be the number thou shalt indent, and the level of the indent
shall be two. Three shalt thou not indent, neither indent thou once,
excepting that thou then proceed to two. Five is right out.
/cc @andyzheng0831 @jlowdermilk
Two shall be the number thou shalt indent, and the level of the indent
shall be two. Three shalt thou not indent, neither indent thou once,
excepting that thou then proceed to two. Five is right out.
This bug was inadvertently introduced in #32406.
The longer term plan (shouldn't be too much longer) is to remove this
file entirely and rely on the `gci-trusty` version of it, but to stop
some bleeding and allow our jenkins using kube-up + coreos to work, we
should merge this fix until we have the more complete solution.
Automatic merge from submit-queue
Allow building experimenta etcd images
Ref #20504
Once this PR is in, I would like to build and push: "etcd:3.0.10-experimental" image to:
- start testing it
- to make it possible to build a different "3.0.10" image in the future (we will most probably built in some loging into it.
@lavalamp - FYI
Automatic merge from submit-queue
Tune down initialDelaySeconds for readinessProbe.
Fixed#33053.
Tuned down the `initialDelaySeconds`(original 30s) for readiness probe to 3 seconds and `periodSeconds`(default 10s) to 5 seconds to shorten the initial time before a dns server pod being exposed. This configuration passed DNS e2e tests and did not even hit any readiness failure(for kube-dns) with a GCE cluster with 4 nodes during the experiments.
For scaling out kube-dns servers, it took less than 10s for servers being exposed after they appeared as running, which is much faster than 30+s(the original cost).
`failureThreshold` is left as default(3) and it would not lead to restart because the status of readiness probe would only affect whether endpoints being exposed in service or not(in the dns service point of view). According to the implementation of [prober](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/prober/worker.go), the number of retries for readiness probe is unbounded. Hence there is no obvious effect if the readiness probe fail several times in the beginning.
The state machine of prober could be illustrated with below figure:

I want to see the e2e result of this PR for further evaluation.
@thockin @bprashanth
Automatic merge from submit-queue
Print a more helpful error message when failing to start rolling-updates
Hopefully this will help us track down where the 1.3 -> 1.4 upgrades are breaking down. We'll need to cherry-pick this into release-1.4 to have any effect, though.
Automatic merge from submit-queue
Split dns healthcheck into two different urls
Attempt to fix#30633.
<s>This new kube-dns pod template creates two exechealthz processes listen on two different ports for kubedns and dnsmasq correspondingly.
@thockin @girishkalele
Automatic merge from submit-queue
Alpha JWS Discovery API for locating an apiserver securely
This PR contains an early alpha prototype of the JWS discovery API outlined in proposal #30707.
CA certificate, API endpoints, and the token to be used to authenticate to this discovery API are currently passed in as secrets. If the caller provides a valid token ID, a JWS signed blob of ClusterInfo containing the API endpoints and the CA cert to use will be returned to the caller. This is used by the alpha kubeadm to allow seamless, very quick cluster setup with simple commands well suited for copy paste.
Current TODO list:
- [x] Allow the use of arbitrary strings as token ID/token, we're currently treating them as raw keys.
- [x] Integrate the building of the pod container, move to cluster/images/kube-discovery.
- [x] Build for: amd64, arm, arm64 and ppc64le. (just replace GOARCH=)
- [x] Rename to gcr.io/google_containers/kube-discovery-ARCH:1.0
- [x] Cleanup rogue files in discovery sub-dir.
- [x] Move pkg/discovery/ to cmd/discovery/app.
There is additional pending work to return a kubeconfig rather than ClusterInfo, however I believe this is slated for post-alpha.
Automatic merge from submit-queue
Reset core_patern on GCI
The default core_pattern pipes the core dumps to /sbin/crash_reporter
which is more restrictive in saving crash dumps. So for
now, set a generic core_pattern that users can work with.
@dchen1107 @aulanov can you please review?
cc/ @kubernetes/goog-image
Automatic merge from submit-queue
Update the containervm image to the latest one (container-v1-3-v20160…
Node e2e is running with old containervm image which only has docker 1.9.1. This pr fixed such issue.
Automatic merge from submit-queue
(GCI) Configure logrotate to rotate all .log files in /var/log.
Fixes logrotate configuration in GCI to rotate all "*.log" files in /var/log.
Fixes issue #33223.
Automatic merge from submit-queue
Setting the default image for GKE tests to Container_VM.
@vishh @spxtr @pwittrock
The purpose is to keep the current state of tests as is even if GKE changes the base image.
The default core_pattern pipes the core dumps to /sbin/crash_reporter
which is more restrictive in saving crash dumps. So for
now, set a generic core_pattern that users can work with.
Automatic merge from submit-queue
Bump up GCI version.
```release-note
Upgrading Container-VM base image for k8s on GCE. Brief changelog as follows:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
```
Fixes#32596
This needs a cherrypick into v1.4 release branch because it is fixing v1.4 release blocking issues. This patch is easy and safe to rollback in case of emergencies.
@vishh can you please review?
Fixes#32596 and many other issues.
cc/ @kubernetes/goog-image FYI
Brief changelog compared to gci-dev-54-8743-3-0:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
- Updated built-in kubelet version to 1.3.7
- add ethtool and ebtables binaries expected by kubelet
Fixes#32596
Automatic merge from submit-queue
Enable hostpath provisioner for vagrant environment
This flag is required to run e2e tests for certain features (petset), and for manual tests and debugging.
related: https://github.com/kubernetes/kubernetes/issues/32119
Automatic merge from submit-queue
Implemented KUBE_DELETE_NODES flag in kube-down.
Implemented KUBE_DELETE_NODES flag in kube-down script.
It prevents removal of nodes when shutting down a HA master replica.
Automatic merge from submit-queue
Use a patched golang version for building linux/arm
Fixes: #29904
Right now, linux/arm is broken because of an internal limitation in Go.
I've filed an issue for it here: https://github.com/golang/go/issues/17028
The affected binaries of this limitation are hyperkube and kube-apiserver, which are the largest binaries.
And when we now have a patched go 1.7.1 version for building "unsupported" but important architectures (ref: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/multi-platform.md), we should also include the patch for ppc64le and start building ppc64le again.
As soon as @laboger has the patch I need up on Github, I'll include ppc64le to this PR and we'll merge it
TODO:
- [ ] ~~Update the PR with patches for ppc64le at the same time @luxas~~
- [x] Push the new kube-cross image @ixdy
- [x] Run a full `make release` before to verify nothing breaks @luxas + @ixdy
- [ ] Cherrypick into the 1.4 branch @luxas + (who?)
@lavalamp @smarterclayton @ixdy @rsc @davecheney @wojtek-t @jfrazelle @bradfitz @david-mcmahon @pwittrock
Tell systemd to keep trying to restart kubelet without limit. Without
this change at some stage systemd will stop trying to restart kubelet
and mark it failed.
These are the settings we're using elsewhere (e.g. Docker)
Automatic merge from submit-queue
Added --log-facility flag to enhance dnsmasq logging
Fix#31010.
Dnsmasq in kube-dns pod is logging in default setting, which is somehow hard to locate. Add --log-facility=- flag to redirect logs to std.
@girishkalele
Automatic merge from submit-queue
Add glusterfs-client in hyperkube image.
When we run kubernete in a docker container, the glusterfs volume doesn't work.
This PR add glusterfs-client package in hyperkube image to fix the bug.
It is required to run automated tests for certain features (petset),
and for manual tests and debugging.
Change-Id: I9203aab6d67c8ff0cc4574473e8d0af888fe1804
Automatic merge from submit-queue
etcd: data rollback tool of v3 -> v2
ref: https://github.com/kubernetes/features/issues/44
ref #20504
What?
This provides a rollback tool for some users to rollback etcd data from v3 to v2.
Automatic merge from submit-queue
Add flag to set CNI bin dir, and use it on gci nodes
**What this PR does / why we need it**:
When using `kube-up` on GCE, following #31023 which moved the workers from debian to gci, CNI just isn't working. The root cause is basically as discussed in #28563: one flag (`--network-plugin-dir`) means two different things, and the `configure-helper` script uses it for the wrong purpose.
This PR adds a new flag `--cni-bin-dir`, then uses it to configure CNI as desired.
As discussed at #28563, I have also added a flag `--cni-conf-dir` so users can be explicit
**Which issue this PR fixes** : fixes#28563
**Special notes for your reviewer**:
I left the old flag largely alone for backwards-compatibility, with the exception that I stop setting the default when CNI is in use. The value of `"/usr/libexec/kubernetes/kubelet-plugins/net/exec/"` is unlikely to be what is wanted there.
**Release note**:
```release-note
Added new kubelet flags `--cni-bin-dir` and `--cni-conf-dir` to specify where CNI files are located.
Fixed CNI configuration on GCI platform when using CNI.
```
Automatic merge from submit-queue
e2e/log-dump: Collect kernel log with journald
Related to #31928
The kern.log file does not exist on journald distros typically.
cc @vishh @Random-Liu
Automatic merge from submit-queue
Update container image version for downward api volume tests
Some tests were using 0.7, and some were using 0.6, so updating all to 0.7.
@kubernetes/rh-cluster-infra
Automatic merge from submit-queue
cluster/gce: Update master root disk size
As part of #29213, the hyperkube image will be deployed alongside
existing dependencies.
This ends up just running over the root disk size of 10 during
extraction.
cc @yifan-gu @aaronlevy
Automatic merge from submit-queue
Add detect-master to local provider to get e2e working
Make it possible to run some e2e tests using the local provider (./hack/local-up-cluster.sh)
This will now work for tests that don't need more than one node:
export KUBERNETES_PROVIDER=local
go run hack/e2e.go -v -test --check_node_count=false --check_version_skew=false --test_args="--ginkgo.focus=Cadvisor"
Note: without this commit, the port and ip address are wrong and require the --host option (which is inconsistent with the other providers).
Automatic merge from submit-queue
Teach create-kubeconfig() to deal with multi path KUBECONFIG
When KUBECONFIG is in the form "A:B:C" make sure each file is
created.
fixes#17778
Automatic merge from submit-queue
Use a Deployment for kube-dns
Attempt to fix#31554
Switching kube-dns from using Replication Controller to Deployment.
The outdated kube-dns YAML file in coreos and juju dir is also updated. Most of the specific memory limit in the files remain unchanged because it seems like people were modifying it explicitly(c8d82fc2a9). Only the memory limit for healthz is increased due to this pending investigation(#29688).
YAML files stay in *-rc.yaml format considering there are a lots of scripts in cluster and hack dirs are using this format. But it may be fine to changed them all.
@bprashanth @girishkalele
Automatic merge from submit-queue
Fix/centos docker download
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**: The CentOS cluster provider attempts to download docker from a location that 404's.
**Which issue this PR fixes**: addresses https://github.com/kubernetes/kubernetes/issues/27572#issuecomment-226690177
**Special notes for your reviewer**: I don't know how Kubernetes decides docker compatibility, but it was previously pulling `latest` so I chose the most recent release. Is there any mechanism for keeping things like this up to date?
What is the status of kubernetes rpm's? As far as I could tell there aren't any 1.3 rpm's published. Are those officially supported or a community project?
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
CentOS Cluster Provider: fix docker download location & use docker 1.12.0
```
Automatic merge from submit-queue
Fix etcd2 cross-build in the Makefile
fixes https://github.com/kubernetes/kubernetes/issues/32328
Make it possible to compile both etcd2 and etcd3 in the Makefile and compile attachlease for multiple arches as well.
@lavalamp The etcd build-from-source semantics changed between etcd2 and etcd3.
I updated it to etcd3 in my last PR, and didn't think we were gonna build etcd2 more.
However, I've now fixed it to build for both versions.
Thanks!
Automatic merge from submit-queue
Fix glbc name to match image version
Risk is low, we should get it into 1.4 to avoid confusion. Image is 0.7.1 (bumped in 1.3.6) so name and label should match.
Automatic merge from submit-queue
AWS: Change default networking for kube-up to kubenet
**What this PR does / why we need it**: Fixes AWS bring-up. Again.
There's a kubelet bug that prevents NETWORK_PROVIDER=none from working right now, and we should migrate AWS to `kubenet` anyways.
Working on reproing the `none` issue on GCE, then I'll file a bug on the main issue. But this fixes AWS, so quick tactical fix.
Automatic merge from submit-queue
Use etcd 2.3.7
This will switch to etcd 2.3.7 for release 1.4, to resolve issues rolling back from 1.4 to 1.3 (while preventing those same issues rolling back to 1.4.0 from a release including etcd 3.0.x).
Fixes#32253.
See #32253 (comment) for etcd roadmap.
Automatic merge from submit-queue
Fix 127.0.01 typo
**What this PR does / why we need it**:
Fixes a small typo, though typo seems inconsequential
**Release note**:
none
Automatic merge from submit-queue
Enable kubelet eviction whenever inodes free is < 5% on GCE
This is a pre-req for enabling inodes based evictions in GKE.
Automatic merge from submit-queue
Make image-puller work on GCI nodes.
Currently image-puller works only on debian nodes. This will make our test more flaky after we switch to the GCI by default. This PR ports the image-puller to the GCI-based Nodes.
cc @vishh @wonderfly @dchen1107
Automatic merge from submit-queue
rkt: Update kube-up rkt version to v1.14.0
cc @kubernetes/sig-rktnetes
This should have been included in #31286 (whoops).
This is a bugfix that I propose for v1.4 inclusion.
Automatic merge from submit-queue
move '(master)' to end of message for uniformity
**What this PR does / why we need it**: This is a small polish operation on the kubernetes charm wrt juju status output.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
This changes the status output from:
```
kubernetes/0 active idle 3 172.27.24.54 8088/tcp
Kubernetes running.
kubernetes/1 active idle 4 172.27.24.55 6443/tcp
(master) Kubernetes services started
```
to this:
```
kubernetes/0 active idle 3 172.27.24.54 8088/tcp
Kubernetes running.
kubernetes/1 active idle 4 172.27.24.55 6443/tcp
Kubernetes services started (master)
```
As part of #29213, the hyperkube image will be deployed alongside
existing dependencies.
This ends up just running over the root disk size of 10 during
extraction.
Automatic merge from submit-queue
Set eviction-hard for vagrant cluster
In order to test eviction related functionality it will be convenient to have reasonable eviction defaults.
At this moment exactly same flags are used by GCE environment
kubelet will have a following flag:
--eviction-hard=memory.available<100Mi,nodefs.available<10%
In order to test eviction related functionality it will be convenient
to have reasonable eviction defaults.
At this moment exactly same flags are used by GCE environment
kubelet will have a following flag:
--eviction-hard=memory.available<100Mi,nodefs.available<10%
Change-Id: I56ca03bc3c5467c8450150e292f7a346fa7772a9
Automatic merge from submit-queue
Fix Bash script
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**: `cluster/mesos/docker/socat/build.sh` had two lines mixed together.
Old command output:
```
$ ./cluster/mesos/docker/socat/build.sh
./cluster/mesos/docker/socat/build.sh: line 21: set: pipefailscript_dir=/home/rodolfo/src/k8s.io/kubernetes/cluster/mesos/docker/socat: invalid option name
```
**Special notes for your reviewer**: probably nobody is using that script? @sttts PTAL.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
Automatic merge from submit-queue
Enable Rescheduler by default
Rescheduler is stable - e2e test is passing constantly for >1week.
ref #29023
```release-note
Rescheduler which ensures that critical pods are always scheduled enabled by default in GCE.
```
Automatic merge from submit-queue
Pick a specific GCI version by default on GCE.
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone. It would pick up the latest GCI release on
that milestone at the time of cluster creation. The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI. However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.
With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.
We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases. We can also automate the process to
automatically bump the hard-coded GCI version in future.
@vishh @adityakali can you please review?
cc @kubernetes/goog-image FYI
Automatic merge from submit-queue
Store startupscript from GKE clusters too
Ref https://github.com/kubernetes/kubernetes/issues/31215
@kubernetes/goog-gke Is there any reason why we don't want to do it?
@kubernetes/test-infra-maintainers
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone. It would pick up the latest GCI release on
that milestone at the time of cluster creation. The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI. However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.
With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.
We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases. We can also automate the process to
automatically bump the hard-coded GCI version in future.
This changes the status output from:
```
kubernetes/0 active idle 3 172.27.24.54 8088/tcp
Kubernetes running.
kubernetes/1 active idle 4 172.27.24.55 6443/tcp
(master) Kubernetes services started
```
to this:
```
kubernetes/0 active idle 3 172.27.24.54 8088/tcp
Kubernetes running.
kubernetes/1 active idle 4 172.27.24.55 6443/tcp
Kubernetes services started (master)
```
Juju bootstrapping is an act of cost. This should be an explicit action
by the tooling surrounding bundle-tester when testing a charm. Setting
bootstrap:false will allow us to get faster feedback at lower cost when
running the kubernetes charm under ci. Additionally doesn't reset so
no communication attempt is made to the controller
Additionally add tox to test dependency list
Automatic merge from submit-queue
keep docker0 with private cidr range
fixes: #31465
Keep docker0 when using kubenet on GCI. Assign 169.254.123.1/24 to docker0 to avoid cidr conflict.
Automatic merge from submit-queue
AWS: Hopefully fix e2e?
**What this PR does / why we need it**: Fix AWS e2e
**Which issue this PR fixes**: fixes build broken by #28499
**Special notes for your reviewer**: This is a pump & dump, I probably won't be around to respond to comments after this. If it needs a cherry-pick or anything, please check?
Automatic merge from submit-queue
Remove deprecated Namespace admission plug-ins
```release-note
The NamespaceExists and NamespaceAutoProvision admission controllers have been removed.
All cluster operators should use NamespaceLifecycle.
```
Fixes https://github.com/kubernetes/kubernetes/issues/31195
Automatic merge from submit-queue
fix feature_gates salt plumbing
Fix salt plumbing for `--feature-gate` from `FEATURE_GATES kube env.
Was generating grains.conf and kube-env for master only. Verified it works now for gci and debian master/nodes.
cc @thockin @timstclair
Automatic merge from submit-queue
Build and push kube-dns for 1.4 release.
Fix#31355.
Following docker images had been uploaded:
gcr.io/google_containers/kubedns-amd64:1.7
gcr.io/google_containers/kubedns-arm:1.7
gcr.io/google_containers/kubedns-arm64:1.7
Build for ppc64le is disabled by default, and it failed to be built using:
`KUBE_BUILD_PPC64LE=y make release`
I'm still working on making the ppc64le build. Updates will be added following this thread.
@girishkalele @thockin
Automatic merge from submit-queue
Add ExternalName kube-dns e2e test
ExternalName allows kubedns to return CNAME records for external
services. No proxying is involved.
Built on top of and includes #30599
See original issue at
https://github.com/kubernetes/kubernetes/issues/13748
Feature tracking at
https://github.com/kubernetes/features/issues/33
The e2e test is at least as comprehensive as the one for headless services (namely, only to some degree)
```release-note
Add ExternalName services as CNAME references to external ones
```
Automatic merge from submit-queue
gci: decouple from the built-in kubelet version
Prior to this change, configure.sh would:
(1) compare versions of built-in kubelet and downloaded kubelet, and
(2) bind-mount downloaded kubelet at /usr/bin/kubelet in case of
version mismatch
With this change, configure.sh:
(1) compares the two versions only on test clusters, and
(2) uses the actual file paths to start kubelet w/o any bind-mounting
To allow (2), this change also provides its own version of kubelet
systemd service file.
Effectively with this change we will always use the downloaded kubelet
binary along with its own systemd service file on non-test clusters. The
main advantage is this change does not rely on the kubelet being built in to
the OS image.
@dchen1107 @wonderfly can you please review
cc/ @kubernetes/goog-image FYI
Prior to this change, configure.sh would:
(1) compare versions of built-in kubelet and downloaded kubelet, and
(2) bind-mount downloaded kubelet at /usr/bin/kubelet in case of
version mismatch
With this change, configure.sh:
(1) compares the two versions only on test clusters, and
(2) uses the actual file paths to start kubelet w/o any bind-mounting
To allow (2), this change also provides its own version of kubelet
systemd service file.
Effectively with this change we will always use the downloaded kubelet
binary along with its own systemd service file on non-test clusters. The
main advantage is this change does not rely on the kubelet being built in to
the OS image.
Automatic merge from submit-queue
Add validation for KUBE_USER
Malformed KUBE_USER causes error in cluster setup.
cc/ @kubernetes/goog-image
@Q-Lee @Amey-D Can you please review?
Automatic merge from submit-queue
Add admission controller for default storage class.
The admission controller adds a default class to PVCs that do not require any
specific class. This way, users (=PVC authors) do not need to care about
storage classes, administrator can configure a default one and all these PVCs
that do not care about class will get the default one.
The marker of default class is annotation "volume.beta.kubernetes.io/storage-class", which must be set to "true" to work. All other values (or missing annotation) makes the class non-default.
Based on @thockin's code, added tests and made it not to reject a PVC when no class is marked as default.
.
@kubernetes/sig-storage
Automatic merge from submit-queue
Reduce size of images fluentd-gcp and fluentd-elasticsearch
replaces #26652
```
aledbf/fluentd-elasticsearch 1.19 769ece5c8ba8 About an hour ago 269.9 MB
gcr.io/google_containers/fluentd-elasticsearch 1.18 0a8cbfbea7f7 5 weeks ago 530.3 MB
aledbf/fluentd-gcp 1.22 ef979b82a767 About an hour ago 307.9 MB
gcr.io/google_containers/fluentd-gcp 1.21 0ef09b1bcfd7 2 weeks ago 498.5 MB
```
closes#29782
Automatic merge from submit-queue
Configure webhook
**What this PR does / why we need it**: this configures the image policy webhook + admission controller for gce/gci.
addresses: #22888
**Release note**:
```Configure image verification admission controller and webhook on gce.
```
Automatic merge from submit-queue
Support for creation/removal of master replicas.
HA master: initial support for creation/removal of masters replicas by
kube-up/kube-down scripts for GCE on gci (other distributions, including debian, are not supported yet).
Automatic merge from submit-queue
fix path handling in hack/lib/init.sh
Jenkinsfile pipeline jobs get cloned into "\<project\> (\<branch\>)". As a result, I can't use certain things in `hack/lib/init.sh`.
This is a small fix for that problem.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
Automatic merge from submit-queue
Enable the garbage collector by default
Turning GC on by default.
Memory usage of GC is back to normal after #30943. The CPU usage is a little higher than the cap in scalability test (1.11 core vs. 1 core). This PR adjusted the default GC worker to 20 to see if that helps CPU usage.
@kubernetes/sig-api-machinery @wojtek-t @lavalamp
Automatic merge from submit-queue
Bump heapster version
Bump heapster version to v1.2.0-beta.1.
Migrate metrics tests and HPA to use List objects introduced in the new version.
The admission controller adds a default class to PVCs that do not require any
specific class. This way, users (=PVC authors) do not need to care about
storage classes, administrator can configure a default one and all these PVCs
that do not care about class will get the default one.
Automatic merge from submit-queue
Use --regions instead of --region for gcloud list [resource]
gcloud has started complaining:
```
WARNING: Abbreviated flag [--region] will be disabled in release 132.0.0, use the full name [--regions].
WARNING: Abbreviated flag [--region] will be disabled in release 132.0.0, use the full name [--regions].
WARNING: Abbreviated flag [--region] will be disabled in release 132.0.0, use the full name [--regions].
```
We'll probably need to cherry-pick this, as otherwise the list-resources script will start failing at some point in the future.
Automatic merge from submit-queue
Let load and density e2e tests use GC if it's on
I've run the 100 and 500 nodes tests and they both pass.
The test-infra half of the PR is https://github.com/kubernetes/test-infra/pull/369
cc @lavalamp
Automatic merge from submit-queue
Update core etcd references to use 3.0.4
This updates the core references to use 3.0.4.
There are still legacy references in the code base that should be cleaned, or just removed but I'm reluctant to purge.
/cc @kubernetes/sig-scalability
Automatic merge from submit-queue
Add user-specified kubectl arguments to addons start script
This is a simple way, using the same environment variable paradigm used throughout these scripts, to let a user specify kubectl arguments to the addons script.
fixes#30371
Automatic merge from submit-queue
Avoid unnecessary copies on GCI initialization.
The issue I faced was that when starting a cluster I was getting:
```
Aug 12 11:12:46 e2e-test-wojtekt-master configure.sh[1079]: cp: error writing '/home/kubernetes/kubernetes-src.tar.gz': No space left on device
```
This PR reduces amount of space that is needed on startup, as well as this speeds up starting cluster.
@lavalamp @dchen1107
Automatic merge from submit-queue
Add support for kube-up.sh to deploy Calico network policy to GCI masters
Also remove requirement for calicoctl from Debian / salt installed nodes and clean it up a little by deploying calico-node with a manifest rather than calicoctl. This also makes it more reliable by retrying properly.
How to use:
```
make quick-release
NETWORK_POLICY_PROVIDER=calico cluster/kube-up.sh
```
One place where I was uncertain:
- CPU allocations (on the master particularly, where there's very little spare capacity). I took some from etcd, but if there's a better way to decide this, I'm happy to change it.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29037)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Fix error reporting during vagrant provisioning
<!--
Checklist for submitting a Pull Request
Please remove this comment block before submitting.
1. Please read our [contributor guidelines](https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md).
2. See our [developer guide](https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md).
3. If you want this PR to automatically close an issue when it is merged,
add `fixes #<issue number>` or `fixes #<issue number>, fixes #<issue number>`
to close multiple issues (see: https://github.com/blog/1506-closing-issues-via-pull-requests).
4. Follow the instructions for [labeling and writing a release note for this PR](https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes) in the block below.
-->
`release_not_found` shell function can be used both while running
`provision-master.sh` and `privision-node.sh` (it's used by `install-salt`
function in `provision-utils.sh`, but it was defined in `provision-master.sh`.
Because of this, one of my colleagues got the following diagnostic:
```
==> master: Succeeded: 52 (changed=8)
==> master: Failed: 0
==> master: -------------
==> master: Total states run: 52
==> node-1: Machine already provisioned. Run `vagrant provision` or use the `--provision`
==> node-1: flag to force provisioning. Provisioners marked to run always will still run.
==> node-1: Running provisioner: shell...
node-1: Running: /tmp/vagrant-shell20160726-19144-hahnl1.sh
==> node-1: Prepare package manager
==> node-1: Provisioning network on node
==> node-1: Network configuration verified
==> node-1: /tmp/vagrant-shell: line 134: release_not_found: command not found
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.
```
... which is rather confusing.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29610)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Use latest GCI image based on a regex in Node e2e
This PR also makes it possible to run node e2e against multiple previous images, sorted by creation time. A regex for the image name can be used to instruct node e2e to identify test images.
Depends on #29577
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29815)
<!-- Reviewable:end -->
Automatic merge from submit-queue
AWS: Allow no-op kube-down to exit 0
Not exactly sure why hack/e2e.go `IsUp()` is returning true right now,
but I can solve this a different way. This unifies with the GCE
behavior, which is that no-op kube-down returns 0.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/30380)
<!-- Reviewable:end -->
It can run tests against multiple existing images that match a regex.
GCI images will be using a regex.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Not exactly sure why hack/e2e.go IsUp() is returning true right now,
but I can solve this a different way. This unifies with the GCE
behavior, which is that no-op kube-down returns 0.
Automatic merge from submit-queue
Add cleanup addon pod to remove empty keys from etcd
namespace deletion will leave a trace of empty keys on etcd. This PR adds an addon pod to periodically check for those empty keys on etcd and remove them.
fixes#27307
Automatic merge from submit-queue
Edits to bring the tls-terminated etcd cluster to the layer.
fixes#23198
```release-note
* Updates required for juju kubernetes to use the tls-terminated etcd charm.
```
* Use the current stable CoreOS image
* Switch to etcd2
* Launch flanneld on master to make nodes accessible
* Generate Service Account certificate and enable admission controls
Automatic merge from submit-queue
Cleanup k8s script noise with a verbosity concept
Fixes https://github.com/kubernetes/kubernetes/issues/30109
The KUBE_VERBOSE environment variable sets the verbosity level to
use. Log messages can specify a verbosity by setting the V
variable. e.g.
V=2 kube::log::info foo bar
Would only print "foo bar" if $KUBE_VERBOSE >= 2.
Examples:
Default verbosity (1 for make commands):
```
$ make kubelet
+++ [0804 17:23:32] Generating bindata:
/usr/local/google/home/stclair/go/k8s3/src/k8s.io/kubernetes/test/e2e/framework/gobindata_util.go
+++ [0804 17:23:37] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
+++ [0804 17:23:37] Building go targets for linux/amd64:
cmd/kubelet
# k8s.io/kubernetes/pkg/kubelet
pkg/kubelet/kubelet.go:247: undefined: a
make: *** [kubelet] Error 1
```
Extra verbose (5, comparable to previous levels):
<details>
```
$ make kubelet KUBE_VERBOSE=5
I0804 17:31:05.083395 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/trace.go:151:30: cannot use (traceBufHeader literal) (value of type traceBufHeader) as unsafe.ArbitraryType value in argument to unsafe.Sizeof
I0804 17:31:05.083503 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/trace.go:151:7: array length 64 << 10 - unsafe.Sizeof((traceBufHeader literal)) (value of type uintptr) must be constant
I0804 17:31:05.083600 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/mgcwork.go:269:37: cannot use (workbufhdr literal) (value of type workbufhdr) as unsafe.ArbitraryType value in argument to unsafe.Sizeof
I0804 17:31:05.083654 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/mgcwork.go:269:7: array length (_WorkbufSize - unsafe.Sizeof((workbufhdr literal))) / sys.PtrSize (value of type uintptr) must be constant
I0804 17:31:05.084006 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/chan.go:21:28: cannot use (hchan literal) (value of type hchan) as unsafe.ArbitraryType value in argument to unsafe.Sizeof
I0804 17:31:05.084040 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/chan.go:21:66: cannot use (hchan literal) (value of type hchan) as unsafe.ArbitraryType value in argument to unsafe.Sizeof
I0804 17:31:05.084076 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/chan.go:21:14: unsafe.Sizeof((hchan literal)) + uintptr(-int(unsafe.Sizeof((hchan literal))) & (maxAlign - 1)) (value of type uintptr) is not constant
I0804 17:31:05.085536 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/hashmap.go:80:31: cannot use (struct{b bmap; v int64} literal).v (value of type int64) as unsafe.ArbitraryType value in argument to unsafe.Offsetof
I0804 17:31:05.085567 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/hashmap.go:80:15: unsafe.Offsetof((struct{b bmap; v int64} literal).v) (value of type uintptr) is not constant
I0804 17:31:05.085788 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/hashmap.go:1053:45: cannot convert &zeroinitial (value of type *[1024]byte) to unsafe.Pointer
I0804 17:31:05.086995 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/mfinal.go:20:65: cannot use (finalizer literal) (value of type finalizer) as unsafe.ArbitraryType value in argument to unsafe.Sizeof
I0804 17:31:05.087031 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/mfinal.go:20:11: array length (_FinBlockSize - 2 * sys.PtrSize - 2 * 4) / unsafe.Sizeof((finalizer literal)) (value of type uintptr) must be constant
I0804 17:31:05.087957 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/mstats.go:170:39: cannot use memstats.by_size (variable of type [67]struct{size uint32; nmalloc uint64; nfree uint64}) as unsafe.ArbitraryType value in argument to unsafe.Offsetof
I0804 17:31:05.087999 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/mstats.go:170:76: cannot use memstats.by_size[0] (variable of type struct{size uint32; nmalloc uint64; nfree uint64}) as unsafe.ArbitraryType value in argument to unsafe.Sizeof
I0804 17:31:05.088483 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/panic.go:118:34: cannot use (_defer literal) (value of type _defer) as unsafe.ArbitraryType value in argument to unsafe.Sizeof
I0804 17:31:05.088510 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/panic.go:118:20: unsafe.Sizeof((_defer literal)) (value of type uintptr) is not constant
I0804 17:31:05.089812 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/sema.go:42:42: cannot use (semaRoot literal) (value of type semaRoot) as unsafe.ArbitraryType value in argument to unsafe.Sizeof
I0804 17:31:05.089845 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/runtime/sema.go:42:8: array length sys.CacheLineSize - unsafe.Sizeof((semaRoot literal)) (value of type uintptr) must be constant
I0804 17:31:05.094634 2601 parse.go:307] type checking encountered some errors in "runtime", but ignoring.
I0804 17:31:05.875185 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/go/k8s3/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/davecgh/go-spew/spew/bypass.go:33:26: cannot use (*byte)(nil) (value of type *byte) as unsafe.ArbitraryType value in argument to unsafe.Sizeof
I0804 17:31:05.875234 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/go/k8s3/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/davecgh/go-spew/spew/bypass.go:33:12: unsafe.Sizeof((*byte)(nil)) (value of type uintptr) is not constant
I0804 17:31:05.875838 2601 parse.go:307] type checking encountered some errors in "github.com/davecgh/go-spew/spew", but ignoring.
I0804 17:31:05.897216 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/crypto/cipher/xor.go:12:36: cannot use uintptr(0) (constant 0 of type uintptr) as unsafe.ArbitraryType value in argument to unsafe.Sizeof
I0804 17:31:05.897261 2601 parse.go:353] type checker error: /usr/local/google/home/stclair/.gvm/gos/go1.6.2/src/crypto/cipher/xor.go:12:18: int(unsafe.Sizeof(uintptr(0))) (value of type int) is not constant
I0804 17:31:05.897360 2601 parse.go:307] type checking encountered some errors in "crypto/cipher", but ignoring.
I0804 17:31:06.400904 2601 conversion.go:227] considering pkg "k8s.io/kubernetes/federation/apis/core/v1"
I0804 17:31:06.401138 2601 conversion.go:243] tags: ["k8s.io/kubernetes/federation/apis/core"]
I0804 17:31:06.427408 2601 conversion.go:283] no viable conversions, not generating for this package
I0804 17:31:06.427508 2601 main.go:73] Completed successfully.
Go version: go version go1.6.2 linux/amd64
+++ [0804 17:31:06] Generating bindata:
/usr/local/google/home/stclair/go/k8s3/src/k8s.io/kubernetes/test/e2e/framework/gobindata_util.go
Generated bindata file : 11536 ../../..//test/e2e/generated/bindata.go lines of lovely automated artifacts
+++ [0804 17:31:12] Building the toolchain targets:
k8s.io/kubernetes/hack/cmd/teststale
+++ [0804 17:31:12] Building go targets for linux/amd64:
cmd/kubelet
# k8s.io/kubernetes/pkg/kubelet
pkg/kubelet/kubelet.go:247: undefined: a
!!! Error in /usr/local/google/home/stclair/go/k8s3/src/k8s.io/kubernetes/hack/lib/golang.sh:506
'go install "${goflags[@]:+${goflags[@]}}" -gcflags "${gogcflags}" -ldflags "${goldflags}" "${nonstatics[@]:+${nonstatics[@]}}"' exited with status 2
Call stack:
1: /usr/local/google/home/stclair/go/k8s3/src/k8s.io/kubernetes/hack/lib/golang.sh:506 kube::golang::build_binaries_for_platform(...)
2: /usr/local/google/home/stclair/go/k8s3/src/k8s.io/kubernetes/hack/lib/golang.sh:692 kube::golang::build_binaries(...)
3: hack/make-rules/build.sh:27 main(...)
Exiting with status 1
!!! Error in /usr/local/google/home/stclair/go/k8s3/src/k8s.io/kubernetes/hack/lib/golang.sh:596
'( kube::golang::setup_env; echo "Go version: $(go version)"; local host_platform; host_platform=$(kube::golang::host_platform); local goflags goldflags gogcflags; eval "goflags=(${KUBE_GOFLAGS:-})"; goldflags="${KUBE_GOLDFLAGS:-} $(kube::version::ldflags)"; gogcflags="${KUBE_GOGCFLAGS:-}"; local use_go_build; local -a targets=(); local arg; readonly BINDATAS=("${KUBE_ROOT}/test/e2e/framework/gobindata_util.go"); kube::log::status "Generating bindata:" "${BINDATAS[@]}"; for bindata in ${BINDATAS[@]};
do
if [[ -f $bindata ]]; then
go generate "${bindata}";
fi;
done; for arg in "$@";
do
if [[ "${arg}" == "--use_go_build" ]]; then
use_go_build=true;
else
if [[ "${arg}" == -* ]]; then
goflags+=("${arg}");
else
targets+=("${arg}");
fi;
fi;
done; if [[ ${#targets[@]} -eq 0 ]]; then
targets=("${KUBE_ALL_TARGETS[@]}");
fi; local -a platforms=(${KUBE_BUILD_PLATFORMS:-}); if [[ ${#platforms[@]} -eq 0 ]]; then
platforms=("${host_platform}");
fi; local binaries; binaries=($(kube::golang::binaries_from_targets "${targets[@]}")); local parallel=false; if [[ ${#platforms[@]} -gt 1 ]]; then
local gigs; gigs=$(kube::golang::get_physmem); if [[ ${gigs} -ge ${KUBE_PARALLEL_BUILD_MEMORY} ]]; then
kube::log::status "Multiple platforms requested and available ${gigs}G >= threshold ${KUBE_PARALLEL_BUILD_MEMORY}G, building platforms in parallel"; parallel=true;
else
kube::log::status "Multiple platforms requested, but available ${gigs}G < threshold ${KUBE_PARALLEL_BUILD_MEMORY}G, building platforms in serial"; parallel=false;
fi;
fi; kube::golang::build_kube_toolchain; if [[ "${parallel}" == "true" ]]; then
kube::log::status "Building go targets for ${platforms[@]} in parallel (output will appear in a burst when complete):" "${targets[@]}"; local platform; for platform in "${platforms[@]}";
do
( kube::golang::set_platform_envs "${platform}"; kube::log::status "${platform}: go build started"; kube::golang::build_binaries_for_platform ${platform} ${use_go_build:-}; kube::log::status "${platform}: go build finished" ) &> "/tmp//${platform//\//_}.build" &
done; local fails=0; for job in $(jobs -p);
do
wait ${job} || let "fails+=1";
done; for platform in "${platforms[@]}";
do
cat "/tmp//${platform//\//_}.build";
done; exit ${fails};
else
for platform in "${platforms[@]}";
do
kube::log::status "Building go targets for ${platform}:" "${targets[@]}"; kube::golang::set_platform_envs "${platform}"; kube::golang::build_binaries_for_platform ${platform} ${use_go_build:-};
done;
fi )' exited with status 1
Call stack:
1: /usr/local/google/home/stclair/go/k8s3/src/k8s.io/kubernetes/hack/lib/golang.sh:596 kube::golang::build_binaries(...)
2: hack/make-rules/build.sh:27 main(...)
Exiting with status 1
make: *** [kubelet] Error 1
```
</details>
Remaining work: Add a verbosity label to more log messages.
/cc @kubernetes/sig-api-machinery @kubernetes/contributor-experience
Automatic merge from submit-queue
AWS/GCE: Rework use of master name
* Add a pillar for `hostname` (because even if there's a good Salt function for it, I don't trust it to return the short hostname)
* Move `INITIAL_ETCD_CLUSTER` to just the GCE turn-up
* Remove `master_name`, which isn't needed
The KUBE_VERBOSE environment variable sets the verbosity level to
use. Log messages can specify a verbosity by setting the V
variable. e.g.
V=2 kube::log::info foo bar
Would only print "foo bar" if $KUBE_VERBOSE >= 2.
* Add a pillar for hostname (because even if there's a good Salt
function for it, I don't trust it to return the short hostname)
* Move INITIAL_ETCD_CLUSTER to just the GCE turn-up
* Remove the master_name, which isn't needed as a pillar
Automatic merge from submit-queue
Replacing skydns with kubedns for the juju cluster. #29720
```release-note
* Updating the cluster/juju provider to use kubedns in place of skydns.
```
Automatic merge from submit-queue
In cluster scripts correct gcloud list arg from '--zone' to '--zones'
I started getting these messages when doing `kube-up` and similar operations:
WARNING: Abbreviated flag [--zone] will be disabled in release 132.0.0, use the full name [--zones].
This PR corrects the flag where used.
Note there are many uses of `--zone` on commands like `gcloud instances describe` which are still correct - those commands do not accept multiple zones.
Automatic merge from submit-queue
Documented second arg to create-flanneld-opts in cluster/ubuntu/util.sh
This is a bug fix, no release note needed.
Fixes#29546
Automatic merge from submit-queue
GKE test-build-release: Actually do the build.
Multiple devs (myself included!) have experienced frustration with the fact that if `KUBERNETES_PROVIDER=gke` then `hack/e2e.go --build` doesn't actually do a build.
Are we actually relying on this behavior anywhere?
Automatic merge from submit-queue
[Garbage Collector] add e2e tests again
#27151 is reverted because gke didn't start correctly after it's merged (https://github.com/kubernetes/kubernetes/pull/27151#issuecomment-233030686).
The possible problem is the `unbound variable`, which is fixed in the second commit of this PR. However, I cannot verify if the PR will fail the gke suite since I don't have the environment to run that suite.
@wojtek-t @lavalamp
Automatic merge from submit-queue
azure: kube-up respects AZURE_RESOURCE_GROUP
This fixes#28482.
* declare AZKUBE_ variables as global to workaround lack of bash support for exporting array variables
Automatic merge from submit-queue
cluster/images/hyperkube: re-add hyperkube busybox style symlinks
Originally symlinks were added with a `--make-symlinks` command discussed in https://github.com/kubernetes/kubernetes/issues/24510 and implemented in https://github.com/kubernetes/kubernetes/pull/24511.
It was backed out in https://github.com/kubernetes/kubernetes/pull/25693 because go binaries don't run in qemu and this breaks cross-building the Dockerfile for arm. In this case, due to running `hyperkube --make-symlinks`.
Lets just add the symlinks manually until the upstream bug is fixed (qemu).
fixes#28702
@mikedanese @thockin @yifan-gu @euank
Automatic merge from submit-queue
Updated fluentd configuration to spawn multiple threads for processing
(by default, fluentd uses a single thread).
@a-robinson @igorpeshansky
Automatic merge from submit-queue
Bump exechealthz image
With the new image at least if we observe an exec container taking more ram than it should (like the oom situation, which shouldn't happen today because of the increased limits), we can kubectl exec and check the pprof endpoints.
Note that I'm not bumping the rc version, because I just did so with: https://github.com/kubernetes/kubernetes/pull/29693.
Automatic merge from submit-queue
Give healthz more memory to mitigate #29688
This will recreate the rc but not the pods. At least on the clusters we patched, if the pods get recreated they'll ccome back up with the updated limits.
#29688
Automatic merge from submit-queue
export KUBE_USER to salt (support custom usernames) for vagrant, vsph…
GCE/GKE were handled in #29164, AWS was handled in #29428. This should cover the rest of the configurations that use ABAC.
Automatic merge from submit-queue
kube-up: increase download timeout for kubernetes.tar.gz
Particularly on smaller instances on AWS, we were hitting the 80 second
timeout now that our image is well over the 1GB mark.
Increase the timeout from 80 seconds to 300 seconds.
Fix#29418
Automatic merge from submit-queue
Include CNI for all architectures in the hyperkube image
Can some of you (@jfrazelle @mikedanese) quickly lgtm this?
I'd like it if we got it merged before v1.4.0-alpha.2
It's not a huge change, I'm just cross-compiling this CNI stuff while waiting for the v0.4.0 which likely will release binaries for all arches.
Automatic merge from submit-queue
AWS kube-up: fix MASTER_OS_DISTRIBUTION
On AWS we were defining KUBE_MASTER_OS_DISTRIBUTION, but the scripts
expect MASTER_OS_DISTRIBUTION.
Fixes#29422
Particularly on smaller instances on AWS, we were hitting the 80 second
timeout now that our image is well over the 1GB mark.
Increase the timeout from 80 seconds to 300 seconds.
Fix#29418
Automatic merge from submit-queue
Add load balancer in front of apiserver in HA master setup
The first commit is just https://github.com/kubernetes/kubernetes/pull/29201 and has been already LGTMed.
Second commit has some small fixes:
1. Precompute REGION variable in config
2. Add timeout for waiting for loadbalancer
3. Fix kube-down so that it doesn't delete some resources if there are still masters/nodes in other zones
Second commit also fixes bug in https://github.com/kubernetes/kubernetes/pull/29201 where variable `REGION` was unset in `kube-down` when deleting master IP. The bug caused leaking of IP addresses.
https://github.com/kubernetes/kubernetes/issues/21124
@davidopp @jszczepkowski @wojtek-t @mikedanese
Automatic merge from submit-queue
Bump the default etcd version in the Makefile to 3.0.3
Fixes: #29132
I haven't had time to manually validate the arm and arm64 version yet, but I think it should be fine.
cc @xiang90 @hongchaodeng @timothysc @lavalamp @wojtek-t @thockin @kubernetes/sig-scalability @Pensu @laboger
Automatic merge from submit-queue
Ubuntu: Enable ssh compression when downloading binaries during cluster creation
<!--
Checklist for submitting a Pull Request
Please remove this comment block before submitting.
1. Please read our [contributor guidelines](https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md).
2. See our [developer guide](https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md).
3. If you want this PR to automatically close an issue when it is merged,
add `fixes #<issue number>` or `fixes #<issue number>, fixes #<issue number>`
to close multiple issues (see: https://github.com/blog/1506-closing-issues-via-pull-requests).
4. Follow the instructions for [labeling and writing a release note for this PR](https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes) in the block below.
-->
resolves#20971 by using the options provided by ssh.
Native ssh compression has existed for years, and the server is free to disregard the setting, so this should be safe.
With things like the kube binaries I see about a 2x speed increase.
```
λ time scp kubes-bin.tar 9.30.182.251:/mnt/build/kubin
kubes-bin.tar 100% 344MB 10.7MB/s 00:32
real 0m32.284s
user 0m1.679s
sys 0m1.263s
λ time scp -C kubes-bin.tar 9.30.182.251:/mnt/build/kubin
kubes-bin.tar 100% 344MB 22.9MB/s 00:15
real 0m14.810s
user 0m12.858s
sys 0m0.994s
λ ls -lah kubes-bin.tar
-rw-r--r-- 1 mhb staff 344M Jun 2 15:29 kubes-bin.tar
λ tar -tf kubes-bin.tar
kubectl
master/
master/etcd
master/etcdctl
master/flanneld
master/kube-apiserver
master/kube-controller-manager
master/kube-scheduler
node/
node/flanneld
node/kube-proxy
node/kubelet
```
Automatic merge from submit-queue
fix logrotate config (again)
we need to add the dateformat option so that the logrotate
can create unique logfiles for each rotation. Without this,
logrotation is skipped with message like (generated in
verbose mode of logrotate):
rotating log /var/log/rotate-test.log, log->rotateCount is 5
dateext suffix '-20160718'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
destination /var/log/rotate-test2.log-20160718.gz already exists, skipping rotation
Tested as follows:
# config in '/etc/logrotate.d/rotate-test':
/var/log/rotate-test.log {
rotate 5
copytruncate
missingok
notifempty
compress
maxsize 100M
daily
dateext
dateformat -%Y%m%d-%s
create 0644 root root
}
# create 150Mb of /var/log/rotate-test.log
$ dd if=/dev/zero of=/var/log/rotate-test.log bs=1048576 count=150 conv=notrunc oflag=append
# run logrotate
$ /usr/sbin/logrotate -v /etc/logrotate.conf
...
rotating pattern: /var/log/rotate-test.log after 1 days (5 rotations)
empty log files are not rotated, log files >= 104857600 are rotated earlier, old logs are removed
considering log /var/log/rotate-test.log
log needs rotating
rotating log /var/log/rotate-test.log, log->rotateCount is 5
Converted ' -%Y%m%d-%s' -> '-%Y%m%d-%s'
dateext suffix '-20160718-1468875268'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
copying /var/log/rotate-test.log to /var/log/rotate-test.log-20160718-1468875268
truncating /var/log/rotate-test.log
compressing log with: /bin/gzip
Repeating 'dd' and 'logrotate' commands now generate logfiles correctly.
#27754
@bprashanth can you please review?
Automatic merge from submit-queue
hyperkube: fix build for 3rd party registry (again)
Fixes issue #28487
This is a minor fix for the issue reported in #28487
we need to add the dateformat option so that the logrotate
can create unique logfiles for each rotation. Without this,
we logrotation is skipped with message like (generated in
verbose mode of logrotate):
rotating log /var/log/rotate-test.log, log->rotateCount is 5
dateext suffix '-20160718'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
destination /var/log/rotate-test2.log-20160718.gz already exists, skipping rotation
Tested as follows:
# config in '/etc/logrotate.d/rotate-test':
/var/log/rotate-test.log {
rotate 5
copytruncate
missingok
notifempty
compress
maxsize 100M
daily
dateext
dateformat -%Y%m%d-%s
create 0644 root root
}
# create 150Mb of /var/log/rotate-test.log
$ dd if=/dev/zero of=/var/log/rotate-test.log bs=1048576 count=150 conv=notrunc oflag=append
# run logrotate
$ /usr/sbin/logrotate -v /etc/logrotate.conf
...
rotating pattern: /var/log/rotate-test.log after 1 days (5 rotations)
empty log files are not rotated, log files >= 104857600 are rotated earlier, old logs are removed
considering log /var/log/rotate-test.log
log needs rotating
rotating log /var/log/rotate-test.log, log->rotateCount is 5
Converted ' -%Y%m%d-%s' -> '-%Y%m%d-%s'
dateext suffix '-20160718-1468875268'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
copying /var/log/rotate-test.log to /var/log/rotate-test.log-20160718-1468875268
truncating /var/log/rotate-test.log
compressing log with: /bin/gzip
Repeating 'dd' and 'logrotate' commands now generate logfiles correctly.
Automatic merge from submit-queue
Unset KUBERNETES_PROVIDER when KUBERNETES_CONFORMANCE_TEST is set
fixes#26269
same as #26530 - i accidentally lost my fork and couldn't rebase there ;)
@mikedanese PTAL
Automatic merge from submit-queue
kube-up: install new Docker pre-requisite (libltdl7) when not in image
Docker now has a dependency on libltdl7; we have to specify it manually
if we are installing docker using dpkg (vs using apt-get or similar,
which would pull it in automatically)
Fixes#28644
This allows us to start building real dependencies into Makefile.
Leave old hack/* scripts in place but advise to use 'make'. There are a few
rules that call things like 'go run' or 'build/*' that I left as-is for now.
Docker now has a dependency on libltdl7; we have to specify it manually
if we are installing docker using dpkg (vs using apt-get or similar,
which would pull it in automatically)
Fixes#28644
Automatic merge from submit-queue
Fixes#28205, Check release tar location for Openstack-Heat provider
This does a basic check to see where the release tars are located.
Allows people to use openstack-heat outside of compiling k8s.
Automatic merge from submit-queue
Enable extensions/v1beta1/NetworkPolicy by default
Fixes https://github.com/kubernetes/kubernetes/issues/28401
For some reason this also triggered an update to the swagger spec (which apparently hadn't been done before but wasn't failing validation...)
Automatic merge from submit-queue
Implementing a proper master/worker split in the juju cluster code.
```
release-note-none
```
General updates to the cluster/juju Kubernetes provider, to bring it up to date.
Updating the skydns templates to version 11
Updating the etcd container definition to include arch.
Updating the master template to include arch and version for hyperkube container.
Adding dns_domain configuration options.
Adding storage layer options.
[]()
Automatic merge from submit-queue
Check existence of kubernetes dir for get-kube.sh
[]()
There are a lot of references to https://get.k8s.io/ over the internet.
Most of such references do not describe KUBERNETES_SKIP_DOWNLOAD env variable
and newbies can get into a situation described below:
- execute `wget -q -O - https://get.k8s.io | bash`
- receive a failure due too missed packages or some configs
- fix the issue
- try again `wget -q -O - https://get.k8s.io | bash`
In this case, get-kube.sh will not check that kubernetes directory already
exist and repeat download again.
Lets make get-kube.sh more user-friendly and check existence of kubernetes dir
Updating the skydns templates to version 11
Updating the etcd container definition to include arch.
Updating the master template to include arch and version for hyperkube container.
Adding dns_domain configuration options.
Adding storage layer options.
Fixing underscore problem and adding exceptions.
Fixing the underscore flag errors.
Automatic merge from submit-queue
Make GKE detect-instance-groups work on Mac.
Make the fix from #27803 also work on mac.
The GNU `expr` command supports both the `expr match STRING REGEXP` and `expr STRING : REGEXP` command syntax.
The BSD `expr` command only has the `expr STRING : REGEXP` syntax.
@fabioy @a-robinson
Automatic merge from submit-queue
Bump skydns godeps to latest
Update Godeps for github.com/skynetservices/skydns and miekg/dns.
Bump kubedns version to 1.6 with latest skynetservices/skydns code
Built kube-dns for all architectures and pushed containers to gcr.io.
Automatic merge from submit-queue
Enhance kubedns pod health checks to cover kubedns container
The existing health check hits port 53, the dnsmasq container, with the same domain name every time. Since dnsmasq looks up and caches results from the kubedns container, running on port 10053, the health check is not covering the kubedns container after the first query (and once every TTL expiration).
This PR enhances the health check to directly hit port 10053 (kubedns) in addition to port 53.
Automatic merge from submit-queue
Substitute federation_domain_map parameter with its value in node bootstrap scripts.
This PR also removes the substitution code we added to the build scripts.
**Release Note**
```release-note
If you use one of the kube-dns replication controller manifest in `cluster/saltbase/salt/kube-dns`, i.e. `cluster/saltbase/salt/kube-dns/{skydns-rc.yaml.base,skydns-rc.yaml.in}`, either substitute one of `__PILLAR__FEDERATIONS__DOMAIN__MAP__` or `{{ pillar['federations_domain_map'] }}` with the corresponding federation name to domain name value or remove them if you do not support cluster federation at this time. If you plan to substitute the parameter with its value, here is an example for `{{ pillar['federations_domain_map'] }`
pillar['federations_domain_map'] = "- --federations=myfederation=federation.test"
where `myfederation` is the name of the federation and `federation.test` is the domain name registered for the federation.
```
cc @erictune @kubernetes/sig-cluster-federation @MikeSpreitzer @luxas
[]()
Automatic merge from submit-queue
Remove duplicated nginx image. Use nginx-slim instead
This PR removes the image `gcr.io/google_containers/nginx:1.7.9` and uses `gcr.io/google_containers/nginx-slim:0.7`.
Besides removing the duplication `1.7.9` is 16 months old.
Automatic merge from submit-queue
Making DHCP_OPTION_SET_ID creation optional
Reason: We have a pre-configured VPC in AWS. `kube-up.sh` should not making changes to the VPC DHCP option if there's already DHCP options configured.
PR Changes: When `DHCP_OPTION_SET_ID` is given in environment variable, kube-up.sh will skip the `DHCP_OPTION_SET_ID` creation.
Automatic merge from submit-queue
Enable setting up Kubernetes cluster in Ubuntu on Azure
Implement basic cloud provider functionality to deploy Kubernetes on
Azure. SaltStack is used to deploy Kubernetes on top of Ubuntu
virtual machines. OpenVpn provides network connectivity. For
kubelet authentication, we use basic authentication (username and
password). The scripts use the legacy Azure Service Management APIs.
We have set up a nightly test job in our Jenkins server for federated
testing to run the e2e test suite on Azure. With the cloud provider
scripts in this commit, 14 e2e test cases pass in this environment.
We plan to implement additional Azure functionality to support more
test cases.
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/21207)
<!-- Reviewable:end -->
Automatic merge from submit-queue
[fix] allow ALLOW_PRIVILEGED to be passed to kubelet and kube-api
This is something that we need for running docker in docker. Please let me know if you would consider this change. Thanks :)
Automatic merge from submit-queue
Add Calico as policy provider in GCE
Adds Calico as policy provider to GCE, enforcing the extensions/v1beta1 NetworkPolicy API.
Still to do:
- [x] Enable NetworkPolicy API when POLICY_PROVIDER is provided.
- [x] Fix CNI plugin, policy controller versions.
CC @thockin - does this general approach look good?
Automatic merge from submit-queue
Tracked addition of federation, sed support in kube DNS
[]()
The kube DNS app recently gained support for federation (whatever that
is), including a new Salt parameter. This broke the deployAddons.sh script for cluster ubuntu. The DNS app also gained alternate
templates, intended to be friendly to `sed`. Fortunately, those do
not demand a federation parameter.
This PR fixes up the ` cluster/ubuntu/deployAddons.sh` script to track those changes, by switching to the `sed`-friendly templates.
Automatic merge from submit-queue
mount instanceid file from config drive when using openstack cloud provider
fix https://github.com/kubernetes/kubernetes/issues/23191, the instanceid file is read however we do not mount it as a volume, and it would cause the cloud provider contacts the metadata server, in some cases, the metadata server is not able to serve, then the cloud provider would fail to initialize, we should avoid that.
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/23733)
<!-- Reviewable:end -->
Automatic merge from submit-queue
cluster/aws: Add option for kubeconfig context
Added KUBE_CONFIG_CONTEXT environment variable to customize the kubeconfig context created at the end of the aws kube-up script.
Fixes#24877
This PR does barely anything and shouldn't require e2e tests. It's just a minor convenience.
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24910)
<!-- Reviewable:end -->
The kube DNS app recently gained support for federation (whatever that
is), including a new Salt parameter. It also gained alternate
templates, intended to be friendly to `sed`. Fortunately, those do
not demand a federation parameter.
Automatic merge from submit-queue
Revert "Federation e2e supports aws"
Reverting https://github.com/kubernetes/kubernetes/pull/27791 to get our Jenkins tests green again.
cc @kubernetes/sig-cluster-federation
Automatic merge from submit-queue
federation: Updating KubeDNS to try finding a local service first for federation query
Ref https://github.com/kubernetes/kubernetes/issues/26762
Updating KubeDNS to try to find a local service first for federation query.
Without this change, KubeDNS always returns the DNS hostname, even if a local service exists.
Have updated the code to first remove federation name from path if it exists, so that the default search for local service happens. If we dont find a local service, then we try to find the DNS hostname.
Will appreciate a strong review since this is my first change to KubeDNS.
https://github.com/kubernetes/kubernetes/pull/25727 was the original PR that added federation support to KubeDNS.
cc @kubernetes/sig-cluster-federation @quinton-hoole @madhusudancs @bprashanth @mml
Automatic merge from submit-queue
Use new fluentd-gcp container with journal support
This makes use of the systemd-journal support added in PR #27981
and Fixes#27446.
cc/ @a-robinson @andyzheng0831
Automatic merge from submit-queue
Support journal logs in fluentd-gcp on GCI
This maintains a single common image for each rather than having to fork out separate images, relying on different commands in yaml manifests to differentiate in the behavior. This is treading on top of @adityakali's #27906, but I wasn't able to get in touch with him this afternoon until very recently. He's handling making sure that the new yaml manifests are used when running on GCI.
```release-note
```
Only run the systemd-journal plugin when on a platform that requests it.
The plugin crashes the fluentd process if the journal isn't present, so
it can't just be run blindly in all configurations.
Following from #27830, this copies the source onto the instance and
displays the location of it prominently (keeping the download link for
anyone that just wants to curl it).
Example output (this tag doesn't exist yet):
---
Welcome to Kubernetes v1.4.0!
You can find documentation for Kubernetes at:
http://docs.kubernetes.io/
The source for this release can be found at:
/usr/local/share/doc/kubernetes/kubernetes-src.tar.gz
Or you can download it at:
https://storage.googleapis.com/kubernetes-release/release/v1.4.0/kubernetes-src.tar.gz
It is based on the Kubernetes source at:
https://github.com/kubernetes/kubernetes/tree/v1.4.0
For Kubernetes copyright and licensing information, see:
/usr/local/share/doc/kubernetes/LICENSES
---
Automatic merge from submit-queue
Pushing a new KubeDNS image and updating the YAML files
Updating KubeDNS image to include https://github.com/kubernetes/kubernetes/pull/27845
@kubernetes/sig-cluster-federation @girishkalele @mml
Automatic merge from submit-queue
Revert kube-proxy as a DaemonSet in hyperkube for the v1.3 release
It was a bit sad, but I was a bit too fast with the kube-proxy DaemonSet thing, so we have to target v1.4 for that one. Reverting to a static-pod
This one is for v1.3
@mikedanese @cheld @zreigz
Automatic merge from submit-queue
increase addon check interval
Do static pods have a crash loop back off? If so, this test would be much faster if we restarted the kubelet to clear that.
Fixes#26770
Following from #27830, this copies the source onto the instance and
displays the location of it prominently (keeping the download link for
anyone that just wants to curl it).
Example output (this tag doesn't exist yet):
---
Welcome to Kubernetes v1.4.0!
You can find documentation for Kubernetes at:
http://docs.kubernetes.io/
The source for this release can be found at:
/usr/local/share/doc/kubernetes/kubernetes-src.tar.gz
Or you can download it at:
https://storage.googleapis.com/kubernetes-release/release/v1.4.0/kubernetes-src.tar.gz
It is based on the Kubernetes source at:
https://github.com/kubernetes/kubernetes/tree/v1.4.0
For Kubernetes copyright and licensing information, see:
/usr/local/share/doc/kubernetes/LICENSES
---
Automatic merge from submit-queue
AWS kube-up: Authorize route53 in the IAM policy
Federation needs this now (on the nodes), and I suspect ingress
controllers will shortly want this also. Given we're going to authorize
it on the nodes, we should authorize it on the master also (the master
is much more trusted).
Fix#27467
Automatic merge from submit-queue
Allow conformance tests to run on non-GCE providers
fixes https://github.com/kubernetes/kubernetes/issues/26869
Creates a skeleton provider which has all the required function stubs -- but will allow a previously set "skeleton" KUBERNETES_PROVIDER to not be overriden with "gce".
Federation needs this now (on the nodes), and I suspect ingress
controllers will shortly want this also. Given we're going to authorize
it on the nodes, we should authorize it on the master also (the master
is much more trusted).
Fix#27467
Automatic merge from submit-queue
AWS kube-up: move to Docker 1.11.2
This is to mirror GCE
Also we remove support for vivid as Docker no longer packages for it, and remove some of the unreachable distro code in aws kube-up.
Also bump the AMI to a 1.3 version (with preinstalled Docker 1.11.2)
Fixes https://github.com/kubernetes/kubernetes/issues/27654
Automatic merge from submit-queue
GCE provider: Limit Filter calls to regexps rather than insane blobs
Filters can't exceed 4k, and GET requests against the GCE API are also limited, so these break down in different ways at different cluster counts. Fix it by introducing an advisory `node-instance-prefix` configuration in the GCE provider that can hint the `EnsureLoadBalancer`/`UpdateLoadBalancer code` (and the firewall creation/update code). If it's not there, or wrong (a hostname that's registered violates it), just ignore it and grab the whole project.
Fixes#27731
[]()
Filters can't exceed 4k, and GET requests against the GCE API are also
limited, so these break down in different ways at different cluster
counts. Fix it by introducing an advisory node-instance-prefix
configuration in the GCE provider that can hint the
EnsureLoadBalancer/UpdateLoadBalancer code (and the firewall
creation/update code). If it's not there, or wrong (a hostname that's
registered violates it), just ignore it and grab the whole project.
Automatic merge from submit-queue
federation: Creating kubeconfig files to be used for creating secrets for clusters on aws and gke
Extension of https://github.com/kubernetes/kubernetes/pull/26914 which created the kubeconfig files for gce clusters.
This PR extends it to AWS, vagrant and GKE.
The change for AWS and vagrant is exactly same as GCE.
For GKE, since `gcloud create clusters` creates kubeconfig, we are just copying the generated kubeconfig to the desired location
cc @kubernetes/sig-cluster-federation @colhom
@roberthbailey for GKE
Automatic merge from submit-queue
rkt: Map kubelet's `--stage1-image` flag to rkt's `--stage1-name` flag.
This enables rkt to use cached stage1 image instead of unpacking the stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to enable rkt to find the stage1 image with the name specified by this flag.
Also, the cloud config is modified to pre-load the stage1 images.
cc @kubernetes/sig-rktnetes @kubernetes/sig-node
Automatic merge from submit-queue
add logrotate service and configuration for GCI
This change mirrors the configuration in cluster/saltbase/salt/logrotate for GCI.
On GCI we use systemd timers (https://www.freedesktop.org/software/systemd/man/systemd.timer.html) and install an hourly timer - kube-logrotate.timer. This will invoke kube-logrotate.service (which calls /usr/sbin/logrotate) once every hour to perform log rotation as per the rotation rules installed under /etc/logrotate.d/.
@kubernetes/goog-image @zmerlynn @dchen1107 @andyzheng0831
This enables rkt to use cached stage1 image instead of unpacking the
stage1 image every time for every pod.
After this change, users need to preload the stage1 images in order to
enable rkt to find the stage1 image with the name specified by this flag.
Automatic merge from submit-queue
make GCI image detection robust
This change makes sure that in case we roll back a released GCI image, the image detection logic picks a correct active image.
@kubernetes/goog-image @Amey-D @wonderfly @dchen1107
Automatic merge from submit-queue
Update to dnsmasq:1.3 and make hyperkube always use the latest addons
This bumps dnsmasq to a version that works on all architectures: https://github.com/kubernetes/contrib/pull/1192 (which have to be pushed first indeed)
Also I removed the manifests in hyperkube addons in favor for machine-generated ones, which will avoid mistakes.
This one is required for `v1.3`, so it has to be cherrypicked I think...
It makes docker and docker-multinode addons work again...
(Yes, we'll probably get rid of docker in favor for minikube, but we'll have to have it in this release at least)
@girishkalele @thockin @ArtfulCoder @david-mcmahon @bgrant0607 @mikedanese
This works around a linux kernel bug with overly aggressive caching of
ARP entries, which was causing problems when we reused IP addresses in
VPCs, for example with an ASG in a relatively small subnet.
See #23395 for more explanation.
Fixes#23395
Vivid is EOL, and Docker is no longer packaged for it.
Remove support for it in 1.3 (in 1.2 we had warned users it was EOL).
Also remove unused wheezy, trusty & coreos & do general cleanup.
Automatic merge from submit-queue
Prep for continuous Docker validation test
```release-note
Add a test config variable to specify desired Docker version to run on GCI.
```
We want to continuously validate Docker releases (#25215), on GCI. This change
adds a new test config variable, `KUBE_GCI_DOCKER_VERSION`, through which we can
specify which version of Docker we want to run on the master and nodes. This
change also patches the Jenkins e2e-runner with the ability to fetch the latest
Docker (pre)release, and sets the aforementioned variable accordingly.
Tested on my local Jenkins instance that was able to start a cluster with the latest Docker version (different from installed version) running on both master and nodes.
@dchen1107 Can you review?
cc/ @andyzheng0831 for changes in `cluster/gce/gci/helper.sh`, and @ixdy @spxtr for changes to the Jenkins e2e-runner
cc/ @kubernetes/goog-image
Implement basic cloud provider functionality to deploy Kubernetes on
Azure. SaltStack is used to deploy Kubernetes on top of Ubuntu
virtual machines. OpenVpn provides network connectivity. For
kubelet authentication, we use basic authentication (username and
password). The scripts use the legacy Azure Service Management APIs.
We have set up a nightly test job in our Jenkins server for federated
testing to run the e2e test suite on Azure. With the cloud provider
scripts in this commit, 14 e2e test cases pass in this environment.
We plan to implement additional Azure functionality to support more
test cases.
This first reverts commit 8e8437dad8.
Also resolves conflicts with docs on f334fc41
And resolves conflicts with https://github.com/kubernetes/kubernetes/pull/22231/commits
to make people switching between two different methods of setting up by
setting env variables.
Conflicts:
cluster/get-kube.sh
cluster/saltbase/salt/README.md
cluster/saltbase/salt/kube-proxy/default
cluster/saltbase/salt/top.sls
Automatic merge from submit-queue
Revert "Revert "GCI: add support for network plugin""
PR #27027 added the network plugin support in GCI config, but later a bug in the network plugin broke e2e tests (see issue #27118). The bug was fixed by #27141 and we have been repeatedly run the serial e2e tests more than 10 times to verify the fix. Now it should be safe to put the GCI network plugin support back.
We will first merge in the master branch and monitor the Jenkins serial tests for a while and then cherry-pick it into release-1.3 branch.
Automatic merge from submit-queue
kube-up multizone: don't print scary warning
The node-count check gets confused when there are more nodes that we
launched, which is normal with KUBE_USE_EXISTING_MASTER.
This fix just suppresses the error message in that case.
Fix#23390
The node-count check gets confused when there are more nodes that we
launched, which is normal with KUBE_USE_EXISTING_MASTER.
This fix just suppresses the error message in that case.
Fix#23390
Automatic merge from submit-queue
Fixes and improvements to Photon Controller backend for kube-up
- Improve reliability of network address detection by using MAC
address. VMware has a MAC OUI that reliably distinguishes the VM's
NICs from the other NICs (like the CBR). This doesn't rely on the
unreliable reporting of the portgroup.
- Persist route changes. We configure routes on the master and nodes,
but previously we didn't persist them so they didn't last across
reboots. This persists them in /etc/network/interfaces
- Fix regression that didn't configure auth for kube-apiserver with
Photon Controller.
- Reliably run apt-get update: Not doing this can cause apt to fail.
- Remove unused nginx config in salt
- Improve reliability of network address detection by using MAC
address. VMware has a MAC OUI that reliably distinguishes the VM's
NICs from the other NICs (like the CBR). This doesn't rely on the
unreliable reporting of the portgroup.
- Persist route changes. We configure routes on the master and nodes,
but previously we didn't persist them so they didn't last across
reboots. This persists them in /etc/network/interfaces
- Fix regression that didn't configure auth for kube-apiserver with
Photon Controller.
- Reliably run apt-get update: Not doing this can cause apt to fail.
- Remove unused nginx config in salt
Automatic merge from submit-queue
version bump for gci to milestone 53
Fixes#26455
GCI release 53 includes kubernetes v1.3.0-alpha.5 with docker-1.11.2.
@dchen1107 @kubernetes/goog-image @andyzheng0831
Automatic merge from submit-queue
support for mounting local-ssds on GCI
This change adds support for mounting local ssds on GCI.
It updates the previous container-vm behavior as well to
match that for GCI nodes by mounting the local-ssds under
the same path (/mnt/disks/ssdN).
@vulpecula @roberthbailey @andyzheng0831 @kubernetes/goog-image
Automatic merge from submit-queue
Trusty: fix the 'ping' issue and fluentd-gcp issue #26379
This PR is mainly for being picking up the fix in #27016 and #27102 in trusty code, so that we can fix the issues in the release-1.2 branch for GCI. It contains two parts:
(1) Adding iptables rules to accept ICMP traffic, otherwise 'ping' from a pod does not work;
(2) Revising the code for cleaning up docker0 stuff including the bridge and iptables rules. I slightly refactor the code of starting kubelet and removing docker0 stuff before starting kubelet. The old code did it after starting kubelet but before restarting docker. I think doing it before starting kubelet is safter.
cc/ @roberthbailey @fabioy @dchen1107 @a-robinson @kubernetes/goog-image
Automatic merge from submit-queue
cluster/gce/coreos: Update heapster apiVersion
This fixes an inadvertant search-replace error in #26617.
The error was missed then because the search-replace issue wasn't
present in the standalone controllers, but was in all the others.
I verified that with this change heapster comes up under the default influxdb monitoring and without this change addon manager spits out validation failure errors for the heapster yaml.
cc @yifan-gu
This change adds support for mounting local ssds on GCI.
It updates the previous container-vm behavior as well to
match that for GCI nodes by mounting the local-ssds under
the same path (/mnt/disks/ssdN).
Automatic merge from submit-queue
GCI: add support for network plugin
I had run e2e against a cluster with both master and nodes on GCI a couple of times. The PR auto tests will cover the hybrid cluster with just master on GCI.
cc/ @roberthbailey @fabioy @kubernetes/goog-image
Automatic merge from submit-queue
Exit image puller subshell
Exit the subshell with 0 so even if the last docker pull fails the pod doesn't end up in the error state.
This fixes an inadvertant search-replace error in #26617.
The error was missed then because the search-replace issue wasn't
present in the standalone controllers, but was in all the others.
Automatic merge from submit-queue
GCI: fix the issue #26379
This PR deletes docker0 explicitly to fix the issue. In some cases, coexistence of docker0 and cbr0 make troubles in GCI-based cluster instances.
I verified it in GKE. With the fix, fluentd-gcp pod shows no error. "curl google.com" can work inside a pod. Mark it as P0 to match the issue priority.
@a-robinson @roberthbailey @freehan @kubernetes/goog-image
Automatic merge from submit-queue
Enable support for memory eviction configuration via salt
Added evictions based on memory by default whenever the available memory is < 100Mi.
Updated GCE and GCI.
Automatic merge from submit-queue
Bump cluster autoscaler version and enable scale down by default
Follow up of https://github.com/kubernetes/contrib/pull/1148.
cc: @piosz @fgrzadkowski @jszczepkowski
Automatic merge from submit-queue
Re-enable node problem detector by default
Re-enable node problem detector started in gce cluster by default.
For now, in the master node, the node problem detector will be started and do nothing (see https://github.com/kubernetes/node-problem-detector/pull/13).
But in fact, in my test cluster, the master has no extra cpu to run the node problem detector, so node problem detector is started on all nodes except master, which is what we want but not expected...
@dchen1107
/cc @kubernetes/sig-node
/cc @andyzheng0831 for the gci script change.
[]()
Automatic merge from submit-queue
Don't run fluentd-es on GCI masters
It isn't run on containervm masters. It can't do anything on the master because the master doesn't have kube-proxy running to enable fluentd to talk to the elasticsearch service.
@andyzheng0831
Automatic merge from submit-queue
Add collection of the new glbc and cluster-autoscaler logs
I've incremented the version numbers by 2 to avoid conflicting with #26652. I'll make sure the potential conflict between the images gets resolved reasonably.
cc @piosz @bprashanth @aledbf
We want to continuously validate Docker releases (#25215), on GCI. This change
adds a new test config variable, `KUBE_GCI_DOCKER_VERSION`, through which we can
specify which version of Docker we want to run on the master and nodes. This
change also patches the Jenkins e2e-runner with the ability to fetch the latest
Docker (pre)release, and sets the aforementioned variable accordingly.
Automatic merge from submit-queue
GCI/Trusty: support the Docker registry mirror
@roberthbailey @zmerlynn please review it.
cc/ @fabioy @dchen1107 @kubernetes/goog-image FYI.
cc/ @ojarjur it is very straightforward to add support for GCI, which is pretty much like the change to ContainerVM's configure-vm.sh in your original PR #25841.
Automatic merge from submit-queue
GCI: correct the fix in #26363
This PR is mainly for correcting the fix to 'find' command in #26363. I added "-maxdepth 1" in an earlier change, and #26363 tried to fix it by changing the search path. This is potentially incorrect, when yaml files are in more than one layer deep. The real fix should be removing the "-maxdepth 1" flag from 'find' command. This PR also updates two minor places in the file configure-helper.sh introduced by two previous PR #26413 and #26048.
@roberthbailey @wonderfly
cc/ @dchen1107 @fabioy @kubernetes/goog-image
Automatic merge from submit-queue
Increase failure threshold for glbc liveness probe
This pod fails a liveness probe on occasion, probably because the failure thresholds are too strict. Simple enough that either reviewer can review.
Automatic merge from submit-queue
pin GCI version to milestone 52
This is mainly for pinning the 1.2 branch to GCI milestone 52
which contains correct docker and kubelet built in.
Doing this allows us to upgrade docker to v1.11 (issue #26455)
in GCI 53 without breaking the 1.2 release branch.
@kubernetes/goog-image @dchen1107 @roberthbailey @andyzheng0831
This is mainly for pinning the 1.2 branch to GCI milestone 52
which contains correct docker and kubelet built in.
Doing this allows us to upgrade docker to v1.11 (issue #26455)
in GCI 53 without breaking the 1.2 release branch.
Automatic merge from submit-queue
Rebuild elasticsearch image to include changes since 1.2
Fixes#25360. I've pushed the image to GCR.
@jimmidyson @keontang @vishh
Automatic merge from submit-queue
Move the defaults setting of GCI to util.sh
fixes#26291
This change recovers some of the side effects of
https://github.com/kubernetes/kubernetes/pull/26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
@euank @roberthbailey Can you review?
Automatic merge from submit-queue
cluster/coreos: Update heapster addon to beta2
fixes#26616
As noted there, heapster was updated but not for gce/coreos which breaks anything that depends on heapster's new metrics API (i.e. autoscaling)
This change recovers some of the side effects of
https://github.com/kubernetes/kubernetes/pull/26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
Automatic merge from submit-queue
GCI: cherry-pick the fix in PR #25670
This PR simply copies the fix in #25670 into the GCI support.
cc/ @kubernetes/goog-image @dchen1107 @roberthbailey
Automatic merge from submit-queue
Switch DNS addons from skydns to kubedns
Change GCI and trusty cluster-helper scripts to use kubedns instead of skydns.
Unified skydns templates using a simple underscore based template and
added transform sed scripts to transform into salt and sed yaml
templates
Moved all content out of cluster/addons/dns into build/kube-dns and
saltbase/salt/kube-dns
Automatic merge from submit-queue
Support for cluster autoscaler in GCE Trusty and GCI images
Fixes: #26346
Ref: #26197
cc: @fgrzadkowski @vulpecula @piosz @jszczepkowski
Automatic merge from submit-queue
Prepull images in e2e
Quick and dirty image puller because the SQ stalled multiple times just *today* on image pull flake (https://github.com/kubernetes/kubernetes/issues/25277).
@kubernetes/sig-node @kubernetes/sig-testing wdyt?
Automatic merge from submit-queue
Make node-instance-group base names unique to prevent collisions
We create multiple IGMs for >1000 Node clusters. When we have a conflict on base name IGMs will fight over ownership of the VM that happen to have the name belonging to multiple IGMs.
This change will increase reliability of starting big clusters.
cc @wojtek-t @alex-mohr @roberthbailey @mikedanese
Automatic merge from submit-queue
Add node problem detector as an addon pod.
```release-note
Introduce a new add-on pod NodeProblemDetector.
NodeProblemDetector is a DaemonSet running on each node, monitoring node health and reporting
node problems as NodeCondition and Event. Currently it already supports kernel log monitoring, and
will support more problem detection in the future. It is enabled by default on gce now.
```
This PR enables NodeProblemDetector as an add-on pod.
/cc @mikedanese @kubernetes/sig-node
[]()
Automatic merge from submit-queue
Configuration for GCP webhook authentication and authorization
This PR adds configuration for GCP webhook authentication and authorization in ContainerVM and GCI. The change of configure-vm.sh and kube-apiserver.manifest is directly copied from @cjcullen's PR #25380 and #25296. The change in GCI script configure-helper.sh includes the support for webhook authentication and authorization, and also some code refactor to improve readability.
@cjcullen @roberthbailey @zmerlynn please review it. The original PRs are P1, please mark this as P1.
cc/ @fabioy @kubernetes/goog-image FYI.
I verified it by running e2e tests on GCI cluster. Without the GCI side change, cluster creation fails as being capture by GKE Jenkins tests. I don't test when the two env GCP_AUTHN_URL and GCP_AUTHZ_URL are set, because they are only set in GKE. After this PR is merged, @cjcullen will test in GKE.
Automatic merge from submit-queue
cluster/gce/coreos: Set service-cluster-ip-range
Broken by #19242
See also #26002
This is necessary to kube-up for me, but depending on how #26002 plays out, this PR might not be necessary. Happy to close this or merge or whatever depending on what's best.
cc @yifan-gu @sjpotter @mikedanese
Automatic merge from submit-queue
Updating CentOS image, adding heat back to the required cli tools.
[]()
Updated the CentOS cloudimage to the latest available, and also added heat to the required list of cli tools. This is an interim step to replacing all the commands with openstackclient.
Automatic merge from submit-queue
add CIDR allocator for NodeController
This PR:
* use pkg/controller/framework to watch nodes and reduce lists when allocate CIDR for node
* decouple the cidr allocation logic from monitoring status logic
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/19242)
<!-- Reviewable:end -->
Automatic merge from submit-queue
GCI: Fix the condition for using the default image
This PR revises the condition for using the default GCI image. The old logic is not convenient for manually run e2e tests in some cases (mainly for GCI team to test custom images). The new logic by this PR is very similar to the logic in using ContainerVM. When setting distro to "gci", if master or node image is unset, we use gci-dev for it. If either is set, we respect it.
@roberthbailey @zmerlynn @dchen1107 please review it, and we should cherry pick it in release-1.2 branch. Thanks!
cc/ @kubernetes/goog-image @adityakali FYI
Automatic merge from submit-queue
Fix hyperkube's layer caching, and remove --make-symlinks at build time
@david-mcmahon This is required before you release. Explanation in the code.
Automatic merge from submit-queue
AWS: More support for ap-northeast-2 region
Issue #24446
The new AWS region for Seoul, Korea (ap-northeast-2)
was launched in January 2016
https://aws.amazon.com/blogs/aws/now-open-aws-asia-pacific-seoul-region/
But it requires a few changes.
To test:
```
export KUBERNETES_PROVIDER=aws
export KUBE_AWS_ZONE=ap-northeast-2a
export MASTER_SIZE=t2.medium
export NODE_SIZE=t2.medium
export NUM_NODES=4
cluster/kube-up.sh
```
I assigned the AMIs by checking the specific version used from `ap-northeast-1`,
and finding the same image with the same datestamp.
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24464)
<!-- Reviewable:end -->
Automatic merge from submit-queue
GCI/Trusty: Fix an issue in using 'find' commands
This PR makes the logic of 'find' command consistent with the 'cp' command afterwards, i.e., only check one layer of a given dir. Without this fix, we have seen a recent breakage after PR #25309 added the file cluster/addons/fluentd-elasticsearch/es-image/template-k8s-logstash.json. The 'find' command discovers this json file, but the 'cp' command fails.
@roberthbailey @dchen1107 @zmerlynn please review this fix, and mark it as a cherry pick candidate. I already verified this fix can resolve the breakage.
cc/ @wonderfly @fabioy @kubernetes/goog-image FYI
Automatic merge from submit-queue
cluster/juju: Updated the url for the getting started doc
Minor change to update the URL pointing at the "Running Kubernetes locally via Docker" document
Automatic merge from submit-queue
GCI: Enable the log of upstart jobs
This PR enables the log of upstart jobs in master.yaml and node.yaml. By default, log of upstart jobs are enabled in Trusty and placed in /var/log/upstart, but not enabled in GCI. This change explicitly directs the log to the system logger. For trusty, they are in /var/log/syslog file. In GCI, we can check it using "journalctl". This change will be useful for debugging if cluster initialization fails.
@roberthbailey @maisem @dchen1107 please review it. This will be useful for issues like #23634. We should also cherry pick it in release-1.2
cc/ @fabioy @zmerlynn @wonderfly FYI.
Automatic merge from submit-queue
Automatically download the latest stable release version of Kubernetes.
The ubuntu version of download-release.sh included in the binary release downloads the released .tar.gz file again. Right now the version of the downloaded file is manually encoded within the script. This change fetches the released version automatically, similar to the shell script available on the main Kubernetes site below:
https://get.k8s.io/
Ideally the installation on bare metal ubuntu should work with the files available in the already downloaded package.
@mikedanese
Automatic merge from submit-queue
add index template for es aggregations
This index template helps us to do es aggregations of namespace_name, pod_name and container_name. Then after doing eggs, we will get the whole name not all the spilt pieces.
fix#25127
Automatic merge from submit-queue
Salt configuration for the new Cluster Autoscaler for GCE
Adds support for cloud autoscaler from contrib/cloud-autoscaler in kube-up.sh GCE script.
cc: @fgrzadkowski @piosz
Automatic merge from submit-queue
Use --format='value(name)' with gcloud instead of grep/awk/cut
Fixing our fragile parsing of `gcloud` is getting old (#24746, #25159, maybe others?).
Instead, let's just get the proper output out of `gcloud` in the first place.
Automatic merge from submit-queue
PSP admission
```release-note
Update PodSecurityPolicy types and add admission controller that could enforce them
```
Still working on removing the non-relevant parts of the tests but I wanted to get this open to start soliciting feedback.
- [x] bring PSP up to date with any new features we've added to SCC for discussion
- [x] create admission controller that is a pared down version of SCC (no ns based strategies, no user/groups/service account permissioning)
- [x] fix tests
@liggitt @pmorie - this is the simple implementation requested that assumes all PSPs should be checked for each requests. It is a slimmed down version of our SCC admission controller
@erictune @smarterclayton
Automatic merge from submit-queue
Remove myself from a bunch of OWNERS files
For the time being I am too overloaded to do non scheduler/admission related reviews that aren't explicitly assigned to me.
cc/ @brendandburns
Automatic merge from submit-queue
Change default clusterCIDRs from /16 to /14 in GCE configs allowing 1000 Node clusters by default.
cc @thockin @roberthbailey @wojtek-t @zmerlynn @davidopp
There are a lot of references to https://get.k8s.io/ over the internet.
Most of such references do not describe KUBERNETES_SKIP_DOWNLOAD env variable
and newbies can get into a situation described below:
- execute `wget -q -O - https://get.k8s.io | bash`
- receive a failure due too missed packages or some configs
- fix the issue
- try again `wget -q -O - https://get.k8s.io | bash`
In this case, get-kube.sh will not check that kubernetes directory already
exist and repeat download again.
Lets make get-kube.sh more user-friednly and check existence if kubernetes dir
Automatic merge from submit-queue
Openstack provider
Our pull request delivers solution to create Kubernetes cluster on the top of OpenStack. Heat OpenStack Orchestration engine describes the infrastructure for Kubernetes cluster. CentoOS images are used for Kubernetes host machines.
We tested our solution with DevStack and Citycloud provider.
We believe that our solution will fill the gap that which is on the market.
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/21737)
<!-- Reviewable:end -->
Apparently our cluster start time increased, to the point where users
are reporting spurious timeouts (#23623) and users are reporting that
increasing the timeout fixes the issue (thanks @paralin for the
suggestion and @jlfields for confirming).
Fix#23623
Automatic merge from submit-queue
GCI/Trusty: Fix the running of kube-addon-manager
This PR fixes the issue that kube-addon-master (added in #23600) is not started. Without this fix, no kube-system pods can be running correctly. As a result, the GCI-based Jenkins testing k8s head has been down for a couple of days. The root cause is that we stopped to use namespace.yaml, but configure-helper.sh still tries to copy it. This PR also gets rid of /var/cache/kubernetes-install/kube_env.yaml, as it is not needed anymore after #24108.
@mikedanese @roberthbailey @dchen1107 please review it. If possible please mark it as P1, as it blocks GCI-based Jenkins tests.
cc/ @kubernetes/goog-image @fabioy FYI
Automatic merge from submit-queue
Added vsphere support for vagrant
Since the native vsphere support (using govc library) requires admin permissions on ESX/vCenter, not everyone can have such permissions. So I'm adding a vsphere support using vagrant using vagrant-vsphere plugin
Automatic merge from submit-queue
Move godeps to vendor/
This is a first-step towards glide support, maybe we don't want or need to take this, but it was easy to try.
This fails to compile, not sure why:
```
# k8s.io/kubernetes/pkg/apis/extensions/v1beta1
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:2703: undefined: extensions.ClusterAutoscaler
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:2703: undefined: ClusterAutoscaler
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:2719: undefined: extensions.ClusterAutoscaler
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:2719: undefined: ClusterAutoscaler
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:2723: undefined: extensions.ClusterAutoscalerList
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:2723: undefined: ClusterAutoscalerList
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:3468: Convert_extensions_JobSpec_To_v1beta1_JobSpec redeclared in this block
previous declaration at _output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion.go:328
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:3845: Convert_extensions_ScaleStatus_To_v1beta1_ScaleStatus redeclared in this block
previous declaration at _output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion.go:98
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:4737: Convert_v1beta1_JobSpec_To_extensions_JobSpec redeclared in this block
previous declaration at _output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion.go:380
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:5186: Convert_v1beta1_ScaleStatus_To_extensions_ScaleStatus redeclared in this block
previous declaration at _output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion.go:120
_output/local/go/src/k8s.io/kubernetes/pkg/apis/extensions/v1beta1/conversion_generated.go:2723: too many errors
!!! Error in /home/thockin/tmp/godep-vendor/src/k8s.io/kubernetes/hack/lib/golang.sh:417
```
Automatic merge from submit-queue
cluster/images/hyperkube: create symlink for each server
Add a kubelet symlink so that the hyperkube image can appear as a kubelet image. https://github.com/kubernetes/kubernetes/issues/24510
Automatic merge from submit-queue
GCI: Add two GCI specific metadata pairs
This PR adds two GCI specific metadata pairs when using GCI image.
(1) "gci-update-strategy": by default the GCI in-place updater is enabled. It means that when a new image is released, the instance on the old image will be upgraded to the new image. In this change, we turn it off;
(2) "gci-ensure-gke-docker": GCI is built with two versions of docker. When this metadata is set to "true", the version satisfying kubernetes qualification will be used. Setting this metadata prevents from using incorrect docker version.
Automatic merge from submit-queue
Use v1.6.2-1 tag for build.
Is there any reason these don't use the VERSION file like everything else? cc @luxas @ixdy
Automatic merge from submit-queue
Add an entry to the salt config to allow Debian jessie on GCE.
```release-note
Add an entry to the salt config to allow Debian jessie on GCE.
As with the existing Wheezy image on GCE, docker is expected
to already be installed in the image.
```
[]()
Automatic merge from submit-queue
Convert internal types to use exact precision integers
This makes conversion more suitable for future optimizations, and we need to stop pretending for some of our internal types that the width of the int doesn't matter.
@wojtek-t
Automatic merge from submit-queue
Fix detect-node-names to not error out if there are no nodes
Fixes#21564.
Teardown was not working correctly in rare cases because `detect-node-names` was failing before any of the actual cleanup was run. I'm pretty sure the issue was that there was an instance group, but no instances in the instance group, so we bailed out when we tried to expand the bash array.
This PR adds a guard so we don't bail if the array is empty.
cc @jlowdermilk @spxtr
Automatic merge from submit-queue
Promote Pod Hostname & Subdomain to fields (were annotations)
Deprecating the podHostName, subdomain and PodHostnames annotations and created corresponding new fields for them on PodSpec and Endpoints types.
Annotation doc: #22564
Annotation code: #20688
CentOS 7 Core nodes running on OpenStack with an SSL-enabled API
endpoint results in the following error without this patch:
F0425 19:00:58.124520 5 server.go:100] Cloud provider could not be initialized: could not init cloud provider "openstack": Post https://my.openstack.cloud:5000/v2.0/tokens: x509: failed to load system roots and no roots provided
The root cause is that the ca-bundle.crt file is actually a symlink
which points to a directory which wasn't previously exposed.
[root@kubernetesstack-master ~]# ls -l /etc/ssl/certs/ca-bundle.crt
lrwxrwxrwx. 1 root root 49 18 nov 11:02 /etc/ssl/certs/ca-bundle.crt -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
[root@kubernetesstack-master ~]#
Fixed the order of fields for basic_auth.
This provider still needs to leverage common.sh for generating proper credentials though.
Also documented a pattern for how to get the SWIFT_SERVER_URL automatically
Hard-coded the enabling of the common addons:
- logging
- kube-dashboard
- monitoring
Will make it configureable in a subsequent PR.
Also need to enable configuration of basic_auth.csv
This makes it so that we download the OS image automatically.
Also contains other usability improvements:
- kubectl context created with heat stack name
- Bumped default minions to 3
Making the assumption that the person running kube-up has their
Openstack environment setup, those same variables are being passed
into heat, and then into openstack.conf.
The salt codebase was modified to add openstack as well.
If someone has an openrc as part of their profile, this will make kube-up work automatically.
The only things that have to be modified are in config-default.sh, either by editing the file or setting environment variables.
This assumes you have your environement variables set correctly.
When ENABLE_PROXY is set to true, it takes the current proxy
settings and applies them to the heat configuration.
Also modified the defaults system in config-default.sh
Automatic merge from submit-queue
Update Docker version after cockpit installation
Fixes https://github.com/kubernetes/kubernetes/issues/24530
The vagrant setup didn't worked for me because `cockpit cockpit-kubernetes` brings their own Docker version (1.7) which doesn't work and the master components doesn't come up. More information about this bug are in my [issue](https://github.com/kubernetes/kubernetes/issues/24530).
My test system:
```bash
$ uname -a
Darwin MyMacBook.local 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64
$ vagrant --version
Vagrant 1.8.1
$ VBoxManage --version
5.0.16r105871
```
Automatic merge from submit-queue
Add support for running clusters on GCI
Google Container-VM Image (GCI) is the next revision of Container-VM. See documentation at https://cloud.google.com/compute/docs/containers/vm-image/. This change adds support for starting a Kubernetes cluster using GCI.
With this change, users can start a kubernetes cluster using the latest kubelet and kubectl release binary built in the GCI image by running:
$ KUBE_OS_DISTRIBUTION="gci" cluster/kube-up.sh
Or run a testing cluster on GCI by running:
$ KUBE_OS_DISTRIBUTION="gci" go run hack/e2e.go -v --up
The commands above will choose the latest GCI image by default.
Added KUBE_CONFIG_CONTEXT environment variable to customize the
kubeconfig context created at the end of the aws kube-up script.
Signed-off-by: Christian Stewart <christian@paral.in>
Automatic merge from submit-queue
Switch to ABAC authorization from AllowAll
Switch from AllowAll to ABAC. All existing identities (that are created by deployment scripts) are given full permissions through ABAC. Manually created identities will need policies added to the `policy.jsonl` file on the master.
Automatic merge from submit-queue
don't source the kube-env in addon-manager
This was added in 2feb658ed7 which became unused after #23603 but wasn't removed
The UI didn't work with vSphere kube-up implementation. This fixes
that by making the following changes:
* Configure the apiserver with admission controls, especially
ServiceAccount. This will provide the token to the dashboard pod
that it needs to talk to the apiserver. This will also improve other
pods that require service accounts.
* Add routes to the master so it can communicate with the pods, so
hitting the https://MASTER/ui URL will allow it to contact the
pods.
* Add an extra subject for the cluster IP to the apiserver, so when
the dashboard communicates with the apiserver, the certificate
matches the IP address it's using.
Automatic merge from submit-queue
Trusty: Add debug supports for docker and kubelet
This PR adds debug support in two aspects: (1) For a test cluster, docker command will have "--debug" flag. Recently we noticed that this is very helpful in debug e2e test failures; (2) The kubelet command line will be put in /etc/default/kubelet. If a developer wants to test kubelet flags without recreating a cluster, she/he only needs to revise this file and then run "initctl restart kubelet". In addition, this PR fixes a couple of small things like comments and alignment.
Test result:
(1) Manually verified changing /etc/default/kubelet and run "initctl restart kubelet";
(2) Verified docker command line flag "--debug";
(3) e2e on pure trusty cluster and hybrid cluster all passed.
@roberthbailey @dchen1107 @zmerlynn please review it.
cc/ @yujuhong @fabioy @wonderfly FYI.
Automatic merge from submit-queue
Initial kube-up support for VMware's Photon Controller
This is for: https://github.com/kubernetes/kubernetes/issues/24121
Photon Controller is an open-source cloud management platform. More
information is available at:
http://vmware.github.io/photon-controller/
This commit provides initial support for Photon Controller. The
following features are tested and working:
- kube-up and kube-down
- Basic pod and service management
- Networking within the Kubernetes cluster
- UI and DNS addons
It has been tested with a Kubernetes cluster of up to 10
nodes. Further work on scaling is planned for the near future.
Internally we have implemented continuous integration testing and will
run it multiple times per day against the Kubernetes master branch
once this is integrated so we can quickly react to problems.
A few things have not yet been implemented, but are planned:
- Support for kube-push
- Support for test-build-release, test-setup, test-teardown
Assuming this is accepted for inclusion, we will write documentation
for the kubernetes.io site.
We have included a script to help users configure Photon Controller
for use with Kubernetes. While not required, it will help some
users get started more quickly. It will be documented.
We are aware of the kube-deploy efforts and will track them and
support them as appropriate.
Automatic merge from submit-queue
Trusty: Add retry in curl commands
This fix is for improving robustness in fetch critical metadata files when the metadata server is temporarily unreachable.
@roberthbailey @zmerlynn @dchen1107 please review it.
cc/ @fabioy @wonderfly FYI.
Automatic merge from submit-queue
Use kube-system namespace
Fixes#23153.
Sadly, kube-system isn't automatically created, so people need to make
sure to create it in their turnup scripts. Also after creating
kube-system it can take 10+ seconds for master and proxy to show up.
I tested the equivalent of these changes locally, but not these changes
themselves as I don't have a dev/build env up, so please read carefully
and maybe try them out!
This is for: https://github.com/kubernetes/kubernetes/issues/24121
Photon Controller is an open-source cloud management platform. More
information is available at:
http://vmware.github.io/photon-controller/
This commit provides initial support for Photon Controller. The
following features are tested and working:
- kube-up and kube-down
- Basic pod and service management
- Networking within the Kubernetes cluster
- UI and DNS addons
It has been tested with a Kubernetes cluster of up to 10
nodes. Further work on scaling is planned for the near future.
Internally we have implemented continuous integration testing and will
run it multiple times per day against the Kubernetes master branch
once this is integrated so we can quickly react to problems.
A few things have not yet been implemented, but are planned:
- Support for kube-push
- Support for test-build-release, test-setup, test-teardown
Assuming this is accepted for inclusion, we will write documentation
for the kubernetes.io site.
We have included a script to help users configure Photon Controller
for use with Kubernetes. While not required, it will help some
users get started more quickly. It will be documented.
We are aware of the kube-deploy efforts and will track them and
support them as appropriate.
E2e shows occasional kubectl failures here, so add some retries. We may want
to make this more general, but I think we should try it out in small scope
first.
Also clean up the retry loop so it doesn't process errors as successful runs
(discovered in testing).
Also simplify a bit of go template syntax.
Testing: I made kubectl randomly fail 50% of the time ($RANDOM%2 ==0) and
iterated until this gave me more helpful results. Still not perfect, but
better.
Automatic merge from submit-queue
jenkins: Allow configuration of release bucket
This allows others to leverage the existing E2E code to test some
patched kube binary by simply overriding the bucket and reusing many of
the existing scripts
Automatic merge from submit-queue
Trusty: Handle the new var in kube-proxy manifest
This is to capture the kube-proxy manifest change in PR #24429.
@roberthbailey @fabioy @zmerlynn please review this change and mark it as cherry pick candidate. We need to catch up 1.2.3 release.
cc/ @dchen1107 @wonderfly @cjcullen FYI.
I have verified this fix. Without this fix, kube-proxy pod in Trusty nodes cannot be started correctly, i.e., the command line has an unhadled variable. And some other kube-system pods do not work correctly as kube-proxy is not working well. After applying this fix, kube-proxy can be started correctly, and all kube-system pods run successfully.
Automatic merge from submit-queue
Fix GLBC cluster addon README link
Fix the link to L7 load balancer controller in GLBC cluster addon README.
Fixed#24462.
Automatic merge from submit-queue
Strip comments from configure-vm.sh for gce
We are getting very close to the 32KiB limit on GCE metadata entry length. We used to strip comments before putting the value in metadata, but I think we removed it in a refactor because it wasn't absolutely necessary, and leaving it out made the scripts slightly cleaner. It's close to being necessary again.
Removing comments reduces the size from 31,609B to 27,221B: https://www.diffchecker.com/0xmmecvw.
Automatic merge from submit-queue
add HOME env variable for kube-addons service
Fix https://github.com/kubernetes/kubernetes/issues/23973.
Briefly, systemd service does not know the `HOME` environment variable which causes the kubectl write schema file into `/.kube` while it is expected to be `/root/.kube`.
Automatic merge from submit-queue
Add easy-rsa to hyperkube container
Otherwise gets downloaded a runtime, which kind of breaks the container model.
See [comment](https://github.com/kubernetes/kubernetes/issues/20514#issuecomment-195835786) in #20514 - this causes dockerized install of k8s to fail if you're behind a proxy. make-ca-cert.sh already looks for a local copy of easy-rsa.tar.gz before downloading it, so this drops the tarball in the expected place in the container.
Automatic merge from submit-queue
Fix spacing in usage_from_stdin and info_from_stdin (issue #24186).
If "a" is a bash array, then the syntax to append the contents of $line as a
new element to the array is a+=("$line"), not messages+=$line
Using the former syntax just seems to append to the first element, creating a
long string and thus losing newline information.
Fixing this allows us to drop some empty lines from invocations of
usage_from_stdin.
Automatic merge from submit-queue
add labels to kube component static pods
```
$ k --namespace=kube-system get po -l 'tier in (control-plane)'
NAME READY STATUS RESTARTS AGE
kube-apiserver-k-7-master 1/1 Running 2 1m
kube-controller-manager-k-7-master 1/1 Running 1 1m
kube-scheduler-k-7-master 1/1 Running 0 54s
$ k --namespace=kube-system get po -l 'tier in (node)'
NAME READY STATUS RESTARTS AGE
kube-proxy-k-7-minion-eheu 1/1 Running 0 1m
kube-proxy-k-7-minion-mwo9 1/1 Running 0 1m
kube-proxy-k-7-minion-xw6m 1/1 Running 0 1m
```
cc @bgrant0607 @thockin @gmarek
Fixes#21267
Automatic merge from submit-queue
add config-test.sh to cluster/centos so we can run e2e test on centos/fedora/rhel
so I can run e2e test on centos locally using the following command
```console
KUBERNETES_PROVIDER=centos KUBERNETES_CONFORMANCE_TEST=y ./cluster/test-e2e.sh
```
Automatic merge from submit-queue
don't ship kube-registry-proxy and pause images in tars.
pause is built into containervm. if it's not on the machine we should just pull
it. nobody that I'm aware of uses kube-registry-proxy and it makes build/deployment
more complicated and slower.
This allows others to leverage the existing E2E code to test some
patched kube binary by simply overriding the bucket and reusing many of
the existing scripts
If "a" is a bash array, then the syntax to append the contents of $line as a
new element to the array is a+=("$line"), not messages+=$line
Using the former syntax just seems to append to the first element, creating a
long string and thus losing newline information.
Fixing this allows us to drop some empty lines from invocations of
usage_from_stdin.
Automatic merge from submit-queue
Fix GKE kube-up to correctly find an IGM from a multi-zone cluster
I've confirmed that this successfully brings up a cluster, fixing the immediate issue with the new e2e test. Sorry about not properly vetting it in the original PR (#24075).
This does cause a warning message to be printed based on the handling of the NUM_NODES variable though, which I could fix if you guys think it's worth it:
```
Detected 6 ready nodes, found 6 nodes out of expected 3. Found more nodes than expected, your cluster may not behave correctly.
```
@quinton-hoole
Automatic merge from submit-queue
Fix log dump for new gcloud
`gcloud compute instance-groups managed list-instances` at CI has self-link for instance instead of just name. Fixes#24120
Automatic merge from submit-queue
Trusty: Fixes for running GKE master
This PR includes two fixes for running GKE master on our image:
(1) The kubelet command line assembly had a missing part for cbr0. We did not catch it because the code path is not covered by OSS k8s tests;
(2) Remove the "" from the variables in the cert files. It causes a parsing issue in GKE. Again, this code path is not covered by k8s tests.
This PR also refactors the code for assembling kubelet flag. I move all logic into a single function assemble_kubelet_flags in configure-helper.sh for better readability and also simplify node.yaml and master.yaml.
@roberthbailey @dchen1107 please review it, and mark it as cherrypick-candidate. This PR is verified by @maisem. Together with his CL for GKE, we can run GKE cluster with master on our image and nodes on ContainerVM.
cc/ @maisem @fabioy @wonderfly FYI
This only applies to gce kube-up. 60 seconds of open connection should
be sufficient for anything that we should be downloading. The release
tar is currently 255M.
Automatic merge from submit-queue
Make kube2sky and skydns docker images cross-platform
ARM tracking issue: #17981
Continues on: #19216
Make it possible to create `kube2sky` and `skydns` docker images for ARM and other architectures too
Build in a container, so `golang` isn't a dependency
I've preserved the original default behaviour:
- `skydns`: It just compiles with go on host
- `kube2sky`: Build an image
@brendandburns @dchen1107 @ArtfulCoder @thockin @fgrzadkowski
Automatic merge from submit-queue
Up to golang 1.6
A second attempt to upgrade go version above `go1.4`
Merge ASAP after you've cut the `release-1.2` branch and feel ready.
`go1.6` should perform slightly better than `go1.5`, so this time it might work
@gmarek @wojtek-t @zmerlynn @mikedanese @brendandburns @ixdy @thockin
pause is built into containervm. if it's not on the machine we should just pull
it. nobody that I'm aware of uses kube-registry-proxy and it makes build/deployment
more complicated and slower.
Automatic merge from submit-queue
Cross-build hyperkube and debian-iptables for ARM. Also add a flannel image
We have to be able to build complex docker images too on `amd64` hosts.
Right now we can't build Dockerfiles with `RUN` commands when building for other architectures e.g. ARM.
Resin has a tutorial about this here: https://resin.io/blog/building-arm-containers-on-any-x86-machine-even-dockerhub/
But it's a bit clumsy syntax.
The other alternative would be running this command in a Makefile:
```
# This registers in the kernel that ARM binaries should be run by /usr/bin/qemu-{ARCH}-static
docker run --rm --privileged multiarch/qemu-user-static:register --reset
```
and
```
ADD https://github.com/multiarch/qemu-user-static/releases/download/v2.5.0/x86_64_qemu-arm-static.tar.xz /usr/bin
```
Then the kernel will be able to differ ARM binaries from amd64. When it finds a ARM binary, it will invoke `/usr/bin/qemu-arm-static` first and lets `qemu` translate the ARM syscalls to amd64 ones.
Some code here: https://github.com/multiarch
WDYT is the best approach? If registering `binfmt_misc` in the kernels of the machines is OK, then I think we should go with that.
Otherwise, we'll have to wait for resin's patch to be merged into mainline qemu before we may use the code I have here now.
@fgrzadkowski @david-mcmahon @brendandburns @zmerlynn @ixdy @ihmccreery @thockin
Automatic merge from submit-queue
support NETWORK_PROVIDER=cni for KUBERNETES_PROVIDER=vagrant
While trying to develop CNI plugins for K8's, I found the docs referenced the support of --network-plugin=cni for kubelet, but this wasn't surfaced up via salt to support env NETWORK_PROVIDER=cni before a kube-up deployment.
This PR is my attempt at adding CNI support to the kube-up happy path, following a lot of similar work for NETWORK_PROVIDER=kubenet which already exists.
Also, I've added the ability to consume CNI plugin's (binaries) and configuration files from the local cluster/network-plugins directory into the necessary locations as referenced here for CNI:
http://kubernetes.io/docs/admin/network-plugins
This allows a local developer to easily work on CNI plugin development while following the existing kube-up.sh docs and process.
In general, i've struggled to find any authoritative information or answers to my questions in slack regarding CNI progress / correct integration, so comments encouraged here!
Files are taken from cluster/network-plugins/{bin,conf} to be consumed within a vagrant kube-up.sh environment.
Paths used for configuration files and the 'cni' name of the network provider are all from the kubernetes documentation, but the actual implementation in the salt automation doesn't seem to exist.
Use of NETWORK_PROVIDER=cni is documented as useable (as well as it's affects on the runtime args of kubelet),
however the actual implimentation in the salt automation doesnt seem to exist.
this change attempts to fix that for the vagrant usecase.
Automatic merge from submit-queue
use apply instead of create to setup namespaces and tokens in addon manager
when the addon manager restarts, it takes ~15 minutes (1000 seconds) to start the sync loop because it retries creation of namespace and tokens 100 times. Create fails if the tokens already exist. Just use apply.
Automatic merge from submit-queue
Juju kube up
I found some problems with the kube-up script that this pull request addresses. We didn't have the kubectl binary in the correct location.
Just changing where we download the package from the master, and fixing the kube-down.sh script to remove those files.
We rename it to EPHEMERAL_BLOCK_DEVICE_MAPPINGS, and we also change the value
so that it starts with a `,`, instead of always inserting a comma before it.
In this way the value can be empty.
Also, if the user sets the (currently experimental) KUBE_AWS_STORAGE
environment variable to be "ebs", then we will not mount any instance storage
which will cause the machines to use EBS storage instead.
format-disks used to run with non-strict bash semantics, but this changed in
1.2 as we now merge it into the GCE script, so pipefail and errexit are both
set.
However, the way we list the ephemeral disks, by piping to grep, would cause an
exit code of 2 if there were no ephemeral disks.
Tolerate failure here by add `|| true`. The metadata service call is unlikely
to fail, so we continue to ignore that possibility.
Automatic merge from submit-queue
Fix so setup-files don't recreate/invalidate certificates that already exist
Fixes: #23197 and a lot of other DNS and dashboard issues
This is quite critical for `docker`-based users and should be considered as a **cherrypick-candidate** as it makes a lot of people wonder why Dashboard and/or DNS doesn't work. Example: https://github.com/kubernetes/dashboard/issues/374
Earlier when you shut your `docker.md` cluster down and started it again, all ServiceAccounts became invalidated by `setup-files` that happily ran once again and replaced all files. That made `apiserver` and `controller-manager` pick up the new certs (or there was a race condition, they _could_ have picked up the old certs too, but that's unlikely) and the old certs were put into `/var/run/secrets` because the ServiceAccount's Secrets were stored in etcd, which `setup-files` didn't touch.
@fgrzadkowski @huggsboson @thockin @mikedanese @vishh @pwittrock @eparis @bgrant0607
Automatic merge from submit-queue
Trusty: Regional release .tar.gz support
@zmerlynn and @roberthbailey please review it. This change is to support the feature added in PR #22234. The entire logic is pretty much the same as in #22234, with only few minor changes in implementation.
I had manually run e2e tests with "export RELEASE_REGION_FALLBACK=true" on two clusters: (1) Trusty on master nodes on ContainerVM; (2) Master and nodes all on trusty. All tests are green. I don't figure out a way to simulate regional fallback. But I did test the function download_or_bust() out-of-box.
cc/ @wonderfly @dchen1107 @fabioy FYI.
Sadly, kube-system isn't automatically created, so people need to make
sure to create it in their turnup scripts. Also after creating
kube-system it can take 10+ seconds for master and proxy to show up.
I tested the equivalent of these changes locally, but not these changes
themselves as I don't have a dev/build env up, so please read carefully
and maybe try them out!
Use kubectl create ns
Automatic merge from submit-queue
Create a new Deployment in kube-system for every version.
It appears that version numbers have already been properly added to these files. Small change to delete an old deployment entirely, so we can make a new one per version (like replication controllers).
We'll want to change this back once the kube-addons support deployments in a later version.
Mostly doc updates and cruft removal
- describe conformance test policy and howto in e2e-tests.md
- rm e2e test info from testing.md in the name of DRY
- rm cluster/test-conformance.sh; unusable in release tar, not e2e.go
- update e2e test link in write-a-getting-started-guide.md
There are actually two `roles` setting in ubuntu installation scripts.
One is roles as string, which can be set as env and then used in scripts.
The other is roles as array, which is used by internal handling to
locate specific role by offset.
This patch tries to distinguish roles meaning by declearing the second
as roles_array, thus eliminating its ambiguity.
When using this flag, this error is shown:
Flag --api-version has been deprecated, flag is no longer respected and will be deleted in the next release
Stop using the flag in the validate-cluster.sh script and avoid the warning.
This commit imports the latest development focus from the Charmer team
working to deliver Kubernetes charms with Juju.
Notable Changes:
- The charm is now assembled from layers in $JUJU_ROOT/layers
- Prior, the juju provider would compile and fat-pack the charms, this
new approach delivers the entirety of Kubernetes via hyperkube.
- Adds Kubedns as part of `cluster/kube-up.sh` and verification
- Removes the hard-coded port 8080 for the Kubernetes Master
- Includes TLS validation
- Validates kubernetes config from leader charm
- Targets Juju 2.0 commands
This should allow allow the non_masquerade_cidr option to get configured
in /etc/salt/minion.d/grains.conf, allowing the flag to used by kubelet
in /etc/sysconfig/kubelet. Default configuration is set in pillar
If we deleted an ELB, we often fail to delete the security group,
because deleting the ELB is invisibly asynchronous.
Add a retry loop around delete-security-group to work around this.
Fix#21147
The only tested-working distros are vivid, wily & jessie.
vivid should not really be used because it is no longer supported, so
recommend wily or jessie instead.
For other distros, recommend jessie instead.
Fix#21218
The previous jessie image had a broken cloud-init, which would use an
Ubuntu-specific 'nobootwait' argument when mounting disks. We now
override that in the image.
Fix#22549
also:
- adds a mechanism to build and upload hyperkube for non-official
releases
- adds a mechanism for proxying azkube's traffic
- --no-cloud-provider for now
- support specifying the resource group for CI scenarios
Allow the gcr.io/google_containers registry to be overridden
regionally by just blasting a new KUBE_ADDON_REGISTRY out. Instead of
adding every addon to Salt and asking all of the other consumers
(Trusty, Juju, Mesos, etc) to change, just script the sed ourselves.
This is probably the 9th grossest thing I've ever done, but it works
well, and it works quickly. I kind of wish it didn't.
It includes some performance improvements for parsing JSON (which is
very important for us, since all Docker logs are JSON) as well as a
couple new settings, like forcing of a flush of multiline logs after a
time period rather than having to wait until a new log is seen before
feeling confident flushing the previous one.
-Remove CPU limits to enable CPU bursting once 1.2 begins enforcing CPU limits.
-Add a memory limit for fluentd-es to match fluentd-gcp.
-Explicitly set requests to match limits.
This change revises the way to provide kube-system manifests for clusters on Trusty. Originally, we maintained copies of some manifests under cluster/gce/trusty/kube-manifests, which is not scalable and hard to maintain. With this change, clusters on Trusty will use the same source of manifests as ContainerVM. This change also fixes some minor problems such as shell variables and comments to meet the style guidance better.
Starting docker through Salt has always been problematic. Kubelet or
the babysitter process should start it. We've kept it around primarily
so we have a `service: docker` node for the Salt DAG.
Instead, we enable (but do not start) the Docker service in Salt. This
lets us keep the DAG node, but won't start it.
There's another bug in Salt, where watches will start the service even
on `service.enabled`. So we remove the watches, and move them to our
existing Salt bug-fix script.
The kubelet flag "nosystem" was removed recently, which breaks kubelet in Trusty. This changes remove the flag usage accordingly. It also revises several aspects of Trusty support to make it in the same page as running on ContainerVM, such as new flags in kubelet and new logic in api-server and etcd pods.
The Docker 1.9.1 package on Debian is broken, and the service fails to
install when run unattended. This is treated as an installation failure
and causes everything to fail.
However, the service can be started by Salt once we're not installing
the package, and indeed we restart docker anyway.
So, on Debian, use a helper script to install the docker package. The
script sets up a policy-rc.d file to prevent the service starting, and
then cleanly removes it afterwards (this would be difficult to do in
Salt, I believe).
PR #22022 added a new variable "cpurequest" in kube-proxy.manifest. This makes kubelet in Trusty fail to start the kube-proxy pod as this variable value is not set.
* In kube-up.sh, create a staging bucket with a location nearest the
zone being created. If new variable RELEASE_REGION_FALLBACK is set
(default false), create multiple buckets and stage to fallback
URLs. (In open source, this path is primarily for testing.)
* In configure-vm.sh, split the URL env variables by comma (if any
extra are present) and retry on the fallback URLs. Also factor the
hash checking into this path rather than outside, since a corrupt
release in a particular geo can be retried in a different geo.
* Remove the local already-staged .tar.gz checks. They've caused
several issues along the way, and with this code path become virtually
unmaintainable. (I could add a sentinel for each bucket it's possibly
staged to, but ew.)
Configurations in config-default.sh should take default values if they
are set outside of the script. `roles` option is an exception. This
patch fix it to maintain consistency behavior with other options.
I didn't expect glog to split single log statements onto multiple lines,
but apparently it does if they're long enough. This groups them back
together appropriately.
Default distro is jessie, due to the support situation with Ubuntu
distros. Default ubuntu distro is wily.
Update the docs to reflect the recommended distros with kube-up, and to
encourage contributions for other distros.
Spot instances take a lot longer to run; wait up to 15 minutes for the
nodes to launch when we're using spot instances. (Previously we were
waiting 5 minutes).
for vsphere provider docker currently only supports 1.9.1 release.
The older versions of docker are failing on jessie due to issue https://github.com/docker/docker/issues/18793
and newer version 1.10.x is not properly tested.
Based on the official debian image, with the following changes:
* Switched extlinux -> grub, because we need to change kernel options
to enable the memory cgroup controller, and extlinux is harder and has
reboot problems
* Added packages that would otherwise be installed as part of the boot
(just an optimization)
* Also add the cloud-initramfs-growroot package; with it the root
volume will resize.
* We add panic=10 & oops=panic to kernel options
* We install the packages as per the base image, except we install
awscli from pip, because the repo version is really old.
Once we've built the master, we can build kubeconfig. By doing so, if
we time out waiting for the nodes, the system is still configured
correctly.
In particular, spot instances can be slow to launch.
Related to issue #21200
I think we should probably leave this undocumented for now, until we
have a better way to launch multiple sets of nodes, but it's great for
cost savings while testing!
Fix#21200
We run unattened-upgrades manually, and then reboot automatically if we
find /var/run/reboot-required; then we check if any services need
restarting and restart them automatically using the needrestart tool.
This should mean we don't _have_ to build new images on every security
update, though we can do so to avoid a reboot.
Issue #21382
m3.large for > 150 nodes.
t2.micro often runs out of memory. The t2 class has very
difficult-to-understand behaviour when it runs out of CPU. The
m3.medium is reasonably affordable, and avoids these problems.
Fix#21151
Issue #18975
Otherwise we risk services coming up on the master before the backing
volume is ready.
If we then see the master-pd is already mounted, don't try to remount
it.
Issue #21155
In commit 07d7cfd3, people add ${DEBUG} == "true" in file
cluster/saltbase/salt/generate-cert/make-ca-cert.sh
But the default value for DEBUG is not set. In that commit, it set the value
of DEBUG in cluster/ubuntu/util.sh where it call this script. When using this
script in saltstack to bring up cluster in other cloud platforms, it will fail
to generate the cert since we set set -o nounset in make-ca-cert.sh and var DEBUG
does not set. Set a default value for DEBUG here will fix this problem.
If the ephemeral volume is present and mounted, don't try to reinitialize
them.
Don't block the boot if the ephemeral volume is corrupt / missing -
this enables us to cope with a stop/start & presumably also corruption.
In this case, we'll reformat the ephemeral storage.
Fix#21157
This is so we have the same behaviour as on GCE.
This also lets us change the bootstrap script or the config, which is
nice. Instance data is immutable on AWS once it is booted.
Fix#21150
This is good because it removes an obstacle to using the
cluster/ubuntu scripting to install Kubernetes into a restricted
environment where the machines can not open connections to arbitrary
external locations.
Also add debuggability to make-ca-cert.sh
Resolves#21037Resolves#21092
We were assuming the PROJECT env var was set, which the e2e tests do.
But PROJECT is normally not set on AWS (it is set on GCE); this broke as
part of the harmonization.
Revert to the pre-existing behaviour here, where we use "aws_" as the
prefix.
Fix#21141
This change corrects how we determine the log level. Moreover, it explicitly redirects kubelet log to /var/log/kubelet.log, as we noticed it may miss sometimes.
This change moves the code of running and monitoring addon pods in a daemon type upstart job, so that addon manifest monitoring can be restarted automatically upon failure. Second, it updates the usage of "kube-ui" to "dashboard" to match the change in PR #20330.
Package manager "dnf" does not work correctly with Salt
(cf https://github.com/saltstack/salt/issues/31001)
It causes Salt to consider that some packages (python, git, curl, etc.) are not
installed, which breaks the Vagrant Kubernetes setup.
Updating dnf and dnf-plugins-core to their latest version solves the issue.
Additionally, I've added the "fastestmirror" to dnf, which is useful if a
RPM mirror is broken or very slow. (In my case, dnf used a broken mirror which
froze the Kubernetes setup).
Fix script for case when neeed to setup cluster
in an existen VPC and subnet with ip mask example: 10.0.0.0/8.
Fixed bug to detect ip of master if provided MASTER_RESERVED_IP.
For some reason detecting master ip was moved to volumes and only when MASTER_RESERVED_IP=auto.
If specify IPv4 for MASTER_RESERVED_IP like `52.1.1.1`, than we could
not detect ip even during last steps of setuping cluster.
step the KUBE_MASTER_IP is reseted because there are no tag for the
volume.
This change support running kubernetes master on Ubuntu Trusty.
It uses pure cloud-config and shell scripts, and completely gets
rid of saltstack or the release salt tarball.
In the e2e tests detect-master is called directly. In turn, it calls
find-tagged-master-ip, which assumed that find-master-pd has already already
been called. But this wasn't true in the e2e case.
We add a call to find-master-pd; it is idempotent.
Combine the fields that will be used for content transformation
(content-type, codec, and group version) into a single struct in client,
and then pass that struct into the rest client and request. Set the
content-type when sending requests to the server, and accept the content
type as primary.
Will form the foundation for content-negotiation via the client.
- wget is not installed by default on fedora 23. Use curl instead
since it is always available on recent Fedora.
- The repo url for cockpit resulted in an http redirect message being
saved as the repo file which broke deployment. Update the url to
url that was redirected to and ensure that future redirects will be
handled correctly.
- The main Fedora 23 repo includes salt packages, and there is no
salt repo for 23. The salt bootstrap still creates a repo file for
a nonexistent repo, though, and this change removes it to avoid
having dnf report an error on every update.
build-runtime-config was being called in verify-prereqs, which didn't
match how GCE called it, and didn't seem to actually work.
Instead call it just before the master configuration is built. Also
call it just before the node configuration is built, even though the
nodes don't _currently_ require the runtime_config.
To fix it, I just add openssl depedency on "generate-cert" state. It
should work on Debian-like and RedHat-Like systems. (and, Archlinux,
Opensuse, etc)
Fixed error :
$ sudo salt 'kubernetes-master' state.apply
----------
ID: kubernetes-cert
Function: cmd.script
Result: False
Comment: Command 'kubernetes-cert' run
Started: 06:57:06.634203
Duration: 208.719 ms
Changes:
----------
pid:
793
retcode:
1
stderr:
/tmpm24T3R.sh: line 22: openssl: command not found
chgrp: cannot access '/srv/kubernetes/server.key': No such file or directory
chgrp: cannot access '/srv/kubernetes/server.cert': No such file or directory
chmod: cannot access '/srv/kubernetes/server.key': No such file or directory
chmod: cannot access '/srv/kubernetes/server.cert': No such file or directory
stdout:
After applying my patch (success) :
----------
ID: kubernetes-cert
Function: cmd.script
Result: True
Comment: Command 'kubernetes-cert' run
Started: 07:17:04.172384
Duration: 1041.092 ms
Changes:
----------
pid:
1045
retcode:
0
stderr:
Generating a 4096 bit RSA private key
......................................................................++
...............................................................................++
writing new private key to '/srv/kubernetes/server.key'
-----
stdout:
----------
If we don't use an elastic IP, the IP address will be lost if we lose
the master for any reason, and a replacement master will not have the
same IP. But the master IP is set both in client kubeconfig files and
the master SSL certificate. Hence the default should be to allocate an
elastic IP for the master.
One complication: AWS doesn't allow tags on elastic IPs, so it is hard
to track the elastic IP so we can delete it as part of kube-down.
Instead, we take the master EBS volume with the elastic IP. This is a
little odd, but works because the master volume & the master elastic IP
really need to be assigned to the same machine, so might be thought of
as a pair.
Also, we now delete the master EBS volume as part of kube-down, as
people expect kube-down to clean-up everything it creates.
We adapt the existing code to work across all zones in a region.
We require a feature-flag to enable Ubernetes-Lite
Reasons:
* There are some behavioural changes if users create volumes with
the same name in two zones.
* We don't want to make one API call per zone if we're not running
Ubernetes-Lite.
* Ubernetes-Lite is still experimental.
There isn't a parallel flag implemented for AWS, because at the moment
there would be no behaviour changes from this.
This is for internal use at the moment, for testing Ubernetes Lite, but
arguably makes the code a little cleaner.
Also rename KUBE_SHARE_MASTER -> KUBE_USE_EXISTING_MASTER
The version of Salt we're running doesn't do a good job of detecting
systemd. Inspired by https://github.com/saltstack/salt/issues/13926,
I added a provider-force to the services.
With this change, salt-call -l debug state.highstate succeeds, even for
repeated invocations.
The issue was (probably) benign, but definitely caused noised (e.g. #11297)
I got the package name wrong before, which meant that salt was failing
on invocations after the first (the name apparently doesn't matter on
the first invocation).
We've had a lot of salt problems with systemd on AWS; we have a
workaround in place that we use everywhere else, we should use that for
kube-node-unpacker too.
Fixes#19386
Issue #19388
Some functionality in hack/lib is currently depended on by
cluster/common.sh so kube-up from the full release tar (which
does not include hack/) is currently broken. With this PR we
create cluster/lib/ and move the necessary bits from hack/
over to get kube-up working again.
Fixes: 96d1b8d1b2
Signed-off-by: Mike Danese <mikedanese@google.com>
- Move keygen image mesosphere/kubernetes-mesos-keygen -> mesosphere/kubernetes-keygen:v1.0.0
- Remove resolveip in favor of github.com/karlkfi/resolveip (resolveip.sh)
- Remove util-temp-dir.sh in favor of github.com/karlkfi/intemp (intemp.sh)
- Refactor bash code to use intemp (extract functions to scripts)
- Remove util-ssl.sh in favor of mesosphere/kubernetes-keygen
Implement a flag that defines the frequency at which a node's out of
disk condition can change its status. Use this flag to suspend out of
disk status changes in the time period specified by the flag, after
the status is changed once.
Set the flag to 0 in e2e tests so that we can predictably test out of
disk node condition.
Also, use util.Clock interface for all time related functionality in
the kubelet. Calling time functions in unversioned package or time
package such as unversioned.Now() or time.Now() makes it really hard
to test such code. It also makes the tests flaky and sometimes
unnecessarily slow due to time.Sleep() calls used to simulate the
time elapsed. So use util.Clock interface instead which can be faked
in the tests.
For AWS EBS, a volume can only be attached to a node in the same AZ.
The scheduler must therefore detect if a volume is being attached to a
pod, and ensure that the pod is scheduled on a node in the same AZ as
the volume.
So that the scheduler need not query the cloud provider every time, and
to support decoupled operation (e.g. bare metal) we tag the volume with
our placement labels. This is done automatically by means of an
admission controller on AWS when a PersistentVolume is created backed by
an EBS volume.
Support for tagging GCE PVs will follow.
Pods that specify a volume directly (i.e. without using a
PersistentVolumeClaim) will not currently be scheduled correctly (i.e.
they will be scheduled without zone-awareness).
To build the python image, BUILD_PYTHON_IMAGE should be set during make.
When the addon script is running, it will check if python is installed
on the machine, if not, it will use the python image that built previously.
- default the release to the value of latest_release instead of
the string 'latest_release'
- use curl -O when retrieving kubectl to write output to disk instead
of to the screen
In MacOS there is error during setup a new cluster:
```
+ sed -i -e 's/^[[:blank:]]*#.*$//' -e '/^[[:blank:]]*$/d' /sometmpfile
sed: -e: No such file or directory
```
Because sed version of MacOS does not support modern features.
Currently if a pod is being scheduled with no meta.RolesKey label
attached to it, per convention the first configured mesos (framework)
role is being used.
This is quite limiting and also lets e2e tests fail. This commit
introduces a new configuration option "--mesos-default-pod-roles" defaulting to
"*" which defines the default pod roles in case the meta.RolesKey pod
label is missing.
Currently when using a custom elastic IP, the ENV var `KUBE_MASTER_IP` gets
the output of `$(assign-elastic-ip $ip $master_id)` assigned.
This is wrong since the command returns a string:
`Attaching IP 99.999.999.999 to instance i-9999999`
This patch fixes the assignment by calling `get_instance_public_ip` again.
This change is to pick up the fix in PR #18178. It avoids confusing
cadvisor when systemd is present in an instance but does not act
as the init system.
When PROXY_SETTING is empty, you end up an empty
command of "", as witnessed by this bash debug
output when +x is enabled:
+ '' /home/ubuntu/kube/make-ca-cert.sh 10.0.0.232 IP:10.0.0.232,IP:192.168.3.1,DNS:kubernetes,DNS:kubernetes.default,DNS:kubernetes.default.svc,DNS:kubernetes.default.svc.cluster.local
Given the example:
PROXY_SETTING="http_proxy=http://server:port https_proxy=https://server:port"
You would not want this quoted on the script executed
on the remote master or minion node.
Enabling +e, for additional tracing and to
abort on any failure in the remote SSH session.
Adding a DEBUG parameter into config-default.sh allowing additional
debug information to be present in the logs during node rollout, using
bash's "set -x" when DEBUG=true
This allows resource usage monitoring test to launch 100 test pods per node, in
addition to the add-on pods.
Also reduce the test time length since the results over the shorter period are
representative enough.
This change refactors the code of preparing kube-system manifests
for trusty based cluster. The manifests used by nodes do not contain
salt configuration, so we can simply copy them from the directory
cluster/saltbase/salt, make a tarball, and upload to Google Storage.
- document needed packages in hostexec image
- add RunHostCmdOrDie
- kube-proxy e2e: port from ssh to hostexec
- use preset NodeName to schedule test pods to different nodes
- parallel launch of pods
- port from ssh to hostexec
- add timeout because nc might block on udp
- delete test container without grace period
- PrivilegedPod e2e: port from ssh to hostexec
- NodePort e2e: port from ssh to hostexec
- cluster/mesos/docker: Enable privileged pods
Remove a comment that disabled the redirection of output destined for
`/etc/salt/minion.d/grains.conf`. Must have been a commented added to
debug the generation of the line, to view it on `STDOUT`.
The node.yaml has some logic that will be also used by the kubernetes
master on trusty work (issue #16702). This change moves the code
shared by the master and node configuration to a separate script, and
the master and node configuration can source it to use the code.
Moreover, this change stages the script for GKE use.
- add busybox static pod to mesos-docker cluster
- customize static pods with binding annotations
- code cleanup
- removed hacky podtask.And func; support minimal resources for static pods when resource accounting is disabled
- removed zip archive of static pods, changed to gzip of PodList json
- pod utilities moved to package podutil
- added e2e test
- merge watched mirror pods into the mesos pod config stream
We use the AWS CLI support for --query and --filter instead; should be
more reliable and clearer.
Also set the output format to text, so we don't have to set it every
time and don't risk problems if we forget to set it.
Fixes#16747
We do still have to use JSON parsing in one place: ELB does not support
--filter, so we have to use Python there.
Addresses #15968
This patch removes KUBE_ENABLE_EXPERIMENTAL_API and similar calls in
favor of specifying desired features in KUBE_RUNTIME_CONFIG. Changes
have also been made to e2e scripts to re-enable using
KUBE_RUNTIME_CONFIG rather than EXPERIMENTAL_API env vars.
This also introduces KUBE_ENABLE_DAEMONSETS and KUBE_ENABLE_DEPLOYMENTS.
Signed-off-by: Christian Stewart <christian@paral.in>
We can't tag ASGs, but we can see what instances are running in an ASG,
and we can match those by our tags.
So look for our running instances, and look for the ASGs that created
them, and delete those.
This can be defeated (most notably if users change the ASG size to 0),
but it is safer that other deletion methods.
By setting KUBE_SHARE_MASTER=true we reuse an existing master, rather
than creating a new one.
By setting KUBE_SUBNET_CIDR=172.20.1.0/24 you can specify the CIDR for a
new subnet, avoiding conflicts.
Both these options are documented only in kube-up and clearly marked as
'experimental' i.e. likely to change.
By combining these, you can kube-up a cluster normally, and then kube-up
a cluster in a different AZ, and the new nodes will attach to the same
master.
KUBE_SHARE_MASTER is also useful for addding a second node
auto-scaling-group, for example if you wanted to mix spot & on-demand
instances.
If the `hostname` commands used in the polling loop fail, their stdout
is going to be empty and so `getent hosts` command will actually
succeed. For the loop to work as expected, make sure the subcommands
return a string which is an invalid host name.
Allows loading existing auth from kubeconfig on kube-up if a
valid KUBE_CONTEXT is specified, instead of always force
regenerating auth (basic or token) when creating a new cluster.
When KUBE_E2E_STORAGE_TEST_ENVIRONMENT is set to 'true', kube-up.sh script
will:
- Install the right packages for all storage volumes.
- Use devicemapper as docker storage backend. 'aufs', the default one on
Debian, does not support extended attibutes required by Ceph RBD and Gluster
server containers.
Tested on GCE and Vagrant, e2e tests for storage volumes passes without any
additional configuration.
This ensures nfs-common is installed on GCE, and provides a more
functional explanation/example. I launched two replication controllers
so that there were busybox pods to poke around at the NFS volume, and
so that the later wget actually works (the original example would have
to work on the node, or need some other access to the container
network). After switching to two controllers, it actually makes more
sense to use PV claims, and it's probably a configuration that makes
more sense for indirection for NFS anyways.
Fixes AWS ubuntu deployment due to extra-$(uname) vs extra-virtual
package being installed. See issue #14162
Signed-off-by: Christian Stewart <christian@paral.in>
We want to match the version of netcat that is installed on GCE. We
were having problems with netcat-openbsd having slightly different
timeout behaviour (on UDP packets; when there was no listener).
The --log-level="\debug\" flag in DOCKER_OPTS may not be correctly
interpreted in some cases. We turn on this flag only for testing
clusters. In addition to fixing the docker flag, this change
also removes the confusing numbers from the lines of separating
upstart jobs.
- Refactor common and gce/upgrade.sh to use arbitrary published releases
- Update hack/get-build to use cluster/common code
- Use hack/get-build.sh in cluster upgrade test logic
This avoid that we either waste cpu resources due to rounding or that we report
to much to the kubelet such that the e2e tests think they can schedule more than
resources are available.
This fixes https://github.com/mesosphere/kubernetes-mesos/issues/437
We need this for some tests; not all the options are fully plumbed in,
but should enable experimental/v1alpha1, as needed for jobs tests.
In particular, ENABLE_NODE_AUTOSCALER is not yet actually implemented.
This is a bit of a hack of the existing scripts, but the quickest way to get this cluster up.
Will restructure e2e.sh to do this in a more sane way in a separate PR.
This scopes down the initially ambitious PR:
https://github.com/kubernetes/kubernetes/pull/14960 to replace just
`pause` and `fluentd-elasticsearch` to come through `beta.gcr.io`.
The v2 versions have been pushed under new tags, `pause:2.0` and
`fluentd-elastisearch:1.12`.
NOTE: `beta.gcr.io` will still serve images using v1 until they are repushed with v2. Pulls through `gcr.io` will still work after pushing through `beta.gcr.io`, but will be served over v1 (via compat logic).
Similar to #15070, we should log the distro if we're going to tell the
user we can't match it (so the user can see if they have typoed it, and
so it will hopefully be included to us in error reports)
The current timeout of 5 seconds is needlessly short, given that we
fail kube-up if the (eventually consistent?) bucket creation takes
longer.
Raise it to 120 seconds.
Possibly related to issue #14278
The new version fixes problem with missing metrics.
The new config decreases load on GCM/InfluxDB.
Increased stats resolution from default 5s to 30s.
Decreased sink frequency from 2m to 1m.
Changed util.sh: provision-* functions so every flannel interface is created after starting the etcd service and reconfDocker.sh with the "i" option is launched after every flannel interface is up and running.
OpenContrail is an open-source based networking software which provides virtualization support for the cloud.
This change-set adds ability to install and provision opencontrail software for networking in kubernetes based cloud environment.
There are basically 3 components
o kube-network-manager -- plugin between contrail components and kubernets components
o provision_master.sh -- OpenContrail software installer and provisioner in master node
o provision_minion.sh -- OpenContrail software installer and provisioner in minion node(s)
These are driven via salt configuration files
One can provision opencontrail by just setting "export NETWORK_PROVIDER=opencontrail"
Optionally, OPENCONTRAIL_TAG, and OPENCONTRAIL_KUBERNETES_TAG can be used to
specify opencontrail and contrail-kubernetes software versions to install and provision.
Public-IP Subnet provided by contrail can be configured via OPENCONTRAIL_PUBLIC_SUBNET
environment variable
At this moment, plan is to add support for aws, gce and vagrant based platforms
For more information on contrail-kubernetes, please visit https://github.com/juniper/contrail-kubernetes For more information on opencontrail, please visit http://www.opencontrail.org
In order to make the etcd instances of the VMs join into a single cluster,
we used to use the discovery mechanism.
This made the cluster bootstrap dependent on an external etcd cluster instance.
74601ea replaced the dependency on discovery.etcd.io by a local etcd cluster.
This change completely gets rid of the dynamic discovery mechanism in favor
of the static configuration method.
This should be both safe and light since it completely removes the need of having
an external etcd cluster running somewhere (either discovery.etcd.io, or locally).
the firewall rules created in e2e tests. This allows the teardown
code to run without needing to inspect the managed instance group
for the cluster (which may no longer exist) and should make e2e
teardown much more resilient.
Since skydns is created in namespace 'kube-system' and kubernetes service is created in namespace 'default', if busybox is created in namespace 'kube-system' then nslookup will work with 'kubernetes.default'.
changes to fluent-plugin-google-cloud to attach Kubernetes metadata to
logs.
Along with this, separate logs from containers in the cluster out from
logs from the daemons running on the node by instantiating two instances
of the output plugin, one which uses the new metadata (for containers)
and one which doesn't (for things like docker and the kubelet).
Upstart monitors the process of docker, kubelet, and kube-proxy.
This change adds an upstart job running as daemon to conduct
non-PID health monitoring.
etcd was constantly restarting with too many open files until I gave it more room on Ubuntu 14.04
https://gist.github.com/darron/2aadb8f30f3dd6f580bf
This is a more sensible default - but it may not be enough depending on how many minion nodes there are.
We were splitting the aufs storage into docker & kubernetes areas, but
the kubernetes area was filling up very quickly because empty volumes
went on there, and I had originally not sized it big enough for that.
Instead, create one volume for both so they can share space freely. We
can't do this for devicemapper, but that configuration seems to be
deprecated by Docker anyway.
* Using Fedora 21 as the base box
* Discover the active network interfaces in the box to avoid hardcoding
them in configuration.
* Use the master IP for the certificate.
gunk when installing the google-fluentd agent.
Also let it log things by not redirecting to a file within the container
and only using -q (warning logs only) rather than -qq (error logs only).
This registry can be accessed through proxies that run on each node
listening on port 5000. We send the proxy images to the nodes directly
to avoid requests that hit the network during cluster launch. For now,
we continue to pull the registry itself over the network, especially
given its large size (we should be able to dramatically shrink the
image). On GCE we create a PD and use that for storage, otherwise we
use an emptyDir. The registry is not enabled outside of GCE. All
communication is currently plain HTTP. In order to use SSL, we will
need to be able to request a certificate/key from the apiserver signed
by the apiserver's CA cert.
- Remove unused MESOS_DOCKER_IMAGE_DIR
mesos-slave-dind handles recursive mounting internally now
- Extract docker-compose exec to a function.
Avoids export pollution.
Avoids compose file path as a global var.
- Localize some function variables.
- Validate existence of docker & docker-compose
- Improve user account creation output
It is for running nodes on Ubuntu image upto 14.04 LTS (Trusty).
The change for running master on Ubuntu will be added later.
The configuration consists of several upstart jobs, which is
passed to node instances through GCE metadata and parsed by cloud-init.
- Generate CA & API Server SSL key/cert in keygen docker image
- Refactor SSL generation
- Generate service account key & user files on local machine
- Enable kube-up to be run in a container (kubernetes-mesos-test)
- Add timeout env vars
- Pull docker images up front to avoid timeouts
- Remove docker image builds from test-setup
- Nuke logs dir before each kube-up
- Make run_in_docker work without KUBECONFIG defined
- Fix temp dir cleanup
- Add auth mount env var
- Default to $HOME/tmp/kubernetes/auth
- Outside of repo (which gets docker mounted when using kubernetes-mesos-test)
- Inside $HOME (which gets vm mounted when using docker-machine or boot2docker)
- Add log dump dir env var
- Default to $HOME/tmp/kubernetes/logs (for consistancy with auth dir)
- Enable errtrace
- Increase log level to aid CI debugging
The default Fedora 21 image requires some manual networking fixup that
breaks Fedora 22. This change ensures that the fixup in question is run
only for Fedora 21.
Variables $ENABLE_CLUSTER_MONITORING and $ENABLE_CLUSTER_UI are currently set in cluster/vagrant/config-default.sh but are not passed to the master VM. Therefore, cluster/saltbase/salt/kube-addons/init.sls does not have these variables, and the add-ons cannot be enabled.
Systemd doesn't do variable substitution on the name of the command to run, so we have to install
rkt to a directory with an absolute literal path that we can reference with environment variables.
Initialize global variable MINION_IPS in setClusterInfo function.
MINION_IPS is defined as a global variable, and is concatenated with other nodeIP.
When setClusterInfo is called for many times, this could cause potential problems.
Such as, you will have MINION_IPS=192.168.0.2,192.168.0.3,192.168.0.2,192.168.0.3 which is obviously wrong.
Update util.sh
separated from the apiserver running locally on the master node so that it
can be optionally enabled or disabled as needed.
Also, fix the healthchecking configuration for the master components, which
was previously only working by coincidence:
If a kubelet doesn't register with a master, it never bothers to figure out
what its local address is. In which case it ends up constructing a URL like
http://:8080/healthz for the http probe. This happens to work on the master
because all of the pods are using host networking and explicitly binding to
127.0.0.1. Once the kubelet is registered with the master and it determines
the local node address, it tries to healthcheck on an address where the pod
isn't listening and the kubelet periodically restarts each master component
when the liveness probe fails.
When deploying the kubernetes using Ubuntu's script, the value of configuration item `DOCKER_OPTS` is not set to `/etc/default/docker`.
This commit is to fix this bug.
Currently make-ca-cert.sh uses (equiv of)
mktemp -d --tmpdir kube.XXXXX
but --tmpdir is not a valid option on OS X. Switch to
mktemp -d -t kube.XXXXX
Which is valid, but subtly different between OS X and Linux. The
directory you get back will be different on each.
Linux: ${tmpdir}/kube.y5Bsu/
OS X: ${tmpdir}/kube.XXXXX.VQ81oOui/
Instead of hard coding kube-cert and /srv/kubernetes allow these to be
overwritten by environment variables. / is immutable on some systems
and so /srv is not a possible location to store data.
Not every cluster can be validated the same way. Factoring out the
validate-cluster call into a kube-util.sh function allows customization.
This allows to proceed with GoogleCloudPlatform/kubernetes#10049 before
the mid/long-term unified cluster validation in GoogleCloudPlatform/kubernetes#11908
is implemented. Otherwise, the later blocks the former.
When executing kube-up on a ubuntu cluster I'm getting the following error:
bash: /root/kube/make-ca-cert: No such file or directory
Removed line as it is invalid and is duplicated by another line.
This will allow more successful kube-up.sh executions. Since kube-apiserver doesn't start on the first try after etcd first starts up possibly due to the lack of resources on my server.
The AWS API requires a signature on method calls, including the
timestamp to prevent replay attacks. A time drift of up to 5 minutes
between client and server is tolerated.
However, if the client clock drifts by >5 minutes, the server will start
to reject API calls (with the cryptic "AWS was not able to validate the
provided access credentials").
To prevent this happening, we install ntp on all nodes.
Fix#11371
Previously we would rely on the s3 bucket's region being configured
correctly, at least for the existence check. By querying for the bucket
region and then going direct to the correct region, we avoid errors and
we avoid potential eventual consistency problems.
May be related to issue: #12109
When we deploy the kubernetes using Ubuntu's script.
1. First we set the roles "ai i i" and NUM_MINIONS=3, it runs as expected.
2. Then we change the roles to "a i i" and NUM_MINIONS=2, we found it will not run successfully.
It's because there are history files left on the previous deployment.
This commit will delete the files when stop the cluster.
Primary motivation: enable GKE and other cluster-as-a-service folks to
easily run additional logic on the master without having to modify salt
or SSH to the master after it's been created.
kube-apiserver.service has 'ExecStartPre=/usr/bin/mkdir -p /var/lib/kube-apiserver', but if server is not fast enough 'mv /home/core/known_tokens.csv /var/lib/kube-apiserver/known_tokens.csv' will fail.
This was originally submitted to pick up v0.3.1 of the cloud logging
plugin which had a fix for the name 'metadata' failing to resolve.
Since new releases of google-fluentd have this fix, it is no longer
required.
I've done some additional testing of 'gem update' behavior in the interim
and I think it is ok to use in targeted situations, but we should not be
doing an unconstrained update in general. The issue is that updating a
gem may bring new dependencies, some of those dependencies may include
native code, so it may try to launch a compiler, which isn't desirable
and prone to failure.
If we do need to grab an updated gem in the future we should specify an
explicit version and the --minimal-deps flag.
KUBERNETES=libvirt-coreos cluster/kube-up.sh produced the following error:
cluster/../cluster/libvirt-coreos/../../cluster/common.sh: line 83: user_args[@]: unbound variable
This was coming from the fact that, as a libvirt-coreos cluster runs locally on local VMs,
there is no authentication mechanism. This led to have user_args of common.sh unset.
In the case of libvirt-coreos, it is in fact expected to have no authentication token.
The background for this change is in #9675.
In short, Vivid Vervet gives us a supported/updated image,
that runs Docker with a working storage engine, but doesn't
require a reboot as part of node start.
Fixes#9675.