This is mainly for pinning the 1.2 branch to GCI milestone 52
which contains correct docker and kubelet built in.
Doing this allows us to upgrade docker to v1.11 (issue #26455)
in GCI 53 without breaking the 1.2 release branch.
This change recovers some of the side effects of
https://github.com/kubernetes/kubernetes/pull/26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
Automatic merge from submit-queue
Make node-instance-group base names unique to prevent collisions
We create multiple IGMs for >1000 Node clusters. When we have a conflict on base name IGMs will fight over ownership of the VM that happen to have the name belonging to multiple IGMs.
This change will increase reliability of starting big clusters.
cc @wojtek-t @alex-mohr @roberthbailey @mikedanese
Automatic merge from submit-queue
GCI: Fix the condition for using the default image
This PR revises the condition for using the default GCI image. The old logic is not convenient for manually run e2e tests in some cases (mainly for GCI team to test custom images). The new logic by this PR is very similar to the logic in using ContainerVM. When setting distro to "gci", if master or node image is unset, we use gci-dev for it. If either is set, we respect it.
@roberthbailey @zmerlynn @dchen1107 please review it, and we should cherry pick it in release-1.2 branch. Thanks!
cc/ @kubernetes/goog-image @adityakali FYI
Automatic merge from submit-queue
Salt configuration for the new Cluster Autoscaler for GCE
Adds support for cloud autoscaler from contrib/cloud-autoscaler in kube-up.sh GCE script.
cc: @fgrzadkowski @piosz
Automatic merge from submit-queue
Fix detect-node-names to not error out if there are no nodes
Fixes#21564.
Teardown was not working correctly in rare cases because `detect-node-names` was failing before any of the actual cleanup was run. I'm pretty sure the issue was that there was an instance group, but no instances in the instance group, so we bailed out when we tried to expand the bash array.
This PR adds a guard so we don't bail if the array is empty.
cc @jlowdermilk @spxtr
Automatic merge from submit-queue
Add support for running clusters on GCI
Google Container-VM Image (GCI) is the next revision of Container-VM. See documentation at https://cloud.google.com/compute/docs/containers/vm-image/. This change adds support for starting a Kubernetes cluster using GCI.
With this change, users can start a kubernetes cluster using the latest kubelet and kubectl release binary built in the GCI image by running:
$ KUBE_OS_DISTRIBUTION="gci" cluster/kube-up.sh
Or run a testing cluster on GCI by running:
$ KUBE_OS_DISTRIBUTION="gci" go run hack/e2e.go -v --up
The commands above will choose the latest GCI image by default.
Automatic merge from submit-queue
jenkins: Allow configuration of release bucket
This allows others to leverage the existing E2E code to test some
patched kube binary by simply overriding the bucket and reusing many of
the existing scripts
This allows others to leverage the existing E2E code to test some
patched kube binary by simply overriding the bucket and reusing many of
the existing scripts
Automatic merge from submit-queue
Trusty: Regional release .tar.gz support
@zmerlynn and @roberthbailey please review it. This change is to support the feature added in PR #22234. The entire logic is pretty much the same as in #22234, with only few minor changes in implementation.
I had manually run e2e tests with "export RELEASE_REGION_FALLBACK=true" on two clusters: (1) Trusty on master nodes on ContainerVM; (2) Master and nodes all on trusty. All tests are green. I don't figure out a way to simulate regional fallback. But I did test the function download_or_bust() out-of-box.
cc/ @wonderfly @dchen1107 @fabioy FYI.
Allow the gcr.io/google_containers registry to be overridden
regionally by just blasting a new KUBE_ADDON_REGISTRY out. Instead of
adding every addon to Salt and asking all of the other consumers
(Trusty, Juju, Mesos, etc) to change, just script the sed ourselves.
This is probably the 9th grossest thing I've ever done, but it works
well, and it works quickly. I kind of wish it didn't.
* In kube-up.sh, create a staging bucket with a location nearest the
zone being created. If new variable RELEASE_REGION_FALLBACK is set
(default false), create multiple buckets and stage to fallback
URLs. (In open source, this path is primarily for testing.)
* In configure-vm.sh, split the URL env variables by comma (if any
extra are present) and retry on the fallback URLs. Also factor the
hash checking into this path rather than outside, since a corrupt
release in a particular geo can be retried in a different geo.
* Remove the local already-staged .tar.gz checks. They've caused
several issues along the way, and with this code path become virtually
unmaintainable. (I could add a sentinel for each bucket it's possibly
staged to, but ew.)
We adapt the existing code to work across all zones in a region.
We require a feature-flag to enable Ubernetes-Lite
Reasons:
* There are some behavioural changes if users create volumes with
the same name in two zones.
* We don't want to make one API call per zone if we're not running
Ubernetes-Lite.
* Ubernetes-Lite is still experimental.
There isn't a parallel flag implemented for AWS, because at the moment
there would be no behaviour changes from this.
This is for internal use at the moment, for testing Ubernetes Lite, but
arguably makes the code a little cleaner.
Also rename KUBE_SHARE_MASTER -> KUBE_USE_EXISTING_MASTER
Some functionality in hack/lib is currently depended on by
cluster/common.sh so kube-up from the full release tar (which
does not include hack/) is currently broken. With this PR we
create cluster/lib/ and move the necessary bits from hack/
over to get kube-up working again.
Fixes: 96d1b8d1b2
Signed-off-by: Mike Danese <mikedanese@google.com>
This change refactors the code of preparing kube-system manifests
for trusty based cluster. The manifests used by nodes do not contain
salt configuration, so we can simply copy them from the directory
cluster/saltbase/salt, make a tarball, and upload to Google Storage.
The node.yaml has some logic that will be also used by the kubernetes
master on trusty work (issue #16702). This change moves the code
shared by the master and node configuration to a separate script, and
the master and node configuration can source it to use the code.
Moreover, this change stages the script for GKE use.
Addresses #15968
This patch removes KUBE_ENABLE_EXPERIMENTAL_API and similar calls in
favor of specifying desired features in KUBE_RUNTIME_CONFIG. Changes
have also been made to e2e scripts to re-enable using
KUBE_RUNTIME_CONFIG rather than EXPERIMENTAL_API env vars.
This also introduces KUBE_ENABLE_DAEMONSETS and KUBE_ENABLE_DEPLOYMENTS.
Signed-off-by: Christian Stewart <christian@paral.in>
Allows loading existing auth from kubeconfig on kube-up if a
valid KUBE_CONTEXT is specified, instead of always force
regenerating auth (basic or token) when creating a new cluster.
- Refactor common and gce/upgrade.sh to use arbitrary published releases
- Update hack/get-build to use cluster/common code
- Use hack/get-build.sh in cluster upgrade test logic
This registry can be accessed through proxies that run on each node
listening on port 5000. We send the proxy images to the nodes directly
to avoid requests that hit the network during cluster launch. For now,
we continue to pull the registry itself over the network, especially
given its large size (we should be able to dramatically shrink the
image). On GCE we create a PD and use that for storage, otherwise we
use an emptyDir. The registry is not enabled outside of GCE. All
communication is currently plain HTTP. In order to use SSL, we will
need to be able to request a certificate/key from the apiserver signed
by the apiserver's CA cert.
It is for running nodes on Ubuntu image upto 14.04 LTS (Trusty).
The change for running master on Ubuntu will be added later.
The configuration consists of several upstart jobs, which is
passed to node instances through GCE metadata and parsed by cloud-init.
separated from the apiserver running locally on the master node so that it
can be optionally enabled or disabled as needed.
Also, fix the healthchecking configuration for the master components, which
was previously only working by coincidence:
If a kubelet doesn't register with a master, it never bothers to figure out
what its local address is. In which case it ends up constructing a URL like
http://:8080/healthz for the http probe. This happens to work on the master
because all of the pods are using host networking and explicitly binding to
127.0.0.1. Once the kubelet is registered with the master and it determines
the local node address, it tries to healthcheck on an address where the pod
isn't listening and the kubelet periodically restarts each master component
when the liveness probe fails.
* Set SHA1 for Kubernetes server binary and Salt tar in kube-env.
* Check SHA1 in configure-vm.sh. If the env variable isn't available,
download the SHA1 from GCS and double check that.
* Fixes a bug in the devel path where we were actually uploading the
wrong sha1 to the bucket.
Fixes#10021
The script allows also to push binaries only to the master or specified node.
Added support for released tars.
Introduced new push methods and implemented them for GCE.
We can't write the marker before we upload the file, otherwise anything
that interrupts the upload will leave a corrupted upload that we believe
to be current.
And adds a .sha1 cache file to indicate what file was already pushed
to GCS, and how to force it if not, removing a few seconds off a
kube-up/push if you're just cycling.
With this and #7602, all TAR_URLS will have a .sha1 as well.
Tested on GCE.
Includes untested modifications for AWS and Vagrant.
No changes for any other distros.
Probably will work on other up-to-date providers
but beware. Symptom would be that service proxying
stops working.
1. Generates a token kube-proxy in AWS, GCE, and Vagrant setup scripts.
1. Distributes the token via salt-overlay, and salt to /var/lib/kube-proxy/kubeconfig
1. Changes kube-proxy args:
- use the --kubeconfig argument
- changes --master argument from http://MASTER:7080 to https://MASTER
- http -> https
- explicit port 7080 -> implied 443
Possible ways this might break other distros:
Mitigation: there is an default empty kubeconfig file.
If the distro does not populate the salt-overlay, then
it should get the empty, which parses to an empty
object, which, combined with the --master argument,
should still work.
Mitigation:
- azure: Special case to use 7080 in
- rackspace: way out of date, so don't care.
- vsphere: way out of date, so don't care.
- other distros: not using salt.
ensure-kube-token is not needed anymore because
the token passed in kube-env.
In the up case it is set, in the push case it is an empty string
but not used.
Allow unset KUBELET_TOKEN (for push case).
Fix comment.
- Configure the apiserver to listen securely on 443 instead of 6443.
- Configure the kubelet to connect to 443 instead of 6443.
- Update documentation to refer to bearer tokens instead of basic auth.
Generates the new token on AWS, GCE, Vagrant.
Renames instance metadata from "kube-token" to "kubelet-token".
(Is this okay for GKE?)
Having separate tokens for kubelet and kube-proxy permits
using principle of least privilege, makes it easy to
rate limit the clients separately, allows annotation
of apiserver logs with the client identity at a finer grain
than just source-ip.
This is needed when we upgrade (and useful when you're trying to
change the startup script for reboots).
Along the way: allow add-instance-metadata[-from-file] to take a
variable number of KVs.
Address #6075: Shoot the master VM while saving the master-pd. This
takes a couple of minor changes to configure-vm.sh, some of which also
would be necessary for reboot. In particular, I changed it so that the
kube-token instance metadata is no longer required after inception;
instead, we mount the master-pd and see if we've already created the
known tokens file before blocking on the instance metadata.
Also partially addresses #6099 in bash by refactoring the kube-push
path.
Deletion is wonderful. The only weird thing was where to put the
message about the proxy URLs. Satnam suggested kubectl clusterinfo,
which seemed like a good option to put at the end of cluster turn-up.
IP address. The configure-vm script can resolve this relatively easily
on the node. This is less painful for GKE, which creates all the
resources in parallel.
Change provisioning to pass all variables to both master and node. Run
Salt in a masterless setup on all nodes ala
http://docs.saltstack.com/en/latest/topics/tutorials/quickstart.html,
which involves ensuring Salt daemon is NOT running after install. Kill
Salt master install. And fix push to actually work in this new flow.
As part of this, the GCE Salt config no longer has access to the Salt
mine, which is primarily obnoxious for two reasons: - The minions
can't use Salt to see the master: this is easily fixed by static
config. - The master can't see the list of all the minions: this is
fixed temporarily by static config in util.sh, but later, by other
means (see
https://github.com/GoogleCloudPlatform/kubernetes/issues/156, which
should eventually remove this direction).
As part of it, flatten all of cluster/gce/templates/* into
configure-vm.sh, using a single, separate piece of YAML to drive the
environment variables, rather than constantly rewriting the startup
script.
the master's IP upon creation to make it easier to replace the master later.
This pulls out the parts of PR #3174 that don't break anything and will
make upgrading existing clusters in the future less painful.
Add /etc/salt to the master-pd
Change the .kubeconfig context that gce kube-up creates to project
+ instance prefix, so you can spin up clusters with the same name
in different compute projects without overwriting .kubeconfig.
Adds labels to the services, waits for them to be created (which
should be instant, but just in case), query the forwarding rules like
as we did before.
Fixes#3893
This implements phase 1 of the proposal in #3579, moving the creation
of the pods, RCs, and services to the master after the apiserver is
available.
This is such a wide commit because our existing initial config story
is special:
* Add kube-addons service and associated salt configuration:
** We configure /etc/kubernetes/addons to be a directory of objects
that are appropriately configured for the current cluster.
** "/etc/init.d/kube-addons start" slurps up everything in that dir.
(Most of the difficult is the business logic in salt around getting
that directory built at all.)
** We cheat and overlay cluster/addons into saltbase/salt/kube-addons
as config files for the kube-addons meta-service.
* Change .yaml.in files to salt templates
* Rename {setup,teardown}-{monitoring,logging} to
{setup,teardown}-{monitoring,logging}-firewall to properly reflect
their real purpose now (the purpose of these functions is now ONLY to
bring up the firewall rules, and possibly to relay the IP to the user).
* Rework GCE {setup,teardown}-{monitoring,logging}-firewall: Both
functions were improperly configuring global rules, yet used
lifecycles tied to the cluster. Use $NODE_INSTANCE_PREFIX with the
rule. The logging rule needed a $NETWORK specifier. The monitoring
rule tried gcloud describe first, but given the instancing, this feels
like a waste of time now.
* Plumb ENABLE_CLUSTER_MONITORING, ENABLE_CLUSTER_LOGGING,
ELASTICSEARCH_LOGGING_REPLICAS and DNS_REPLICAS down to the master,
since these are needed there now.
(Desperately want just a yaml or json file we can share between
providers that has all this crap. Maybe #3525 is an answer?)
Huge caveats: I've gone pretty firm testing on GCE, including
twiddling the env variables and making sure the objects I expect to
come up, come up. I've tested that it doesn't break GKE bringup
somehow. But I haven't had a chance to test the other providers.
Removed auth for Grafana to facilitate usage via service proxy on the api-server.
Added a grafana service
Removed elasticsearch dependency for monitoring - faster startup times.
Removed auth for Grafana to facilitate usage via service proxy on the api-server.
Added a grafana service
Removed elasticsearch dependency for monitoring - faster startup times.
persistent disk when running on GCE.
I'll follow up soon with a second PR that enables kube-push to
completely bring down the master VM and replace it with a new one.
Also make downloading more reliable and run 'highstate' after install for good measure. As part of this we no longer use gsutil to download and have to make 'staged' binaries in GCS publicly readable.
This lets us blow away salt files and replace them with a new version while keeping a tree of "overlay" files that are cluster specific and generated at cluster up time.
Fixes#1783
Otherwise KUBE_MASTER_IP and KUBE_MINION_IP_ADDRESSES may contain 'external-ip'
$ detect-master
Using master: kubernetes-master (external IP: external-ip)'