Commit Graph

39 Commits

Author SHA1 Message Date
fabriziopandini
9aec633c40 Fix tests 2018-10-04 13:20:11 +02:00
k8s-ci-robot
e7eb26919b
Merge pull request #68749 from liztio/renew-etcd-certs
Renew certificates as part of upgrade rather than recreating them
2018-09-18 10:11:02 -07:00
liz
c2a93cbe06
Renew certificates as part of upgrade rather than recreating them 2018-09-17 13:24:34 -04:00
Lubomir I. Ivanov
fb365768e0 kubeadm: update MinimumControlPlaneVersion to v1.11.0
Update MinimumControlPlaneVersion to v1.11.0. Also update related
unit tests and test configurations.
2018-09-15 05:26:40 +03:00
Lucas Käldström
5224551fa1 kubeadm: Split out ClusterConfiguration from InitConfiguration
Trivial rebasement, fixed some broken tests,
and inserted some TODOs: Rostislav M. Georgiev <rostislavg@vmware.com>
2018-08-22 11:43:02 +03:00
liz
394e6b554a
Yank out a bunch of manual tests and prose
`phase certs` and upgrade commands now all use certslist infra
2018-08-20 15:21:08 -04:00
Lucas Käldström
52f0591ad9
Automated rename from MasterConfiguration to InitConfiguration 2018-07-09 04:55:02 +03:00
Lucas Käldström
c9b52ede7e
Automated bump from v1alpha2 references to v1alpha3 2018-07-04 14:07:53 +03:00
Lucas Käldström
5d96a719fb
kubeadm: Fix a couple of small-ish bugs for v1.11 2018-06-12 18:59:34 +03:00
liz
6ed91fc07c
Save kubeadm manifest backup directories
When kubeadm upgrades a static pod cluster, the old manifests were previously
deleted. This patch alters this behaviour so they are now stored in a
timestamped temporary directory.
2018-05-31 14:41:47 -04:00
Lucas Käldström
fd47f8b20c
Update unit tests to use the new NodeRegistration object 2018-05-29 17:52:10 +03:00
Lucas Käldström
099e60b1db
kubeadm: Refactor the .Etcd substruct in the v1alpha2 API 2018-05-23 21:13:32 +03:00
Lucas Käldström
5687f652db
kubeadm: Remove .AuthorizationModes in the v1alpha2 API 2018-05-21 08:49:12 +03:00
Lucas Käldström
adb60f4064
kubeadm: Remove the .CloudProvider configuration option 2018-05-16 15:46:34 +01:00
Lucas Käldström
e28242a245
autogenerated move to reference the v1alpha2 API inside of kubeadm 2018-05-16 09:59:41 +01:00
Kubernetes Submit Queue
5788d4de1f
Merge pull request #63495 from detiber/external_etcd_upgrade
Automatic merge from submit-queue (batch tested with PRs 63792, 63495, 63742, 63332, 63779). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubeadm - fix upgrades with external etcd

**What this PR does / why we need it**:

- Allow for upgrade plan and upgrade apply to work with external etcd
  - https://github.com/kubernetes/kubeadm/issues/727
  - https://github.com/kubernetes/kubernetes/pull/62141

- Update upgrade plan output when configured for external etcd
  - Move etcd to a separate section and show available upgrades

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubeadm/issues/727

**Release note**:
```release-note
kubeadm upgrade now supports external etcd setups again
```

I created a gist documenting the manual testing I've been doing for this PR here: https://gist.github.com/detiber/e18d907c41901fbb5e12ffa1af5750f8
2018-05-15 09:04:20 -07:00
Jason DeTiberus
f40b7f389e
kubeadm - fix external etcd upgrades
- Update upgrade plan output when configured for external etcd
  - Move etcd to a separate section and show available upgrades
2018-05-14 20:51:20 -04:00
Craig Tracey
ac1e940c75
Support kubeadm upgrade with remote etcd cluster
Currently kubeadm only performs an upgrade if the etcd cluster is
colocated with the control plane node. As this is only one possible
configuration, kubeadm should support upgrades with etcd clusters
that are not local to the node.

Signed-off-by: Craig Tracey <craigtracey@gmail.com>
2018-05-14 20:40:57 -04:00
Lucas Käldström
80a31d7a5a
Stop installing kubeadm types in the generic, legacy scheme 2018-05-14 18:11:30 +01:00
Lucas Käldström
68c68dfadc
Rename kubeadmapiext to the more explicit kubeadmapiv1alpha1 2018-05-14 12:31:48 +03:00
Jason DeTiberus
4c768bb2ca [kubeadm] Add etcd L7 check on upgrade
- Adds L7 check for kubeadm etcd static pod upgrade
2018-04-24 09:56:35 -06:00
leigh schrandt
4a37e05665 [kubeadm] Update test-case, fix nil-pointer bug, and improve error message 2018-04-24 09:55:56 -06:00
leigh schrandt
99a1143676 [kubeadm] Implement etcdutils with Cluster.HasTLS()
- Test HasTLS()
- Instrument throughout upgrade plan and apply
- Update plan_test and apply_test to use new fake Cluster interfaces
- Add descriptions to upgrade range test
- Support KubernetesDir and EtcdDataDir in upgrade tests
- Cover etcdUpgrade in upgrade tests
- Cover upcoming TLSUpgrade in upgrade tests
2018-04-24 09:55:51 -06:00
Jason DeTiberus
d55d1b6fbe [kubeadm] fix mirror-pod hash race condition
- Update kubeadm static pod upgrades to use the
  kubetypes.ConfigHashAnnotationKey annotation on the mirror pod rather
  than generating a hash from the full object info. Previously, a status
  update for the pod would allow the upgrade to proceed before the
  new static pod manifest was actually deployed.

Signed-off-by: Jason DeTiberus <detiber@gmail.com>
2018-04-20 18:32:03 -06:00
leigh schrandt
7a1a3aa3df Generate client certificates for healthchecking kubeadm etcd static pods
Add new phase command: `certs etcd-healthcheck`
Certs are placed at /etc/kubernetes/pki/etcd/healthcheck-client.{crt,key}
2018-03-04 19:25:16 -07:00
leigh schrandt
41974cb91f Fix typos 2018-02-27 17:56:16 -07:00
leigh schrandt
2d9b2d9fef Switch to a dedicated CA for kubeadm etcd identities 2018-02-27 17:42:43 -07:00
leigh schrandt
f61430d7c8 Fix typos
- Fix typos in tests for upgrade phase
- Rename loadCertificateAuthorithy() --> loadCertificateAuthority()
- Disambiguate apiKubeletClientCert & apiEtcdClientCert
- Parameterize hard-coded certs_test config + log tempCertsDir
2018-02-23 17:05:43 -07:00
leigh schrandt
f5e11a0ce0 Change SANs for etcd serving and peer certs
- Place etcd server and peer certs & keys into pki subdir
- Move certs.altName functions to pkiutil + add appendSANstoAltNames()
    Share the append logic for the getAltName functions as suggested by
    @jamiehannaford.
    Move functions/tests to certs/pkiutil as suggested by @luxas.

    Update Bazel BUILD deps

- Warn when an APIServerCertSANs or EtcdCertSANs entry is unusable
- Add MasterConfiguration.EtcdPeerCertSANs
- Move EtcdServerCertSANs and EtcdPeerCertSANs under MasterConfiguration.Etcd
2018-02-23 17:05:39 -07:00
leigh schrandt
bb689eb2bb Secure etcd API /w TLS on kubeadm init [kubeadm/#594]
- Generate Server and Peer cert for etcd
- Generate Client cert for apiserver
- Add flags / hostMounts for etcd static pod
- Add flags / hostMounts for apiserver static pod

- Generate certs on upgrade of static-pods for etcd/kube-apiserver
- Modify logic for appending etcd flags to staticpod to be safer for external etcd
2018-02-23 16:06:55 -07:00
Tim Hockin
3586986416 Switch to k8s.gcr.io vanity domain
This is the 2nd attempt.  The previous was reverted while we figured out
the regional mirrors (oops).

New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest.  To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today).  For now the staging is an alias to
gcr.io/google_containers (the legacy URL).

When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.

We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it.  Nice and
visible, easy to keep track of.
2018-02-07 21:14:19 -08:00
Tim Hockin
e9dd8a68f6 Revert k8s.gcr.io vanity domain
This reverts commit eba5b6092a.

Fixes https://github.com/kubernetes/kubernetes/issues/57526
2017-12-22 14:36:16 -08:00
Tim Hockin
eba5b6092a Use k8s.gcr.io vanity domain for container images 2017-12-18 09:18:34 -08:00
Serguei Bezverkhi
1f20a8d022 Adding etcd upgrade to kubeadm upgrade apply
List of changes:
- Refactoring staticpod and waiter functions
2017-11-18 18:47:50 -05:00
Dr. Stefan Schimanski
7773a30f67 pkg/api/legacyscheme: fixup imports 2017-10-18 17:23:55 +02:00
Kubernetes Submit Queue
e528a6e785 Merge pull request #51369 from luxas/kubeadm_poll_kubelet
Automatic merge from submit-queue (batch tested with PRs 51682, 51546, 51369, 50924, 51827)

kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy

**What this PR does / why we need it**:

In order to improve the UX when the kubelet is unhealthy or stopped, or whatever, kubeadm now polls the kubelet's API after 40 and 60 seconds, and then performs an exponential backoff for a total of 155 seconds.

If the kubelet endpoint is not returning `ok` by then, kubeadm gives up and exits.

This will miligate at least 60% of our "[apiclient] Created API client, waiting for control plane to come up" issues in the kubeadm issue tracker 🎉, as kubeadm now informs the user what's wrong and also doesn't deadlock like before.

Demo:
```
lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm init --skip-preflight-checks
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Skipping pre-flight checks
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [thegopher kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.115]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[apiclient] All control plane components are healthy after 40.502199 seconds
[markmaster] Will mark node thegopher as master by adding a label and a taint
[markmaster] Master thegopher tainted and labelled with key/value: node-role.kubernetes.io/master=""
[bootstraptoken] Using token: 5776d5.91e7ed14f9e274df
[bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[addons] Applied essential addon: kube-dns
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run (as a regular user):

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join --token 5776d5.91e7ed14f9e274df 192.168.1.115:6443 --discovery-token-ca-cert-hash sha256:6f301ce8c3f5f6558090b2c3599d26d6fc94ffa3c3565ffac952f4f0c7a9b2a9

lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm reset
[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/etcd]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
lucas@THEGOPHER:~/luxas/kubernetes$ sudo systemctl stop kubelet
lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm init --skip-preflight-checks
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Skipping pre-flight checks
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [thegopher kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.115]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by that:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
	- There is no internet connection; so the kubelet can't pull the following control plane images:
		- gcr.io/google_containers/kube-apiserver-amd64:v1.7.4
		- gcr.io/google_containers/kube-controller-manager-amd64:v1.7.4
		- gcr.io/google_containers/kube-scheduler-amd64:v1.7.4

You can troubleshoot this for example with the following commands if you're on a systemd-powered system:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster
```

In this demo, I'm first starting kubeadm normally and everything works as usual.
In the second case, I'm explicitely stopping the kubelet so it doesn't run, and skipping preflight checks, so that kubeadm doesn't even try to exec `systemctl start kubelet` like it does usually.
That obviously results in a non-working system, but now kubeadm tells the user what's the problem instead of waiting forever.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

Fixes: https://github.com/kubernetes/kubeadm/issues/377

**Special notes for your reviewer**:

**Release note**:

```release-note
kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy
```
@kubernetes/sig-cluster-lifecycle-pr-reviews @pipejakob 

cc @justinsb @kris-nova @lukemarsden as well as you wanted this feature :)
2017-09-03 15:54:19 -07:00
Lucas Käldström
92c5997b8e
kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy 2017-09-03 18:02:46 +03:00
Lucas Käldström
b0a17d11e4
kubeadm: Add omitempty tags to nullable values and use metav1.Duration 2017-09-03 17:25:45 +03:00
Lucas Käldström
94983530d4
Add unit tests for kubeadm upgrade 2017-09-03 12:26:10 +03:00