If the kube-proxy/dns ConfigMap are missing, show warnings and assume
that these addons were skipped during "kubeadm init",
and that their redeployment on upgrade is not desired.
TODO: remove this once "kubeadm upgrade apply" phases are supported:
https://github.com/kubernetes/kubeadm/issues/1318
The TLS bootstrapping timeout is increased to 5 minutes with a retry
once every 5 seconds. Failing fast if the kubelet is not healthy is also
preserved.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
kubeadm uses image tags (such as `v3.4.3-0`) to specify the version of
etcd. However, the upgrade code in kubeadm uses the etcd client API to
fetch the currently deployed version. The result contains only the etcd
version without the additional information (such as image revision) that
is normally found in the tag. As a result it would refuse an upgrade
where the etcd versions match and the only difference is the image
revision number (`v3.4.3-0` to `v3.4.3-1`).
To fix the above issue, the following changes are done:
- Replace the existing etcd version querying code, that uses the etcd
client library, with code that returns the etcd image tag from the
local static pod manifest file.
- If an etcd `imageTag` is specified in the ClusterConfiguration during
upgrade, use that tag instead. This is done regardless if the tag was
specified in the configuration stored in the cluster or with a new
configuration supplied by the `--config` command line parameter.
If no custom tag is specified, kubeadm will select one depending on
the desired Kubernetes version.
- `kubeadm upgrade plan` no longer prints upgrade information about
external etcd. It's the user's responsibility to manage it in that
case.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
If the user does not provide --config or --control-plane
but provides some other flags such as --certificate-key
kubeadm is supposed to print a warning.
The logic around printing the warning is bogus. Implement
proper checks of when to print the warning.
b117a928 added a new check during "join" whether a Node with
the same name exists in the cluster.
When upgrading from 1.17 to 1.18 make sure the required RBAC
by this check is added. Otherwise "kubeadm join" will complain that
it lacks permissions to GET a Node.
A narrow assumption of what is contained in the `imageID` fields for the
CoreDNS pods causes a panic upon upgrade.
Fix this by using a proper regex to match a trailing SHA256 image digest
in `imageID` or return an error if it cannot find it.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Most of these could have been refactored automatically but it wouldn't
have been uglier. The unsophisticated tooling left lots of unnecessary
struct -> pointer -> struct transitions.
The KCM is moving to means of only singing apiserver (kubelet) client
and kubelet serving certificates. See:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md#signers
Up until now the experimental kubeadm functionality '--use-api'
under "kubeadm alpha certs renew" was using the KCM to sign *any*
certficate as long as the KCM has the root CA cert/key.
Post discussions with the kubeadm maintainers, it was decided that
this functionality should be removed from kubeadm due to the
requirement to have external signers for renewing the common
control-plane certificates that kubeadm manages.
- Uses correct pause image for Windows
- Omits systemd specific flags
- Common build flags function to be used by Linux and Windows
- Uses user configured image repository for Windows pause image
After the shift for init phases, GetStaticPodSpecs() from
app/phases/controlplane/manifests.go gets called on each control-plane
component sub-phase. This ends up calling the Printf from
AddExtraHostPathMounts() in app/phases/controlplane/volumes.go
multiple times printing the same volumes for different components.
- Remove the Printf call from AddExtraHostPathMounts().
- Print all volumes for a component in CreateStaticPodFiles() using klog
V(2).
Perhaps in the future a bigger refactor is needed here were a
single control-plane component spec can be requested instead of a
map[string]v1.Pod.
The selected key type is defined by kubeadm's --feature-gates option:
if it contains PublicKeysECDSA=true then ECDSA keys will be generated
and used.
By default RSA keys are used still.
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
When doing the very first upgrade from a cluster that contains the
source of truth in the ClusterStatus struct, the new kubeadm logic
will try to retrieve this information from annotations.
This changeset adds to both etcd and apiserver endpoint retrieval the
special case in which they won't retry if we are in such cases. The
logic will retry if we find any unknown error, but will not retry in
the following cases:
- etcd annotations do not contain etcd endpoints, but the overall list
of etcd pods is greater than 0. This means that we listed at least
one etcd pod, but they are missing the annotation.
- API server annotation is not found on the api server pod for a given
node name, but no errors aside from that one were found. This means
that the API server pod is present, but is missing the annotation.
In both cases there is no point in retrying, and so, this speeds up the
upgrade path when coming from a previous existing cluster.
While `ClusterStatus` will be maintained and uploaded, it won't be
used by the internal `kubeadm` logic in order to determine the etcd
endpoints anymore.
The only exception is during the first upgrade cycle (`kubeadm upgrade
apply`, `kubeadm upgrade node`), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
While kubeadm does not support CA rotation,
the users might still attempt to perform this manually.
For kubeconfig files, updating to a new CA is not reflected
and users need to embed new CA PEM manually.
On kubeconfig cert renewal, always keep the embedded CA
in sync with the one on disk.
Includes a couple of typo fixes.
It turns out that the dual-stack feature enabled doesn't mean that
the cluster MUST be dual-stack, it only indicates that it MAY be
dual-stack but CAN be single-stack.
We should relax the validation to allow single-stack clusters
with dual-stack enabled.
If a Node name in the cluster is already taken and this Node is Ready,
prevent TLS bootsrap on "kubeadm join" and exit early.
This change requires that a new ClusterRole is granted to the
"system:bootstrappers:kubeadm:default-node-token" group to be
able get Nodes in the cluster. The same group already has access
to obtain objects such as the KubeletConfiguration and kubeadm's
ClusterConfiguration.
The motivation of this change is to prevent undefined behavior
and the potential control-plane breakdown if such a cluster
is racing to have two nodes with the same name for long periods
of time.
The following values are validated in the following precedence
from lower to higher:
- actual hostname
- NodeRegistration.Name (or "--node-name") from JoinConfiguration
- "--hostname-override" passed via kubeletExtraArgs
If the user decides to not let kubeadm know about a custom node name
and to instead override the hostname from a kubelet systemd unit file,
kubeadm will not be able to detect the problem.
- Extend the exponential backoff for add/remove/... retry to
11 steps ~=106 seconds. From experiments for 3 and more members
the race can take more that ~=26 seconds.
- Increase the dialTimeout for client creation to 40 seconds.
20 seconds seems racy for 3 and more members.
For the etcd client, amend AddMember() to handle a very
rare bug when multiple members can end up with the same
name. Match the member peer address and assign it the name of
the member we are adding. For the rest of the members with missing
names use their member IDs as name. The etcd node is not disrupted
by the unknown names.
The important aspects are:
- The number of members of the initial cluster must match
the members in the cluster.
- The member we are current adding is present in the initial cluster.
The CoreDNS GA feature-gate in kubeadm was deprecated since 1.13.
The k8s policy is to remove the gate 2 releases after it transitions
to GA:
https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecation
We kept it around for longer to prevent existing setups from breaking
as it caused minimal maintenance overhead.
The warning message
```
[config] WARNING: Ignored YAML document with GroupVersionKind ...
```
is printed for all GVKs that are not part of the kubeadm core types.
This is wrong as the component config types are supported and successfully
parsed and used despite the fact that the warning is printed for them too.
Hence this simple fix first checks if the group of the GVK is a supported
component config group and the warning is printed only if it's not.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
kubeadm deploys the apiserver, controller-manager and the scheduler
using liveness probes.
The bind-address option is used to configure the probe address, in
case this is configured with an unspecified address, the probe
will fail. When using an unspecified address the probe host field is
left empty, otherwise the bind-address is used.
The function validateKubeConfig() can end up comparing
a user generated kubeconfig to a kubeconfig generated by kubeadm.
If a user kubeconfig has a CA that is base64 encoded with whitespace,
if said kubeconfig is loaded using clientcmd.LoadFromFile()
the CertificateAuthorityData bytes will be decoded from base64
and placed in the v1.Config raw. On the other hand a kubeconfig
generated by kubeadm will have the ca.crt parsed to a Certificate
object with whitespace ignored in the PEM input.
Make sure that validateKubeConfig() tolerates whitespace differences
when comparing CertificateAuthorityData.
kubeadm removed the deprecated "--address" flag for controller-manager
and scheduler in favor of "--bind-address"
We should use bind-address to configure the manifest probe addresses.
If the user has modified the kubelet.conf post TLS bootstrap
to become invalid, the function getNodeNameFromKubeletConfig() can
panic. This was observed to trigger in "kubeadm reset" use cases.
Add basic validation and unit tests around parsing the kubelet.conf
with the aforementioned function.
Currently if the controlplane fails to init, we print out a message
with some example commands that only show docker CLI.
This tries to improve that by printing the example commands for
docker, cri-o and containerd by checking the socket looking for
the default docker socket.
CreateOrMutateConfigMap was not resilient when it was trying to Create
the ConfigMap. If this operation returned an unknown error the whole
operation would fail, because it was strict in what error it was
expecting right afterwards: if the error returned by the Create call
was a IsAlreadyExists error, it would work fine. However, if an
unexpected error (such as an EOF) happened, this call would fail.
We are seeing this error specially when running control plane node
joins in an automated fashion, where things happen at a relatively
high speed pace.
It was specially easy to reproduce with kind, with several control
plane instances. E.g.:
```
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I1130 11:43:42.788952 887 round_trippers.go:443] POST https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s in 1013 milliseconds
Post https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s: unexpected EOF
unable to create ConfigMap
k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient.CreateOrMutateConfigMap
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient/idempotency.go:65
```
This change makes this logic more resilient to unknown errors. It will
retry on the light of unknown errors until some of the expected error
happens: either `IsAlreadyExists`, in which case we will mutate the
ConfigMap, or no error, in which case the ConfigMap has been created.
kubeadm always use the IPv4 localhost address by defaultA for etcd
The probe hostname is obtained before the generation of the etcd
parameters, so it can't detect the right IP familiy for the
host of the probe.
This causes that with IPv6 clusters doesn't work because the probe
uses the IPv4 localhost address.
This patchs configures the right localhost address based on the used
AdvertiseAddress IP family.
- Use `schema.TypeMeta` instead of custom `struct` for VK
- More strict check on GVK after `Interpret` in `SplitYAMLDocuments`
- Adjust `Interpret` comment to include JSON
- Add retrieveValidatedConfigInfo to be able to better unit
test the function.
- Break some of the logic in RetrieveValidatedConfigInfo into
helper functions.
- Pass JoinConfiguration.Discovery to RetrieveValidatedConfigInfo
instead of JoinConfiguration.
- Use the discovery timeout per API call to fetch cluster-info
(optionally the user value can be slit in 2).
- Add detailed unit tests for retrieveValidatedConfigInfo.
kubeadm's current implementation of component config support is "kind" centric.
This has its downsides. Namely:
- Kind names and numbers can change between config versions.
Newer kinds can be ignored. Therefore, detection of a version change is
considerably harder.
- A component config can have only one kind that is managed by kubeadm.
Thus a more appropriate way to identify component configs is required.
Probably the best solution identified so far is a config group.
A group name is unlikely to change between versions, while the kind names and
structure can.
Tracking component configs by group name allows us to:
- Spot more easily config version changes and manage alternate versions.
- Support more than one kind in a config group/version.
- Abstract component configs by hiding their exact structure.
Hence, this change rips off the old kind based support for component configs
and replaces it with a group name based one. This also has the following
extra benefits:
- More tests were added.
- kubeadm now errors out if an unsupported version of a known component group
is used.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
- Add a new preflight check for upgrade that runs the pause container
with -v in a Job.
- Wait for the Job to complete and return an error after N seconds.
- Manually clean the Job because we don't have the TTL controller
enabled in kubeadm yet (it's still alpha).
Currently this uses the combined kubelet output (stdout + stderr), but this
causes parsing issues if the kubelet logs something on stderr.
Thus we ignore the entire stderr and use stdout only.
We do disable a couple of tests here. That is because the fakeexecer only
supports combined output and return a "not supported" error if `.Output()`
gets invoked thus permanently failing those.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
- Don't always print to stdout that the kubelet is starting.
instead delegate this to the callers of TryStartKubelet.
- Add a new root kubeadm init phase called "kubelet-finalize"
- Add a sub-phase to "kubelet-finalize"
called "experimental-cert-rotation"
- "cert-rotation" performs the following actions:
- tries to guess if kubelet client cert rotation is enabled
- update the kubelet.conf to use the rotatable cert/key
The PR introducing 5bb8069 got merged accidentally (the CI robot not
respecting a hold). Hence, the feedback to that PR is merged separately.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
As the hyperkube image is itself deprecated and moved out of tree, its use with
kubeadm gets deprecated too. Hence, deprecation messages will be printed when
it is used.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
ToClientSet() in kubeconfig.go creates a clientset from
the passed Config object (kubeconfig). For IP addresses
that are not reachable e.g. Get() calls for ConfigMaps
can block for a few minutes with the default timeout.
Modify the timeout to a shorter value by passing an override.
There are two writes yet only one read on a non-buffered channel that is
created locally and not passed anywhere else.
Therefore, it could leak one of its two spawned Goroutines if either:
* The provided `f` takes longer than an erroneous result from
`waiter.WaitForHealthyKubelet`, or;
* The provided `f` completes before an erroneous result from
`waiter.WaitForHealthyKubelet`.
The fix is to add a one-element buffer so that the channel write happens
for the second Goroutine in these cases, allowing it to finish and freeing
references to the now-buffered channel, letting it to be GC'd.
3993c42431 introduced the propagation of *_PROXY
host env. variables to the kube-proxy container.
To allow The NODE_NAME variable to be properly updated by the downward
API make, sure we preserve the existing variables when adding *_PROXY.
This change removes dependencies on the internal types of the kubelet and
kube-proxy component configs. Along with that defaulting and validation is
removed as well. kubeadm will display a warning, that it did not verify the
component config upon load.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Checking if the path exists before creating the volume is
problematic because the path will be created regardless
after the initial call to "kubeadm init" and once the CM Pod
is running.
Then on subsequent calls to "kubeadm init" or the "control-plane"
phase the manifest for the CM will be different.
Always mount this path, but also consider the user provided
flag override from ClusterConfiguration.
Used strings instead of bytes in the TestTokenOutput test cases as
expected output is a plain text.
This should also simplify the data representation and the test code
a bit.
The flag has been problematic and abused by users.
While perhaps its original purpose was to be able to feed
a new version of the control-plane it also made it possible
to apply modifications to the ClusterConfiguration object
in the cluster. The lack of a feature in kubeadm for reconfiguration
of running clusters resulted in users using this flag for
the same purpose.
While it works for certain scenarios like updating
a static Pod for this control-plane only, it can result in
unexpected behavior if the user has for example fed a node name
different than the host name, when originally they created this node.
kubeadm 1.16 introduced the "kustomize" feature that
is a potential replacement for this user demand.
Add warning that this flag should not be used.
This field is not used in the kubeadm code. It was brought from
cli-runtime where it's used to support complex relationship between
command line parameters, which is not present in kubeadm.