kubeadm uses image tags (such as `v3.4.3-0`) to specify the version of
etcd. However, the upgrade code in kubeadm uses the etcd client API to
fetch the currently deployed version. The result contains only the etcd
version without the additional information (such as image revision) that
is normally found in the tag. As a result it would refuse an upgrade
where the etcd versions match and the only difference is the image
revision number (`v3.4.3-0` to `v3.4.3-1`).
To fix the above issue, the following changes are done:
- Replace the existing etcd version querying code, that uses the etcd
client library, with code that returns the etcd image tag from the
local static pod manifest file.
- If an etcd `imageTag` is specified in the ClusterConfiguration during
upgrade, use that tag instead. This is done regardless if the tag was
specified in the configuration stored in the cluster or with a new
configuration supplied by the `--config` command line parameter.
If no custom tag is specified, kubeadm will select one depending on
the desired Kubernetes version.
- `kubeadm upgrade plan` no longer prints upgrade information about
external etcd. It's the user's responsibility to manage it in that
case.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Most of these could have been refactored automatically but it wouldn't
have been uglier. The unsophisticated tooling left lots of unnecessary
struct -> pointer -> struct transitions.
The selected key type is defined by kubeadm's --feature-gates option:
if it contains PublicKeysECDSA=true then ECDSA keys will be generated
and used.
By default RSA keys are used still.
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
When doing the very first upgrade from a cluster that contains the
source of truth in the ClusterStatus struct, the new kubeadm logic
will try to retrieve this information from annotations.
This changeset adds to both etcd and apiserver endpoint retrieval the
special case in which they won't retry if we are in such cases. The
logic will retry if we find any unknown error, but will not retry in
the following cases:
- etcd annotations do not contain etcd endpoints, but the overall list
of etcd pods is greater than 0. This means that we listed at least
one etcd pod, but they are missing the annotation.
- API server annotation is not found on the api server pod for a given
node name, but no errors aside from that one were found. This means
that the API server pod is present, but is missing the annotation.
In both cases there is no point in retrying, and so, this speeds up the
upgrade path when coming from a previous existing cluster.
While `ClusterStatus` will be maintained and uploaded, it won't be
used by the internal `kubeadm` logic in order to determine the etcd
endpoints anymore.
The only exception is during the first upgrade cycle (`kubeadm upgrade
apply`, `kubeadm upgrade node`), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
- Extend the exponential backoff for add/remove/... retry to
11 steps ~=106 seconds. From experiments for 3 and more members
the race can take more that ~=26 seconds.
- Increase the dialTimeout for client creation to 40 seconds.
20 seconds seems racy for 3 and more members.
For the etcd client, amend AddMember() to handle a very
rare bug when multiple members can end up with the same
name. Match the member peer address and assign it the name of
the member we are adding. For the rest of the members with missing
names use their member IDs as name. The etcd node is not disrupted
by the unknown names.
The important aspects are:
- The number of members of the initial cluster must match
the members in the cluster.
- The member we are current adding is present in the initial cluster.
The warning message
```
[config] WARNING: Ignored YAML document with GroupVersionKind ...
```
is printed for all GVKs that are not part of the kubeadm core types.
This is wrong as the component config types are supported and successfully
parsed and used despite the fact that the warning is printed for them too.
Hence this simple fix first checks if the group of the GVK is a supported
component config group and the warning is printed only if it's not.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
kubeadm deploys the apiserver, controller-manager and the scheduler
using liveness probes.
The bind-address option is used to configure the probe address, in
case this is configured with an unspecified address, the probe
will fail. When using an unspecified address the probe host field is
left empty, otherwise the bind-address is used.
kubeadm removed the deprecated "--address" flag for controller-manager
and scheduler in favor of "--bind-address"
We should use bind-address to configure the manifest probe addresses.
If the user has modified the kubelet.conf post TLS bootstrap
to become invalid, the function getNodeNameFromKubeletConfig() can
panic. This was observed to trigger in "kubeadm reset" use cases.
Add basic validation and unit tests around parsing the kubelet.conf
with the aforementioned function.
CreateOrMutateConfigMap was not resilient when it was trying to Create
the ConfigMap. If this operation returned an unknown error the whole
operation would fail, because it was strict in what error it was
expecting right afterwards: if the error returned by the Create call
was a IsAlreadyExists error, it would work fine. However, if an
unexpected error (such as an EOF) happened, this call would fail.
We are seeing this error specially when running control plane node
joins in an automated fashion, where things happen at a relatively
high speed pace.
It was specially easy to reproduce with kind, with several control
plane instances. E.g.:
```
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I1130 11:43:42.788952 887 round_trippers.go:443] POST https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s in 1013 milliseconds
Post https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s: unexpected EOF
unable to create ConfigMap
k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient.CreateOrMutateConfigMap
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient/idempotency.go:65
```
This change makes this logic more resilient to unknown errors. It will
retry on the light of unknown errors until some of the expected error
happens: either `IsAlreadyExists`, in which case we will mutate the
ConfigMap, or no error, in which case the ConfigMap has been created.
kubeadm always use the IPv4 localhost address by defaultA for etcd
The probe hostname is obtained before the generation of the etcd
parameters, so it can't detect the right IP familiy for the
host of the probe.
This causes that with IPv6 clusters doesn't work because the probe
uses the IPv4 localhost address.
This patchs configures the right localhost address based on the used
AdvertiseAddress IP family.
- Use `schema.TypeMeta` instead of custom `struct` for VK
- More strict check on GVK after `Interpret` in `SplitYAMLDocuments`
- Adjust `Interpret` comment to include JSON
kubeadm's current implementation of component config support is "kind" centric.
This has its downsides. Namely:
- Kind names and numbers can change between config versions.
Newer kinds can be ignored. Therefore, detection of a version change is
considerably harder.
- A component config can have only one kind that is managed by kubeadm.
Thus a more appropriate way to identify component configs is required.
Probably the best solution identified so far is a config group.
A group name is unlikely to change between versions, while the kind names and
structure can.
Tracking component configs by group name allows us to:
- Spot more easily config version changes and manage alternate versions.
- Support more than one kind in a config group/version.
- Abstract component configs by hiding their exact structure.
Hence, this change rips off the old kind based support for component configs
and replaces it with a group name based one. This also has the following
extra benefits:
- More tests were added.
- kubeadm now errors out if an unsupported version of a known component group
is used.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
ToClientSet() in kubeconfig.go creates a clientset from
the passed Config object (kubeconfig). For IP addresses
that are not reachable e.g. Get() calls for ConfigMaps
can block for a few minutes with the default timeout.
Modify the timeout to a shorter value by passing an override.
There are two writes yet only one read on a non-buffered channel that is
created locally and not passed anywhere else.
Therefore, it could leak one of its two spawned Goroutines if either:
* The provided `f` takes longer than an erroneous result from
`waiter.WaitForHealthyKubelet`, or;
* The provided `f` completes before an erroneous result from
`waiter.WaitForHealthyKubelet`.
The fix is to add a one-element buffer so that the channel write happens
for the second Goroutine in these cases, allowing it to finish and freeing
references to the now-buffered channel, letting it to be GC'd.
This field is not used in the kubeadm code. It was brought from
cli-runtime where it's used to support complex relationship between
command line parameters, which is not present in kubeadm.
of service subnets.
Update DNS, Cert, dry-run logic to support list of Service CIDRs.
Added unit tests for GetKubernetesServiceCIDR and updated
GetDNSIP() unit test to inclue dual-sack cases.
Whenever kubeadm needs to fetch its configuration from the cluster, it gets
the component configuration of all supported components (currently only kubelet
and kube-proxy). However, kube-proxy is deemed an optional component and its
installation may be skipped (by skipping the addon/kube-proxy phase on init).
When kube-proxy's installation is skipped, its config map is not created and
all kubeadm operations, that fetch the config from the cluster, are bound to
fail with "not found" or "forbidden" (because of missing RBAC rules) errors.
To fix this issue, we have to ignore the 403 and 404 errors, returned on an
attempt to fetch kube-proxy's component config from the cluster.
The `GetFromKubeProxyConfigMap` function now supports returning nil for both
error and object to indicate just such a case.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
The new etcd balancer (>3.3.14, 3.4.0) uses an asynchronous resolver for
endpoints. Without "WithBlock", the client may return before the
connection is up.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>