After the shift for init phases, GetStaticPodSpecs() from
app/phases/controlplane/manifests.go gets called on each control-plane
component sub-phase. This ends up calling the Printf from
AddExtraHostPathMounts() in app/phases/controlplane/volumes.go
multiple times printing the same volumes for different components.
- Remove the Printf call from AddExtraHostPathMounts().
- Print all volumes for a component in CreateStaticPodFiles() using klog
V(2).
Perhaps in the future a bigger refactor is needed here were a
single control-plane component spec can be requested instead of a
map[string]v1.Pod.
The selected key type is defined by kubeadm's --feature-gates option:
if it contains PublicKeysECDSA=true then ECDSA keys will be generated
and used.
By default RSA keys are used still.
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
When doing the very first upgrade from a cluster that contains the
source of truth in the ClusterStatus struct, the new kubeadm logic
will try to retrieve this information from annotations.
This changeset adds to both etcd and apiserver endpoint retrieval the
special case in which they won't retry if we are in such cases. The
logic will retry if we find any unknown error, but will not retry in
the following cases:
- etcd annotations do not contain etcd endpoints, but the overall list
of etcd pods is greater than 0. This means that we listed at least
one etcd pod, but they are missing the annotation.
- API server annotation is not found on the api server pod for a given
node name, but no errors aside from that one were found. This means
that the API server pod is present, but is missing the annotation.
In both cases there is no point in retrying, and so, this speeds up the
upgrade path when coming from a previous existing cluster.
While `ClusterStatus` will be maintained and uploaded, it won't be
used by the internal `kubeadm` logic in order to determine the etcd
endpoints anymore.
The only exception is during the first upgrade cycle (`kubeadm upgrade
apply`, `kubeadm upgrade node`), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
While kubeadm does not support CA rotation,
the users might still attempt to perform this manually.
For kubeconfig files, updating to a new CA is not reflected
and users need to embed new CA PEM manually.
On kubeconfig cert renewal, always keep the embedded CA
in sync with the one on disk.
Includes a couple of typo fixes.
It turns out that the dual-stack feature enabled doesn't mean that
the cluster MUST be dual-stack, it only indicates that it MAY be
dual-stack but CAN be single-stack.
We should relax the validation to allow single-stack clusters
with dual-stack enabled.
If a Node name in the cluster is already taken and this Node is Ready,
prevent TLS bootsrap on "kubeadm join" and exit early.
This change requires that a new ClusterRole is granted to the
"system:bootstrappers:kubeadm:default-node-token" group to be
able get Nodes in the cluster. The same group already has access
to obtain objects such as the KubeletConfiguration and kubeadm's
ClusterConfiguration.
The motivation of this change is to prevent undefined behavior
and the potential control-plane breakdown if such a cluster
is racing to have two nodes with the same name for long periods
of time.
The following values are validated in the following precedence
from lower to higher:
- actual hostname
- NodeRegistration.Name (or "--node-name") from JoinConfiguration
- "--hostname-override" passed via kubeletExtraArgs
If the user decides to not let kubeadm know about a custom node name
and to instead override the hostname from a kubelet systemd unit file,
kubeadm will not be able to detect the problem.
- Extend the exponential backoff for add/remove/... retry to
11 steps ~=106 seconds. From experiments for 3 and more members
the race can take more that ~=26 seconds.
- Increase the dialTimeout for client creation to 40 seconds.
20 seconds seems racy for 3 and more members.
For the etcd client, amend AddMember() to handle a very
rare bug when multiple members can end up with the same
name. Match the member peer address and assign it the name of
the member we are adding. For the rest of the members with missing
names use their member IDs as name. The etcd node is not disrupted
by the unknown names.
The important aspects are:
- The number of members of the initial cluster must match
the members in the cluster.
- The member we are current adding is present in the initial cluster.
The CoreDNS GA feature-gate in kubeadm was deprecated since 1.13.
The k8s policy is to remove the gate 2 releases after it transitions
to GA:
https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecation
We kept it around for longer to prevent existing setups from breaking
as it caused minimal maintenance overhead.
The warning message
```
[config] WARNING: Ignored YAML document with GroupVersionKind ...
```
is printed for all GVKs that are not part of the kubeadm core types.
This is wrong as the component config types are supported and successfully
parsed and used despite the fact that the warning is printed for them too.
Hence this simple fix first checks if the group of the GVK is a supported
component config group and the warning is printed only if it's not.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
kubeadm deploys the apiserver, controller-manager and the scheduler
using liveness probes.
The bind-address option is used to configure the probe address, in
case this is configured with an unspecified address, the probe
will fail. When using an unspecified address the probe host field is
left empty, otherwise the bind-address is used.
The function validateKubeConfig() can end up comparing
a user generated kubeconfig to a kubeconfig generated by kubeadm.
If a user kubeconfig has a CA that is base64 encoded with whitespace,
if said kubeconfig is loaded using clientcmd.LoadFromFile()
the CertificateAuthorityData bytes will be decoded from base64
and placed in the v1.Config raw. On the other hand a kubeconfig
generated by kubeadm will have the ca.crt parsed to a Certificate
object with whitespace ignored in the PEM input.
Make sure that validateKubeConfig() tolerates whitespace differences
when comparing CertificateAuthorityData.
kubeadm removed the deprecated "--address" flag for controller-manager
and scheduler in favor of "--bind-address"
We should use bind-address to configure the manifest probe addresses.
If the user has modified the kubelet.conf post TLS bootstrap
to become invalid, the function getNodeNameFromKubeletConfig() can
panic. This was observed to trigger in "kubeadm reset" use cases.
Add basic validation and unit tests around parsing the kubelet.conf
with the aforementioned function.
Currently if the controlplane fails to init, we print out a message
with some example commands that only show docker CLI.
This tries to improve that by printing the example commands for
docker, cri-o and containerd by checking the socket looking for
the default docker socket.
CreateOrMutateConfigMap was not resilient when it was trying to Create
the ConfigMap. If this operation returned an unknown error the whole
operation would fail, because it was strict in what error it was
expecting right afterwards: if the error returned by the Create call
was a IsAlreadyExists error, it would work fine. However, if an
unexpected error (such as an EOF) happened, this call would fail.
We are seeing this error specially when running control plane node
joins in an automated fashion, where things happen at a relatively
high speed pace.
It was specially easy to reproduce with kind, with several control
plane instances. E.g.:
```
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I1130 11:43:42.788952 887 round_trippers.go:443] POST https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s in 1013 milliseconds
Post https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s: unexpected EOF
unable to create ConfigMap
k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient.CreateOrMutateConfigMap
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient/idempotency.go:65
```
This change makes this logic more resilient to unknown errors. It will
retry on the light of unknown errors until some of the expected error
happens: either `IsAlreadyExists`, in which case we will mutate the
ConfigMap, or no error, in which case the ConfigMap has been created.
kubeadm always use the IPv4 localhost address by defaultA for etcd
The probe hostname is obtained before the generation of the etcd
parameters, so it can't detect the right IP familiy for the
host of the probe.
This causes that with IPv6 clusters doesn't work because the probe
uses the IPv4 localhost address.
This patchs configures the right localhost address based on the used
AdvertiseAddress IP family.
- Use `schema.TypeMeta` instead of custom `struct` for VK
- More strict check on GVK after `Interpret` in `SplitYAMLDocuments`
- Adjust `Interpret` comment to include JSON
- Add retrieveValidatedConfigInfo to be able to better unit
test the function.
- Break some of the logic in RetrieveValidatedConfigInfo into
helper functions.
- Pass JoinConfiguration.Discovery to RetrieveValidatedConfigInfo
instead of JoinConfiguration.
- Use the discovery timeout per API call to fetch cluster-info
(optionally the user value can be slit in 2).
- Add detailed unit tests for retrieveValidatedConfigInfo.
kubeadm's current implementation of component config support is "kind" centric.
This has its downsides. Namely:
- Kind names and numbers can change between config versions.
Newer kinds can be ignored. Therefore, detection of a version change is
considerably harder.
- A component config can have only one kind that is managed by kubeadm.
Thus a more appropriate way to identify component configs is required.
Probably the best solution identified so far is a config group.
A group name is unlikely to change between versions, while the kind names and
structure can.
Tracking component configs by group name allows us to:
- Spot more easily config version changes and manage alternate versions.
- Support more than one kind in a config group/version.
- Abstract component configs by hiding their exact structure.
Hence, this change rips off the old kind based support for component configs
and replaces it with a group name based one. This also has the following
extra benefits:
- More tests were added.
- kubeadm now errors out if an unsupported version of a known component group
is used.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
- Add a new preflight check for upgrade that runs the pause container
with -v in a Job.
- Wait for the Job to complete and return an error after N seconds.
- Manually clean the Job because we don't have the TTL controller
enabled in kubeadm yet (it's still alpha).
Currently this uses the combined kubelet output (stdout + stderr), but this
causes parsing issues if the kubelet logs something on stderr.
Thus we ignore the entire stderr and use stdout only.
We do disable a couple of tests here. That is because the fakeexecer only
supports combined output and return a "not supported" error if `.Output()`
gets invoked thus permanently failing those.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
- Don't always print to stdout that the kubelet is starting.
instead delegate this to the callers of TryStartKubelet.
- Add a new root kubeadm init phase called "kubelet-finalize"
- Add a sub-phase to "kubelet-finalize"
called "experimental-cert-rotation"
- "cert-rotation" performs the following actions:
- tries to guess if kubelet client cert rotation is enabled
- update the kubelet.conf to use the rotatable cert/key
The PR introducing 5bb8069 got merged accidentally (the CI robot not
respecting a hold). Hence, the feedback to that PR is merged separately.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
As the hyperkube image is itself deprecated and moved out of tree, its use with
kubeadm gets deprecated too. Hence, deprecation messages will be printed when
it is used.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
ToClientSet() in kubeconfig.go creates a clientset from
the passed Config object (kubeconfig). For IP addresses
that are not reachable e.g. Get() calls for ConfigMaps
can block for a few minutes with the default timeout.
Modify the timeout to a shorter value by passing an override.
There are two writes yet only one read on a non-buffered channel that is
created locally and not passed anywhere else.
Therefore, it could leak one of its two spawned Goroutines if either:
* The provided `f` takes longer than an erroneous result from
`waiter.WaitForHealthyKubelet`, or;
* The provided `f` completes before an erroneous result from
`waiter.WaitForHealthyKubelet`.
The fix is to add a one-element buffer so that the channel write happens
for the second Goroutine in these cases, allowing it to finish and freeing
references to the now-buffered channel, letting it to be GC'd.
3993c42431 introduced the propagation of *_PROXY
host env. variables to the kube-proxy container.
To allow The NODE_NAME variable to be properly updated by the downward
API make, sure we preserve the existing variables when adding *_PROXY.
This change removes dependencies on the internal types of the kubelet and
kube-proxy component configs. Along with that defaulting and validation is
removed as well. kubeadm will display a warning, that it did not verify the
component config upon load.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Checking if the path exists before creating the volume is
problematic because the path will be created regardless
after the initial call to "kubeadm init" and once the CM Pod
is running.
Then on subsequent calls to "kubeadm init" or the "control-plane"
phase the manifest for the CM will be different.
Always mount this path, but also consider the user provided
flag override from ClusterConfiguration.
Used strings instead of bytes in the TestTokenOutput test cases as
expected output is a plain text.
This should also simplify the data representation and the test code
a bit.
The flag has been problematic and abused by users.
While perhaps its original purpose was to be able to feed
a new version of the control-plane it also made it possible
to apply modifications to the ClusterConfiguration object
in the cluster. The lack of a feature in kubeadm for reconfiguration
of running clusters resulted in users using this flag for
the same purpose.
While it works for certain scenarios like updating
a static Pod for this control-plane only, it can result in
unexpected behavior if the user has for example fed a node name
different than the host name, when originally they created this node.
kubeadm 1.16 introduced the "kustomize" feature that
is a potential replacement for this user demand.
Add warning that this flag should not be used.
This field is not used in the kubeadm code. It was brought from
cli-runtime where it's used to support complex relationship between
command line parameters, which is not present in kubeadm.