Add new a v1beta4.ResetConfiguration.UnmountFlags field that
can be used to pass in Linux unmount2() flags such as MNT_FORCE.
Default value continues to be 0 - i.e. no flags.
Instead of warnings when syscall.Unmount() causes errors,
store all the errors in an aggregate. Abort the reset operation if
at least one unmount error was encountered.
The function TryRunCommand() uses an exponential backoff,
which is good, but it's inconsistent and only used in a couple
of places.
Remove its usage in the token.go#UpdateOrCreateTokens()
and switch to using the standard function used in other places -
PollUntilContextTimeout().
Remove wait.go#TryRunCommand(), as there are no other usages.
The function wait.go#WaitForKubeletAndFunc() has been used in
a number of places in kubeadm. It starts a go routine to wait for
the kubelet /healthz and in parallel starts another go routine
to wait for an custom function.
This logic is problematic. If kubeadm is waiting for the kubelet
in parallel with something that requires the kubelet, the right
solution would be to first wait for the kubelet in serial and only
then proceed with the other action. The parallelism here particularly
during "init" required a unwanted "initial timeout" of 40s, before
the kubelet waiting even starts. In most cases, this makes the kubelet
waiter to not even start, while the main point of waiting becomes
the "other action".
- Remove the function WaitForKubeletAndFunc() from the Waiter interface.
- Rename the function WaitForHealthyKubelet() to just WaitForKubelet()
to be consistent with the naming WaitForAPI().
- Update WaitForKubelet() to not use TryRunCommand() and instead
use PollUntilContextTimeout().
- Remove the "initial timeout" of 40s in WaitForKubelet().
- Make both WaitForKubelet() and WaitForAPI() use similar error
handling and output.
- Update all usage of WaitForKubelet() to be a serial call before
any other action, such as another wait* call.
- Make the default wait timeout for the kubelet
/healthz to be 1 minute (kubeadmconstants.DefaultKubeletTimeout).
- Apply updates to all implementations of the Waiter interface.
Explictly show the help msg that `kubeadm upgrade plan` can only run
on the node where "admin.conf" exists, normally, this is the control
plane node.
Signed-off-by: Dave Chen <dave.chen@arm.com>
The notes printed to the user from common.go when
loadConfig fails are outdated and incorrect.
If the config cannot be loaded the user should not be instructed
to re-upload the config with kubeadm commands. Instead they
should do it manually with kubectl.
On loadConfig() error just wrap the error in a simple message
and show it to the user.
The current setup stomps missing IsNotFound errors for Node objects.
The underlying fetching of init configuration uses
the node object to construct an initconfiguration for this
upgrade process, so if the Node is missing the kube-config CM
will be reported as missing, which is incorrect.
The component connection between kube-apiserver and kubelet does not
require the "O" field on the Subject to be set to the
"system:masters" privileged group. It can be a less
privileged group like "kubeadm:cluster-admins".
Change the group in the apiserve-kubelet-client
certificate specification. This cert is passed to
--kubelet-client-certificate.
The addition of the "super-admin.conf" functionality required
init.go's Client() to create RBAC rules on its first creation.
However this created a problem with the "wait-control-plane" phase
of "kubeadm init" where a client is needed to connect to the
API server Discovery API's "/healthz" endpoint. The logic that ensures
the RBAC became the step where the API server wait was polled for.
To avoid this, introduce a new InitData function ClientWithoutBootstrap.
In "wait-control-plane" use this client, which has no permissions
(anonymous), but is sufficient to connect to the "/healthz".
Pending changes here would be:
- Stop using the "/healthz", instead a regular REST client from
the kubelet cert/key can be constructed.
- Make the wait for kubelet / API server linear (not in go routines).
In EnsureAdminClusterRoleBindingImpl() there are a couple of
polls around CRB create calls. When testing the function
a short retry and a timeout are used. These introduce around
2x20 fake client "connections" / poll iterations under a couple
of test cases with 2 seconds overall test increase.
Given the polls in EnsureAdminClusterRoleBindingImpl()
are of type PollUntilContextTimeout() with "immediate" set to "true",
the short retry / time out can be removed when testing,
because one poll iteration is guaranteed and the tested function
is at 100% coverage with reactors and test cases.
Poll CRB create calls for kubeadm:cluster-admins when using the
super-admin.conf credential. The prior create call that uses the
credential admin.conf was already polled. Polling this subsequent
call seems advisable to ensure that momentary errors in between
cannot trip EnsureAdminClusterRoleBindingImpl().
- Update unit tests in certs_test.go related to the "renew" CLI command.
- In /init, (d *initData) Client(), make sure that the new logic
for bootstrapping an "admin.conf" user is performed, by calling
EnsureAdminClusterRoleBinding() from the phases backend. Add a
"adminKubeConfigBootstrapped" flag that helps call this logic only
once per "kubeadm init" binary execution.
- In /phases/init include a new subphase for generating
the "super-admin.conf" file.
- In /phases/reset make sure the file "super-admin.conf" is
cleaned if present. Update unit tests.