CreateOrMutateConfigMap was not resilient when it was trying to Create
the ConfigMap. If this operation returned an unknown error the whole
operation would fail, because it was strict in what error it was
expecting right afterwards: if the error returned by the Create call
was a IsAlreadyExists error, it would work fine. However, if an
unexpected error (such as an EOF) happened, this call would fail.
We are seeing this error specially when running control plane node
joins in an automated fashion, where things happen at a relatively
high speed pace.
It was specially easy to reproduce with kind, with several control
plane instances. E.g.:
```
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I1130 11:43:42.788952 887 round_trippers.go:443] POST https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s in 1013 milliseconds
Post https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s: unexpected EOF
unable to create ConfigMap
k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient.CreateOrMutateConfigMap
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient/idempotency.go:65
```
This change makes this logic more resilient to unknown errors. It will
retry on the light of unknown errors until some of the expected error
happens: either `IsAlreadyExists`, in which case we will mutate the
ConfigMap, or no error, in which case the ConfigMap has been created.
- Add retrieveValidatedConfigInfo to be able to better unit
test the function.
- Break some of the logic in RetrieveValidatedConfigInfo into
helper functions.
- Pass JoinConfiguration.Discovery to RetrieveValidatedConfigInfo
instead of JoinConfiguration.
- Use the discovery timeout per API call to fetch cluster-info
(optionally the user value can be slit in 2).
- Add detailed unit tests for retrieveValidatedConfigInfo.
- Add a new preflight check for upgrade that runs the pause container
with -v in a Job.
- Wait for the Job to complete and return an error after N seconds.
- Manually clean the Job because we don't have the TTL controller
enabled in kubeadm yet (it's still alpha).
Currently this uses the combined kubelet output (stdout + stderr), but this
causes parsing issues if the kubelet logs something on stderr.
Thus we ignore the entire stderr and use stdout only.
We do disable a couple of tests here. That is because the fakeexecer only
supports combined output and return a "not supported" error if `.Output()`
gets invoked thus permanently failing those.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
This patch removes pkg/util/mount completely, and replaces it with the
mount package now located at k8s.io/utils/mount. The code found at
k8s.io/utils/mount was moved there from pkg/util/mount, so the code is
identical, just no longer in-tree to k/k.
update tests
add comment
amend var name
update comment
add check for empty slice
fix tests
fix mask size in test
review feedback
add ipv4 and ipv6 flag for mask sizes
add to violation exception list
remove import alias
run update-openapi-spec
review feedback
run update-bazel
review feedback
review feedback