Commit Graph

111 Commits

Author SHA1 Message Date
Lubomir I. Ivanov
c29450eb00 kubeadm: apply retries to all API calls in idempotency.go
The idempotency.go (perhaps not so accurately named) contains
API calls that kubeadm does against an API server using client-go.

Some users seem to have unstable setups where for unknown reasons
the API server can be unavailable or refuse to respond as expected.

Use PollUntilContextTimeout in all exported functions to ensure
such API calls are all retry-able.

NOTE: The context passed to PollUntilContextTimeout is not propagated
in the polled function. Instead the poll function creates it's own
context 'ctx := context.Background()', this is to avoid
breaking expectations on the side of the callers, that expect
a certain type of error and not "context timeout" errors.

Additional changes:
- Make all context.TODO() -> context.Background()
- Update all unit tests and make sure during testing the retry
interval and timeout are short. Test coverage of idempotency.go
is at ~97%.
- Remove the TestMutateConfigMapWithConflict test. It does not
contribute much, because conflict handling is done at the API,
server side, not on the side of kubeadm. This simulating this is not
needed.
2024-02-18 13:14:32 +02:00
xin.li
deec79ad8d kubeadm: increase ut coverage for apiclient/idempotency
Signed-off-by: xin.li <xin.li@daocloud.io>
2024-02-05 23:02:48 +08:00
Lubomir I. Ivanov
2cdd9a7130 kubeadm: use separate context in GetConfigMapWithShortRetry
Intentionally pass a new context to this API call.
This will let the API call run independently of the parent
context timeout, which is quite short and can cause the API
call to return abruptly.
2024-01-19 00:19:07 +02:00
Lubomir I. Ivanov
26a79e4c0b kubeadm: special case context errors in GetConfigMapWithShortRetry
If some code is about to go over the context deadline,
"x/time/rate/rate.go" would return and untyped error with the string
"would exceed context deadline". If some code already exceeded
the deadline the error would be of type DeadlineExceeded.
Ignore such context errors and only store API and connectivity errors.
2024-01-18 15:35:25 +02:00
Lubomir I. Ivanov
54a6e6a772 kubeadm: keep a function with short timeout in idempotency.go
- Name the function GetConfigMapWithShortRetry to be
easier to understand that the function is with a very short timeout.
Add note that this function should be used in cases there is a
fallback to local config.
- Apply custom hardcoded interval of 50ms and timeout of 350ms to it.
Previously the fucntion used exp backoff with 5 steps up to ~340ms.
2024-01-16 17:53:21 +02:00
Lubomir I. Ivanov
caf5311413 kubeadm: start using the Timeouts struct values
Propagate usage of the Timeout struct values.
Apply sanitazation to timeout constants in contants.go.
2024-01-14 15:07:56 +02:00
Lubomir I. Ivanov
374e41cf66 kubeadm: replace deprecated wait.Poll() and wait.PollImmediate()
Replace the usage of the deprecated wait.Poll() and
wait.PollImmediate() functions with wait.PollUntilContextTimeout().
Since we don't have piping of context around kubeadm,
use context.Background() everywhere.

Some wait.Poll() functions were converted to "immediate" as there
is no point for them to not be. This is done for consistency.

Replace the only instance of wait.JitterUntil with
wait.PollUntilContextTimeout. JitterUntil is not deprecated
but this is also done for consistency.
2024-01-14 15:07:55 +02:00
Lubomir I. Ivanov
32fbb23f3b kubeadm: remove usage of the TryRunCommand() function
The function TryRunCommand() uses an exponential backoff,
which is good, but it's inconsistent and only used in a couple
of places.

Remove its usage in the token.go#UpdateOrCreateTokens()
and switch to using the standard function used in other places -
PollUntilContextTimeout().

Remove wait.go#TryRunCommand(), as there are no other usages.
2023-12-20 08:51:00 +02:00
Lubomir I. Ivanov
557118897d kubeadm: drop concurrency when waiting for kubelet /healthz
The function wait.go#WaitForKubeletAndFunc() has been used in
a number of places in kubeadm. It starts a go routine to wait for
the kubelet /healthz and in parallel starts another go routine
to wait for an custom function.

This logic is problematic. If kubeadm is waiting for the kubelet
in parallel with something that requires the kubelet, the right
solution would be to first wait for the kubelet in serial and only
then proceed with the other action. The parallelism here particularly
during "init" required a unwanted "initial timeout" of 40s, before
the kubelet waiting even starts. In most cases, this makes the kubelet
waiter to not even start, while the main point of waiting becomes
the "other action".

- Remove the function WaitForKubeletAndFunc() from the Waiter interface.
- Rename the function WaitForHealthyKubelet() to just WaitForKubelet()
to be consistent with the naming WaitForAPI().
- Update WaitForKubelet() to not use TryRunCommand() and instead
use PollUntilContextTimeout().
- Remove the "initial timeout" of 40s in WaitForKubelet().
- Make both WaitForKubelet() and WaitForAPI() use similar error
handling and output.
- Update all usage of WaitForKubelet() to be a serial call before
any other action, such as another wait* call.
- Make the default wait timeout for the kubelet
/healthz to be 1 minute (kubeadmconstants.DefaultKubeletTimeout).
- Apply updates to all implementations of the Waiter interface.
2023-12-20 08:51:00 +02:00
Kubernetes Prow Robot
589d6f3886
Merge pull request #117630 from skitt/intstr-fromint32-cluster-lifecycle
Cluster lifecycle: use new intstr functions
2023-05-19 08:50:30 -07:00
Stephen Kitt
4c83aae2cc
kubeadm: replace intstr.FromInt with intstr.FromInt32
This touches cases where FromInt() is used on numeric constants, or
values which are already int32s, or int variables which are defined
close by and can be changed to int32s with little impact.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-05-01 09:17:50 +02:00
cui fliter
1359ebcc5b fix doc mismatch
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-04-16 18:29:45 +08:00
Lubomir I. Ivanov
54b73deaca kubeadm: handle dry run GET actions from fake discovery
The kubeadm dry run client reactor code is flawed as it assumes
all invoked "get" verb actions can be casted to GetAction.
Apparently that is not the case when Discovery().ServerVersion()
and other discovery calls are made. In such cases the action
type is the bare ActionImpl.

Catch if an action can be casted to ActionImpl and construct a
GetAction from it. GetActionImpl only suppersets ActionImpl with
a Name field (empty string in this case).

Add unit test for Discovery().ServerVersion().
2022-12-21 11:49:59 +02:00
QuantumEnergyE
847a39afc0 Retry patch when then service is unavailable or timeout. 2022-11-29 23:09:31 +08:00
SataQiu
d4cafe4738 kubeadm: optimize and make the usage consistent about apierrors.IsNotFound 2022-10-13 23:23:53 +08:00
SataQiu
61cd585ad2 kubeadm: remove redundant import alias and unused apiclient util funcs 2022-09-28 12:36:54 +08:00
B Aravind
b307321c0a
I have evaluated TODO retry remove if feasible (#112383)
Hi team, hope u all doing well.

I have checked TODO that to remove "retry" if feasible but it's important i think that it shouldn't be removed because it was used in every file on your repo.

Update idempotency.go

Update idempotency.go

Update idempotency.go
2022-09-13 22:31:00 -07:00
cndoit18
ec43037d0f style: remove redundant judgment
Signed-off-by: cndoit18 <cndoit18@outlook.com>
2022-08-25 12:07:36 +08:00
XuzhengChang
7824316e89 Print getStaticPodSingleHash err message 2022-03-02 09:34:12 +08:00
SataQiu
2c5aef9036 kubeadm: fix the bug that 'kubeadm init --dry-run --upload-certs' command failed with 'secret not found' error 2022-02-09 12:58:02 +08:00
Monokaix
eab74f15a5 Remove unused arg of kubeadm/WaitForKubeletAndFunc 2021-12-25 09:12:00 +08:00
haoyun
a600e31c55 test: add test for PatchNode when error happend
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-10-19 11:01:01 +08:00
haoyun
bd8f26c2d7 fix: patchNode retry logic
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-10-17 12:36:36 +08:00
Antonio Ojea
0cd75e8fec run hack/update-netparse-cve.sh 2021-08-20 10:42:09 +02:00
XinYang
72fd01095d
re-order imports for kubeadm
Signed-off-by: XinYang <xinydev@gmail.com>
2021-08-17 22:40:46 +08:00
XinYang
c2a8cd359f
re-order the imports in kubeadm
Signed-off-by: XinYang <xinydev@gmail.com>

Update cmd/kubeadm/app/cmd/join.go

Co-authored-by: Lubomir I. Ivanov <neolit123@gmail.com>
2021-07-04 16:41:27 +08:00
Sandeep Rajan
b8a1bd6a6c remove the deprecated kube-dns as an option in kubeadm 2021-03-04 12:12:54 -05:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
Xianglin Gao
04ef3628e3 refact CreateOrMutateConfigMap and MutateConfigMap with PollImmediate
Signed-off-by: Xianglin Gao <xianglin.gxl@alibaba-inc.com>
2020-06-11 00:31:22 +08:00
Xianglin Gao
6d572ea9b7 Add retries for CreateOrUpdateRoleBinding
Signed-off-by: Xianglin Gao <xianglin.gxl@alibaba-inc.com>
2020-06-10 00:23:46 +08:00
Xianglin Gao
052eb7d9a5 Add retries for CreateOrUpdateRole
Signed-off-by: Xianglin Gao <xianglin.gxl@alibaba-inc.com>
2020-06-10 00:12:25 +08:00
Jordan Liggitt
b7c2faf26c client-go dynamic client: add context to callers 2020-03-06 10:56:23 -05:00
Mike Danese
76f8594378 more artisanal fixes
Most of these could have been refactored automatically but it wouldn't
have been uglier. The unsophisticated tooling left lots of unnecessary
struct -> pointer -> struct transitions.
2020-03-05 14:59:47 -08:00
Taesun Lee
97fc3e6139
Fix typos in apiclient util
fix initalTimeout to initialTimeout
2020-02-20 15:20:04 +09:00
Mike Danese
25651408ae generated: run refactor 2020-02-08 12:30:21 -05:00
Mike Danese
3aa59f7f30 generated: run refactor 2020-02-07 18:16:47 -08:00
Mike Danese
d55d6175f8 refactor 2020-01-29 08:50:45 -08:00
Rafael Fernández López
14fe7225c1
kubeadm: Improve resiliency in CreateOrMutateConfigMap
CreateOrMutateConfigMap was not resilient when it was trying to Create
the ConfigMap. If this operation returned an unknown error the whole
operation would fail, because it was strict in what error it was
expecting right afterwards: if the error returned by the Create call
was a IsAlreadyExists error, it would work fine. However, if an
unexpected error (such as an EOF) happened, this call would fail.

We are seeing this error specially when running control plane node
joins in an automated fashion, where things happen at a relatively
high speed pace.

It was specially easy to reproduce with kind, with several control
plane instances. E.g.:

```
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I1130 11:43:42.788952     887 round_trippers.go:443] POST https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s  in 1013 milliseconds
Post https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s: unexpected EOF
unable to create ConfigMap
k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient.CreateOrMutateConfigMap
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient/idempotency.go:65
```

This change makes this logic more resilient to unknown errors. It will
retry on the light of unknown errors until some of the expected error
happens: either `IsAlreadyExists`, in which case we will mutate the
ConfigMap, or no error, in which case the ConfigMap has been created.
2019-11-30 22:48:16 +01:00
Jordan Liggitt
752cda4fc4 guard kubeadm dependencies on k8s.io/kubernetes 2019-11-13 15:05:11 -05:00
Louis Jackman
62e314a556
Remove potential Goroutine leak in kubeadm wait.go
There are two writes yet only one read on a non-buffered channel that is
created locally and not passed anywhere else.

Therefore, it could leak one of its two spawned Goroutines if either:
* The provided `f` takes longer than an erroneous result from
  `waiter.WaitForHealthyKubelet`, or;
* The provided `f` completes before an erroneous result from
  `waiter.WaitForHealthyKubelet`.

The fix is to add a one-element buffer so that the channel write happens
for the second Goroutine in these cases, allowing it to finish and freeing
references to the now-buffered channel, letting it to be GC'd.
2019-11-08 21:05:19 +00:00
Sandeep Rajan
16191db353 skip deployment update if migration fails 2019-11-06 10:55:54 -05:00
Yassine TIJANI
f984b4c7a2 remove ipallocator in favor of k/utils net package
Signed-off-by: Yassine TIJANI <ytijani@vmware.com>
2019-10-22 18:37:13 +02:00
dzzg
cd57039927
cleanup: fix typo "contstruct" -> "construct" 2019-08-06 06:22:46 +08:00
David Xia
fabfd950b1
cleanup: fix some log and error capitalizations
Part of https://github.com/kubernetes/kubernetes/issues/15863
2019-07-20 18:26:16 -04:00
Rafael Fernández López
26c9965a97
kubeadm: Add ability to retry ConfigMap get if certain errors happen
During the control plane joins, sometimes the control plane returns an
expected error when trying to download the `kubeadm-config` ConfigMap.
This is a workaround for this issue until the root cause is completely
identified and fixed.

Ideally, this commit should be reverted in the near future.
2019-06-12 17:49:27 +02:00
Daniel (Shijun) Qian
5268f69405 fix duplicated imports of k8s code (#77484)
* fix duplicated imports of api/core/v1

* fix duplicated imports of client-go/kubernetes

* fix duplicated imports of rest code

* change import name to more reasonable
2019-05-08 10:12:47 -07:00
Kubernetes Prow Robot
36ccff1b27
Merge pull request #76821 from ereslibre/kubeadm-config-retry-on-conflict
kubeadm: improve resiliency when conflicts arise when updating the kubeadm-config configmap
2019-04-23 15:50:01 -07:00
Rafael Fernández López
bc8bafd825
kubeadm: improve resiliency when conflicts arise when updating the kubeadm-config ConfigMap
Add the functionality to support `CreateOrMutateConfigMap` and `MutateConfigMap`.

* `CreateOrMutateConfigMap` will try to create a given ConfigMap object; if this ConfigMap
  already exists, a new version of the resource will be retrieved from the server and a
  mutator callback will be called on it. Then, an `Update` of the mutated object will be
  performed. If there's a conflict during this `Update` operation, retry until no conflict
  happens. On every retry the object is refreshed from the server to the latest version.

* `MutateConfigMap` will try to get the latest version of the ConfigMap from the server,
  call the mutator callback and then try to `Update` the mutated object. If there's a
  conflict during this `Update` operation, retry until no conflict happens. On every retry
  the object is refreshed from the server to the latest version.

Add unit tests for `MutateConfigMap`

* One test checks that in case of no conflicts, the update of the
  given ConfigMap happens without any issues.

* Another test mimics 5 consecutive CONFLICT responses when updating
  the given ConfigMap, whereas the sixth try it will work.
2019-04-23 15:40:37 +02:00
aaa
a5b88f69e9 Fix two minor bugs in kubeadm 2019-04-20 06:42:36 -04:00
Lubomir I. Ivanov
6f6b364b9c kubeadm: update output of init, join reset commands
- move most unrelated to phases output to klog.V(1)
- rename some prefixes for consistency - e.g.
[kubelet] -> [kubelet-start]
- control-plane-prepare: print details for each generated CP
component manifest.
- uppercase the info text for all "[reset].." lines
- modify the text for one line in reset
2019-03-06 03:17:35 +02:00