This PR specifies minimum control plane version,
kubelet version and current K8s version for v1.20.
Signed-off-by: Kommireddy Akhilesh <akhileshkommireddy2412@gmail.com>
Client side period validation of certificates should not be
fatal, as local clock skews are not so uncommon. The validation
should be left to the running servers.
- Remove this validation from TryLoadCertFromDisk().
- Add a new function ValidateCertPeriod(), that can be used for this
purpose on demand.
- In phases/certs add a new function CheckCertificatePeriodValidity()
that will print warnings if a certificate does not pass period
validation, and caches certificates that were already checked.
- Use the function in a number of places where certificates
are loaded from disk.
CNI is no longer alpha and is widely used by almost every Kubernetes cluster, we should remove the alpha warnings that were originally added from the early days of CNI
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
The isCoreDNSVersionSupported() check assumes that
there is a running kubelet, that manages the CoreDNS containers.
If the containers are being created it is not possible to fetch
their image digest. To workaround that, a poll can be used in
isCoreDNSVersionSupported() and wait for the CoreDNS Pods
are expected to be running. Depending on timing and CNI
yet to be installed this can cause problems related to
addon idempotency of "kubeadm init", because if the CoreDNS
Pods are waiting for another step they will never get running.
Remove the function isCoreDNSVersionSupported() and assume that
the version is always supported. Rely on the Corefile migration
library to error out if it must.
- Ensure the directory is created with 0700 via a new function
called CreateDataDirectory().
- Call this function in the init phases instead of the manual call
to MkdirAll.
- Call this function when joining control-plane nodes with local etcd.
If the directory creation is left to the kubelet via the
static Pod hostPath mounts, it will end up with 0755
which is not desired.
A bug was discovered in the `enforceRequirements` func for `upgrade plan`.
If a command line argument that specifies the target Kubernetes version is
supplied, the returned `ClusterConfiguration` by `enforceRequirements` will
have its `KubernetesVersion` field set to the new version.
If no version was specified, the returned `KubernetesVersion` points to the
currently installed one.
This remained undetected for a couple of reasons
- It's only `upgrade plan` that allows for the version command line argument to
be optional (in `upgrade plan` it's mandatory)
- Prior to 1.19, the implementation of `upgrade plan` did not make use of the
`KubernetesVersion` returned by `enforceRequirements`.
`upgrade plan` supports this optional command line argument to enable
air-gapped setups (as not specifying a version on the command line will end up
looking for the latest version over the Interned).
Hence, the only option is to make `enforceRequirements` consistent in the
`upgrade plan` case and always return the currently installed version in the
`KubernetesVersion` field.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Pinning the kube-controller-manager and kube-scheduler kubeconfig files
to point to the control-plane-endpoint can be problematic during
immutable upgrades if one of these components ends up contacting an N-1
kube-apiserver:
https://kubernetes.io/docs/setup/release/version-skew-policy/#kube-controller-manager-kube-scheduler-and-cloud-controller-manager
For example, the components can send a request for a non-existing API
version.
Instead of using the CPE for these components, use the LocalAPIEndpoint.
This guarantees that the components would talk to the local
kube-apiserver, which should be the same version, unless the user
explicitly patched manifests.
A check that verifies that kubeadm does not "upgrade" to an older release was
overly optimized by skipping upgrade if the new version is the same as the old
one. This somewhat makes sense, but that way changes in any of the etcd fields
in the ClusterConfiguration won't be applied if the etcd version is not
changed.
Hence, this simple change ensures that the upgrade is done even when no version
change takes place.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
As a part of cleaning up inactive members (who haven't been active since
beginning of 2019) from OWNERS files, this commit removes euank and
CaoShuFeng from the list of reviewers.
dependencycheck verifies that no violating depdendencies exist in pkgs
passed via a JSON file generated from go list.
Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
The implementation consists of
- identifying all places where VolumeSource.PersistentVolumeClaim has
a special meaning and then ensuring that the same code path is taken
for an ephemeral volume, with the ownership check
- adding a controller that produces the PVCs for each embedded
VolumeSource.EphemeralVolume
- relaxing the PVC protection controller such that it removes
the finalizer already before the pod is deleted (only
if the GenericEphemeralVolume feature is enabled): this is
needed to break a cycle where foreground deletion of the pod
blocks on removing the PVC, which waits for deletion of the pod
The controller was derived from the endpointslices controller.
* Creates private keys and CSR files for all the control-plane certificates
* Helps with External CA mode of kubeadm
Signed-off-by: Richard Wall <richard.wall@jetstack.io>
Back in the v1alpha2 days the fuzzer test needed to be disabled. To ensure that
there were no config breaks and everything worked correctly extensive replacement
tests were put in place that functioned as unit tests for the kubeadm config utils
as well.
The fuzzer test has been reenabled for a long time now and there's no need for
these replacements. Hence, over time most of these were disabled, deleted and
refactored. The last remnants are part of the LoadJoinConfigurationFromFile test.
The test data for those old tests remains largely unused today, but it still receives
updates as it contains kubelet's and kube-proxy's component configs. Updates to these
configs are usually done because the maintainers of those need to add a new field.
Hence, to cleanup old code and reduce maintenance burden, the last test that depends
on this test data is finally refactored and cleaned up to represent a simple unit test
of `LoadJoinConfigurationFromFile`.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Over the course of recent development of the `componentconfigs` package,
it became evident that most of the tests in this package cannot be implemented without
using a component config. As all of the currently supported component configs are
external to the kubeadm project (kubelet and kube-proxy), practically all of the tests
in this package are now dependent on external code.
This is not desirable, because other component's configs may change frequently and
without much of a notice. In particular many configs add new fields without bumping their
versions. In addition to that, some components may be deprecated in the future and many
tests may use their configs as a place holder of a component config just to test some
common functionality.
To top that, there are many tests that test the same common functionality several times
(for each different component config).
Thus a kubeadm managed replacement and a fake test environment are introduced.
The new test environment uses kubeadm's very own `ClusterConfiguration`.
ClusterConfiguration is normally not managed by the `componentconfigs` package.
It's only used, because of the following:
- It's a versioned API that is under the control of kubeadm maintainers. This enables us to test
the componentconfigs package more thoroughly without having to have full and always up to date
knowledge about the config of another component.
- Other components often introduce new fields in their configs without bumping up the config version.
This, often times, requires that the PR that introduces such new fields to touch kubeadm test code.
Doing so, requires more work on the part of developers and reviewers. When kubeadm moves out of k/k
this would allow for more sporadic breaks in kubeadm tests as PRs that merge in k/k and introduce
new fields won't be able to fix the tests in kubeadm.
- If we implement tests for all common functionality using the config of another component and it gets
deprecated and/or we stop supporting it in production, we'll have to focus on a massive test refactoring
or just continue importing this config just for test use.
Thus, to reduce maintenance costs without sacrificing test coverage, we introduce this mini-framework
and set of tests here which replace the normal component configs with a single one (`ClusterConfiguration`)
and test the component config independent logic of this package.
As a result of this, many of the older test cases are refactored and greatly simplified to reflect
on the new change as well. The old tests that are strictly tied to specific component configs
(like the defaulting tests) are left unchanged.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Kubeadm setup of kube-controller-manager and kube-scheduler is
lacking the --port=0 option which caused the component to enable
the insecure port by default and serve insecurely on the default
node interface.
Add --port=0 by default to both components. Users are still allowed
the explicitly set the flag (via extraArgs), which allows them
to override this default kubeadm behavior and enable the insecure port.
NOTE: the flag is deprecated and should be removed from kubeadm manifests
once it's removed from core.
`kubeadm config upload` is a GA command that has been deprecated and scheduled
for removal since Kubernetes 1.15 (released 06/19/2019). This change will
finally removed it in Kubernetes 1.19 (planned for August 2020).
The original command has long since been replaced by a GA init phase:
`kubeadm init phase upload-config`
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Add PatchStaticPod() in staticpod/utils.go
Apply patches to static Pods in:
- phases/controlplane/CreateStaticPodFiles()
- phases/etcd/CreateLocalEtcdStaticPodManifestFile() and
CreateStackedEtcdStaticPodManifestFile()
Add unit tests and update Bazel.
Previously, separate interfaces were defined for Reserve and Unreserve
plugins. However, in nearly all cases, a plugin that allocates a
resource using Reserve will likely want to register itself for Unreserve
as well in order to free the allocated resource at the end of a failed
scheduling/binding cycle. Having separate plugins for Reserve and
Unreserve also adds unnecessary config toil. To that end, this patch
aims to merge the two plugins into a single interface called a
ReservePlugin that requires implementing both the Reserve and Unreserve
methods.
This change enables kubeadm upgrade plan to print a state table with
information regarding known component config API groups. Most importantly this
information includes current and preferred version for each group and an
indication if a manual user upgrade is required.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
`kubeadm upgrade plan` is using the external (currently `v1alpha1`) types of
the kubeadm output API to collect upgrade plans. This is counter intuitive
since code structure gets bound to the whatever version the output API is at.
In addition to that, the versioned API is used only in the very last stages of
a machine readable output (which is currently not implemented).
Hence, to increase flexibility and keep up with the standard Kubernetes
ecosystem practice, `kubeadm upgrade plan` is migrated to use the internal
types of the output API.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
UploadConfiguration() now always retries the underling API calls,
which can make TestUploadConfiguration run for a long time.
Remove the negative test cases, where errors are expected.
Negative test cases should be tested in app/util/apiclient,
where a short timeout / retry count should be possible for unit tests.
Currently, kubeadm would refuse to perfom an upgrade (or even planing for one)
if it detects a user supplied unsupported component config version. Hence,
users are required to manually upgrade their component configs and store them
in the config maps prior to executing `kubeadm upgrade plan` or
`kubeadm upgrade apply`.
This change introduces the ability to use the `--config` option of the
`kubeadm upgrade plan` and `kubeadm upgrade apply` commands to supply a YAML
file containing component configs to be used in place of the existing ones in
the cluster upon upgrade.
The old behavior where `--config` is used to reconfigure a cluster is still
supported. kubeadm automatically detects which behavior to use based on the
presence (or absense) of kubeadm config types (API group
`kubeadm.kubernetes.io`).
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
We were detecting the IP family that kube-proxy should use
based on the bind address, however, this is not valid when
using an unspecified address, because on those cases
kube-proxy adopts the IP family of the address reported
in the Node API object.
The IP family will be determined by the nodeIP used by the proxier
The order of precedence is:
1. config.bindAddress if bindAddress is not 0.0.0.0 or ::
2. the primary IP from the Node object, if set
3. if no IP is found it defaults to 127.0.0.1 and IPv4
Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
If an etcd member with the same address already exists, don't re-add it.
Instead, use the existing member list for creating the "initial cluster"
that is written for this etcd server instance static Pod.
Component configs are used by kubeadm upgrade plan at the moment. However, they
can prevent kubeadm upgrade plan from functioning if loading of an unsupported
version of a component config is attempted. For that matter it's best to just
stop loading component configs as part of the kubeadm config load process.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Under the feature gate DefaultPodTopologySpread, which will disable the legacy DefaultPodTopologySpread plugin from the default algorithm providers.
Signed-off-by: Aldo Culquicondor <acondor@google.com>
`getK8sVersionFromUserInput` would attempt to load the config from a user
specified YAML file (via the `--config` option of `kubeadm upgrade plan` or
`kubeadm upgrade apply`). This is done in order to fetch the `KubernetesVersion`
field of the `ClusterConfiguration`. The complete config is then immediately
discarded. The actual config that is used during the upgrade process is fetched
from within `enforceRequirements`.
This, along with the fact that `getK8sVersionFromUserInput` is always called
immediately after `enforceRequirements` makes it possible to merge the two.
Merging them would help us simplify things and avoid future problems in
component config related patches.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Until now, users were always asked to manually convert a component config to a
version supported by kubeadm, if kubeadm is not supporting its version.
This is true even for configs generated with older kubeadm versions, hence
getting users to make manual conversions on kubeadm generated configs.
This is not appropriate and user friendly, although, it tends to be the most
common case. Hence, we sign kubeadm generated component configs stored in
config maps with a SHA256 checksum. If a configs is loaded by kubeadm from a
config map and has a valid signature it's considered "kubeadm generated" and if
a version migration is required, this config is automatically discarded and a
new one is generated.
If there is no checksum or the checksum is not matching, the config is
considered as "user supplied" and, if a version migration is required, kubeadm
will bail out with an error, requiring manual config migration (as it's today).
The behavior when supplying component configs on the kubeadm command line
does not change. Kubeadm would still bail out with an error requiring migration
if it can recognize their groups but not versions.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
In case a malformed flag is passed to k8s components
such as "–foo", where "–" is not an ASCII dash character,
the components currently silently ignore the flag
and treat it as a positional argument.
Make k8s components/commands exit with an error if a positional argument
that is not empty is found. Include a custom error message for all
components except kubeadm, as cobra.NoArgs is used in a lot of
places already (can be fixed in a followup).
The kubelet already handles this properly - e.g.:
'unknown command: "–foo"'
This change affects:
- cloud-controller-manager
- kube-apiserver
- kube-controller-manager
- kube-proxy
- kubeadm {alpha|config|token|version}
- kubemark
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Lubomir I. Ivanov <lubomirivanov@vmware.com>
- Use a dummy nodename instead of OS hostname
- Inline toString() function
- Use backticks to wrap expected patch
- Remove redundant test name from error logs
kubelet.DownloadConfig is an old utility function which takes a client set and
a kubelet version, uses them to fetch the kubelet component config from a
config map, and places it in a local file. This function is simple to use, but
it is dangerous and unnecessary. Practically, in all cases the kubelet
configuration is present locally and does not need to be fetched from a config
map on the cluster (it just needs to be stored in a file).
Furthermore, kubelet.DownloadConfig does not use the kubeadm component configs
module in any way. Hence, a kubelet configuration fetched using it may not be
patched, validated, or otherwise, processed in any way by kubeadm other than
piping it to a file.
This patch replaces all but a single kubelet.DownloadConfig invocation with
equivalents that get the local copy of the kubelet component config and just
store it in a file. The sole remaining invocation covers the
`kubeadm upgrade node --kubelet-version` case.
In addition to that, a possible panic is fixed in kubelet.DownloadConfig and
it now takes the kubelet version parameter as string.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Current calculations assume that -trimpath is not passed to go tool
compile, which is not the case for test binaries built with bazel. This
causes issues for integration tests right now but is generally not
correct.
The approach taken here is a bit of a hack but it works on the
assumption that if and only if trimpath is passed, we are running under
bazel. I didn't see a good spot for pkgPath(), so I just copied it
around.
We can remove all uses of `dockershim` from `cmd/kubelet`, by just
passing the docker options to the kubelet in their pure form, instead of
using them to create a `dockerClientConfig` (which is defined in
dockershim). We can then construct the `dockerClientConfig` only when we
actually need it.
Remove one of two uses of Dockershim in `cmd/kubelet`. The other is for
creating a docker client which we pass to the Kubelet... we will handle
that refactor in a separate diff.
I'm fairly confident, though need to double check, that no one is
actually using this experimental dockershim behavior. If they are, I
think we will want to find a new way to support it (that doesn't require
using the Kubelet only to launch Dockershim).
kubeadm is setting the IPv6DualStack feature gate in the command line of the kubelet.
However, the kubelet is gradually moving away from command line flags towards component config use.
Hence, we should set the IPv6DualStack feature gate in the component config instead.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
In slower setups it can take more time for the existing cluster
to be in a healthy state, so the existing backoff of ~50 seconds
is apparently not sufficient.
The client dial can also fail for similar reasons.
Improve kubeadm's join toleration of adding new etcd members.
Wrap both the client dial and member add in a longer backoff
(up to ~200 seconds).
This particular change should be backported to the support skew.
In a future change for master, all etcd client operations should be
make consistent so that the etcd logic is in a sane state.
specifically:
- cmd/kubeadm/.import-restrictions
- we don't need to explicitly allow k8s.io repos (external or published)
- rm pkg/controller/.import-restrictions
- pkg/client/unversioned was removed in 59042
- pkg/kubectl/.import-restrictions
- pkg/printers is no longer used
- pkg/api was masking all of the pkg/apis prefixes
- rm staging/src/k8s.io/code-generator/cmd/lister-gen/.import-restrictions
- noop / empty file
- test/e2e/framework/.import-restrictions
- we don't need to explicitly allow k8s.io repos (external or published)
yaml has comments, so we can explain why we have certain rules or
certain prefixes
for those files that weren't already commented yaml, I converted them to
yaml and took a best guess at comments based on the PRs that introduced
or updated them
Use an init container that performs the pre-pull of a component
and then start an instance of "pause" as a regular container to
get the DaemonSet Pod in a Running state.
More details on this change in the code comments.
pkg/util/rlimit/rlimit_linux.go:25:1: exported function RlimitNumFiles should have comment or be unexported
pkg/util/rlimit/rlimit_linux.go:25:6: func name will be used as rlimit.RlimitNumFiles by other packages, and that stutters; consider calling this NumFiles
pkg/util/rlimit/rlimit_unsupported.go:25:1: exported function RlimitNumFiles should have comment or be unexported
pkg/util/rlimit/rlimit_unsupported.go:25:6: func name will be used as rlimit.RlimitNumFiles by other packages, and that stutters; consider calling this NumFiles
Ref: https://github.com/kubernetes/kubernetes/issues/68026
The flag "--use-api" for "alpha certs renew" was deprecated in 1.18.
Remove the flag and related logic that executes certificate renewal
using "api/certificates/v1beta1". kubeadm continues to be able
to create CSR files and renew using the local CA on disk.
kubeadm init prints:
W0410 23:02:10.119723 13040 manifests.go:225] the default kube-apiserver
authorization-mode is "Node,RBAC"; using "Node,RBAC"
Add a new function compareAuthzModes() and a unit test for it.
Make sure the warning is printed only if the user modes don't match
the defaults.
Allow overriding the dry-run temporary directory with
an env. variable (KUBEADM_INIT_DRYRUN_DIR).
Use the same variable in test/cmd/init_test.go.
This allows running integration tests as non-root.
Make getKubeadmPath() fetch the KUBEADM_PATH env. variable.
Panic if it's missing. Don't handle the "--kubeadm-path"
flag. Remove the same flag from the BUILD bazel test rule.
Don't handle "--kubeadm-cmd-skip" usage of this flag is missing
from the code base.
Remove usage of "kubeadmCmdSkip" as the flag "--kubeadm-cmd-skip"
is never passed.
If the kube-proxy/dns ConfigMap are missing, show warnings and assume
that these addons were skipped during "kubeadm init",
and that their redeployment on upgrade is not desired.
TODO: remove this once "kubeadm upgrade apply" phases are supported:
https://github.com/kubernetes/kubeadm/issues/1318
The TLS bootstrapping timeout is increased to 5 minutes with a retry
once every 5 seconds. Failing fast if the kubelet is not healthy is also
preserved.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
kubeadm uses image tags (such as `v3.4.3-0`) to specify the version of
etcd. However, the upgrade code in kubeadm uses the etcd client API to
fetch the currently deployed version. The result contains only the etcd
version without the additional information (such as image revision) that
is normally found in the tag. As a result it would refuse an upgrade
where the etcd versions match and the only difference is the image
revision number (`v3.4.3-0` to `v3.4.3-1`).
To fix the above issue, the following changes are done:
- Replace the existing etcd version querying code, that uses the etcd
client library, with code that returns the etcd image tag from the
local static pod manifest file.
- If an etcd `imageTag` is specified in the ClusterConfiguration during
upgrade, use that tag instead. This is done regardless if the tag was
specified in the configuration stored in the cluster or with a new
configuration supplied by the `--config` command line parameter.
If no custom tag is specified, kubeadm will select one depending on
the desired Kubernetes version.
- `kubeadm upgrade plan` no longer prints upgrade information about
external etcd. It's the user's responsibility to manage it in that
case.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
If the user does not provide --config or --control-plane
but provides some other flags such as --certificate-key
kubeadm is supposed to print a warning.
The logic around printing the warning is bogus. Implement
proper checks of when to print the warning.
b117a928 added a new check during "join" whether a Node with
the same name exists in the cluster.
When upgrading from 1.17 to 1.18 make sure the required RBAC
by this check is added. Otherwise "kubeadm join" will complain that
it lacks permissions to GET a Node.
A narrow assumption of what is contained in the `imageID` fields for the
CoreDNS pods causes a panic upon upgrade.
Fix this by using a proper regex to match a trailing SHA256 image digest
in `imageID` or return an error if it cannot find it.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
So multiple instances of kube-apiserver can bind on the same address and
port, to provide seamless upgrades.
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This builds on previous work but only sets the sysctlConnReuse value
if the kernel is known to be above 4.19. To avoid calling GetKernelVersion
twice, I store the value from the CanUseIPVS method and then check the version
constraint at time of expected sysctl call.
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
This change removes support for basic authn in v1.19 via the
--basic-auth-file flag. This functionality was deprecated in v1.16
in response to ATR-K8S-002: Non-constant time password comparison.
Similar functionality is available via the --token-auth-file flag
for development purposes.
Signed-off-by: Monis Khan <mok@vmware.com>
Most of these could have been refactored automatically but it wouldn't
have been uglier. The unsophisticated tooling left lots of unnecessary
struct -> pointer -> struct transitions.
The KCM is moving to means of only singing apiserver (kubelet) client
and kubelet serving certificates. See:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/20190607-certificates-api.md#signers
Up until now the experimental kubeadm functionality '--use-api'
under "kubeadm alpha certs renew" was using the KCM to sign *any*
certficate as long as the KCM has the root CA cert/key.
Post discussions with the kubeadm maintainers, it was decided that
this functionality should be removed from kubeadm due to the
requirement to have external signers for renewing the common
control-plane certificates that kubeadm manages.
- Uses correct pause image for Windows
- Omits systemd specific flags
- Common build flags function to be used by Linux and Windows
- Uses user configured image repository for Windows pause image
The old flag name doesn't make sense with the renamed API Priority and
Fairness feature, and it's still safe to change the flag since it hasn't done
anything useful in a released k8s version yet.
After the shift for init phases, GetStaticPodSpecs() from
app/phases/controlplane/manifests.go gets called on each control-plane
component sub-phase. This ends up calling the Printf from
AddExtraHostPathMounts() in app/phases/controlplane/volumes.go
multiple times printing the same volumes for different components.
- Remove the Printf call from AddExtraHostPathMounts().
- Print all volumes for a component in CreateStaticPodFiles() using klog
V(2).
Perhaps in the future a bigger refactor is needed here were a
single control-plane component spec can be requested instead of a
map[string]v1.Pod.
The selected key type is defined by kubeadm's --feature-gates option:
if it contains PublicKeysECDSA=true then ECDSA keys will be generated
and used.
By default RSA keys are used still.
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
When doing the very first upgrade from a cluster that contains the
source of truth in the ClusterStatus struct, the new kubeadm logic
will try to retrieve this information from annotations.
This changeset adds to both etcd and apiserver endpoint retrieval the
special case in which they won't retry if we are in such cases. The
logic will retry if we find any unknown error, but will not retry in
the following cases:
- etcd annotations do not contain etcd endpoints, but the overall list
of etcd pods is greater than 0. This means that we listed at least
one etcd pod, but they are missing the annotation.
- API server annotation is not found on the api server pod for a given
node name, but no errors aside from that one were found. This means
that the API server pod is present, but is missing the annotation.
In both cases there is no point in retrying, and so, this speeds up the
upgrade path when coming from a previous existing cluster.
While `ClusterStatus` will be maintained and uploaded, it won't be
used by the internal `kubeadm` logic in order to determine the etcd
endpoints anymore.
The only exception is during the first upgrade cycle (`kubeadm upgrade
apply`, `kubeadm upgrade node`), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
While kubeadm does not support CA rotation,
the users might still attempt to perform this manually.
For kubeconfig files, updating to a new CA is not reflected
and users need to embed new CA PEM manually.
On kubeconfig cert renewal, always keep the embedded CA
in sync with the one on disk.
Includes a couple of typo fixes.
- Add handlers for service account issuer metadata.
- Add option to manually override JWKS URI.
- Add unit and integration tests.
- Add a separate ServiceAccountIssuerDiscovery feature gate.
Additional notes:
- If not explicitly overridden, the JWKS URI will be based on
the API server's external address and port.
- The metadata server is configured with the validating key set rather
than the signing key set. This allows for key rotation because tokens
can still be validated by the keys exposed in the JWKs URL, even if the
signing key has been rotated (note this may still be a short window if
tokens have short lifetimes).
- The trust model of OIDC discovery requires that the relying party
fetch the issuer metadata via HTTPS; the trust of the issuer metadata
comes from the server presenting a TLS certificate with a trust chain
back to the from the relying party's root(s) of trust. For tests, we use
a local issuer (https://kubernetes.default.svc) for the certificate
so that workloads within the cluster can authenticate it when fetching
OIDC metadata. An API server cannot validly claim https://kubernetes.io,
but within the cluster, it is the authority for kubernetes.default.svc,
according to the in-cluster config.
Co-authored-by: Michael Taufen <mtaufen@google.com>
It turns out that the dual-stack feature enabled doesn't mean that
the cluster MUST be dual-stack, it only indicates that it MAY be
dual-stack but CAN be single-stack.
We should relax the validation to allow single-stack clusters
with dual-stack enabled.
If a Node name in the cluster is already taken and this Node is Ready,
prevent TLS bootsrap on "kubeadm join" and exit early.
This change requires that a new ClusterRole is granted to the
"system:bootstrappers:kubeadm:default-node-token" group to be
able get Nodes in the cluster. The same group already has access
to obtain objects such as the KubeletConfiguration and kubeadm's
ClusterConfiguration.
The motivation of this change is to prevent undefined behavior
and the potential control-plane breakdown if such a cluster
is racing to have two nodes with the same name for long periods
of time.
The following values are validated in the following precedence
from lower to higher:
- actual hostname
- NodeRegistration.Name (or "--node-name") from JoinConfiguration
- "--hostname-override" passed via kubeletExtraArgs
If the user decides to not let kubeadm know about a custom node name
and to instead override the hostname from a kubelet systemd unit file,
kubeadm will not be able to detect the problem.
- Extend the exponential backoff for add/remove/... retry to
11 steps ~=106 seconds. From experiments for 3 and more members
the race can take more that ~=26 seconds.
- Increase the dialTimeout for client creation to 40 seconds.
20 seconds seems racy for 3 and more members.
For the etcd client, amend AddMember() to handle a very
rare bug when multiple members can end up with the same
name. Match the member peer address and assign it the name of
the member we are adding. For the rest of the members with missing
names use their member IDs as name. The etcd node is not disrupted
by the unknown names.
The important aspects are:
- The number of members of the initial cluster must match
the members in the cluster.
- The member we are current adding is present in the initial cluster.
As part of #68522, Switching off the cAdvisor v1 Json API that we expose
directly. These include /stats/, /stats/container, /stats/{podName}/{containerName},
and /stats/{namespace}/{podName}/{uid}/{containerName}
The CoreDNS GA feature-gate in kubeadm was deprecated since 1.13.
The k8s policy is to remove the gate 2 releases after it transitions
to GA:
https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecation
We kept it around for longer to prevent existing setups from breaking
as it caused minimal maintenance overhead.
This creates a new EndpointSliceProxying feature gate to cover EndpointSlice
consumption (kube-proxy) and allow the existing EndpointSlice feature gate to
focus on EndpointSlice production only. Along with that addition, this enables
the EndpointSlice feature gate by default, now only affecting the controller.
The rationale here is that it's really difficult to guarantee all EndpointSlices
are created in a cluster upgrade process before kube-proxy attempts to consume
them. Although masters are generally upgraded before nodes, and in most cases,
the controller would have enough time to create EndpointSlices before a new node
with kube-proxy spun up, there are plenty of edge cases where that might not be
the case. The primary limitation on EndpointSlice creation is the API rate limit
of 20QPS. In clusters with a lot of endpoints and/or with a lot of other API
requests, it could be difficult to create all the EndpointSlices before a new
node with kube-proxy targeting EndpointSlices spun up.
Separating this into 2 feature gates allows for a more gradual rollout with the
EndpointSlice controller being enabled by default in 1.18, and EndpointSlices
for kube-proxy being enabled by default in the next release.
The warning message
```
[config] WARNING: Ignored YAML document with GroupVersionKind ...
```
is printed for all GVKs that are not part of the kubeadm core types.
This is wrong as the component config types are supported and successfully
parsed and used despite the fact that the warning is printed for them too.
Hence this simple fix first checks if the group of the GVK is a supported
component config group and the warning is printed only if it's not.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
kubeadm deploys the apiserver, controller-manager and the scheduler
using liveness probes.
The bind-address option is used to configure the probe address, in
case this is configured with an unspecified address, the probe
will fail. When using an unspecified address the probe host field is
left empty, otherwise the bind-address is used.
The function validateKubeConfig() can end up comparing
a user generated kubeconfig to a kubeconfig generated by kubeadm.
If a user kubeconfig has a CA that is base64 encoded with whitespace,
if said kubeconfig is loaded using clientcmd.LoadFromFile()
the CertificateAuthorityData bytes will be decoded from base64
and placed in the v1.Config raw. On the other hand a kubeconfig
generated by kubeadm will have the ca.crt parsed to a Certificate
object with whitespace ignored in the PEM input.
Make sure that validateKubeConfig() tolerates whitespace differences
when comparing CertificateAuthorityData.
kubeadm removed the deprecated "--address" flag for controller-manager
and scheduler in favor of "--bind-address"
We should use bind-address to configure the manifest probe addresses.
If the user has modified the kubelet.conf post TLS bootstrap
to become invalid, the function getNodeNameFromKubeletConfig() can
panic. This was observed to trigger in "kubeadm reset" use cases.
Add basic validation and unit tests around parsing the kubelet.conf
with the aforementioned function.
Currently if the controlplane fails to init, we print out a message
with some example commands that only show docker CLI.
This tries to improve that by printing the example commands for
docker, cri-o and containerd by checking the socket looking for
the default docker socket.
switch api helper functions to v1 CRD api
switch v1 CRD for apiserver internal
switch to v1 CRD for internal controllers
api storage/validation related changes
move local-defaulting utils private to prevent spreading
boilerplate
keep the subresource status/scale spec nil unless it's enabled
clean up empty space
CreateOrMutateConfigMap was not resilient when it was trying to Create
the ConfigMap. If this operation returned an unknown error the whole
operation would fail, because it was strict in what error it was
expecting right afterwards: if the error returned by the Create call
was a IsAlreadyExists error, it would work fine. However, if an
unexpected error (such as an EOF) happened, this call would fail.
We are seeing this error specially when running control plane node
joins in an automated fashion, where things happen at a relatively
high speed pace.
It was specially easy to reproduce with kind, with several control
plane instances. E.g.:
```
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I1130 11:43:42.788952 887 round_trippers.go:443] POST https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s in 1013 milliseconds
Post https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s: unexpected EOF
unable to create ConfigMap
k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient.CreateOrMutateConfigMap
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient/idempotency.go:65
```
This change makes this logic more resilient to unknown errors. It will
retry on the light of unknown errors until some of the expected error
happens: either `IsAlreadyExists`, in which case we will mutate the
ConfigMap, or no error, in which case the ConfigMap has been created.
kubeadm always use the IPv4 localhost address by defaultA for etcd
The probe hostname is obtained before the generation of the etcd
parameters, so it can't detect the right IP familiy for the
host of the probe.
This causes that with IPv6 clusters doesn't work because the probe
uses the IPv4 localhost address.
This patchs configures the right localhost address based on the used
AdvertiseAddress IP family.
- Use `schema.TypeMeta` instead of custom `struct` for VK
- More strict check on GVK after `Interpret` in `SplitYAMLDocuments`
- Adjust `Interpret` comment to include JSON
- Add retrieveValidatedConfigInfo to be able to better unit
test the function.
- Break some of the logic in RetrieveValidatedConfigInfo into
helper functions.
- Pass JoinConfiguration.Discovery to RetrieveValidatedConfigInfo
instead of JoinConfiguration.
- Use the discovery timeout per API call to fetch cluster-info
(optionally the user value can be slit in 2).
- Add detailed unit tests for retrieveValidatedConfigInfo.
kubeadm's current implementation of component config support is "kind" centric.
This has its downsides. Namely:
- Kind names and numbers can change between config versions.
Newer kinds can be ignored. Therefore, detection of a version change is
considerably harder.
- A component config can have only one kind that is managed by kubeadm.
Thus a more appropriate way to identify component configs is required.
Probably the best solution identified so far is a config group.
A group name is unlikely to change between versions, while the kind names and
structure can.
Tracking component configs by group name allows us to:
- Spot more easily config version changes and manage alternate versions.
- Support more than one kind in a config group/version.
- Abstract component configs by hiding their exact structure.
Hence, this change rips off the old kind based support for component configs
and replaces it with a group name based one. This also has the following
extra benefits:
- More tests were added.
- kubeadm now errors out if an unsupported version of a known component group
is used.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
- Add a new preflight check for upgrade that runs the pause container
with -v in a Job.
- Wait for the Job to complete and return an error after N seconds.
- Manually clean the Job because we don't have the TTL controller
enabled in kubeadm yet (it's still alpha).
Currently this uses the combined kubelet output (stdout + stderr), but this
causes parsing issues if the kubelet logs something on stderr.
Thus we ignore the entire stderr and use stdout only.
We do disable a couple of tests here. That is because the fakeexecer only
supports combined output and return a "not supported" error if `.Output()`
gets invoked thus permanently failing those.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
This patch removes pkg/util/mount completely, and replaces it with the
mount package now located at k8s.io/utils/mount. The code found at
k8s.io/utils/mount was moved there from pkg/util/mount, so the code is
identical, just no longer in-tree to k/k.
update tests
add comment
amend var name
update comment
add check for empty slice
fix tests
fix mask size in test
review feedback
add ipv4 and ipv6 flag for mask sizes
add to violation exception list
remove import alias
run update-openapi-spec
review feedback
run update-bazel
review feedback
review feedback
- Don't always print to stdout that the kubelet is starting.
instead delegate this to the callers of TryStartKubelet.
- Add a new root kubeadm init phase called "kubelet-finalize"
- Add a sub-phase to "kubelet-finalize"
called "experimental-cert-rotation"
- "cert-rotation" performs the following actions:
- tries to guess if kubelet client cert rotation is enabled
- update the kubelet.conf to use the rotatable cert/key
Changes scheduler.New so that algorithm source is moved from the
parameter to an option. The default algorithm source is source with the
DefaultProvider.
The PR introducing 5bb8069 got merged accidentally (the CI robot not
respecting a hold). Hence, the feedback to that PR is merged separately.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
As the hyperkube image is itself deprecated and moved out of tree, its use with
kubeadm gets deprecated too. Hence, deprecation messages will be printed when
it is used.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
ToClientSet() in kubeconfig.go creates a clientset from
the passed Config object (kubeconfig). For IP addresses
that are not reachable e.g. Get() calls for ConfigMaps
can block for a few minutes with the default timeout.
Modify the timeout to a shorter value by passing an override.
There are two writes yet only one read on a non-buffered channel that is
created locally and not passed anywhere else.
Therefore, it could leak one of its two spawned Goroutines if either:
* The provided `f` takes longer than an erroneous result from
`waiter.WaitForHealthyKubelet`, or;
* The provided `f` completes before an erroneous result from
`waiter.WaitForHealthyKubelet`.
The fix is to add a one-element buffer so that the channel write happens
for the second Goroutine in these cases, allowing it to finish and freeing
references to the now-buffered channel, letting it to be GC'd.
3993c42431 introduced the propagation of *_PROXY
host env. variables to the kube-proxy container.
To allow The NODE_NAME variable to be properly updated by the downward
API make, sure we preserve the existing variables when adding *_PROXY.
This change removes dependencies on the internal types of the kubelet and
kube-proxy component configs. Along with that defaulting and validation is
removed as well. kubeadm will display a warning, that it did not verify the
component config upon load.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
As part of graduating the scheduler's component config to beta, we require configurable fields to be nilable pointers (see https://github.com/kubernetes/kubernetes/issues/78109). This enables the ability to distinguish between default and unset values. We are only applying this change to external types, and reacting in our defaulting logic. This also reverts existing internal component config fields which were pointers to be non-pointers, for consistency.
This implements a lenient path for decoding a kube-scheduler config file.
The config file gets decoded with a strict serializer first, if that fails a lenient
CodecFactory that has just v1alpha1 registered into it is used for decoding. The lenient
path is to be dropped when support for v1alpha1 is dropped.
For more information on the discussion see #82924 and the linked PRs.
Removed unneeded comments
Matched style from other PR's
Only print error when lenient decoding is successful
Update Bazel for BUILD
Comment out existing strict decoder tests
Added tests for leniant path
Added comments to explain test additions
Cleanup TODO's and tests
Add explicit newline for appended config
Checking if the path exists before creating the volume is
problematic because the path will be created regardless
after the initial call to "kubeadm init" and once the CM Pod
is running.
Then on subsequent calls to "kubeadm init" or the "control-plane"
phase the manifest for the CM will be different.
Always mount this path, but also consider the user provided
flag override from ClusterConfiguration.
Used strings instead of bytes in the TestTokenOutput test cases as
expected output is a plain text.
This should also simplify the data representation and the test code
a bit.
The flag has been problematic and abused by users.
While perhaps its original purpose was to be able to feed
a new version of the control-plane it also made it possible
to apply modifications to the ClusterConfiguration object
in the cluster. The lack of a feature in kubeadm for reconfiguration
of running clusters resulted in users using this flag for
the same purpose.
While it works for certain scenarios like updating
a static Pod for this control-plane only, it can result in
unexpected behavior if the user has for example fed a node name
different than the host name, when originally they created this node.
kubeadm 1.16 introduced the "kustomize" feature that
is a potential replacement for this user demand.
Add warning that this flag should not be used.
This field is not used in the kubeadm code. It was brought from
cli-runtime where it's used to support complex relationship between
command line parameters, which is not present in kubeadm.
Kube-proxy runs two different health servers; one for monitoring the
health of kube-proxy itself, and one for monitoring the health of
specific services. Rename them to "ProxierHealthServer" and
"ServiceHealthServer" to make this clearer, and do a bit of API
cleanup too.
This integration test allows us to detect if a given feature gate will
panic kubeadm. This builds on the assumption that a golang panic makes
the process exit with the code 2.
These tests are not trying to check if the init process succeeds or
not, their only purpose is to ensure that the exit code of the
`kubeadm init` invocation is not 2, thus, reflecting a golang panic.
Some refactors had to be made to the test code, so we return the exit
code along with stdout and stderr.
If the metrics ain't created, the values will not be registered, and the
metrics will not be visible in the metric endpoint.
Therefore move init of dynamic kubelet config below the startup of the
kubelet server (and the init of metrics).
Errors from staticcheck:
cmd/kube-scheduler/app/server.go:297:27: prometheus.Handler is deprecated: Please note the issues described in the doc comment of InstrumentHandler. You might want to consider using promhttp.Handler instead. (SA1019)
pkg/apis/scheduling/v1alpha1/defaults.go:27:6: func addDefaultingFuncs is unused (U1000)
pkg/apis/scheduling/v1beta1/defaults.go:27:6: func addDefaultingFuncs is unused (U1000)
test/e2e/scheduling/predicates.go:757:6: func verifyReplicasResult is unused (U1000)
test/e2e/scheduling/predicates.go:765:6: func getPodsByLabels is unused (U1000)
test/e2e/scheduling/predicates.go:772:6: func runAndKeepPodWithLabelAndGetNodeName is unused (U1000)
test/e2e/scheduling/limit_range.go:172:3: this value of pod is never used (SA4006)
test/e2e/scheduling/limit_range.go:177:3: this value of pod is never used (SA4006)
test/e2e/scheduling/limit_range.go:196:3: this value of pod is never used (SA4006)
test/e2e/scheduling/limit_range.go:201:3: this value of pod is never used (SA4006)
test/e2e/scheduling/limit_range.go:240:3: this value of pod is never used (SA4006)
test/e2e/scheduling/taints.go:428:13: this value of err is never used (SA4006)
test/e2e/scheduling/ubernetes_lite.go:219:2: this value of pods is never used (SA4006)
test/integration/scheduler/extender_test.go:78:4: this value of resp is never used (SA4006)
test/integration/volumescheduling/volume_binding_test.go:529:15: this result of append is never used, except maybe in other appends (SA4010)
test/integration/volumescheduling/volume_binding_test.go:538:15: this result of append is never used, except maybe in other appends (SA4010)
of service subnets.
Update DNS, Cert, dry-run logic to support list of Service CIDRs.
Added unit tests for GetKubernetesServiceCIDR and updated
GetDNSIP() unit test to inclue dual-sack cases.
The firewalld monitoring code was not well tested (and not easily
testable), would never be triggered on most platforms, and was only
being taken advantage of from one place (kube-proxy), which didn't
need it anyway since it already has its own resync loop.
Since the firewalld monitoring was the only consumer of pkg/util/dbus,
we can also now delete that.
Modify the warning log of kube-proxy when we run kube-proxy server
with --proxy-mode, but in the config file, we omit it. Then it logs
like ""{"log":"W0905 09:14:40.321571 1 server_others.go:249]
Flag proxy-mode=\"\" unknown, assuming iptables proxy\n","stream":"stderr",
"time":"2019-09-05T09:14:40.321858964Z"} This may lead to confusion. I
think it should me modefied.
Whenever kubeadm needs to fetch its configuration from the cluster, it gets
the component configuration of all supported components (currently only kubelet
and kube-proxy). However, kube-proxy is deemed an optional component and its
installation may be skipped (by skipping the addon/kube-proxy phase on init).
When kube-proxy's installation is skipped, its config map is not created and
all kubeadm operations, that fetch the config from the cluster, are bound to
fail with "not found" or "forbidden" (because of missing RBAC rules) errors.
To fix this issue, we have to ignore the 403 and 404 errors, returned on an
attempt to fetch kube-proxy's component config from the cluster.
The `GetFromKubeProxyConfigMap` function now supports returning nil for both
error and object to indicate just such a case.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
This should fix a bug that could break masters when the EndpointSlice
feature gate was enabled. This was all tied to how the apiserver creates
and manages it's own services and endpoints (or in this case endpoint
slices). Consumers of endpoint slices also need to know about the
corresponding service. Previously we were trying to set an owner
reference here for this purpose, but that came with potential downsides
and increased complexity. This commit changes behavior of the apiserver
endpointslice integration to set the service name label instead of owner
references, and simplifies consumer logic to reference that (both are
set by the EndpointSlice controller).
Additionally, this should fix a bug with the EndpointSlice GenerateName
value that had previously been set with a "." as a suffix.
If konnectivity service is enabled, the etcd client will now use it.
This did require moving a few methods to break circular dependencies.
Factored in feedback from lavalamp and wenjiaswe.
This patch moves the HostUtil functionality from the util/mount package
to the volume/util/hostutil package.
All `*NewHostUtil*` calls are changed to return concrete types instead
of interfaces.
All callers are changed to use the `*NewHostUtil*` methods instead of
directly instantiating the concrete types.
go fmt
make func private
refactor config_test
Two primary refactorings:
1. config test checkPath method is now each a distinct test
run (which makes it easier to see what is actually failing)
2. TestNewWithDelegate's root path check now parses the json output and
does a comparison against a list of expected paths (no more whitespace
and ordering issues when updating this test, yay).
go fmt
modify and simplify existing integration test for readyz/livez
simplify integration test
set default rbac policy rules for livez
rename a few functions and the entrypoint command line argument (and etcetera)
simplify interface for installing readyz and livez and make auto-register completion a bootstrapped check
untangle some of the nested functions, restructure the code
The new etcd balancer (>3.3.14, 3.4.0) uses an asynchronous resolver for
endpoints. Without "WithBlock", the client may return before the
connection is up.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
A recent commit added warnings for KubeletConfiguration and
KubeProxyConfiguration fields that kubeadm cares about and
does not recommend the user modifying them. Kubelet's
"rotateCertificates" cannot be handled using this function
as there is not way to figure out if the user has set it explicitly to
"false". Hardcode the value to "true" and add a comment about that.
Also apply the following changes to warnDefaultComponentConfigValue()
calls:
- use a local "kind" variable that defines the Kind we are warning about.
- fix wrong paths to fields.
- replace all stray calls of os.Exit() to util.CheckError() instead
- CheckError() now checks if the klog verbosity level is >=5
and shows a stack trace of the error
- don't call klog.Fatal in version.go
It seems undesirable that Kubernetes as a system should be
blocking a node if it's Linux kernel is way too new.
If such a problem even occurs we should exclude versions from
the list of supported versions instead of blocking users
from trying e.g. the latest 7.0.0-beta kernel because our
validators are not aware of this new version.
Usage of github.com/blang/semver is not needed and
k8s.io/apimachinery/pkg/util/version should be used instead
for semantic version parsing and version comparison.
Etcd v3.3.0 added the --listen-metrics-urls flag which allows specifying
addition URLs to the already present /health and /metrics endpoints.
While /health and /metrics are enabled for URLS defined with
--listen-client-urls (v3+ ?) they do require HTTPS.
Replace the present etcdctl based liveness probe with a standard HTTP
GET v1.Probe that connects to http://127.0.0.1:2381/health.
These endpoints are not reachable from the outside and only available
for localhost connections.
Got the proxy-server coming up in the master.
Added certs and have it comiung up with those certs.
Added a daemonset to run the network-agent.
Adding support for agent running as a sameon set on every node.
Added quick hack to test that proxy server/agent were correctly
tunneling traffic to the kubelet.
Added more WIP for reading network proxy configuration.
Get flags set correctly and fix connection services.
Adding missing ApplyTo
Added ConnectivityService.
Fixed build directives. Added connectivity service configuration.
Fixed log levels.
Fixed minor issues for feature turned off.
Fixed boilerplate and format.
Moved log dialer initialization earlier as per Liggits suggestion.
Fixed a few minor issues in the configuration for GCE.
Fixed scheme allocation
Adding unit test.
Added test for direct connectivity service.
Switching to injecting the Lookup method rather than using a Singleton.
First round of mikedaneses feedback.
Fixed deployment to use yaml and other changes suggested by MikeDanese.
Switched network proxy server/agent which are kebab-case not camelCase.
Picked up DIAL_RSP fix.
Factored in deads2k feedback.
Feedback from mikedanese
Factored in second round of feedback from David.
Fix path in verify.
Factored in anfernee's feedback.
First part of lavalamps feedback.
Factored in more changes from lavalamp and mikedanese.
Renamed network-proxy to konnectivity-server and konnectivity-agent.
Fixed tolerations and config file checking.
Added missing strptr
Finished lavalamps requested rename.
Disambiguating konnectivity service by renaming it egress selector.
Switched feature flag to KUBE_ENABLE_EGRESS_VIA_KONNECTIVITY_SERVICE
For file discovery, in case the user feeds a file for the CA
from the kubeconfig, make sure it's preloaded and embedded using
the new function EnsureCertificateAuthorityIsEmbedded().
This commit also applies cleanup:
- unroll validateKubeConfig() into ValidateConfigInfo() as this way
the default cluster can be re-used.
- in ValidateConfigInfo() reuse the variable config instead of creating
a new variable kubeconfig.
- make the Ensure* functions return descriptive errors instead of
wrapping the errors on the side of the callers.
Secure serving was already enabled for kube-controller-manager.
Do the same for kube-scheduler, by passing the flags
"authentication-kubeconfig" and "authorization-kubeconfig"
to the binary in the static Pod.
This change allows the scheduler to perform reviews on incoming
requests, such as:
- authentication.k8s.io/v1beta1 TokenReview
- authorization.k8s.io/v1 SubjectAccessReview
The authentication and authorization checks for "system:kube-scheduler"
users were previously enabled by PR 72491.
On local networks (such as the typical connection path between
control plane components) gzip compression increases CPU use and
end to end p99 latency rather than decreasing it. Disable compression
within the control plane components like a 1.15 cluster would be
configured.
Kube-proxy's iptables mode used to care whether utiliptables's
EnsureRule was able to use "iptables -C" or if it had to implement it
hackily using "iptables-save". But that became irrelevant when
kube-proxy was reimplemented using "iptables-restore", and no one ever
noticed. So remove that check.
- common_test.go: use constants.CurrentKubernetesVersion
- diff_test.go: write temporary files instead of using testdata.
this allows us to not have to bump kubernetesVersions in the
testdata files (now removed)
- policy_test.go: apply fixes to tests that were previously passing,
but a bump in constants.go breaks them. these tests now work
for any version.
ipvs `getProxyMode` test fails on mac as `utilipvs.GetRequiredIPVSMods`
try to reach `/proc/sys/kernel/osrelease` to find version of the running
linux kernel. Linux kernel version is used to determine the list of required
kernel modules for ipvs.
Logic to determine kernel version is moved to GetKernelVersion
method in LinuxKernelHandler which implements ipvs.KernelHandler.
Mock KernelHandler is used in the test cases.
Read and parse file is converted to go function instead of execing cut.
progagates the kubeadm FG to the individual k8scomponents
on the control-plane node.
* Note: Users who want to join worker nodes to the cluster
will have to specify the dual-stack FG to kubelet using the
nodeRegistration.kubeletExtraArgs option as part of their
join config. Alternatively, they can use KUBELET_EXTRA_ARGS.
kubeadm FG: kubernetes/kubeadm#1612
If empty "--kubernetes-version" is given (as it's not configurable now)
k8s.io/kubernetes/cmd/kubeadm/app/util/version.go.KubernetesReleaseVersion
will fetch the version from the internet.
But, this can fail:
% kubeadm init phase certs ca --cert-dir ...
unable to fetch file. URL: "https://dl.k8s.io/release/stable-1.txt", status: 502 Bad Gateway
failed to run commands: exit status 1
Can happen to other commands:
% kubeadm init phase kubeconfig controller-manager ...
% kubeadm init phase kubeconfig scheduler ...
This make "--kubernetes-version" configurable, so users can enable offline mode.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
All controllers in controller-manager that deal with objects generically
work with those objects without needing the full object. Update the GC
and quota controller to use PartialObjectMetadata input objects which
is faster and more efficient.
The metadata client uses protobuf and returns only a subset of object
data (the metadata) which allows operations that act only on objects
generically to work much faster. Use the metadata client in the
namespace controller to reduce the amount of work the namespace controller
has to do in large namespaces.
When adding a new etcd member the etcd cluster can enter a state
of vote, where any new members added at the exact same time will
fail with an error right away.
Implement exponential backoff retry around the MemberAdd call.
This solves a kubeadm problem when concurrently joining
control-plane nodes with stacked etcd members.
From experiment, a few retries with milliseconds apart are
sufficient to achieve the concurrent join of a 3xCP cluster.
Apply the same backoff to MemberRemove in case the concurrent
removal of members fails for similar reasons.
Nit: remove capitalization of preferred
Remove line from kubelet and add to separate PR for easier merge
nit: dependency added to separate PR
Add check to ensure strict policy cannot be set without feature gate enabled
Topology Manager runs "none" policy by default.
Added constants for policies and updated documentation.
libcni 0.7.0 caches ADD operation results and allows the runtime to
retrieve these from the cache. In case the user wants a different
cache directory than the defaul, plumb that through like we do
for --cni-bin-dir and --cni-conf-dir.
If the cluster has a PSP that blocks Pods from running as root
the DS that handles upgrade prepull will fail to create its Pods.
Workaround that by adding a PodSecurityContext with RunAsUser=999.
Instead of creating a Docker client and fetching an Info object
from the docker enpoint, call the "docker info" command
and populate a local dockerInfo struct from JSON output.
Also
- add unit tests.
- update import boss and bazel.
This change affects "test/e2e_node/e2e_node_suite_test.go"
as it consumes this Docker validator by calling
"system.ValidateSpec()".
Since we never use the cobras "SilenceErrors" or "SilenceUsage",
a command executed with "cmd.Execute()" will never return an error
without printing it.
The current behavior results in all error messages being printed twice:
Example:
$ kubectl abc
Error: unknown command "abc" for "kubectl"
Run 'kubectl --help' for usage.
unknown command "abc" for "kubectl"
This applies to all cli commands using Cobra. To verify, follow the code
path of the Execute function:
https://github.com/spf13/cobra/blob/c439c4fa0937/command.go#L793
Signed-off-by: Odin Ugedal <odin@ugedal.com>
- comment out Liz and Chuck until further notice.
Feel free to come back to kubeadm!!
- Add SataQiu as reviewer. Welcome.
- Add ereslibre as approver. Congrats!
MarshalClusterConfigurationToBytes has capabilities to output the component
configs, as separate YAML documents, besides the kubeadm ClusterConfiguration
kind. This is no longer necessary for the following reasons:
- All current use cases of this function require only the ClusterConfiguration.
- It will output component configs only if they are not the default ones. This
can produce undeterministic output and, thus, cause potential problems.
- There are only hacky ways to dump the ClusterConfiguration only (without the
component configs).
Hence, we simplify things by replacing the function with direct calls to the
underlaying MarshalToYamlForCodecs. Thus marshalling only ClusterConfiguration,
when needed.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Stop using //pkg/util/normalizer. Use local versions of LongDesc and Examples,
that do not require any external dependencies (other than the Go standard
library).
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
one test also proved it did not call the internet
but this was not fool proof as it did not return a string
and thus could be called with something expecting to fail.
updated the network calls to be package local so tests could pass their
own implementation. A public interface was not provided as it would not
be likely this would ever be needed or wanted.
When a kubeconfig file is read from disk it may lack the
propper mapping between contexts and clusters.
In such a case the kubeconfig phase backend will panic,
without throwing a sensible error.
Add nil checks for a couple of map operations in
validateKubeConfig().
add startup sequence duration and readyz endpoint
add rbac bootstrapping policy for readyz
add integration test around grace period and readyz
rename startup sequence duration flag
copy health checks to fields
rename health-check installed boolean, refactor clock injection logic
cleanup clock injection code
remove todo about poststarthook url registration from healthz
- Added scripts for update and verify
- golang AST code for scanning and fixing imports
- default regex allows it to run on just test/e2e.* file paths
- exclude verify-import-aliases.sh from running in CI jobs
Change-Id: I7f9c76f5525fb9a26ea2be60ea69356362957998
Co-Authored-By: Aaron Crickenberger <spiffxp@google.com>
This patch refactors pkg/util/mount to be more usable outside of
Kubernetes. This is done by refactoring mount.Interface to only contain
methods that are not K8s specific. Methods that are not relevant to
basic mount activities but still have OS-specific implementations are
now found in a mount.HostUtils interface.
This helper is used in tests and pulls in unnecessary dependency, which should
not be used if kubeadm is to move to staging.
Replace with direct use of the GroupResource type.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
ValidateDNS1123Subdomain is a simple wrapper around IsDNS1123Subdomain, however
it's the only reason for us to pull k8s.io/kubernetes/pkg/apis/core/validation
as a dependency.
To avoid unnecessary dependencies, replace the use of ValidateDNS1123Subdomain
with IsDNS1123Subdomain.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
RBAC construction helpers are part of the Kubernetes internal APIs. As such,
we cannot use them once we move to staging.
Hence, replace their use with manual RBAC rule construction.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
During the control plane joins, sometimes the control plane returns an
expected error when trying to download the `kubeadm-config` ConfigMap.
This is a workaround for this issue until the root cause is completely
identified and fixed.
Ideally, this commit should be reverted in the near future.
kube-controller-manager.
If a service CIDR that overlaps with the cluster CIDR is
specified to kube-controller-manager then kube-controller-
manager will incorrectly allocate node CIDRs that overlap
with the service CIDR. The fix ensure that kubeadm
maps the --service-cidr to --service-cluster-ip-range for use
by kube-controller-manager.
As per docs, --allocate-node-cidrs must be true for
--service-cluster-ip-range to be considered. It does not make
sense for --cluster-cidr to be unspecified but for
--service-cluster-ip-range and --allocate-node-cidrs to be
set, since the purpose of these options is to have the
controller-manager do the per node CIDR allocation. Also
note that --service-cluster-ip-range is passed to the
api-server, so the presence of *just*
--service-cluster-ip-range should not imply that
--allocate-node-cidrs should be true.
Resolves: kubernetes/kubeadm/issues/1591
This fixes possible problems when kubeadm upgrade can't load the
InitConfig properly. Some new code introduced in
https://github.com/kubernetes/kubernetes/pull/75499 is placed between
the loading of the config and the error handling, hiding possible
errors.
This error cannot be ignored (as is the case now), since the cfg ptr.
returned from the configutil function will be nil in the case of an
error.
Signed-off-by: Odin Ugedal <odin@ugedal.com>
The existing logic already creates a proper "tree"
where a CA is always generated before the certs that are signed
by this CA, however the tree is not deterministic.
Always use the default list of certs when generating the
"kubeadm init phase certs" phases. Add a unit test that
makes sure that CA always precede signed certs in the default
lists.
This solves the problem where the help screen for "kubeadm
init" cert sub-phases can have a random order.
TestWatchBasedManager was racing with the default namespace creation.
To fix that flake and to ensure integration tests using a shared etcd
don't accidentally overlap in the future, move the three main tests
using the default namespace to separate namespaces, and have
TestWatchBasedManager create that namespace before it runs.
Make StartTestServer wait for default namespace creation, which will
reduce other flakes until future changes completely remove use of default
namespace.
From a failed integration run:
watch_manager_test.go:66: namespaces "default" not found
watch_manager_test.go:66: namespaces "default" not found
watch_manager_test.go:66: namespaces "default" not found
Similar to --token, do not allow the mixture of --config and
--certificate-key.
If the user has fed a config, it is expected that the certificate
key should also be provided in the config and not from
the command line.
Ever since v1alpha3, InitConfiguration is containing ClusterConfiguration
embedded in it. This was done to mimic the internal InitConfiguration, which in
turn is used throughout the kubeadm code base as if it is the old
MasterConfiguration of v1alpha2.
This, however, is confusing to users who vendor in kubeadm as the embedded
ClusterConfiguration inside InitConfiguration is not marshalled to YAML.
For this to happen, special care must be taken for the ClusterConfiguration
field to marshalled separately.
Thus, to make things smooth for users and to reduce third party exposure to
technical debt, this change removes ClusterConfiguration embedding from
InitConfiguration.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
This commit contains only changes generated by the build process.
Nothing here was manually changed.
Changes made to:
```
cmd/kubeadm/app/apis/kubeadm/validation/BUILD
cmd/kubeadm/app/cmd/BUILD
```
were generated by running:
````
./hack/update-bazel.sh
```
update bazel build
fix get plugin config method
initialize only needed plugins
fix unit test
fix import duplicate package
update bazel
add docstrings
add weight field to plugin
add plugin to v1alpha1
add plugins at appropriate extension points
remove todo statement
fix import package file path
set plugin json schema
add plugin unit test to option
initial plugin in test integration
initialize only needed plugins
update bazel
rename func
change plugins needed logic
remove v1 alias
change the comment
fix alias shorter
remove blank line
change docstrings
fix map bool to struct
add some docstrings
add unreserve plugin
fix docstrings
move variable inside the for loop
make if else statement cleaner
remove plugin config from reserve plugin unit test
add plugin config and reduce unnecessary options for unit test
update bazel
fix race condition
fix permit plugin integration
change plugins to be pointer
change weight to int32
fix package alias
initial queue sort plugin
rename unreserve plugin
redesign plugin struct
update docstrings
check queue sort plugin amount
fix error message
fix condition
change plugin struct
add disabled plugin for unit test
fix docstrings
handle nil plugin set
currently sub phases cannot have custom arg validation and container commands can have args.
This removes phase container commands from taking args and enables custom args on the leaf phases
These are based on recommendation from
[staticcheck](http://staticcheck.io/).
- Removes dead type/function along with the import that the function
introduced.
- Removes unused struct fields.
- Removes select nested in a tight for loop, the select does not have a
default, so it will be blocking.
* fix duplicated imports of api/core/v1
* fix duplicated imports of client-go/kubernetes
* fix duplicated imports of rest code
* change import name to more reasonable
During the upgrade process, `kubeadm` will take the current
`ClusterConfiguration`, update the `KubernetesVersion` to the latest
version, and call to `UploadConfiguration`.
This change makes sure that when the mutation happens, not only the
`ClusterStatus` is mutated, but the `ClusterConfiguration` object
inside the `kubeadm-config` ConfigMap as well; it will contain the
new `KubernetesVersion`.
There are a couple of problems with regards to the `omitempty` in v1beta1:
- It is not applied to certain fields. This makes emitting YAML configuration
files in v1beta1 config format verbose by both kubeadm and third party Go
lang tools. Certain fields, that were never given an explicit value would
show up in the marshalled YAML document. This can cause confusion and even
misconfiguration.
- It can be used in inappropriate places. In this case it's used for fields,
that need to be always serialized. The only one such field at the moment is
`NodeRegistrationOptions.Taints`. If the `Taints` field is nil, then it's
defaulted to a slice containing a single control plane node taint. If it's
an empty slice, no taints are applied, thus, the cluster behaves differently.
With that in mind, a Go program, that uses v1beta1 with `omitempty` on the
`Taints` field has no way to specify an explicit empty slice of taints, as
this would get lost after marshalling to YAML.
To fix these issues the following is done in this change:
- A whole bunch of additional omitemptys are placed at many fields in v1beta2.
- `omitempty` is removed from `NodeRegistrationOptions.Taints`
- A test, that verifies the ability to specify empty slice value for `Taints`
is included.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
This change introduces config fields to the v1beta2 format, that allow
certificate key to be specified in the config file. This certificate key is a
hex encoded AES key, that is used to encrypt certificates and keys, needed for
secondary control plane nodes to join. The same key is used for the decryption
during control plane join.
It is important to note, that this key is never uploaded to the cluster. It can
only be specified on either command line or the config file.
The new fields can be used like so:
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
certificateKey: "yourSecretHere"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
controlPlane:
certificateKey: "yourSecretHere"
---
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Even though CreateServiceAccountKeyAndPublicKeyFiles() function is
an interface function it's not unittested. Instead it wraps a couple
of internal functions which are used only inside CreateServiceAccountKeyAndPublicKeyFiles()
and those internal functions are tested.
Rewrite the function to do only what it's intended to do and add unit
tests for it.
These are based on recommendation from
[staticcheck](http://staticcheck.io/).
- Remove unused struct fields
- Remove unused function
- Remove unused variables
- Remove unused constants.
- Miscellaneous cleanups