Once we patch a kubelet configuration file, the patched output
is in JSON. Make sure it's converted back to YAML, given
the kubelet config in the cluster and on disk is always in YAML.
Add unit test for the new function applyKubeletConfigPatches()
In phases/kubelet/WriteConfigToDisk() create a patch
manager for the root patches directory and apply
the user patches with a target "kubeletconfiguration".
With phases/kubelet/WriteConfigToDisk() about to support patches
it is required that the function accepts an io.Writer
where the PatchManager can output to and also a patch directory.
Modify all call sites of the function WriteConfigToDisk()
to properly prepare an pass an io.Writer and patches dir to it.
This results in command phases for init/join/upgrade to pass
the root io.Writer (usually stdout) and the patchesDir populated
either via the config file or --patches flag.
If the user runs "kubeadm upgrade apply", kubeadm can download
a configuration from the cluster. If the configuration contains
the legacy default imageRepository of "k8s.gcr.io", mutate it
to the new default of "registry.k8s.io" and update the
configuration in the config map.
During "upgrade node/diff" download the configuration, mutate the
image repository locally, but do not mutate the in-cluster value.
That is done only on "apply".
This ensures that users are migrated from the old default registry
domain.
- lock the FG to true by default
- cleanup wrappers and logic related to versioned vs unversioned
naming of API objects (CMs and RBAC)
- update unit tests
The OldControlPlaneTaint taint (master) can be replaced
with the new ControlPlaneTaint (control-plane) taint.
Adapt unit tests in markcontrolplane_test.go
and cluster_test.go.
- iniconfiguration.go: stop applying the "master" taint
for new clusters; update related unit tests in _test.go
- apply.go: Remove logic related to cleanup of the "master" label
during upgrade
- apply.go: Add cleanup of the "master" taint on CP nodes
during upgrade
- controlplane_nodes_test.go: remove test for old "master" taint
on nodes (this needs backport to 1.24, because we have a kubeadm
1.25 vs kubernetes test suite 1.24 e2e test)
Use the etcd 3.5.3+ HTTP(s) endpoint "/health?serializable=true",
to allow the kubelet liveness and starup probes in the
kubeadm generated etcd.yaml (static Pod) to track
individual member health instead of tracking the whole
etcd cluster health.
Given kubeadm 1.25 only supports kubelet 1.25 and 1.24,
1.23 related logic around dockershim can be removed.
- Don't clean the directories
/var/lib/dockershim, /var/runkubernetes, /var/lib/cni
- Pass the CRISocket directly to the kubelet
--container-runtime-endpoint flag without extra handling
of dockershim
- No longer apply the --container-runtime=remote flag
as that is the only possible value in 1.24 and 1.25
- Update unit tests
Note: we are still passing --pod-infra-container-image
to avoid the pause image to be GCed by the kubelet.
During upgrade when a CP node is missing the old / legacy "master"
taint, assume the user has manually removed it to allow
workloads to schedule.
In such cases do not re-taint the node with the new "control-plane"
taint.
* Introduce networking/v1alpha1 api, ClusterCIDRConfig type
Introduce networking/v1alpha1 api group.
Add `ClusterCIDRConfig` type to networking/v1alpha1 api group, this type
will enable the NodeIPAM controller to support multiple ClusterCIDRs.
* Change ClusterCIDRConfig.NodeSelector type in api
* Fix review comments for API
* Update ClusterCIDRConfig API Spec
Introduce PerNodeHostBits field, remove PerNodeMaskSize
Over time the size of our junit xml has exploded to the point where
test-grid fails to process them. We still have the original/full
*.stdout files from where the junit xml files are generated from so the
junit xml files need NOT have the fill/exact output for
processing/display. So let us prune the large messages with an
indicator that we have "[... clipped...]" some of the content so folks
can see that they have to consult the full *.stdout files.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
InitLogs overrides the klog default and turns contextual logging off. This
ensures that it is only enabled in Kubernetes commands that explicitly enable
it via a feature gate. A feature gate for it gets defined in
k8s.io/component-base/logs and is then used by Options.ValidateAndApply.
The effect of disabling contextual logging is very limited according to
benchmarks with kube-scheduler. The feature gets added anyway to satisfy the
PRR recommendation that features should be controllable.
The following commands have support for contextual logging:
- kube-apiserver
- kube-controller-manager
- kubelet
- kube-scheduler
- component-base/logs example
Supporting a feature gate check in ValidateAndApply and not in InitLogs is a
simplification: changing InitLogs to accept a FeatureGate would have implied
changing also component-base/cli.Run. This didn't seem worthwhile because
ValidateAndApply already covers the relevant commands.
Include the flag "--experimental-initial-corrupt-check"
in etcd static pod manifests to ensure
etcd member data consistency.
The etcd feature is planned for graduation in 3.6,
at which point we should switch to using the flag
without the "experimental" prefix.
This commit adds the framework for the new local detection
modes BridgeInterface and InterfaceNamePrefix to work.
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Some of these changes are cosmetic (repeatedly calling klog.V instead of
reusing the result), others address real issues:
- Logging a message only above a certain verbosity threshold without
recording that verbosity level (if klog.V().Enabled() { klog.Info... }):
this matters when using a logging backend which records the verbosity
level.
- Passing a format string with parameters to a logging function that
doesn't do string formatting.
All of these locations where found by the enhanced logcheck tool from
https://github.com/kubernetes/klog/pull/297.
In some cases it reports false positives, but those can be suppressed with
source code comments.
We now re-use the crictl tool path within the `ContainerRuntime` when
exec'ing into it. This allows introducing a convenience function to
create the crictl command and re-use it where necessary.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
This change adds 2 options for windows:
--forward-healthcheck-vip: If true forward service VIP for health check
port
--root-hnsendpoint-name: The name of the hns endpoint name for root
namespace attached to l2bridge, default is cbr0
When --forward-healthcheck-vip is set as true and winkernel is used,
kube-proxy will add an hns load balancer to forward health check request
that was sent to lb_vip:healthcheck_port to the node_ip:healthcheck_port.
Without this forwarding, the health check from google load balancer will
fail, and it will stop forwarding traffic to the windows node.
This change fixes the following 2 cases for service:
- `externalTrafficPolicy: Cluster` (default option): healthcheck_port is
10256 for all services. Without this fix, all traffic won't be directly
forwarded to windows node. It will always go through a linux node and
get forwarded to windows from there.
- `externalTrafficPolicy: Local`: different healthcheck_port for each
service that is configured as local. Without this fix, this feature
won't work on windows node at all. This feature preserves client ip
that tries to connect to their application running in windows pod.
Change-Id: If4513e72900101ef70d86b91155e56a1f8c79719
* kube-proxy cluder-cidr arg accepts comma-separated list
It is possible in dual-stack clusters to provide kube-proxy with
a comma-separated list with an IPv4 and IPv6 CIDR for pods.
update: signoff
update2: update email profile
Signed-off-by: Tyler Lloyd <Tyler.Lloyd@microsoft.com>
Signed-off-by: Tyler Lloyd <tylerlloyd928@gmail.com>
* Updating cluster-cidr comment description
Signed-off-by: Tyler Lloyd <tyler.lloyd@microsoft.com>
For the YAML examples, make the indentation consistent
by starting with a space and following with a TAB.
Also adjust the indentation of some fields to place them under
the right YAML field parent - e.g. ignorePreflightErrors
is under nodeRegistration.
All the controllers should use context for signalling termination of communication with API server. Once kcm cancels context all the cert controllers which are started via kcm should cancel the APIServer request in flight instead of hanging around.
This commit includes all the changes needed for APIServer. Instead of modifying the existing signatures for the methods which either generate or return stopChannel, we generate a context from the channel and use the generated context to be passed to the controllers which are started in APIServer. This ensures we don't have to touch APIServer dependencies.
- Modify VerifyUnmarshalStrict to use serializer/json instead
of sigs.k8s.io/yaml. In strict mode, the serializers
in serializer/json use the new sigs.k8s.io/json library
that also catches case sensitive errors for field names -
e.g. foo vs Foo. Include test case for that in strict/testdata.
- Move the hardcoded schemes to check to the side of the
caller - i.e. accept a slice of runtime.Scheme.
- Move the klog warnings outside of VerifyUnmarshalStrict
and make them the responsibility of the caller.
- Call VerifyUnmarshalStrict when downloading the configuration
from kubeadm-config or the kube-proxy or kubelet-config CMs.
This validation is useful if the user has manually patched the CMs.
The apiserver owns and manages the kubernetes.default service.
It has 3 different options to reconcile the endpoints that belong to
that service:
- None: endpoints are handled by an external party.
- MasterCount: legacy, it reconciles based on the endpoints generated
and a flag specifying the number of master on the cluster.
- Lease: default since 1.11, each apiserver writes a lease in etcd
and renews periodically, the endpoints are generated based on the
existing leases.
It seems that when the default was set for the lease reconciler, the
controlplane code wasn't updated and kept using the master count
reconciler.
This also starts the deprecation of the master count reconciler in
favor of the lease reconciler.
The legacy naming "kubelet-config-x.yy" is no longer the
default behavior. Rename instances in documentation and comments
of "kubelet-config-x.yy" to "kubelet-config".
- Graduate the feature gate to Beta and enable it by default.
- Pre-set the default value for UnversionedKubeletConfigMap
to "true" in test/e2e_kubeadm.
- Fix a couple of typos in "tolerate" introduced in the PR that
added the FG in 1.23.
Compare with two pointers will always show that they are different value,
so it will always print the warning message.
Signed-off-by: Dave Chen <dave.chen@arm.com>
- During "upgrade apply" call a new function AddNewControlPlaneTaint()
that finds all nodes with the new "control-plane" node-role label
and adds the new "control-plane" taint to them.
- The function is called in "apply" and is separate from
the step to remove the old "master" label for better debugging
if errors occur.
- Apply "control-plane" taint during init/join by adding the
taint in SetNodeRegistrationDynamicDefaults(). The old
taint "master" is still applied.
- Clarify API docs (v1beta2 and v1beta3) for nodeRegistration.Taint
to not mention "master" taint and be more generic. Remove
example for taints that includes the word "master".
- Update unit tests.
- Update the markcontrolplane phase used by init and join to
only label the nodes with the new control plane label.
- Cleanup TODOs about the old label.
- Remove outdated comment about selfhosting in staticpod/utils.go.
Selfhosting has not been supported in kubeadm for a while
and the comment also mentions the "master" label.
- Update unit tests.
- Rename the function in postupgrade.go to better reflect
what is being done.
- During "upgrade apply" find all nodes with the old label
and remove it by calling PatchNode.
- Update health check for CP nodes to not track "master"
labeled nodes. At this point all CP nodes should have
"control-plane" and we can use that selector only.
- Throw an error if there is more than one known socket on the host.
- Remove the special handling for docker+containerd.
- Remove the local instances of constants for endpoints for
Windows / Unix and use the defaultKnownCRISockets variable
which is populated from OS specific constants.
- Update error message in detectCRISocketImpl to have more
details.
- Make detectCRISocketImpl accept a list of "known" sockets
- Update unit tests for detectCRISocketImpl and make them
use generic paths such as "unix:///foo/bar.sock".
Change the default container runtime CRI socket endpoint to the
one of containerd. Previously it was the one for Docker
- Rename constants.DefaultDockerCRISocket to DefaultCRISocket
- Make the constants files include the endpoints for all supported
container runtimes for Unix/Windows.
- Update unit tests related to docker runtime testing.
- In kubelet/flags.go hardcode the legacy docker socket as a check
to allow kubeadm 1.24 to run against kubelet 1.23 if the user
explicitly sets the criSocket field to "npipe:////./pipe/dockershim"
on Windows or "unix:///var/run/dockershim.sock" on Linux.
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:
if klog.V(5).Enabled() {
klog.Info("hello world")
}
Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.
Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
The API was deprecated in 1.23 when output/v1alpha2 was
added. v1alpha1 is problematic since it embeds kubeadm/v1beta2
BootstrapToken related types directly. v1alpha2 imports
a new group dedicated to bootstrap tokens apis/bootstraptoken.
cli.Run was an attempt to elliminate error handling in Kubernetes
commands. However, it had to rely on heuristics that are not necessarily right
for all commands.
kubectl is one example which has its own error printing code that should be
used in all cases after a command failure. It now gets used also for
`--warnings-as-errors`. Previously, that caused the following message to be
logged at the end:
E0110 16:56:01.987555 202060 run.go:120] "command failed" err="1 warning received"
Now it ends with:
error: 1 warning received
crictl already works with the current state of dockershim.
Using the docker CLI is not required and the DockerRuntime
can be removed from kubeadm. This means that crictl
can connect at the dockershim (or cri-dockerd) socket and
be used to list containers, pull images, remove containers, and
all actions that the kubelet can otherwise perform with the socket.
Ensure that crictl is now required for all supported container runtimes
in checks.go. In the help text in waitcontrolplane.go show only
the crictl example.
Remove the check for the docker service from checks.go.
Remove the DockerValidor check from checks.go.
These two checks were special casing Docker as CR and compensating
for the lack of the same checks in dockershim. With the
extraction of dockershim to cri-dockerd, ideally cri-dockerd
should perform the required checks whether it can support
a given Docker config / version running on a host.
During "upgrade node" and "upgrade apply" read the
kubelet env file from /var/lib/kubelet/kubeadm-flags.env
patch the --container-runtime-endpoint flag value to
have the appropriate URL scheme prefix (e.g. unix:// on Linux)
and write the file back to disk.
This is a temporary workaround that should be kept only for 1 release
cycle - i.e. remove this in 1.25.
The CRI socket that kubeadm writes as an annotation
on a particular Node object can include an endpoint that
does not have an URL scheme. This is undesired as long term
the kubelet can stop allowing endpoints without URL scheme.
For control plane nodes "kubeadm upgrade apply" takes
the locally defaulted / populated NodeRegistration and refreshes
the CRI socket in PerformPostUpgradeTasks. But for secondary
nodes "kubeadm upgrade node" does not.
Adapt "upgrade node" to fetch the NodeRegistration for this node
and fix the CRI socket missing URL scheme if needed in the Node
annotation.
- Update defaults for v1beta2 and 3 to have URL scheme
- Raname DefaultUrlScheme to DefaultContainerRuntimeURLScheme
- Prepend a missing URL scheme to user sockets and warn them
that this might not be supported in the future
- Update socket validation to exclude IsAbs() testing
(This is broken on Windows). Assume the path is not empty and has
URL scheme at this point (validation happens after defaulting).
- Use net.Dial to open Unix sockets
- Update all related unit tests
Signed-off-by: pacoxu <paco.xu@daocloud.io>
Signed-off-by: Lubomir I. Ivanov <lubomirivanov@vmware.com>
Currently when the dockershim socket is used, kubeadm only passes
the --network-plugin=cni to the kubelet and assumes the built-in
dockershim. This is valid for versions <1.24, but with dockershim
and related flags removed the kubelet will fail.
Use preflight.GetKubeletVersion() to find the version of the host
kubelet and if the version is <1.24 assume that it has built-in
dockershim. Newer versions should will be treated as "remote" even
if the socket is for dockershim, for example, provided by cri-dockerd.
Update related unit tests.
Since we removed dockershim we now rely on both flags, which therefore
should not marked experimental any more.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Apply a small fix to ensure the kubeconfig files
that kubeadm manages have a CA when printed in the table
of the "check expiration" command. "CAName" is the field used for that.
In practice kubeconfig files can contain multiple credentials
from different CAs, but this is not supported by kubeadm and there
is a single cluster CA that signs the single client cert/key
in kubeadm managed kubeconfigs.
In case stacked etcd is used, the code that does expiration checks
does not validate if the etcd CA is "external" (missing key)
and if the etcd CA signed certificates are valid.
Add a new function UsingExternalEtcdCA() similar to existing functions
for the cluster CA and front-proxy CA, that performs the checks for
missing etcd CA key and certificate validity.
This function only runs for stacked etcd, since if etcd is external
kubeadm does not track any certs signed by that etcd CA.
This fixes a bug where the etcd CA will be reported as local even
if the etcd/ca.key is missing during "certs check-expiration".
When the "kubeadm certs check-expiration" command is used and
if the ca.key is not present, regular on disk certificate reads
pass fine, but fail for kubeconfig files. The reason for the
failure is that reading of kubeconfig files currently
requires reading both the CA key and cert from disk. Reading the CA
is done to ensure that the CA cert in the kubeconfig is not out of date
during renewal.
Instead of requiring both a CA key and cert to be read, only read
the CA cert from disk, as only the cert is needed for kubeconfig files.
This fixes printing the cert expiration table even if the ca.key
is missing on a host (i.e. the CA is considered external).
If done too soon, the klog.V() calls are ignored because the log verbosity
isn't set. In Kubernetes 1.22, the verbosity was set, but not the logging
format.
Don't use a custom dialer for the kubelet if is not rotating
certificates, so we can reuse TCP connections because we don't need
a customer dialer.
Kubelet needs to be able to recover from stale http connections.
HTTP2 has a mechanism to detect broken connections by sending periodical pings.
HTTP1 only can have one persistent connection, and it will close all Idle connections
once the Kubelet heartbet fails. However, since there are many edge cases that we can't
control, users can still opt-in to the previous behavior for closing the connections by
setting the environment variable DISABLE_HTTP2.
kubeam:node-proxier -> kubeadm:node-proxier
This causes e2e test failures:
"[area-kubeadm] proxy addon kube-proxy ServiceAccount should
be bound to the system:node-proxier cluster role"
in:
- kubeadm-kinder-latest
- kubeadm-kinder-latest-on-...
- other tests
Before this commit when setting bindAddress to 1.2.3.4 the warning was:
The recommended value for "bindAddress" in "KubeProxyConfiguration" is: 1.2.3.4; the provided value is: 0.0.0.0
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
Signed-off-by: wangyysde <net_use@bzhy.com>
Generation swagger.json.
Use v2 path for hpa_cpu_field.
run update-codegen.sh
Signed-off-by: wangyysde <net_use@bzhy.com>
Add the UnversionedKubeletConfigMap feature gate that can
be used to control legacy vs new behavior for naming the
default configmap used to store the KubeletConfiguration.
Update related unit tests.
This PR removes Serve function and uses all required places
ServeWithListenerStopped which takes place new Serve function.
This function returns ListenerStopped channel can be used to drain
requests before shutting down the server.
Instead of the individual error and return, it's better to aggregate all
the errors so that we can fix them all at once.
Take the chance to fix some comments, since kubeadm are not checking that
the certs are equal across controlplane.
Signed-off-by: Dave Chen <dave.chen@arm.com>
The addition of output/v1alpha2 made the converter-gen require
an explicit converter for:
kubeadm/v1beta2.BootstrapToken -> bootstraptoken/v1.BootstrapToken.
Add this converter under kubeadm/v1beta.
Use the converter in output/v1alpha1.
In various places log messages where emitted as part of validation or even
before it (for example, cli.PrintFlags). Those log messages did not use the
final logging configuration, for example text output instead of JSON or not the
final verbosity. The last point became more obvious after moving the setup of
verbosity into logs.Options.Apply because PrintFlags never printed anything
anymore.
In order to force applications to deal with logging as soon as possible, the
Options.Validate and Options.Apply methods are now private. Applications should
use the new Options.ValidateAndApply directly after parsing.
These three options are the ones from logs.AddFlags which are not deprecated.
Therefore it makes sense to make them available also via the configuration file
support in the one command which currently supports that (kubelet).
Long-term, all commands should use LoggingConfiguration, either with a
configuration file (as in kubelet) or via flags (kube-scheduler,
kube-apiserver, kube-controller-manager).
Short-term, both approaches have to be supported. As the majority of the
commands only use logs.AddFlags, that function by default continues to register
the flags and only leaves that to Options.AddFlags when explicitly requested.
A drive-by bug fix is done for log flushing: the periodic flushing called
klog.Flush and therefore missed explicit flushing of the newer logr
backend. This bug was never present in any release Kubernetes and therefore the
fix is not submitted in a separate PR.
This feature has graduated to GA in v1.11 and will always be
enabled. So no longe need to check if enabled.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
* De-share the Handler struct in core API
An upcoming PR adds a handler that only applies on one of these paths.
Having fields that don't work seems bad.
This never should have been shared. Lifecycle hooks are like a "write"
while probes are more like a "read". HTTPGet and TCPSocket don't really
make sense as lifecycle hooks (but I can't take that back). When we add
gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary
RPC - so a probe makes sense but a hook does not.
In the future I can also see adding lifecycle hooks that don't make
sense as probes. E.g. 'sleep' is a common lifecycle request. The only
option is `exec`, which requires having a sleep binary in your image.
* Run update scripts
* Updated non idle logging time
* Update cmd/kubelet/app/options/options.go
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
The new handler is meant to be executed at the end of the delegation chain.
It simply checks if the request have been made before the server has installed all known HTTP paths.
In that case it returns a 503 response otherwise it returns a 404.
We don't want to add additional checks to the readyz path as it might prevent fixing bricked clusters.
This specific handler is meant to "protect" requests that arrive before the paths and handlers are fully initialized.
The feature gate gets locked to "true", with the goal to remove it in two
releases.
All code now can assume that the feature is enabled. Tests for "feature
disabled" are no longer needed and get removed.
Some code wasn't using the new helper functions yet. That gets changed while
touching those lines.
The recommendation from #sig-cli was to print usage, then the error. Extra care
is taken to only print the usage instruction when the error really was about
flag parsing.
Taking kube-scheduler as example:
$ _output/bin/kube-scheduler
I0929 09:42:42.289039 149029 serving.go:348] Generated self-signed cert in-memory
...
W0929 09:42:42.489255 149029 client_config.go:620] error creating inClusterConfig, falling back to default config: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
E0929 09:42:42.489366 149029 run.go:98] "command failed" err="invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable"
$ _output/bin/kube-scheduler --xxx
Usage:
kube-scheduler [flags]
...
--vmodule moduleSpec
comma-separated list of pattern=N settings for file-filtered logging
Error: unknown flag: --xxx
The kubectl behavior doesn't change:
$ _output/bin/kubectl get nodes
Unable to connect to the server: dial tcp: lookup xxxx: No address associated with hostname
$ _output/bin/kubectl --xxx
Error: unknown flag: --xxx
See 'kubectl --help' for usage.
All Kubernetes commands should show flags with hyphens in their help text even
when the flag originally was defined with underscore. Converting a command to
this style is not breaking its command line API because the old-style parameter
with underscore is accepted as alias.
The easiest solution to achieve this is to set normalization shortly before
running the command in the new central cli.Run or the few places where that
function isn't used yet.
There may be some texts which depends on normalization at flag definition time,
like the --logging-format usage warning. Those get generated assuming that
hyphens will be used.
It wasn't documented that InitLogs already uses the log flush frequency, so
some commands have called it before parsing (for example, kubectl in the
original code for logs.go). The flag never had an effect in such commands.
Fixing this turned into a major refactoring of how commands set up flags and
run their Cobra command:
- component-base/logs: implicitely registering flags during package init is an
anti-pattern that makes it impossible to use the package in commands which
want full control over their command line. Logging flags must be added
explicitly now, something that the new cli.Run does automatically.
- component-base/logs: AddFlags would have crashed in kubectl-convert if it
had been called because it relied on the global pflag.CommandLine. This
has been fixed and kubectl-convert now has the same --log-flush-frequency
flag as other commands.
- component-base/logs/testinit: an exception are tests where flag.CommandLine has
to be used. This new package can be imported to add flags to that
once per test program.
- Normalization of the klog command line flags was inconsistent. Some commands
unintentionally didn't normalize to the recommended format with hyphens. This
gets fixed for sample programs, but not for production programs because
it would be a breaking change.
This refactoring has the following user-visible effects:
- The validation error for `go run ./cmd/kube-apiserver --logging-format=json
--add-dir-header` now references `add-dir-header` instead of `add_dir_header`.
- `staging/src/k8s.io/cloud-provider/sample` uses flags with hyphen instead of
underscore.
- `--log-flush-frequency` is not listed anymore in the --logging-format flag's
`non-default formats don't honor these flags` usage text because it will also
work for non-default formats once it is needed.
- `cmd/kubelet`: the description of `--logging-format` uses hyphens instead of
underscores for the flags, which now matches what the command is using.
- `staging/src/k8s.io/component-base/logs/example/cmd`: added logging flags.
- `apiextensions-apiserver` no longer prints a useless stack trace for `main`
when command line parsing raises an error.
The API is a copy of output/v1alpha1 with a minor difference
where output/v1alpha2.BootstrapToken embeds
bootstraptoken/v1.BootstrapToken instead of
kubeadm/v1beta2.BootstrapToken.
Embedding the later is an undesired binding between the "kubeadm"
and "output" groups, preventing the eventual deprecation and removal
of the kubeadm.v1beta2 API.
This new output API version, unlike v1alpha1, does not include
defaulting which is not needed.
Because the proxy.Provider interface included
proxyconfig.EndpointsHandler, all the backends needed to
implement its methods. But iptables, ipvs, and winkernel implemented
them as no-ops, and metaproxier had an implementation that wouldn't
actually work (because it couldn't handle Services with no active
Endpoints).
Since Endpoints processing in kube-proxy is deprecated (and can't be
re-enabled unless you're using a backend that doesn't support
EndpointSlice), remove proxyconfig.EndpointsHandler from the
definition of proxy.Provider and drop all the useless implementations.
Due to a cut-and-paste error in the original implementation in Kubernetes 1.19,
support for generic ephemeral inline volumes in the PVC protection controller
was incorrectly tied to the "storage object in use" feature gate.
The configuration is deprecated and targets removal for v1.23. Tests
cases have been changed as well.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values.
The code change was mistakenly created as PR in the k3s project (see https://github.com/k3s-io/k3s/pull/3505).
A real life use case is described in Rancher issue https://github.com/rancher/rancher/issues/33360.
When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in:
```
I0624 07:38:23.053960 54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0624 07:38:23.053999 54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
```
However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.
Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com
* This allows a controller to use cloud provider managed RBAC
when --use-service-account-credentials is set.
* Create ControllerInitFuncConstructor to pass to init funcs to avoid
future function signature growth.
* Add comments for context around legacy naming of node controllers.
* Add example for setting client names from cloud controller manager.
Panicing if not running in a test and if the component-base/version
variables are empty is not ideal. At some point sections
of kubeadm could be exposed as a library and if these sections
import the constants package, they would panic on the library
users unless they set the version information in component-base
with ldflags.
Instead:
- If the component-base version is empty, return a placeholder version
that should indicate to users that build kubeadm that something is not
right (e.g. they did not use 'make'). During library usage or unit
tests this version should not be relevant.
- Update unit tests to use hardcoded versions instead of the versions
from the constants package. Using the constants package for testing
is good but during unit tests these versions are already placeholders
since unit tests do not populate the actual component-base versions
(e.g. 1.23).
Tests under /app and /test would fail if the current/minimum k8s version
is dynamically populated from the version in the kubeadm binary.
Adapt the tests to support that.
Kubeadm requires manual version updates of its current supported k8s
control plane version and minimally supported k8s control plane and
kubelet versions every release cycle.
To avoid that, in constants.go:
- Add the helper function getSkewedKubernetesVersion() that can be
used to retrieve a MAJOR.(MINOR+n).0 version of k8s. It currently
uses the kubeadm version populated in "component-base/version" during
the kubeadm build process.
- Use the function to set existing version constants (variables).
Update util/config/common.go#NormalizeKubernetesVersion() to
tolerate the case where a k8s version in the ClusterConfiguration
is too old for the kubeadm binary to use during code freeze.
Include unit tests for the new utilities.
This change optimizes the kubeadm/etcd `AddMember` client-side function
by stopping early in the backoff loop when a peer conflict is found
(indicating the member has already been added to the etcd cluster). In
this situation, the function will stop early and relay a call to
`ListMembers` to fetch the current list of members to return. With this
optimization, front-loading a `ListMembers` call is no longer necessary,
as this functionally returns the equivalent response.
This helps reduce the amount of time taken in situational cases where an
initial client request to add a member is accepted by the server, but
fails client-side.
This situation is possible situationally, such as if network latency
causes the request to timeout after it was sent and accepted by the
cluster. In this situation, the following loop would occur and fail with
an `ErrPeerURLExist` response, and would be stuck until the backoff
timeout was met (roughly ~2min30sec currently).
Testing Done:
* Manual testing with an etcd cluster. Initial "AddMember` call was
successful, and the etcd manifest file was identical to prior version
of these files. Subsequent calls to add the same member succeeded
immediately (retaining idempotency), and the resulting manifest file
remains identical to previous version as well. The difference, this
time, is the call finished ~2min25sec faster in an identical test in
the environment tested with.
The purell package at github.com/PuerkitoBio/purell is no longer maintained and in k/k repo under kubeadm package its been used for normalizing the URL. This commit removes the dependency on this package and creates a local function for normalizing the URL within the preflight package under cmd/kubeadm.
Signed-off-by: gkarthiks <github.gkarthiks@gmail.com>
chore: add new line at end of the file
Signed-off-by: gkarthiks <github.gkarthiks@gmail.com>
fix: remove unused mod from vendor modules file
Signed-off-by: gkarthiks <github.gkarthiks@gmail.com>
The CPUManagerPolicyOptions received from the kubelet config/command line args
is propogated to the Container Manager.
We defer the consumption of the options to a later patch(set).
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
In this patch we enhance the kubelet configuration to support
cpuManagerPolicyOptions.
In order to introduce SMT-awareness in CPU Manager, we introduce a
new flag in Kubelet to allow the user to specify an additional flag
called `cpumanager-policy-options` to allow the user to modify the
behaviour of static policy to strictly guarantee allocation of whole
core.
Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
During operations such as "upgrade", kubeadm fetches the
ClusterConfiguration object from the kubeadm ConfigMap.
However, due to requiring node specifics it wraps it in an
InitConfiguration object. The function responsible for that is:
app/util/config#FetchInitConfigurationFromCluster().
A problem with this function (and sub-calls) is that it ignores
the static defaults applied from versioned types
(e.g. v1beta3/defaults.go) and only applies dynamic defaults for:
- API endpoints
- node registration
- etc...
The introduction of Init|JoinConfiguration.ImagePullPolicy now
has static defaulting of the NodeRegistration object with a default
policy of "PullIfNotPresent". Respect this defaulting by constructing
a defaulted internal InitConfiguration from
FetchInitConfigurationFromCluster() and only then apply the dynamic
defaults over it.
This fixes a bug where "kubeadm upgrade ..." fails when pulling images
due to an empty ("") ImagePullPolicy. We could assume that empty
string means default policy on runtime in:
cmd/kubeadm/app/preflight/checks.go#ImagePullCheck()
but that might actually not be the user intent during "init" and "join",
due to e.g. a typo. Similarly, we don't allow empty tokens
on runtime and error out.
Instead of dynamically defaulting NodeRegistration.ImagePullPolicy,
which is common when doing defaulting depending on host state - e.g.
hostname, statically default it in v1beta3/defaults.go.
- Remove defaulting in checks.go
- Add one more unit test in checks_test.go
- Adapt v1beta2 conversion and fuzzer / round tripping tests
This also results in the default being visible when calling:
"kubeadm config print ...".
This change updates the CSR API to add a new, optional field called
expirationSeconds. This field is a request to the signer for the
maximum duration the client wishes the cert to have. The signer is
free to ignore this request based on its own internal policy. The
signers built-in to KCM will honor this field if it is not set to a
value greater than --cluster-signing-duration. The minimum allowed
value for this field is 600 seconds (ten minutes).
This change will help enforce safer durations for certificates in
the Kube ecosystem and will help related projects such as
cert-manager with their migration to the Kube CSR API.
Future enhancements may update the Kubelet to take advantage of this
field when it is configured in a way that can tolerate shorter
certificate lifespans with regular rotation.
Signed-off-by: Monis Khan <mok@vmware.com>
Given bootstraptoken/v1 is now a separate GV, there is no need
to duplicate the API and utilities inside v1beta3 and the internal
version.
v1beta2 must continue to use its internal copy due, since output/v1alpha1
embeds the v1beta2.BootstrapToken object. See issue 2427 in k/kubeadm.
- Make v1beta3 use bootstraptoken/v1 instead of local copies
- Make the internal API use bootstraptoken/v1
- Update validation, /cmd, /util and other packages
- Update v1beta2 conversion
Package bootstraptoken contains an API and utilities wrapping the
"bootstrap.kubernetes.io/token" Secret type to ease its usage in kubeadm.
The API is released as v1, since these utilities have been part of a
GA workflow for 10+ releases.
The "bootstrap.kubernetes.io/token" Secret type is also GA.
During "join" of new control plane machines, kubeadm would
download shared certificates and keys from the cluster stored
in a Secret. Based on the contents of an entry in the Secret,
it would use helper functions from client-go to either write
it as public key, cert (mode 644) or as a private key (mode 600).
The existing logic is always writing both keys and certs with mode 600.
Allow detecting public readable data properly and writing some files
with mode 644.
First check the data with ParsePrivateKeyPEM(); if this passes
there must be at least one private key and the file should be written
with mode 600 as private. If that fails, validate if the data contains
public keys with ParsePublicKeysPEM() and write the file as public
(mode 644).
As a result of this new logic, and given the current set of managed
kubeadm files, .key files will end up with 600, while .crt and .pub
files will end up with 644.
This change updates the backdating logic to only be applied to the
NotBefore date and not the NotAfter date when the certificate is
short lived. Thus when such a certificate is issued, it will not be
immediately expired. Long lived certificates continue to have the
same lifetime as before.
Consolidated all certificate lifetime logic into the
PermissiveSigningPolicy.policy method.
Signed-off-by: Monis Khan <mok@vmware.com>
Add {Init|Join}Configuration.Patches, which is a structure that
contains patch related options. Currently it only has the "Directory"
field which is the same option as the existing --experimental-patches
flag.
The flags --[experimental-]patches value override this value
if both a flag and config is passed during "init" or "join".
The feature of "patches" in kubeadm has been in Alpha for a few
releases. It has not received major bug reports from users.
Deprecate the --experimental-patches flag and add --patches.
Both flags are allowed to be mixed with --config.
When API Priority and Fairness is enabled, the inflight limits must
add up to something positive.
This rejects the configuration that prompted
https://github.com/kubernetes/kubernetes/issues/102885
Update help for max inflight flags
This adds the gate `SeccompDefault` as new alpha feature. Seccomp path
and field fallbacks are now passed to the helper functions, whereas unit
tests covering those code paths have been added as well.
Beside enabling the feature gate, the feature has to be enabled by the
`SeccompDefault` kubelet configuration or its corresponding
`--seccomp-default` CLI flag.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Apply suggestions from code review
Co-authored-by: Paulo Gomes <pjbgf@linux.com>
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
If the user has not specified a pull policy we must assume a default of
v1.PullIfNotPresent.
Add some extra verbose output to help users monitor what policy is
used and what images are skipped / pulled.
Use "fallthrough" and case handle "v1.PullAlways".
Update unit test.
Update CreateInitStaticPodManifestFiles, CreateStaticPodFiles and CreateLocalEtcdStaticPodManifestFile to take into account if the command was run as dry-run.
In the Alpha stage of the feature in kubeadm to support
a rootless control plane, the allocation and assignment of
UID/GIDs to containers in the static pods will be automated.
This automation will require management of users and groups
in /etc/passwd and /etc/group.
The tools on Linux for user/group management are inconsistent
and non-standardized. It also requires us to include a number of
more dependencies in the DEB/RPMs, while complicating the UX for
non-package manager users.
The format of /etc/passwd and /etc/group is standardized.
Add code for managing (adding and deleting) a set of managed
users and groups in these files.
During Runner data initialization, if the value for the flag
"--skip-phases" was empty set the {init|join}Runner.Options.SkipPhases
to the {Init|Join}Configuration.SkipPhases value.
- Add the field SkipPhases in the public v1beta3 as a []string (omitempty)
- Add the field in the internal type
- Run generators
- Adapt v1beta2 converter for JoinConfiguration
Ideally this should be part of dockershim/CRI and not on the
side of kubeadm.
Remove the detection during:
- During preflight
- During kubelet config defaulting
Update dependencies and the test images to use pause 3.5. We also
provide a changelog entry for the new container image version.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
FeatureGate acts as a secondary switch to disable cloud-controller loops
in KCM, Kubelet and KAPI.
Provide comprehensive logging information to users, so they will be
guided in adoption of out-of-tree cloud provider implementation.
- Remove the deprecated --csr* flags "init phase certs"
- Deprecate the same flags for "certs renew".
For both cases users should be using "certs generate-csr".
The command "kubeadm config view" was deprecated in 1.19.
Remove it as scheduled in 1.22.
The replacement is to use kubectl:
kubectl get cm -n kube-system kubeadm-config -o=jsonpath="{.data.ClusterConfiguration}"
- Remove the object form v1beta3 and internal type
- Deprecate a couple of phases that were specifically designed / named to
modify the ClusterStatus object
- Adapt logic around annotation vs ClusterStatus retrieval
- Update unit tests
- Run generators
Running "go test ./cmd/kubeadm/app/..." results in these 3 files
being generated, since we have more callers to the functions
for generating unique private keys during pkiutil tests.
Add the files to ensure they are not generated locally all the time.
Kubeadm no longer supports kube-dns and CoreDNS is the only
supported DNS server. Remove ClusterConfiguration.DNS.Type
from v1beta3 that is used to set the DNS server type.
Now the following flags have no effect and would be removed in v1.24:
* `--port`
* `--address`
The insecure port flags `--port` may only be set to 0 now.
Signed-off-by: Jian Zeng <zengjian.zj@bytedance.com>
- Pin the ClusterConfiguration when fuzzing
the internal InitConfiguration that embeds it. Kubeadm includes
separate constructs for this embedding in the internal type
and this round trip is not viable.
- Remove the artificial calls to SetDefaults_ClusterConfiguration()
in v1beta{2|3}'s converters from public to internal InitConfiguration.
- Make sure the internal InitConfiguration.ClusterConfiguration is
defaulted in initconfiguration.go instead.
- scheme: switch to:
utilruntime.Must(scheme.SetVersionPriority(v1beta3.SchemeGroupVersion))
- change all imports in the code base from v1beta2 to v1beta3
- rename all import aliases for kubeadmapiv1beta2 to "kubeadmapiv".
this allows smaller diffs when changing the default public API.
it turns out that setting a timeout on HTTP client affect watch requests made by the delegated authentication component.
with a 10 second timeout watch requests are being re-established exactly after 10 seconds even though the default request timeout for them is ~5 minutes.
this is because if multiple timeouts were set, the stdlib picks the smaller timeout to be applied, leaving other useless.
for more details see a937729c2c/src/net/http/client.go (L364)
instead of setting a timeout on the HTTP client we should use context for cancellation.
The v1beta1/2 API doc.go files include an example
flag for the kubelet binary "cgroup-driver" under
"kubeletExtraArgs".
This flag is deprecated and should not be in the examples.
Add "v" instead which is one of the flags we know will
not be deprecated soon.
This is part of the "master" -> "control-plane" rename
that we missed. It's not critical for 1.21 as the
"control-plane" taint is still not added to CP nodes,
but it would be best to add the toleration preemptively
like the KEP planned.
* Removes discovery v1alpha1 API
* Replaces per Endpoint Topology with a read only DeprecatedTopology
in GA API
* Adds per Endpoint Zone field in GA API
The kubeadm documentation instructs users to set the container
runtime driver to "systemd", since kubeadm manages a kubelet via
the systemd init system. The kubelet default however is "cgroupfs".
For new clusters set the driver to "systemd" unless the user
is explicit about it. The same defaulting would not happen
during "upgrade".
Errors from staticcheck:
cmd/preferredimports/preferredimports.go:38:2:
package golang.org/x/crypto/ssh/terminal is deprecated:
this package moved to golang.org/x/term. (SA1019)
vendor/k8s.io/client-go/plugin/pkg/client/auth/exec/exec.go:36:2:
package golang.org/x/crypto/ssh/terminal is deprecated:
this package moved to golang.org/x/term. (SA1019)
vendor/k8s.io/client-go/tools/clientcmd/auth_loaders.go:26:2:
package golang.org/x/crypto/ssh/terminal is deprecated:
this package moved to golang.org/x/term. (SA1019)
Please review the above warnings. You can test via:
hack/verify-staticcheck.sh <failing package>
If the above warnings do not make sense, you can exempt the line or
file. See:
https://staticcheck.io/docs/#ignoring-problems
generated:
- hack/update-internal-modules.sh
- hack/lint-dependencies.sh
- hack/update-vendor.sh
Signed-off-by: Stephen Augustus <foo@auggie.dev>
Pass the flag --pod-infra-container-image to the kubelet not only
for Docker but for all CRs.
This flag tells the kubelet to special case the image and not garbage
collect it.
This change updates the number of workers that the CSR signing
controllers use. If a large number of certificates (especially
short lived ones) are approved at the same time, it can take the
signing controllers a long time to process them serially. The
NewCSRSigningController logic is already go routine safe.
Signed-off-by: Monis Khan <mok@vmware.com>
Looks like there is a bit of an issue in the Bluderbuss (Prow plugin)
where it prefers to pick reviewers from a parent OWNERS files,
instead of using an approver from a current OWNERS file as
an additional reviewer.
The dependencycheck binary name was vendorcheck, which was the original
name of the tool. This updates it to dependencycheck.
Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
As a part of cleaning up inactive members (those with no activity within
the past 18 months) from OWNERS files, this commit moves gmarek from an
approver to an emeritus_approver.
The new flag will parse the `--reserved-memory` flag straight forward
to the []kubeletconfig.MemoryReservation variable instead of parsing
it to the middle map representation.
It gives us possibility to get rid of a lot of unneeded code and use the single
presentation for the reserved-memory.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Updates kubeadm version resolution to use kubernetes community infra
bucket to fetch appropriate k8s ci versions. The images are already
being pulled from the kubernetes community infra bucket meaning that a
mismatch can occur when the ci version is fetched from the google infra
bucket and the image is not yet present on k8s infra.
Follow-up to kubernetes/kubernetes#97087
Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Originally raised as an issue with invalid versions to plan, but it has
been determined with air gapped environments and development versions it
is not possible to fully address that issue.
But one thing that was identified was that we can do a better job in how
we output the upgrade plan information. Kubeadm outputs the requested
version as "Latest stable version", though that may not actually be the
case. For this instance, we want to change this to "Target version" to
be a little more accurate.
Then in the component upgrade table that is emitted, the last column of
AVAILABLE isn't quite right either. Also changing this to TARGET to
reflect that this is the version we are targetting to upgrade to,
regardless of its availability.
There could be some improvements in checking available versions,
particularly in air gapped environments, to make sure we actually have
access to the requested version. But this at least clarifies some of the
output a bit.
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
we disabled the /healthz check because our test blocks one post-start
hook from finishing. Instead we should check all the other /healthz/...
endpoints before running the tests
Add DefaultedStaticInitConfiguration() which can be
used instead of DefaultedInitConfiguration() during unit tests.
The later can be slow since it performs dynamic defaulting.
Apply the label:
"node.kubernetes.io/exclude-from-external-load-balancers"
To control plane nodes to preserve backwards compatibility
with the legacy mode where "master" nodes were excluded from
LBs.
This commit replaces the CSIMigrationXXXComplete flag
with InTreePluginXXUnregister flag. This new flag will
be a superset of the CSIMigrationXXXComplete. But this
decouple the plugin unregister from CSI migration. So
if a K8s distribution want to go directly with CSI and
do not support in-tree, they can use this flag directly.
Testing:
1. Enable the InTreePluginXXUnregister and not CSIMigrationXXX,
verify that the PVC using old plugin name will have error
saying cannot find the plugin
2. Enable both the InTreePluginXXUnregister and CSIMigrationXXX
verify that the PVC using old plugin name will start to use
the migrated CSI plugin
Migrate how resource lock and leader election config is generated to new way, hidding kubeClient. This also halfs kubeClient timeout, making it an useful value.
If timeout is equal to RenewDeadline and we hit client timeout on request, there will be no retry, as RenewDeadline part will cancel the context and lose leader election. So setting a timeout to value at least equal to RenewDeadline is pointless.
Setting it as half of RenewDeadline is a heuristic to resolve this missing retry problem without adding additional parameter.
Migrate how resource lock and leader election config is generated to new
way, hidding kubeClient. This also halfs kubeClient timeout, making it
an useful value.
During upgrade the coredns migration library seems to require
that the input version doesn't have the "v" prefix".
Fixes a bug where the user cannot run commands such as
"kubeadm upgrade plan" if they have `v1.8.0` installed.
Assuming this is caused by the fact that previously the image didn't
have a "v" prefix.
Fixes an issue where some kubeadm phases fail if a certificate file
contains a certificate chain with one or more intermediate CA
certificates. The validation algorithm has been changed from requiring
that a certificate was signed directly by the root CA to requiring that
there is a valid certificate chain back to the root CA.
In kubeadm etcd join there is a a bug that exists where,
if a peer already exists in etcd, it attempts to mitigate
by continuing and generating the etcd manifest file. However,
this existing "member name" may actually be unset, causing
subsequent etcd consistency checks to fail.
This change checks if the member name is empty - if it is,
it sets the member name to the node name, and resumes.
The error messages when the user feeds an invalid discovery token CA
hash are vague. Make sure to:
- Print the list of supported hash formats (currently only "sha256").
- Wrap the error from pubKeyPins.Allow() with a descriptive message.
Implement pod resource metrics as described in KEP 1916. The new
`/metrics/resources` endpoint is exposed on the active scheduler
and reports kube_pod_resources* metrics that present the effective
requests and limits for all resources on the pods as calculated by
the scheduler and kubelet. This allows administrators using the
system to quickly perform resource consumption, reservation, and
pending utilization calculations when those metrics are read.
Because metrics calculation is on-demand, there is no additional
resource consumption incurred by the scheduler unless the endpoint
is scraped.
- Mark the "node-role.kubernetes.io/master" key for labels
and taints as deprecated.
- During "kubeadm init/join" apply the label
"node-role.kubernetes.io/control-plane" to new control-plane nodes,
next to the existing "node-role.kubernetes.io/master" label.
- During "kubeadm upgrade apply", find all Nodes with the "master"
label and also apply the "control-plane" label to them
(if they don't have it).
- During upgrade health-checks collect Nodes labeled both "master"
and "control-plane".
- Rename the constants.ControlPlane{Taint|Toleraton} to
constants.OldControlPlane{Taint|Toleraton} to manage the transition.
- Mark constants.OldControlPlane{{Taint|Toleraton} as deprecated.
- Use constants.OldControlPlane{{Taint|Toleraton} instead of
constants.ControlPlane{Taint|Toleraton} everywhere.
- Introduce constants.ControlPlane{Taint|Toleraton}.
- Add constants.ControlPlaneToleraton to the kube-dns / CoreDNS
Deployments to make them anticipate the introduction
of the "node-role.kubernetes.io/control-plane:NoSchedule"
taint (constants.ControlPlaneTaint) on kubeadm control-plane Nodes.
Aborted requests are the ones that were disrupted with http.ErrAbortHandler.
For example, the timeout handler will panic with http.ErrAbortHandler when a response to the client has been already sent
and the timeout elapsed.
Additionally, a new metric requestAbortsTotal was defined to count aborted requests. The new metric allows for aggregation for each group, version, verb, resource, subresource and scope.
without APIServerIdentity enabled, stale apiserver leases won't be GC'ed
and the same for stale storage version entries. In that case the storage
migrator won't operate correctly without manual intervention.
To make sure that the storage version filter can block certain requests until
the storage version updates are completed, and that the apiserver works
properly after the storage version updates are done.
StorageVersions are updated during apiserver bootstrap.
Also add a poststarthook to the aggregator which updates the
StorageVersions via the storageversion.Manager
Previously no timeout was set. Requests without explicit timeout might potentially hang forever and lead to starvation of the application.
When no timeout was specified a default one will be applied.
* api: structure change
* api: defaulting, conversion, and validation
* [FIX] validation: auto remove second ip/family when service changes to SingleStack
* [FIX] api: defaulting, conversion, and validation
* api-server: clusterIPs alloc, printers, storage and strategy
* [FIX] clusterIPs default on read
* alloc: auto remove second ip/family when service changes to SingleStack
* api-server: repair loop handling for clusterIPs
* api-server: force kubernetes default service into single stack
* api-server: tie dualstack feature flag with endpoint feature flag
* controller-manager: feature flag, endpoint, and endpointSlice controllers handling multi family service
* [FIX] controller-manager: feature flag, endpoint, and endpointSlicecontrollers handling multi family service
* kube-proxy: feature-flag, utils, proxier, and meta proxier
* [FIX] kubeproxy: call both proxier at the same time
* kubenet: remove forced pod IP sorting
* kubectl: modify describe to include ClusterIPs, IPFamilies, and IPFamilyPolicy
* e2e: fix tests that depends on IPFamily field AND add dual stack tests
* e2e: fix expected error message for ClusterIP immutability
* add integration tests for dualstack
the third phase of dual stack is a very complex change in the API,
basically it introduces Dual Stack services. Main changes are:
- It pluralizes the Service IPFamily field to IPFamilies,
and removes the singular field.
- It introduces a new field IPFamilyPolicyType that can take
3 values to express the "dual-stack(mad)ness" of the cluster:
SingleStack, PreferDualStack and RequireDualStack
- It pluralizes ClusterIP to ClusterIPs.
The goal is to add coverage to the services API operations,
taking into account the 6 different modes a cluster can have:
- single stack: IP4 or IPv6 (as of today)
- dual stack: IPv4 only, IPv6 only, IPv4 - IPv6, IPv6 - IPv4
* [FIX] add integration tests for dualstack
* generated data
* generated files
Co-authored-by: Antonio Ojea <aojea@redhat.com>
the controller manager should validate the podSubnet against the node-mask
because if they are incorrect can cause the controller-manager to fail.
We don't need to calculate the node-cidr-masks, because those should
be provided by the user, if they are wrong we fail in validation.
This PR adds trailing unit tests to check the service cluster IP range and
improves the code coverage of k8s.io/kubernetes/cmd/kube-apiserver/app from
5.7% to 6.2%.
1) Dual stack IPv4/IPv6
2) Invalid IPv4, IPv6 mask
3) missing IPv4, IPv6 mask
4) invalid IP address format
The tests 2, 3, 4 are suggsted by Antonio Ojea.
Currently the "generate-csr" command does not have any output.
Pass an io.Writer (bound to os.Stdout from /cmd) to the functions
responsible for generating the kubeconfig / certs keys and CSRs.
If nil is passed these functions don't output anything.
Discussion is ongoing about how to best handle dual-stack with clouds
and autodetected IPs, but there is at least agreement that people on
bare metal ought to be able to specify two explicit IPs on dual-stack
hosts, so allow that.
Deprecate the experimental command "alpha self-hosting" and its
sub-command "pivot" that can be used to create a self-hosting
control-plane from static Pods.
The kubeconfig phase of "kubeadm init" detects external CA mode
and skips the generation of kubeconfig files. The kubeconfig
handling during control-plane join executes
CreateJoinControlPlaneKubeConfigFiles() which requires the presence
of ca.key when preparing the spec of a kubeconfig file and prevents
usage of external CA mode.
Modify CreateJoinControlPlaneKubeConfigFiles() to skip generating
the kubeconfig files if external CA mode is detected.
- Modify validateCACertAndKey() to print warnings for missing keys
instead of erroring out.
- Update unit tests.
This allows doing a CP node join in a case where the user has:
- copied shared certificates to the new CP node, but not copied
ca.key files, treating the cluster CAs as external
- signed other required certificates in advance
The provided DialContext wraps existing clients' DialContext in an attempt to
preserve any existing timeout configuration. In some cases, we may replace
infinite timeouts with golang defaults.
- scaleio: tcp connect/keepalive values changed from 0/15 to 30/30
- storageos: no change
The flag was deprecated as it is problematic since it allows
overrides of the kubelet configuration that is downloaded
from the cluster during upgrade.
Kubeadm node upgrades already download the KubeletConfiguration
and store it in the internal ClusterConfiguration type. It is then
only a matter of writing that KubeletConfiguration to disk.
For external CA users that have prepared the kubeconfig files
for components, they might wish to provide a custom API server URL.
When performing validation on these kubeconfig files, instead of
erroring out on such custom URLs, show a klog Warning.
This allows flexibility around topology setup, where users
wish to make the kubeconfigs point to the ControlPlaneEndpoint instead
of the LocalAPIEndpoint.
Fix validation in ValidateKubeconfigsForExternalCA expecting
all kubeconfig files to use the CPE. The kube-scheduler and
kube-controller-manager now use LAE.
This PR specifies minimum control plane version,
kubelet version and current K8s version for v1.20.
Signed-off-by: Kommireddy Akhilesh <akhileshkommireddy2412@gmail.com>
Client side period validation of certificates should not be
fatal, as local clock skews are not so uncommon. The validation
should be left to the running servers.
- Remove this validation from TryLoadCertFromDisk().
- Add a new function ValidateCertPeriod(), that can be used for this
purpose on demand.
- In phases/certs add a new function CheckCertificatePeriodValidity()
that will print warnings if a certificate does not pass period
validation, and caches certificates that were already checked.
- Use the function in a number of places where certificates
are loaded from disk.
CNI is no longer alpha and is widely used by almost every Kubernetes cluster, we should remove the alpha warnings that were originally added from the early days of CNI
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
The isCoreDNSVersionSupported() check assumes that
there is a running kubelet, that manages the CoreDNS containers.
If the containers are being created it is not possible to fetch
their image digest. To workaround that, a poll can be used in
isCoreDNSVersionSupported() and wait for the CoreDNS Pods
are expected to be running. Depending on timing and CNI
yet to be installed this can cause problems related to
addon idempotency of "kubeadm init", because if the CoreDNS
Pods are waiting for another step they will never get running.
Remove the function isCoreDNSVersionSupported() and assume that
the version is always supported. Rely on the Corefile migration
library to error out if it must.
- Ensure the directory is created with 0700 via a new function
called CreateDataDirectory().
- Call this function in the init phases instead of the manual call
to MkdirAll.
- Call this function when joining control-plane nodes with local etcd.
If the directory creation is left to the kubelet via the
static Pod hostPath mounts, it will end up with 0755
which is not desired.
A bug was discovered in the `enforceRequirements` func for `upgrade plan`.
If a command line argument that specifies the target Kubernetes version is
supplied, the returned `ClusterConfiguration` by `enforceRequirements` will
have its `KubernetesVersion` field set to the new version.
If no version was specified, the returned `KubernetesVersion` points to the
currently installed one.
This remained undetected for a couple of reasons
- It's only `upgrade plan` that allows for the version command line argument to
be optional (in `upgrade plan` it's mandatory)
- Prior to 1.19, the implementation of `upgrade plan` did not make use of the
`KubernetesVersion` returned by `enforceRequirements`.
`upgrade plan` supports this optional command line argument to enable
air-gapped setups (as not specifying a version on the command line will end up
looking for the latest version over the Interned).
Hence, the only option is to make `enforceRequirements` consistent in the
`upgrade plan` case and always return the currently installed version in the
`KubernetesVersion` field.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
Pinning the kube-controller-manager and kube-scheduler kubeconfig files
to point to the control-plane-endpoint can be problematic during
immutable upgrades if one of these components ends up contacting an N-1
kube-apiserver:
https://kubernetes.io/docs/setup/release/version-skew-policy/#kube-controller-manager-kube-scheduler-and-cloud-controller-manager
For example, the components can send a request for a non-existing API
version.
Instead of using the CPE for these components, use the LocalAPIEndpoint.
This guarantees that the components would talk to the local
kube-apiserver, which should be the same version, unless the user
explicitly patched manifests.
A check that verifies that kubeadm does not "upgrade" to an older release was
overly optimized by skipping upgrade if the new version is the same as the old
one. This somewhat makes sense, but that way changes in any of the etcd fields
in the ClusterConfiguration won't be applied if the etcd version is not
changed.
Hence, this simple change ensures that the upgrade is done even when no version
change takes place.
Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>