If the user passes "--proxy-mode ipvs", and it is not possible to use
IPVS, then error out rather than falling back to iptables.
There was never any good reason to be doing fallback; this was
presumably erroneously added to parallel the iptables-to-userspace
fallback (which only existed because we had wanted iptables to be the
default but not all systems could support it).
In particular, if the user passed configuration options for ipvs, then
they presumably *didn't* pass configuration options for iptables, and
so even if the iptables proxy is able to run, it is likely to be
misconfigured.
Back when iptables was first made the default, there were
theoretically some users who wouldn't have been able to support it due
to having an old /sbin/iptables. But kube-proxy no longer does the
things that didn't work with old iptables, and we removed that check a
long time ago. There is also a check for a new-enough kernel version,
but it's checking for a feature which was added in kernel 3.6, and no
one could possibly be running Kubernetes with a kernel that old. So
the fallback code now never actually falls back, so it should just be
removed.
This was implemented partly in server.go and partly in
server_others.go even though even the parts in server.go were totally
linux-specific. Simplify things by putting it all in server_others.go
and get rid of some unnecessary abstraction.
Introduce networking/v1alpha1 api group.
Add `ClusterCIDR` type to networking/v1alpha1 api group, this type
will enable the NodeIPAM controller to support multiple ClusterCIDRs.
This change is to promote local storage capacity isolation feature to GA
At the same time, to allow rootless system disable this feature due to
unable to get root fs, this change introduced a new kubelet config
"localStorageCapacityIsolation". By default it is set to true. For
rootless systems, they can set this configuration to false to disable
the feature. Once it is set, user cannot set ephemeral-storage
request/limit because capacity and allocatable will not be set.
Change-Id: I48a52e737c6a09e9131454db6ad31247b56c000a
Flocker storage plugin removed from k8s codebase.
Flocker, an early external storage plugin in k8s,
has not been in maintenance and their business is
down. As far as I know, the plugin is not being
used anymore.
This PR removes the whole flocker dependency and
codebase from core k8s to reduce potential security
risks and reduce maintenance work from the sig-storage community.
the fix introduced in #110634 also introduces a bug preventing `kubeadm
upgrade plan` from running on nodes having a different `os.Hostname()`
than node name. concretely, for a node `titi.company.ch`,
`os.Hostname()` will return `titi`, while the full node name is actually
`titi.company.ch`. this simple fix uses the `cfg.NodeRegistration.Name`
instead, which fixes the issue on my nodes with a FQDN node name
Keep previous hostname retrieval as fallback for dupURL CRI fix
`getClientSet` is used by both cmd `token` and `reset`, move this
method to cmd utils to decouple it from one specific cmd.
Signed-off-by: Dave Chen <dave.chen@arm.com>
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
The experimental-watch-progress-notify-interval flag specifies an interval
at which etcd sends data to the kube-api server.
It is used by the WatchBookmark feature which is GA since 1.17.
It will be used by a new WatchList feature which is Alpha since 1.25
In addition to that the feature was graduated to GA (non-experiment) in etcd 3.5 without any code changes
Fix a TODO to plumb an update filter from above in the resource quota
monitor code that was handling update events for quota-able objects,
instead of hard-coding the logic in the resource quota monitor.
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
* kubelet: silence flag output on errors
Currently, the `--help` text is output on kubelet errors. Currently on
my machine this is 280 lines. Typically kubelet is run by systemd or
similar, starting it a loop. This means when an issue is encountered, we
are spammed by 100s of logs per second, masking the real error.
With this PR, the list of all flags is silenced. Users can still access
them by `kubelet --help` as normal. This same `SilenceUsage` is already
set in the api-server command.
* Update cmd/kubelet/app/server.go
Co-authored-by: Paco Xu <paco.xu@daocloud.io>
Co-authored-by: Paco Xu <paco.xu@daocloud.io>
- `cert-dir` could be specified to a value other than the default value
- we have tests that should be executed successfully on the working cluster
Signed-off-by: Dave Chen <dave.chen@arm.com>
It is useful to have the ability to control whether alpha or beta features are
enabled. We can group features under LoggingAlphaOptions and LoggingBetaOptions
because the configuration is designed so that each feature individually must be
enabled via its own option.
Currently, the JSON format itself is beta (graduated in 1.23) but additional
options for it were only added in 1.23 and thus are still alpha:
$ go run ./staging/src/k8s.io/component-base/logs/example/cmd/logger.go --logging-format=json --log-json-split-stream --log-json-info-buffer-size 1M --feature-gates LoggingBetaOptions=false
[format: Forbidden: Log format json is BETA and disabled, see LoggingBetaOptions feature, options.json.splitStream: Forbidden: Feature LoggingAlphaOptions is disabled, options.json.infoBufferSize: Forbidden: Feature LoggingAlphaOptions is disabled]
$ go run ./staging/src/k8s.io/component-base/logs/example/cmd/logger.go --logging-format=json --log-json-split-stream --log-json-info-buffer-size 1M
[options.json.splitStream: Forbidden: Feature LoggingAlphaOptions is disabled, options.json.infoBufferSize: Forbidden: Feature LoggingAlphaOptions is disabled]
This is the same approach that was taken for CPUManagerPolicyAlphaOptions and
CPUManagerPolicyBetaOptions.
In order to test this without modifying the global feature gate in a test file,
ValidateKubeletConfiguration must take a feature gate as argument.
Making the LoggingConfiguration part of the versioned component-base/config API
had the theoretic advantage that components could have offered different
configuration APIs with experimental features limited to alpha versions (for
example, sanitization offered only in a v1alpha1.KubeletConfiguration). Some
components could have decided to only use stable logging options.
In practice, this wasn't done. Furthermore, we don't want different components
to make different choices regarding which logging features they offer to
users. It should always be the same everywhere, for the sake of consistency.
This can be achieved with a saner Go API by dropping the distinction between
internal and external LoggingConfiguration types. Different stability levels of
indidividual fields have to be covered by documentation (done) and potentially
feature gates (not currently done).
Advantages:
- everything related to logging is under component-base/logs;
previously this was scattered across different packages and
different files under "logs" (why some code was in logs/config.go
vs. logs/options.go vs. logs/logs.go always confused me again
and again when coming back to the code):
- long-term config and command line API are clearly separated
into the "api" package underneath that
- logs/logs.go itself only deals with legacy global flags and
logging configuration
- removal of separate Go APIs like logs.BindLoggingFlags and
logs.Options
- LogRegistry becomes an implementation detail, with less code
and less exported functionality (only registration needs to
be exported, querying is internal)
After the removal of the dynamic kubelet configuration feature it became
possible to initialize logging directly after configuration parsing. The
advantage is that logs emitted by
kubeletconfigvalidation.ValidateKubeletConfiguration and
`klog.InfoS("unsupported configuration ...` already use the intended log
output.
After the code was originally added, Run was replaced by RunE. Taking advantage
of that and returning an error is cleaner.
Once we patch a kubelet configuration file, the patched output
is in JSON. Make sure it's converted back to YAML, given
the kubelet config in the cluster and on disk is always in YAML.
Add unit test for the new function applyKubeletConfigPatches()
In phases/kubelet/WriteConfigToDisk() create a patch
manager for the root patches directory and apply
the user patches with a target "kubeletconfiguration".
With phases/kubelet/WriteConfigToDisk() about to support patches
it is required that the function accepts an io.Writer
where the PatchManager can output to and also a patch directory.
Modify all call sites of the function WriteConfigToDisk()
to properly prepare an pass an io.Writer and patches dir to it.
This results in command phases for init/join/upgrade to pass
the root io.Writer (usually stdout) and the patchesDir populated
either via the config file or --patches flag.
If the user runs "kubeadm upgrade apply", kubeadm can download
a configuration from the cluster. If the configuration contains
the legacy default imageRepository of "k8s.gcr.io", mutate it
to the new default of "registry.k8s.io" and update the
configuration in the config map.
During "upgrade node/diff" download the configuration, mutate the
image repository locally, but do not mutate the in-cluster value.
That is done only on "apply".
This ensures that users are migrated from the old default registry
domain.
- lock the FG to true by default
- cleanup wrappers and logic related to versioned vs unversioned
naming of API objects (CMs and RBAC)
- update unit tests
The OldControlPlaneTaint taint (master) can be replaced
with the new ControlPlaneTaint (control-plane) taint.
Adapt unit tests in markcontrolplane_test.go
and cluster_test.go.
- iniconfiguration.go: stop applying the "master" taint
for new clusters; update related unit tests in _test.go
- apply.go: Remove logic related to cleanup of the "master" label
during upgrade
- apply.go: Add cleanup of the "master" taint on CP nodes
during upgrade
- controlplane_nodes_test.go: remove test for old "master" taint
on nodes (this needs backport to 1.24, because we have a kubeadm
1.25 vs kubernetes test suite 1.24 e2e test)
Use the etcd 3.5.3+ HTTP(s) endpoint "/health?serializable=true",
to allow the kubelet liveness and starup probes in the
kubeadm generated etcd.yaml (static Pod) to track
individual member health instead of tracking the whole
etcd cluster health.
Given kubeadm 1.25 only supports kubelet 1.25 and 1.24,
1.23 related logic around dockershim can be removed.
- Don't clean the directories
/var/lib/dockershim, /var/runkubernetes, /var/lib/cni
- Pass the CRISocket directly to the kubelet
--container-runtime-endpoint flag without extra handling
of dockershim
- No longer apply the --container-runtime=remote flag
as that is the only possible value in 1.24 and 1.25
- Update unit tests
Note: we are still passing --pod-infra-container-image
to avoid the pause image to be GCed by the kubelet.
During upgrade when a CP node is missing the old / legacy "master"
taint, assume the user has manually removed it to allow
workloads to schedule.
In such cases do not re-taint the node with the new "control-plane"
taint.
* Introduce networking/v1alpha1 api, ClusterCIDRConfig type
Introduce networking/v1alpha1 api group.
Add `ClusterCIDRConfig` type to networking/v1alpha1 api group, this type
will enable the NodeIPAM controller to support multiple ClusterCIDRs.
* Change ClusterCIDRConfig.NodeSelector type in api
* Fix review comments for API
* Update ClusterCIDRConfig API Spec
Introduce PerNodeHostBits field, remove PerNodeMaskSize
Over time the size of our junit xml has exploded to the point where
test-grid fails to process them. We still have the original/full
*.stdout files from where the junit xml files are generated from so the
junit xml files need NOT have the fill/exact output for
processing/display. So let us prune the large messages with an
indicator that we have "[... clipped...]" some of the content so folks
can see that they have to consult the full *.stdout files.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
InitLogs overrides the klog default and turns contextual logging off. This
ensures that it is only enabled in Kubernetes commands that explicitly enable
it via a feature gate. A feature gate for it gets defined in
k8s.io/component-base/logs and is then used by Options.ValidateAndApply.
The effect of disabling contextual logging is very limited according to
benchmarks with kube-scheduler. The feature gets added anyway to satisfy the
PRR recommendation that features should be controllable.
The following commands have support for contextual logging:
- kube-apiserver
- kube-controller-manager
- kubelet
- kube-scheduler
- component-base/logs example
Supporting a feature gate check in ValidateAndApply and not in InitLogs is a
simplification: changing InitLogs to accept a FeatureGate would have implied
changing also component-base/cli.Run. This didn't seem worthwhile because
ValidateAndApply already covers the relevant commands.
Include the flag "--experimental-initial-corrupt-check"
in etcd static pod manifests to ensure
etcd member data consistency.
The etcd feature is planned for graduation in 3.6,
at which point we should switch to using the flag
without the "experimental" prefix.
This commit adds the framework for the new local detection
modes BridgeInterface and InterfaceNamePrefix to work.
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Some of these changes are cosmetic (repeatedly calling klog.V instead of
reusing the result), others address real issues:
- Logging a message only above a certain verbosity threshold without
recording that verbosity level (if klog.V().Enabled() { klog.Info... }):
this matters when using a logging backend which records the verbosity
level.
- Passing a format string with parameters to a logging function that
doesn't do string formatting.
All of these locations where found by the enhanced logcheck tool from
https://github.com/kubernetes/klog/pull/297.
In some cases it reports false positives, but those can be suppressed with
source code comments.
We now re-use the crictl tool path within the `ContainerRuntime` when
exec'ing into it. This allows introducing a convenience function to
create the crictl command and re-use it where necessary.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
This change adds 2 options for windows:
--forward-healthcheck-vip: If true forward service VIP for health check
port
--root-hnsendpoint-name: The name of the hns endpoint name for root
namespace attached to l2bridge, default is cbr0
When --forward-healthcheck-vip is set as true and winkernel is used,
kube-proxy will add an hns load balancer to forward health check request
that was sent to lb_vip:healthcheck_port to the node_ip:healthcheck_port.
Without this forwarding, the health check from google load balancer will
fail, and it will stop forwarding traffic to the windows node.
This change fixes the following 2 cases for service:
- `externalTrafficPolicy: Cluster` (default option): healthcheck_port is
10256 for all services. Without this fix, all traffic won't be directly
forwarded to windows node. It will always go through a linux node and
get forwarded to windows from there.
- `externalTrafficPolicy: Local`: different healthcheck_port for each
service that is configured as local. Without this fix, this feature
won't work on windows node at all. This feature preserves client ip
that tries to connect to their application running in windows pod.
Change-Id: If4513e72900101ef70d86b91155e56a1f8c79719
* kube-proxy cluder-cidr arg accepts comma-separated list
It is possible in dual-stack clusters to provide kube-proxy with
a comma-separated list with an IPv4 and IPv6 CIDR for pods.
update: signoff
update2: update email profile
Signed-off-by: Tyler Lloyd <Tyler.Lloyd@microsoft.com>
Signed-off-by: Tyler Lloyd <tylerlloyd928@gmail.com>
* Updating cluster-cidr comment description
Signed-off-by: Tyler Lloyd <tyler.lloyd@microsoft.com>
For the YAML examples, make the indentation consistent
by starting with a space and following with a TAB.
Also adjust the indentation of some fields to place them under
the right YAML field parent - e.g. ignorePreflightErrors
is under nodeRegistration.
All the controllers should use context for signalling termination of communication with API server. Once kcm cancels context all the cert controllers which are started via kcm should cancel the APIServer request in flight instead of hanging around.
This commit includes all the changes needed for APIServer. Instead of modifying the existing signatures for the methods which either generate or return stopChannel, we generate a context from the channel and use the generated context to be passed to the controllers which are started in APIServer. This ensures we don't have to touch APIServer dependencies.
- Modify VerifyUnmarshalStrict to use serializer/json instead
of sigs.k8s.io/yaml. In strict mode, the serializers
in serializer/json use the new sigs.k8s.io/json library
that also catches case sensitive errors for field names -
e.g. foo vs Foo. Include test case for that in strict/testdata.
- Move the hardcoded schemes to check to the side of the
caller - i.e. accept a slice of runtime.Scheme.
- Move the klog warnings outside of VerifyUnmarshalStrict
and make them the responsibility of the caller.
- Call VerifyUnmarshalStrict when downloading the configuration
from kubeadm-config or the kube-proxy or kubelet-config CMs.
This validation is useful if the user has manually patched the CMs.
The apiserver owns and manages the kubernetes.default service.
It has 3 different options to reconcile the endpoints that belong to
that service:
- None: endpoints are handled by an external party.
- MasterCount: legacy, it reconciles based on the endpoints generated
and a flag specifying the number of master on the cluster.
- Lease: default since 1.11, each apiserver writes a lease in etcd
and renews periodically, the endpoints are generated based on the
existing leases.
It seems that when the default was set for the lease reconciler, the
controlplane code wasn't updated and kept using the master count
reconciler.
This also starts the deprecation of the master count reconciler in
favor of the lease reconciler.
The legacy naming "kubelet-config-x.yy" is no longer the
default behavior. Rename instances in documentation and comments
of "kubelet-config-x.yy" to "kubelet-config".
- Graduate the feature gate to Beta and enable it by default.
- Pre-set the default value for UnversionedKubeletConfigMap
to "true" in test/e2e_kubeadm.
- Fix a couple of typos in "tolerate" introduced in the PR that
added the FG in 1.23.
Compare with two pointers will always show that they are different value,
so it will always print the warning message.
Signed-off-by: Dave Chen <dave.chen@arm.com>
- During "upgrade apply" call a new function AddNewControlPlaneTaint()
that finds all nodes with the new "control-plane" node-role label
and adds the new "control-plane" taint to them.
- The function is called in "apply" and is separate from
the step to remove the old "master" label for better debugging
if errors occur.
- Apply "control-plane" taint during init/join by adding the
taint in SetNodeRegistrationDynamicDefaults(). The old
taint "master" is still applied.
- Clarify API docs (v1beta2 and v1beta3) for nodeRegistration.Taint
to not mention "master" taint and be more generic. Remove
example for taints that includes the word "master".
- Update unit tests.
- Update the markcontrolplane phase used by init and join to
only label the nodes with the new control plane label.
- Cleanup TODOs about the old label.
- Remove outdated comment about selfhosting in staticpod/utils.go.
Selfhosting has not been supported in kubeadm for a while
and the comment also mentions the "master" label.
- Update unit tests.
- Rename the function in postupgrade.go to better reflect
what is being done.
- During "upgrade apply" find all nodes with the old label
and remove it by calling PatchNode.
- Update health check for CP nodes to not track "master"
labeled nodes. At this point all CP nodes should have
"control-plane" and we can use that selector only.
- Throw an error if there is more than one known socket on the host.
- Remove the special handling for docker+containerd.
- Remove the local instances of constants for endpoints for
Windows / Unix and use the defaultKnownCRISockets variable
which is populated from OS specific constants.
- Update error message in detectCRISocketImpl to have more
details.
- Make detectCRISocketImpl accept a list of "known" sockets
- Update unit tests for detectCRISocketImpl and make them
use generic paths such as "unix:///foo/bar.sock".
Change the default container runtime CRI socket endpoint to the
one of containerd. Previously it was the one for Docker
- Rename constants.DefaultDockerCRISocket to DefaultCRISocket
- Make the constants files include the endpoints for all supported
container runtimes for Unix/Windows.
- Update unit tests related to docker runtime testing.
- In kubelet/flags.go hardcode the legacy docker socket as a check
to allow kubeadm 1.24 to run against kubelet 1.23 if the user
explicitly sets the criSocket field to "npipe:////./pipe/dockershim"
on Windows or "unix:///var/run/dockershim.sock" on Linux.
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:
if klog.V(5).Enabled() {
klog.Info("hello world")
}
Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.
Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
The API was deprecated in 1.23 when output/v1alpha2 was
added. v1alpha1 is problematic since it embeds kubeadm/v1beta2
BootstrapToken related types directly. v1alpha2 imports
a new group dedicated to bootstrap tokens apis/bootstraptoken.
cli.Run was an attempt to elliminate error handling in Kubernetes
commands. However, it had to rely on heuristics that are not necessarily right
for all commands.
kubectl is one example which has its own error printing code that should be
used in all cases after a command failure. It now gets used also for
`--warnings-as-errors`. Previously, that caused the following message to be
logged at the end:
E0110 16:56:01.987555 202060 run.go:120] "command failed" err="1 warning received"
Now it ends with:
error: 1 warning received
crictl already works with the current state of dockershim.
Using the docker CLI is not required and the DockerRuntime
can be removed from kubeadm. This means that crictl
can connect at the dockershim (or cri-dockerd) socket and
be used to list containers, pull images, remove containers, and
all actions that the kubelet can otherwise perform with the socket.
Ensure that crictl is now required for all supported container runtimes
in checks.go. In the help text in waitcontrolplane.go show only
the crictl example.
Remove the check for the docker service from checks.go.
Remove the DockerValidor check from checks.go.
These two checks were special casing Docker as CR and compensating
for the lack of the same checks in dockershim. With the
extraction of dockershim to cri-dockerd, ideally cri-dockerd
should perform the required checks whether it can support
a given Docker config / version running on a host.
During "upgrade node" and "upgrade apply" read the
kubelet env file from /var/lib/kubelet/kubeadm-flags.env
patch the --container-runtime-endpoint flag value to
have the appropriate URL scheme prefix (e.g. unix:// on Linux)
and write the file back to disk.
This is a temporary workaround that should be kept only for 1 release
cycle - i.e. remove this in 1.25.