TL;DR: we want to start failing the LB HC if a node is tainted with ToBeDeletedByClusterAutoscaler.
This field might need refinement, but currently is deemed our best way of understanding if
a node is about to get deleted. We want to do this only for eTP:Cluster services.
The goal is to connection draining terminating nodes
getLocalDetector() used to pass a utiliptables.Interface to
NewDetectLocalByCIDR() so that NewDetectLocalByCIDR() could verify
that the passed-in CIDR was of the same family as the iptables
interface. It would make more sense for getLocalDetector() to verify
this itself and just *not call NewDetectLocalByCIDR* if the families
don't match, and that's what the code does now. So there's no longer
any need to pass the utiliptables.Interface to the local detector.
Since the single-stack and dual-stack local-detector-getters now have
the same behavior in terms of error-checking and dual-stack config, we
can just replace the contents of getDualStackLocalDetectorTuple() with
a pair of calls to getLocalDetector().
1. When bringing up a single-stack kube-proxy in a dual-stack cluster,
allow using either the primary or secondary IP family.
2. Since the earlier config-checking code will already have bailed out
if the single-stack configuration is unusably broken, we don't need to
do that here. Instead, just return a no-op local detector if there are
no usable CIDRs of the expected IP family.
Rather than having this as part of createProxier(), explicitly figure
out what IP families the proxier can support beforehand, and bail out
if this conflicts with the detected IP family.
Since kube-proxy in LocalModeNodeCIDR needs to obtain the PodCIDR
assigned to the node it watches for the Node object.
However, kube-proxy startup process requires to have these watches in
different places, that opens the possibility of having a race condition
if the same node is recreated and a different PodCIDR is assigned.
Initializing the second watch with the value obtained in the first one
allows us to detect this situation.
Change-Id: I6adeedb6914ad2afd3e0694dcab619c2a66135f8
Signed-off-by: Antonio Ojea <aojea@google.com>
Move the Linux-specific conntrack setup code into a new
"platformSetup" rather than trying to fit it into the generic setup
code.
Also move metrics registration there.
Rather than duplicating some of the KubeProxyConfiguration into
ProxyServer, just store the KubeProxyConfiguration itself so later
code can reference it directly.
For the fields that get platform-specific defaults (Mode,
DetectLocalMode), fill the defaults directly into the
KubeProxyConfiguration rather than keeping the original there and the
defaulted version in the ProxyServer.
Validate the --detect-local-mode value in the API object validation
rather than doing it separately later. Also, remove runtime checks and
unit tests for cases that would be blocked by validation
Rather than duplicating some of the KubeProxyConfiguration into
ProxyServer, just store the KubeProxyConfiguration itself so later
code can reference it directly.
For the fields that get platform-specific defaults (Mode,
DetectLocalMode), fill the defaults directly into the
KubeProxyConfiguration rather than keeping the original there and the
defaulted version in the ProxyServer.
Validate the --detect-local-mode value in the API object validation
rather than doing it separately later. Also, remove runtime checks and
unit tests for cases that would be blocked by validation
The winkernel proxy was originally created by copying+pasting from the
iptables code, but some iptables-specific things were never removed
(and one function got left behind after its functionality was moved
into the shared proxy code).
In fact, this actually uses pkg/util/node's GetHostname() but takes
the unit tests from cmd/kubeadm/app/util's private fork of that
function since they were more extensive. (Of course the fact that
kubeadm had a private fork of this function is a strong argument for
moving it to component-helpers.)
When trying to bring up a cluster via kubeadm with these feature gates enabled,
kube-proxy fails because it didn't know about them:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
featureGates: {"DynamicResourceAllocation":true,"ContextualLogging":true}
runtimeConfig: {"resource.k8s.io/v1alpha1":"true"}
=>
2023-01-20T07:07:54.474966617Z stderr F E0120 07:07:54.474846 1 run.go:74] "command failed" err="failed complete: unrecognized feature gate: ContextualLogging"
The effect of the logging feature gates is minor for kube-proxy, supporting
them is mostly useful for the sake of consistency and to support kubeadm.
In the dual-stack case, iptables.NewDualStackProxier and
ipvs.NewDualStackProxier filtered the nodeport addresses values by IP
family before creating the single-stack proxiers. But in the
single-stack case, the kube-proxy startup code just passed the value
to the single-stack proxiers without validation, so they had to
re-check it themselves. Fix that.
Kube-proxy was checking that iptables supports both IPv4 and IPv6 and
falling back to single-stack if not. But it always fell back to the
primary IP family, regardless of which family iptables supported...
Fix it so that if the primary IP family isn't supported then it bails
out entirely.