getLocalDetector() used to pass a utiliptables.Interface to
NewDetectLocalByCIDR() so that NewDetectLocalByCIDR() could verify
that the passed-in CIDR was of the same family as the iptables
interface. It would make more sense for getLocalDetector() to verify
this itself and just *not call NewDetectLocalByCIDR* if the families
don't match, and that's what the code does now. So there's no longer
any need to pass the utiliptables.Interface to the local detector.
Since the single-stack and dual-stack local-detector-getters now have
the same behavior in terms of error-checking and dual-stack config, we
can just replace the contents of getDualStackLocalDetectorTuple() with
a pair of calls to getLocalDetector().
1. When bringing up a single-stack kube-proxy in a dual-stack cluster,
allow using either the primary or secondary IP family.
2. Since the earlier config-checking code will already have bailed out
if the single-stack configuration is unusably broken, we don't need to
do that here. Instead, just return a no-op local detector if there are
no usable CIDRs of the expected IP family.
Rather than having this as part of createProxier(), explicitly figure
out what IP families the proxier can support beforehand, and bail out
if this conflicts with the detected IP family.
Since kube-proxy in LocalModeNodeCIDR needs to obtain the PodCIDR
assigned to the node it watches for the Node object.
However, kube-proxy startup process requires to have these watches in
different places, that opens the possibility of having a race condition
if the same node is recreated and a different PodCIDR is assigned.
Initializing the second watch with the value obtained in the first one
allows us to detect this situation.
Change-Id: I6adeedb6914ad2afd3e0694dcab619c2a66135f8
Signed-off-by: Antonio Ojea <aojea@google.com>
Move the Linux-specific conntrack setup code into a new
"platformSetup" rather than trying to fit it into the generic setup
code.
Also move metrics registration there.
Rather than duplicating some of the KubeProxyConfiguration into
ProxyServer, just store the KubeProxyConfiguration itself so later
code can reference it directly.
For the fields that get platform-specific defaults (Mode,
DetectLocalMode), fill the defaults directly into the
KubeProxyConfiguration rather than keeping the original there and the
defaulted version in the ProxyServer.
Validate the --detect-local-mode value in the API object validation
rather than doing it separately later. Also, remove runtime checks and
unit tests for cases that would be blocked by validation
Rather than duplicating some of the KubeProxyConfiguration into
ProxyServer, just store the KubeProxyConfiguration itself so later
code can reference it directly.
For the fields that get platform-specific defaults (Mode,
DetectLocalMode), fill the defaults directly into the
KubeProxyConfiguration rather than keeping the original there and the
defaulted version in the ProxyServer.
Validate the --detect-local-mode value in the API object validation
rather than doing it separately later. Also, remove runtime checks and
unit tests for cases that would be blocked by validation
In fact, this actually uses pkg/util/node's GetHostname() but takes
the unit tests from cmd/kubeadm/app/util's private fork of that
function since they were more extensive. (Of course the fact that
kubeadm had a private fork of this function is a strong argument for
moving it to component-helpers.)
In the dual-stack case, iptables.NewDualStackProxier and
ipvs.NewDualStackProxier filtered the nodeport addresses values by IP
family before creating the single-stack proxiers. But in the
single-stack case, the kube-proxy startup code just passed the value
to the single-stack proxiers without validation, so they had to
re-check it themselves. Fix that.
Kube-proxy was checking that iptables supports both IPv4 and IPv6 and
falling back to single-stack if not. But it always fell back to the
primary IP family, regardless of which family iptables supported...
Fix it so that if the primary IP family isn't supported then it bails
out entirely.
Kube/proxy, in NodeCIDR local detector mode, uses the node.Spec.PodCIDRs
field to build the Services iptables rules.
The Node object depends on the kubelet, but if kube-proxy runs as a
static pods or as a standalone binary, it is not possible to guarantee
that the values obtained at bootsrap are valid, causing traffic outages.
Kube-proxy has to react on node changes to avoid this problems, it
simply restarts if detect that the node PodCIDRs have changed.
In case that the Node has been deleted, kube-proxy will only log an
error and keep working, since it may break graceful shutdowns of the
node.
If the user passes "--proxy-mode ipvs", and it is not possible to use
IPVS, then error out rather than falling back to iptables.
There was never any good reason to be doing fallback; this was
presumably erroneously added to parallel the iptables-to-userspace
fallback (which only existed because we had wanted iptables to be the
default but not all systems could support it).
In particular, if the user passed configuration options for ipvs, then
they presumably *didn't* pass configuration options for iptables, and
so even if the iptables proxy is able to run, it is likely to be
misconfigured.
Back when iptables was first made the default, there were
theoretically some users who wouldn't have been able to support it due
to having an old /sbin/iptables. But kube-proxy no longer does the
things that didn't work with old iptables, and we removed that check a
long time ago. There is also a check for a new-enough kernel version,
but it's checking for a feature which was added in kernel 3.6, and no
one could possibly be running Kubernetes with a kernel that old. So
the fallback code now never actually falls back, so it should just be
removed.
This was implemented partly in server.go and partly in
server_others.go even though even the parts in server.go were totally
linux-specific. Simplify things by putting it all in server_others.go
and get rid of some unnecessary abstraction.
This commit adds the framework for the new local detection
modes BridgeInterface and InterfaceNamePrefix to work.
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>