TL;DR: we want to start failing the LB HC if a node is tainted with ToBeDeletedByClusterAutoscaler.
This field might need refinement, but currently is deemed our best way of understanding if
a node is about to get deleted. We want to do this only for eTP:Cluster services.
The goal is to connection draining terminating nodes
Rather than having GetNodeAddresses() return a special magic value
indicating that it matches all IPs, add a separate method to check
that. (And have GetNodeAddresses() just return the IPs as expected
instead.)
Both proxies handle IPv4 and IPv6 nodeport addresses separately, but
GetNodeAddresses went out of its way to make that difficult. Fix that.
This commit does not change any externally-visible semantics, but it
makes the existing weird semantics more obvious. Specifically, if you
say "--nodeport-addresses 10.0.0.0/8,192.168.0.0/16", then the
dual-stack proxy code would have split that into a list of IPv4 CIDRs
(["10.0.0.0/8", "192.168.0.0/16"]) to pass to the IPv4 proxier, and a
list of IPv6 CIDRs ([]) to pass to the IPv6 proxier, and then the IPv6
proxier would say "well since the list of nodeport addresses is empty,
I'll listen on all IPv6 addresses", which probably isn't what you
meant, but that's what it did.
Now that the endpoint update fields have names that make it clear that
they only contain UDP objects, it's obvious that the "protocol == UDP"
checks in the iptables and ipvs proxiers were no-ops, so remove them.
The APIs talked about "stale services" and "stale endpoints", but the
thing that is actually "stale" is the conntrack entries, not the
services/endpoints. Fix the names to indicate what they actual keep
track of.
Also, all three fields (2 in the endpoints update object and 1 in the
service update object) are currently UDP-specific, but only the
service one made that clear. Fix that too.
This commit did not actually work; in between when it was first
written and tested, and when it merged, the code in
pkg/proxy/endpoints.go was changed to only add UDP endpoints to the
"stale endpoints"/"stale services" lists, and so checking for "either
UDP or SCTP" rather than just UDP when processing those lists had no
effect.
This reverts most of commit aa8521df66
(but leaves the changes related to
ipvs.IsRsGracefulTerminationNeeded() since that actually did have the
effect it meant to have).
Today, the health check response to the load balancers asking Kube-proxy for
the status of ETP:Local services does not include the healthz state of Kube-
proxy. This means that Kube-proxy might indicate to load balancers that they
should forward traffic to the node in question, simply because the endpoint
is running on the node - this overlooks the fact that Kube-proxy might be
not-healthy and hasn't successfully written the rules enabling traffic to
reach the endpoint.
With the flag, ipvs uses both source IP and source port (instead of
only source IP) to distribute new connections evently to endpoints
that avoids sending all connections from the same client (i.e. same
source IP) to one single endpoint.
User can explicitly set sessionAffinity in service spec to keep all
connections from a source IP to end up on the same endpoint if needed.
Change-Id: I42f950c0840ac06a4ee68a7bbdeab0fc5505c71f
In addition to actually updating their data from the provided list of
changes, EndpointsMap.Update() and ServicePortMap.Update() return a
struct with some information about things that changed because of that
update (eg services with stale conntrack entries).
For some reason, they were also returning information about
HealthCheckNodePorts, but they were returning *static* information
based on the current (post-Update) state of the map, not information
about what had *changed* in the update. Since this doesn't match how
the other data in the struct is used (and since there's no reason to
have the data only be returned when you call Update() anyway) , split
it out.
In the dual-stack case, iptables.NewDualStackProxier and
ipvs.NewDualStackProxier filtered the nodeport addresses values by IP
family before creating the single-stack proxiers. But in the
single-stack case, the kube-proxy startup code just passed the value
to the single-stack proxiers without validation, so they had to
re-check it themselves. Fix that.
Handle https://github.com/moby/ipvs/issues/27
A work-around was already in place, but a segv would occur
when the bug is fixed. That will not happen now.