The use of "Endpoint" vs "Endpoints" in these type names is tricky
because it doesn't always make sense to use the same singular/plural
convention as the corresonding service-related types, since often the
service-related type is referring to a single service while the
endpoint-related type is referring to multiple endpoint IPs.
The "endpointsInfo" types in the iptables and winkernel proxiers are
now "endpointInfo" because they describe a single endpoint IP (and
wrap proxy.BaseEndpointInfo).
"UpdateEndpointMapResult" is now "UpdateEndpointsMapResult", because
it is the result of EndpointsMap.Update (and it's clearly correct for
EndpointsMap to have plural "Endpoints" because it's a map to an array
of proxy.Endpoint objects.)
"EndpointChangeTracker" is now "EndpointsChangeTracker" because it
tracks changes to the full set of endpoints for a particular service
(and the new name matches the existing "endpointsChange" type and
"Proxier.endpointsChanges" fields.)
TestLoadBalancer and TestHealthCheckNodePort still had iptables rules
checks, but they also have sufficient runPacketFlowTests checks to
cover everything we care about.
(This leaves only TestOverallIPTablesRules and
TestSyncProxyRulesRepeated using assertIPTablesRulesEqual.)
For consistency with TestExternalTrafficPolicyLocal, test all of the
Cluster external traffic policy cases together here (ensuring that
masquerading happens where needed). Drop the assertIPTablesRulesEqual
test in favor of runPacketFlowTests.
Merge TestOnlyLocalExternalIPs, TestOnlyLocalLoadBalancing, and
TestOnlyLocalNodePorts together into TestExternalTrafficPolicyLocal.
Drop the assertIPTablesRulesEqual tests in favor of
runPacketFlowTests.
Remove TestOnlyLocalNodePortsNoClusterCIDR; the relevant bits of the
"no local detector" case are already fully covered by
TestInternalExternalMasquerade.
Previously we had TestNodePort, which tested basic NodePort behavior,
plus Test{Enable,Disable}LocalhostNodePorts{IPv4,IPv6} to test the
behavior of --localhost-nodeports under IPv4 and IPv6, plus
TestDisableLocalhostNodePortsIPv4WithNodeAddress to test
--nodeport-addresses.
Merge all of these together into TestNodePorts, and use
runPacketFlowTests to check the results rather than
assertIPTablesRulesEqual.
The packet tracer is not full-featured enough to be able to check the
"anti martian packet spoofing" rule, so we check the iptables dump for
that manually.
(This also fixes the --localhost-nodeport tests to use the same IP
ranges as most of the other tests now.)
Merge TestClusterIPReject, TestExternalIPsReject, TestNodePortReject,
and TestLoadBalancerReject into a single test.
Also remove the assertIPTablesRulesEqual tests because the packet flow
tests cover all of the details we care about here.
Create some ClusterIP services and use runPacketFlowTests to test
general functionality:
- normal connection
- hairpin connection
- multiple endpoints
- port != targetPort
- multiple protocols on same port
Remove the assertIPTablesRulesEqual test because the packet flow tests
cover all of the details we care about here.
Previously this was used to assert "something changed since the last
sync", but we already have packet flow tests in all of those cases now
to assert that the *specific* something we care about changed.
Rename TestOverallIPTablesRulesWithMultipleServices to just
TestOverallIPTablesRules, and add one rule type we weren't previously
testing (session affinity).
Conntrack invalid packets may cause unexpected and subtle bugs
on esblished connections, because of that we install by default an
iptables rules that drops the packets with this conntrack state.
However, there are network scenarios, specially those that use multihoming
nodes, that may have legit traffic that is detected by conntrack as
invalid, hence these iptables rules are causing problems dropping this
traffic.
An alternative to solve the spurious problems caused by the invalid
connectrack packets is to set the sysctl nf_conntrack_tcp_be_liberal
option, but this is a system wide setting and we don't want kube-proxy
to be opinionated about the whole node networking configuration.
Kube-proxy will only install the DROP rules for invalid conntrack states
if the nf_conntrack_tcp_be_liberal is not set.
Change-Id: I5eb326931ed915f5ae74d210f0a375842b6a790e
Instead of using two metrics use just one metrics with multiple labels,
since the labels can only get 2 values, 200 or 503 there is no risk of
carindality explosion and are simple to represent in graphs.
Change-Id: I0e9cbd6ec2051de44d277d673dc20f02b96aa4d1
When I first wrote TestInternalExternalMasquerade, I put "FIXME"
comments in all of the cases that seemed wrong to me, most of which
got removed as we fixed the corner cases. But there were two cases
where we decided that the implemented behavior, though odd, was
correct, and those FIXMEs never got removed.
All the code to deal with enabling/disabling the feature gate is gone,
but some of the tests were still specifying "this test case assumes
PTE is enabled".
Remove "EndpointSlice" from some unit test names, because they don't
need to clarify that they use EndpointSlices now, because all of the
tests use EndpointSlices now.
Likewise, remove TestEndpointSliceE2E entirely; it was originally an
EndpointSlice version of one of the other tests, but the other test
uses EndpointSlices now too.
TL;DR: we want to start failing the LB HC if a node is tainted with ToBeDeletedByClusterAutoscaler.
This field might need refinement, but currently is deemed our best way of understanding if
a node is about to get deleted. We want to do this only for eTP:Cluster services.
The goal is to connection draining terminating nodes
Historically, IptablesRulesTotal could have been intepreted as either
"the total number of iptables rules kube-proxy is responsible for" or
"the number of iptables rules kube-proxy rewrote on the last sync".
Post-MinimizeIPTablesRestore, these are very different things (and
IptablesRulesTotal unintentionally became the latter).
Fix IptablesRulesTotal (sync_proxy_rules_iptables_total) to be "the
total number of iptables rules kube-proxy is responsible for" and add
IptablesRulesLastSync (sync_proxy_rules_iptables_last) to be "the
number of iptables rules kube-proxy rewrote on the last sync".
This required fixing a small bug in the metric, where it had
previously been counting the "-X" lines that had been passed to
iptables-restore to delete stale chains, rather than only counting the
actual rules.
getLocalDetector() used to pass a utiliptables.Interface to
NewDetectLocalByCIDR() so that NewDetectLocalByCIDR() could verify
that the passed-in CIDR was of the same family as the iptables
interface. It would make more sense for getLocalDetector() to verify
this itself and just *not call NewDetectLocalByCIDR* if the families
don't match, and that's what the code does now. So there's no longer
any need to pass the utiliptables.Interface to the local detector.
These don't belong in pkg/proxy/util; they involve a completely
unrelated definition of proxying.
Since each is only used from one place, just inline them at the
callers.
- add new header "X-Load-Balancing-Endpoint-Weight" returned from service health. Value of the header is number of local endpoints. Header can be used in weighted load balancing. Parsing header for number of endpoints is faster than unmarshalling json from the content body.
- add missing unit test for new and old headers returned from service health
Since kube-proxy in LocalModeNodeCIDR needs to obtain the PodCIDR
assigned to the node it watches for the Node object.
However, kube-proxy startup process requires to have these watches in
different places, that opens the possibility of having a race condition
if the same node is recreated and a different PodCIDR is assigned.
Initializing the second watch with the value obtained in the first one
allows us to detect this situation.
Change-Id: I6adeedb6914ad2afd3e0694dcab619c2a66135f8
Signed-off-by: Antonio Ojea <aojea@google.com>
Rather than having GetNodeAddresses() return a special magic value
indicating that it matches all IPs, add a separate method to check
that. (And have GetNodeAddresses() just return the IPs as expected
instead.)
Both proxies handle IPv4 and IPv6 nodeport addresses separately, but
GetNodeAddresses went out of its way to make that difficult. Fix that.
This commit does not change any externally-visible semantics, but it
makes the existing weird semantics more obvious. Specifically, if you
say "--nodeport-addresses 10.0.0.0/8,192.168.0.0/16", then the
dual-stack proxy code would have split that into a list of IPv4 CIDRs
(["10.0.0.0/8", "192.168.0.0/16"]) to pass to the IPv4 proxier, and a
list of IPv6 CIDRs ([]) to pass to the IPv6 proxier, and then the IPv6
proxier would say "well since the list of nodeport addresses is empty,
I'll listen on all IPv6 addresses", which probably isn't what you
meant, but that's what it did.
Rather than duplicating some of the KubeProxyConfiguration into
ProxyServer, just store the KubeProxyConfiguration itself so later
code can reference it directly.
For the fields that get platform-specific defaults (Mode,
DetectLocalMode), fill the defaults directly into the
KubeProxyConfiguration rather than keeping the original there and the
defaulted version in the ProxyServer.
Validate the --detect-local-mode value in the API object validation
rather than doing it separately later. Also, remove runtime checks and
unit tests for cases that would be blocked by validation
This was making my eyes bleed as I read over code.
I used the following in vim. I made them up on the fly, but they seemed
to pass manual inspection.
:g/},\n\s*{$/s//}, {/
:w
:g/{$\n\s*{$/s//{{/
:w
:g/^\(\s*\)},\n\1},$/s//}},/
:w
:g/^\(\s*\)},$\n\1}$/s//}}/
:w
This touches cases where FromInt() is used on numeric constants, or
values which are already int32s, or int variables which are defined
close by and can be changed to int32s with little impact.
Signed-off-by: Stephen Kitt <skitt@redhat.com>
Rather than duplicating some of the KubeProxyConfiguration into
ProxyServer, just store the KubeProxyConfiguration itself so later
code can reference it directly.
For the fields that get platform-specific defaults (Mode,
DetectLocalMode), fill the defaults directly into the
KubeProxyConfiguration rather than keeping the original there and the
defaulted version in the ProxyServer.
Validate the --detect-local-mode value in the API object validation
rather than doing it separately later. Also, remove runtime checks and
unit tests for cases that would be blocked by validation
The winkernel proxy was originally created by copying+pasting from the
iptables code, but some iptables-specific things were never removed
(and one function got left behind after its functionality was moved
into the shared proxy code).
Now that the endpoint update fields have names that make it clear that
they only contain UDP objects, it's obvious that the "protocol == UDP"
checks in the iptables and ipvs proxiers were no-ops, so remove them.
Rather than calling fp.deleteEndpointConnection() directly, set up the
proxy to have syncProxyRules() call it, so that we are testing it in
the way that it actually gets called.
Squash the IPv4 and IPv6 unit tests together so we don't need to
duplicate all that code. Fix a tiny bug in NewFakeProxier() found
while doing this...
The APIs talked about "stale services" and "stale endpoints", but the
thing that is actually "stale" is the conntrack entries, not the
services/endpoints. Fix the names to indicate what they actual keep
track of.
Also, all three fields (2 in the endpoints update object and 1 in the
service update object) are currently UDP-specific, but only the
service one made that clear. Fix that too.
This commit did not actually work; in between when it was first
written and tested, and when it merged, the code in
pkg/proxy/endpoints.go was changed to only add UDP endpoints to the
"stale endpoints"/"stale services" lists, and so checking for "either
UDP or SCTP" rather than just UDP when processing those lists had no
effect.
This reverts most of commit aa8521df66
(but leaves the changes related to
ipvs.IsRsGracefulTerminationNeeded() since that actually did have the
effect it meant to have).
Annotation
As part of this change, kube-proxy accepts any value for either
annotation that is not "disabled".
Change-Id: Idfc26eb4cc97ff062649dc52ed29823a64fc59a4
Today, the health check response to the load balancers asking Kube-proxy for
the status of ETP:Local services does not include the healthz state of Kube-
proxy. This means that Kube-proxy might indicate to load balancers that they
should forward traffic to the node in question, simply because the endpoint
is running on the node - this overlooks the fact that Kube-proxy might be
not-healthy and hasn't successfully written the rules enabling traffic to
reach the endpoint.
For some reason we were calculating the available nodeport IPs at the
top of syncProxyRules even though we didn't use them until the end.
(Well, the previous code avoided generating KUBE-NODEPORTS chain rules
if there were no node IPs available, but that case is considered an
error anyway, so there's no need to optimize it.)
(Also fix a stale `err` reference exposed by this move.)