ServicePortMap.Update() and EndpointsMap.Update() were just a tiny
wrappers around the corresponding apply() methods, which had no other
callers. So squash them together.
(Also fix the variable naming in ServicePortMap.Update() to match
other methods.)
safeIpset was a wrapper for thread-safely sharing an ipset.IPSet, but
this was unnecessary because ipset.IPSet is just a wrapper around exec
anyway and doesn't need any locking.
In the original version of "MinimizeIPTablesRestore", we skipped the
bottom half of the sync loop when we weren't re-syncing a service, so
certain things that couldn't be skipped had to be done in the top
half. But the code was later changed to always run through the whole
loop body (just not necessarily writing out rules in the bottom half),
so we can reorganize things now to put some related bits of code back
together.
(In particular, this also resolves the fact that we were accidentally
adding the endpoint chains to activeNATChains twice.)
Also change activeNATChains to be a proper generic Set type.
kubemark's proxy mode exists to test how kube-proxy affects the load
on the apiserver, not how it affects the load on the node. There's no
need to generate fake iptables commands, because that all happens
entirely independently of the api watchers.
And update most of the comments to refer to "nftables" rather than
"iptables" (even though it doesn't actually do any nftables updating
at this point).
For now the proxy also internally creates a
utiliptablestesting.FakeIPTables to keep the existing sync code
compiling.
(It is confusing, but allowed, to have distinct "KUBE-SERVICES" chains
in "nat" and "filter" in iptables, but in nftables the "type nat" and
"type filter" chains end up in the same table, so we'll need different
names for the two.)
Change the svcPortInfo and endpointInfo fields to string rather than
utiliptables.Chain, and various fixups from there.
Also use a proper set for activeNATChains, and fix the capitalization
of endpointInfo.chainName.
* Use k8s.io/utils/ptr in pkg/proxy
* Replace pointer.String(), pointer.StringPtr(), and pointer.Bool() with ptr.To()
* Replace pointer.Int32(constexpr) with ptr.To[int32](constexpr)
* Replace pointer.Int32(int32(var)) with ptr.To(int32(var))
* Replace remaining pointer.Int32() cases with ptr.To
* Replace 'tcpProtocol := v1.ProtocolTCP; ... &tcpProtocol', etc with ptr.To(v1.ProtocolTCP)
* Replace 'nodeName = testHostname; ... &nodeName' with ptr.To(testHostname)
* Use ptr.To for SessionAffinityConfig.ClientIP.TimeoutSeconds
* Use ptr.To for InternalTrafficPolicy
* Use ptr.To for LoadBalancer.Ingress.IPMode
BaseEndpointInfo's fields, unlike BaseServicePortInfo's, were all
exported, which then required adding "Get" before some of the function
names in Endpoint so they wouldn't conflict.
Fix that, now that the iptables and ipvs unit tests don't need to be
able to construct BaseEndpointInfos by hand.
Remove NodeName, which was unused because we only care about IsLocal
which was tracked separately.
Remove Zone, which was unused because it's from the old topology
system?
Fix up some comments which still referred to Endpoints vs
EndpointSlice differences.
Also remove an unhelpful helper function in endpoints_test.go
The tests in pkg/proxy already test that EndpointSlice ->
BaseEndpointInfo conversion works correctly; all we need to test in
pkg/proxy/ipvs and pkg/proxy/iptables is that the correct set of
endpoints get picked out where we expect them to, which doesn't
require us to compare the complete BaseEndpointInfo objects.
A new --init-only flag is added tha makes kube-proxy perform
configuration that requires privileged mode and exit. It is
intended to be executed in a privileged initContainer, while
the main container may run with a stricter securityContext
The use of "Endpoint" vs "Endpoints" in these type names is tricky
because it doesn't always make sense to use the same singular/plural
convention as the corresonding service-related types, since often the
service-related type is referring to a single service while the
endpoint-related type is referring to multiple endpoint IPs.
The "endpointsInfo" types in the iptables and winkernel proxiers are
now "endpointInfo" because they describe a single endpoint IP (and
wrap proxy.BaseEndpointInfo).
"UpdateEndpointMapResult" is now "UpdateEndpointsMapResult", because
it is the result of EndpointsMap.Update (and it's clearly correct for
EndpointsMap to have plural "Endpoints" because it's a map to an array
of proxy.Endpoint objects.)
"EndpointChangeTracker" is now "EndpointsChangeTracker" because it
tracks changes to the full set of endpoints for a particular service
(and the new name matches the existing "endpointsChange" type and
"Proxier.endpointsChanges" fields.)
TestLoadBalancer and TestHealthCheckNodePort still had iptables rules
checks, but they also have sufficient runPacketFlowTests checks to
cover everything we care about.
(This leaves only TestOverallIPTablesRules and
TestSyncProxyRulesRepeated using assertIPTablesRulesEqual.)
For consistency with TestExternalTrafficPolicyLocal, test all of the
Cluster external traffic policy cases together here (ensuring that
masquerading happens where needed). Drop the assertIPTablesRulesEqual
test in favor of runPacketFlowTests.
Merge TestOnlyLocalExternalIPs, TestOnlyLocalLoadBalancing, and
TestOnlyLocalNodePorts together into TestExternalTrafficPolicyLocal.
Drop the assertIPTablesRulesEqual tests in favor of
runPacketFlowTests.
Remove TestOnlyLocalNodePortsNoClusterCIDR; the relevant bits of the
"no local detector" case are already fully covered by
TestInternalExternalMasquerade.
Previously we had TestNodePort, which tested basic NodePort behavior,
plus Test{Enable,Disable}LocalhostNodePorts{IPv4,IPv6} to test the
behavior of --localhost-nodeports under IPv4 and IPv6, plus
TestDisableLocalhostNodePortsIPv4WithNodeAddress to test
--nodeport-addresses.
Merge all of these together into TestNodePorts, and use
runPacketFlowTests to check the results rather than
assertIPTablesRulesEqual.
The packet tracer is not full-featured enough to be able to check the
"anti martian packet spoofing" rule, so we check the iptables dump for
that manually.
(This also fixes the --localhost-nodeport tests to use the same IP
ranges as most of the other tests now.)
Merge TestClusterIPReject, TestExternalIPsReject, TestNodePortReject,
and TestLoadBalancerReject into a single test.
Also remove the assertIPTablesRulesEqual tests because the packet flow
tests cover all of the details we care about here.
Create some ClusterIP services and use runPacketFlowTests to test
general functionality:
- normal connection
- hairpin connection
- multiple endpoints
- port != targetPort
- multiple protocols on same port
Remove the assertIPTablesRulesEqual test because the packet flow tests
cover all of the details we care about here.
Previously this was used to assert "something changed since the last
sync", but we already have packet flow tests in all of those cases now
to assert that the *specific* something we care about changed.
Rename TestOverallIPTablesRulesWithMultipleServices to just
TestOverallIPTablesRules, and add one rule type we weren't previously
testing (session affinity).
Conntrack invalid packets may cause unexpected and subtle bugs
on esblished connections, because of that we install by default an
iptables rules that drops the packets with this conntrack state.
However, there are network scenarios, specially those that use multihoming
nodes, that may have legit traffic that is detected by conntrack as
invalid, hence these iptables rules are causing problems dropping this
traffic.
An alternative to solve the spurious problems caused by the invalid
connectrack packets is to set the sysctl nf_conntrack_tcp_be_liberal
option, but this is a system wide setting and we don't want kube-proxy
to be opinionated about the whole node networking configuration.
Kube-proxy will only install the DROP rules for invalid conntrack states
if the nf_conntrack_tcp_be_liberal is not set.
Change-Id: I5eb326931ed915f5ae74d210f0a375842b6a790e