Rather than having GetNodeAddresses() return a special magic value
indicating that it matches all IPs, add a separate method to check
that. (And have GetNodeAddresses() just return the IPs as expected
instead.)
Both proxies handle IPv4 and IPv6 nodeport addresses separately, but
GetNodeAddresses went out of its way to make that difficult. Fix that.
This commit does not change any externally-visible semantics, but it
makes the existing weird semantics more obvious. Specifically, if you
say "--nodeport-addresses 10.0.0.0/8,192.168.0.0/16", then the
dual-stack proxy code would have split that into a list of IPv4 CIDRs
(["10.0.0.0/8", "192.168.0.0/16"]) to pass to the IPv4 proxier, and a
list of IPv6 CIDRs ([]) to pass to the IPv6 proxier, and then the IPv6
proxier would say "well since the list of nodeport addresses is empty,
I'll listen on all IPv6 addresses", which probably isn't what you
meant, but that's what it did.
ContainsIPv4Loopback() claimed that "::/0" contains IPv4 loopback IPs
(on the theory that listening on "::/0" will listen on "0.0.0.0/0" as
well and thus include IPv4 loopback). But its sole caller (the
iptables proxier) doesn't use listen() to accept connections, so this
theory was completely mistaken; if you passed, eg,
`--nodeport-addresses 192.168.0.0/0,::/0`, then it would not create
any rule that accepted nodeport connections on 127.0.0.1, but it would
nonetheless end up setting route_localnet=1 because
ContainsIPv4Loopback() claimed it needed to. Fix this.
Add names to the tests and use t.Run() (rather than having them just
be numbered, with number 9 mistakenly being used twice thus throwing
off all the later numbers...)
Remove unnecessary FakeNetwork element from the testCases struct since
it's always the same. Remove the expectedErr value since a non-nil
error is expected if and only if the returned set is nil, and there's
no reason to test the exact text of the error message.
Fix weird IPv6 subnet sizes.
Change the dual-stack tests to (a) actually have dual-stack interface
addrs, and (b) use a routable IPv6 address, not just localhost (given
that we never actually want to use IPv6 localhost for nodeports).
This PR introduces two new modes for detecting
local traffic in a cluster.
1) detectLocalByBridgeInterface: This takes a bridge name
as argument and decides all traffic that match on their
originating interface being that of this bridge, shall be
considered as local pod traffic.
2) detectLocalByInterfaceNamePrefix: This takes an interface prefix
name as argument and decides all traffic that match on their
originating interface names having a prefix that matches this
argument shall be considered as local pod traffic.
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Now that we don't have to always append all of the iptables args into
a single array, there's no reason to have LocalTrafficDetector take in
a set of args to prepend to its own output, and also not much point in
having it write out the "-j CHAIN" by itself either.
kube-proxy sets the sysctl net.ipv4.conf.all.route_localnet=1
so NodePort services can be accessed on the loopback addresses in
IPv4, but this may present security issues.
Leverage the --nodeport-addresses flag to opt-out of this feature,
if the list is not empty and none of the IP ranges contains an IPv4
loopback address this sysctl is not set.
In addition, add a warning to inform users about this behavior.
If you pass just an IP address to "-s" or "-d", the iptables command
will fill in the correct mask automatically.
Originally, the proxier was just hardcoding "/32" for all of these,
which was unnecessary but simple. But when IPv6 support was added, the
code was made more complicated to deal with the fact that the "/32"
needed to be "/128" in the IPv6 case, so it would parse the IPs to
figure out which family they were, which in turn involved adding some
checks in case the parsing fails (even though that "can't happen" and
the old code didn't check for invalid IPs, even though that would
break the iptables-restore if there had been any).
Anyway, all of that is unnecessary because we can just pass the IP
strings to iptables directly rather than parsing and unparsing them
first.
(The diff to proxier_test.go is just deleting "/32" everywhere.)
* Fix regression in kube-proxy
Don't use a prepend() - that allocates. Instead, make Write() take
either strings or slices (I wish we could express that better).
* WIP: switch to intf
* WIP: less appends
* tests and ipvs
* Migrate to Structured Logs in `pkg/proxy/util`
* Minor fixes
* change key to cidr and remove namespace arg
* Update key from cidr to CIDR
Co-authored-by: JUN YANG <69306452+yangjunmyfm192085@users.noreply.github.com>
* Update key cidr to CIDR
Co-authored-by: JUN YANG <69306452+yangjunmyfm192085@users.noreply.github.com>
* Update key ip to IP
Co-authored-by: JUN YANG <69306452+yangjunmyfm192085@users.noreply.github.com>
* Update key ip to IP
Co-authored-by: JUN YANG <69306452+yangjunmyfm192085@users.noreply.github.com>
* Interchange svcNamespace and svcName
* Change first letter of all messages to capital
* Change key names in endpoints.go
* Change all keynames to lower bumby caps convention
Co-authored-by: JUN YANG <69306452+yangjunmyfm192085@users.noreply.github.com>
The nat KUBE-SERVICES chain is called from OUTPUT and PREROUTING stages. In
clusters with large number of services, the nat-KUBE-SERVICES chain is the largest
chain with for eg: 33k rules. This patch aims to move the KubeMarkMasq rules from
the kubeServicesChain into the respective KUBE-SVC-* chains. This way during each
packet-rule matching we won't have to traverse the MASQ rules of all services which
get accumulated in the KUBE-SERVICES and/or KUBE-NODEPORTS chains. Since the
jump to KUBE-MARK-MASQ ultimately sets the 0x400 mark for nodeIP SNAT, it should not
matter whether the jump is made from KUBE-SERVICES or KUBE-SVC-* chains.
Specifically we change:
1) For ClusterIP svc, we move the KUBE-MARK-MASQ jump rule from KUBE-SERVICES
chain into KUBE-SVC-* chain.
2) For ExternalIP svc, we move the KUBE-MARK-MASQ jump rule in the case of
non-ServiceExternalTrafficPolicyTypeLocal from KUBE-SERVICES
chain into KUBE-SVC-* chain.
3) For NodePorts svc, we move the KUBE-MARK-MASQ jump rule in case of
non-ServiceExternalTrafficPolicyTypeLocal from KUBE-NODEPORTS chain to
KUBE-SVC-* chain.
4) For load-balancer svc, we don't change anything since it is already svc specific
due to creation of KUBE-FW-* chains per svc.
This would cut the rules per svc in KUBE-SERVICES and KUBE-NODEPORTS in half.