Commit Graph

2083 Commits

Author SHA1 Message Date
Dan Winship
5ca73197b3 Document the nftables kube-proxy packet flow 2024-01-11 12:59:21 -05:00
Kubernetes Prow Robot
95a159299b
Merge pull request #122614 from tnqn/nftables-firewall
kube-proxy: fix LoadBalancerSourceRanges not working for nftables mode
2024-01-09 22:27:16 +01:00
Quan Tian
f21f8d9984 kube-proxy: fix LoadBalancerSourceRanges not working for nftables mode
Previously, the firewall-check chain was run in input, forward, and
output hook but not prerouting hook. When the LoadBalancer traffic
arrived at input or forward hook, it had been DNATed to endpoint IP and
port, so the firewall-check chain didn't take effect, traffic from out
of LoadBalancerSourceRanges was not dropped.

It was not detected by unit test because the chains were sorted by
priority only, while hook should be taken into consideration.

The commit links the firewall-check chain to prerouting hook and unlinks
it from input and forward hook to ensure the traffic is filtered before
DNAT. The priorities of filter chains are updated from "DNATPriority-1"
to "DNATPriority-10" to allow third parties to insert something else
between them.

Signed-off-by: Quan Tian <qtian@vmware.com>
2024-01-09 17:34:16 +08:00
Lars Ekman
50b3ffc71f kube-proxy: LoadBalancerSourceRanges as *net.IPNet 2024-01-09 09:17:56 +01:00
Lars Ekman
9eac24c656 kube-proxy: store ExternalIPs as net.IP
They were stored as strings which could be non-canonical
and cause problems
2024-01-09 09:17:50 +01:00
Lars Ekman
d2294007b0 kube-proxy: store LoadBalancerVIPs as net.IP
They were stored as strings which could be non-canonical
and cause problems
2024-01-09 09:17:43 +01:00
Lars Ekman
564b80b1e1 kube-proxy: don't use invalid cidrs in unit test
CIDRs like 192.168.200.3/24 and fd00:20::1/64 replaced with
192.168.200.0/24 and fd00:20::/64
2024-01-09 09:17:31 +01:00
Kubernetes Prow Robot
2cf7465755
Merge pull request #122605 from tnqn/stale-chain-cleanup
kube-proxy: do not delete previously stale but currently active chains
2024-01-08 17:30:53 +01:00
Kubernetes Prow Robot
f538feed8c
Merge pull request #122296 from tnqn/nftables-kernel-requirement
kube-proxy: change implementation of LoadBalancerSourceRanges for wider kernel support
2024-01-08 17:30:27 +01:00
Quan Tian
377f521038 kube-proxy: change implementation of LoadBalancerSourceRanges for wider kernel support
The nftables implementation made use of concatenation of ranges when
creating the set "firewall-allow", but the support was not available
before kernel 5.6. Therefore, nftables mode couldn't run on earlier
kernels, while 5.4 is still widely used.

An alternative of concatenation of ranges is to create a separate
firewall chain for every service port that needs firewalling, and jump
to the service's firewall chain from the common firewall chain via a
rule with vmap.

Renaming from "firewall" to "firewall-ips" is required when changing the
set to the map to support existing clusters to upgrade, otherwise it
would fail to create the map. Besides, "firewall-ips" corresponds to the
"service-ips" map, later we can add use "firewall-nodeports" if it's
determined that NodePort traffic should be subject to
LoadBalancerSourceRanges.

Signed-off-by: Quan Tian <qtian@vmware.com>
2024-01-08 19:26:38 +08:00
Quan Tian
ca8c27c480 kube-proxy: do not delete previously stale but currently active chains
In some cases a chain could change from stale to active, but once it's
added to staleChains it would always be deleted once. When the proxier
tries to delete a previously stale but currently active chain, it would
fail and lead to errors, though it won't cause real problem thanks to
kernel's validation.

The commit removes a chain from staleChains if it becomes active.

Signed-off-by: Quan Tian <qtian@vmware.com>
2024-01-08 17:53:52 +08:00
Kubernetes Prow Robot
c0dc42073d
Merge pull request #122373 from danwinship/linux-proxy
Properly build-tag the Linux kube-proxy backend code
2024-01-04 18:00:34 +01:00
Dan Winship
1c089afcf3 Fix a set type 2023-12-21 09:44:08 -05:00
Dan Winship
147114e648 Update some change tracker doc comments
In particular, fix the description of ServiceChangeTracker.Update's
return value, and point out that it's different from
EndpointsChangeTracker.EndpointSliceUpdate's (though fortunately, in a
way that doesn't matter for any existing code).
2023-12-21 09:44:08 -05:00
Dan Winship
a8a12be3d3 Rename cache's endpointSliceInfo/endpointInfo to endpointSliceData/endpointData
EndpointSliceCache was using the name "endpointInfo" to refer to two
different data types (most egregiously in addEndpoints(), which had a
variable named `endpoint` of type `*endpointInfo` and a variable named
`endpointInfo` of type `Endpoint`).

Continue using "endpointInfo" in places that refer to proxy.Endpoint /
BaseEndpointInfo, since that's consistent with other code, but rename
the local "cache of the Endpoints field of an EndpointSlice" type from
"endpointInfo" to "endpointData". Likewise, rename endpointSliceInfo
to endpointSliceData, for consistency.
2023-12-21 09:44:08 -05:00
Dan Winship
764cb0457f Move some code around in servicechangetracker.go/endpointschangetracker.go
Put the ServiceChangeTracker and EndpointsChangeTracker definitions at
the top of the files, and put the ServicePortMap and EndpointsMap
definitions before their methods.

(No code changes.)
2023-12-21 09:44:04 -05:00
Dan Winship
a73b275031 Split ServicePort/Endpoint from ServiceChangeTracker/EndpointsChangeTracker
Move the ServicePort/BaseServicePortInfo types to serviceport.go.
Move the Endpoint/BaseEndpointInfo types to endpoint.go.

To avoid confusion with the new filenames, rename service.go to
servicechangetracker.go and endpoints.go to endpointschangetracker.go.

(No code changes; this just moves some code from types.go and
services.go to serviceport.go, and some code from types.go and
endpoints.go to endpoint.go.)
2023-12-21 09:38:25 -05:00
Dan Winship
ede0dc1d07 Make newBaseServiceInfo a function rather than a method
(in preparation for moving it)
2023-12-21 09:38:25 -05:00
Dan Winship
0779042a6f Remove a useless "_" assignment to appease the linter
(This would become an error rather than a warning once we try to move
this code to another file.)

Also rename an "ok" variable to "exists" since that what it really
means.
2023-12-21 09:38:24 -05:00
Dan Winship
452fcc5fd6 Remove some dead code in service.go
ServicePortMap.merge had a giant comment explaining its return value,
but nothing ever used that return value.

ServicePort had an InternalTrafficPolicy() method, but nothing used it
(because it was redundant with InternalPolicyLocal().)
2023-12-21 09:38:24 -05:00
Dan Winship
626f349fef Drop PendingChanges methods from change trackers, move into UpdateResults
This fixes a race condition where the tracker could be updated in
between us calling .PendingChanges() and .Update().
2023-12-19 18:27:33 -05:00
Dan Winship
5d0656b1f6 Squash some unnecessarily-chained methods in the change trackers
ServicePortMap.Update() and EndpointsMap.Update() were just a tiny
wrappers around the corresponding apply() methods, which had no other
callers. So squash them together.

(Also fix the variable naming in ServicePortMap.Update() to match
other methods.)
2023-12-19 18:27:33 -05:00
Dan Winship
c1ce1e00ee Properly build-tag the Linux kube-proxy backend code
This had to be able to build on OS X before to make verify-typecheck
pass, but now that that's fixed we can tag the code properly as being
linux-only.
2023-12-18 20:20:51 -05:00
Dan Winship
b69510b069 Remove an unnecessary abstraction
safeIpset was a wrapper for thread-safely sharing an ipset.IPSet, but
this was unnecessary because ipset.IPSet is just a wrapper around exec
anyway and doesn't need any locking.
2023-12-18 19:58:47 -05:00
Kubernetes Prow Robot
8a9e0d936a
Merge pull request #121919 from uablrek/etp-local-externalips
kube-proxy: Fix etp:Local for externalIPs
2023-12-14 08:50:04 +01:00
Kubernetes Prow Robot
b54e719509
Merge pull request #122111 from danwinship/proxy-chain-creation-cleanup
proxy chain creation cleanup
2023-12-14 06:17:40 +01:00
Kubernetes Prow Robot
60cde601a8
Merge pull request #121814 from danwinship/kubemark-iptables
Remove --use-real-proxier support from kubemark
2023-12-13 23:55:01 +01:00
Kubernetes Prow Robot
de6a37bbad
Merge pull request #121744 from uablrek/ipvs-remove-sorting
Remove unnecessary sort in kube-proxy ipvs
2023-12-13 22:35:54 +01:00
Dan Winship
8acf185791 Use a generic Set for utiliptables.GetChainsFromTable 2023-11-29 11:12:27 -05:00
Dan Winship
7cedc3d741 Simplify creation/tracking of chains
In the original version of "MinimizeIPTablesRestore", we skipped the
bottom half of the sync loop when we weren't re-syncing a service, so
certain things that couldn't be skipped had to be done in the top
half. But the code was later changed to always run through the whole
loop body (just not necessarily writing out rules in the bottom half),
so we can reorganize things now to put some related bits of code back
together.

(In particular, this also resolves the fact that we were accidentally
adding the endpoint chains to activeNATChains twice.)

Also change activeNATChains to be a proper generic Set type.
2023-11-29 11:12:20 -05:00
Lars Ekman
19da26005b kube-proxy: Fix etp:Local for externalIPs
The problem was introduced by PR #108460
2023-11-16 09:15:13 +01:00
Dan Winship
2017fb2ec5 Fix "go test -count=2 ./pkg/proxy/iptables"
If you run the tests multiple times, the "partial restore failures"
metric didn't get reset in between.
2023-11-11 08:41:53 -05:00
Dan Winship
ae3235aa01 Remove --use-real-proxier support from kubemark
kubemark's proxy mode exists to test how kube-proxy affects the load
on the apiserver, not how it affects the load on the node. There's no
need to generate fake iptables commands, because that all happens
entirely independently of the api watchers.
2023-11-09 06:52:10 -05:00
Lars Ekman
d78a794be2 Remove unnecessary sort in kube-proxy ipvs
Sorting of endpoints before adding them to ipvs is not
needed, nor wanted. It just takes time
2023-11-06 14:57:18 +01:00
Dan Winship
0993bb78ef Redo service dispatch with maps 2023-10-31 17:54:53 -04:00
Dan Winship
9d71513ac1 Redo no-endpoint handling with maps 2023-10-31 17:54:53 -04:00
Dan Winship
4128631d0f Redo LoadBalancerSourceRanges firewall using sets 2023-10-31 17:54:53 -04:00
Dan Winship
edaa1d735b Redo --nodeport-addresses handling with a set 2023-10-31 17:54:53 -04:00
Dan Winship
ef1347b06d Port NAT rules to nftables (and backend is now functional) 2023-10-31 17:54:51 -04:00
Dan Winship
0c5c620b4f Port filter rules to nftables 2023-10-31 17:40:45 -04:00
Dan Winship
6cff415305 Port service/endpoint chain creation/cleanup to nftables 2023-10-31 17:40:45 -04:00
Dan Winship
2735ad541e Port table setup/cleanup code to nftables 2023-10-31 17:40:30 -04:00
Dan Winship
bcced184c5 Replace "iptables-restore" sync in nftables/proxier.go with (trivial) "nft -f -" sync 2023-10-31 17:38:32 -04:00
Dan Winship
93860a5217 Distinguish iptables-based and nftables-based backends, do startup cleanup
When switching from iptables or ipvs to nftables, clean up old
iptables/ipvs rules. When switching the other way, clean up old
nftables rules.
2023-10-31 17:38:32 -04:00
Dan Winship
abb1a458a9 Create an nftables.Interface in nftables proxier
And update most of the comments to refer to "nftables" rather than
"iptables" (even though it doesn't actually do any nftables updating
at this point).

For now the proxy also internally creates a
utiliptablestesting.FakeIPTables to keep the existing sync code
compiling.
2023-10-31 17:38:29 -04:00
Dan Winship
1a530457f9 Drop unit tests of iptables-specific unit test helpers
(We'll eventually have nftables versions.)
2023-10-31 17:33:53 -04:00
Dan Winship
958e80ca3b Clarify nftables/proxier.go by distinguishing nat/filter table KUBE-SERVICES chains
(It is confusing, but allowed, to have distinct "KUBE-SERVICES" chains
in "nat" and "filter" in iptables, but in nftables the "type nat" and
"type filter" chains end up in the same table, so we'll need different
names for the two.)
2023-10-31 17:33:53 -04:00
Dan Winship
3abdda9800 Simplify nftables/proxier.go by using string rather than utiliptables.Chain
Change the svcPortInfo and endpointInfo fields to string rather than
utiliptables.Chain, and various fixups from there.

Also use a proper set for activeNATChains, and fix the capitalization
of endpointInfo.chainName.
2023-10-31 17:33:53 -04:00
Dan Winship
96e53f64f4 Simplify nftables/proxier.go by removing the "args" reuse
since that will be done differently in nftables
2023-10-31 17:33:53 -04:00
Dan Winship
6535ac1e61 Simplify nftables/proxier.go by removing Monitor stuff
since it shouldn't be necessary
2023-10-31 17:33:53 -04:00