Commit Graph

511 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
ff90c1cc73
Merge pull request #119374 from danwinship/kep-3178-ga
move KEP-3178 IPTablesOwnershipCleanup to GA
2023-07-17 15:53:47 -07:00
Dan Winship
d486736dd3 Remove IPTablesOwnershipCleanup checks and dead code 2023-07-17 16:51:47 -04:00
Aohan Yang
7eab0d7a0d Proxy changes for IP mode field 2023-07-17 16:02:36 +08:00
Kubernetes Prow Robot
f34365789d
Merge pull request #116470 from alexanderConstantinescu/kep-3836-impl
[Kube-proxy]: Implement KEP-3836
2023-07-15 05:43:04 -07:00
Dan Winship
883d0c3b71 Add a dummy implementation of proxyutil.LineBuffer
Rather than actually assembling all of the rules we aren't going to
use, just count them and throw them away.
2023-07-14 08:38:25 -04:00
Alexander Constantinescu
9b1c4c7b57 Implement KEP-3836
TL;DR: we want to start failing the LB HC if a node is tainted with ToBeDeletedByClusterAutoscaler.
This field might need refinement, but currently is deemed our best way of understanding if
a node is about to get deleted. We want to do this only for eTP:Cluster services.

The goal is to connection draining terminating nodes
2023-07-10 10:30:54 +02:00
Dan Winship
68ed020b2a Split IptablesRulesTotal metric into two different metrics
Historically, IptablesRulesTotal could have been intepreted as either
"the total number of iptables rules kube-proxy is responsible for" or
"the number of iptables rules kube-proxy rewrote on the last sync".
Post-MinimizeIPTablesRestore, these are very different things (and
IptablesRulesTotal unintentionally became the latter).

Fix IptablesRulesTotal (sync_proxy_rules_iptables_total) to be "the
total number of iptables rules kube-proxy is responsible for" and add
IptablesRulesLastSync (sync_proxy_rules_iptables_last) to be "the
number of iptables rules kube-proxy rewrote on the last sync".
2023-07-07 09:04:04 -04:00
Dan Winship
02c59710ea Test the IptablesRulesTotal metric in TestSyncProxyRulesRepeated
This required fixing a small bug in the metric, where it had
previously been counting the "-X" lines that had been passed to
iptables-restore to delete stale chains, rather than only counting the
actual rules.
2023-07-06 15:48:48 -04:00
Dan Winship
4962e6eacb Squash detectNodeIP and nodeIPTuple together 2023-06-06 20:48:00 -04:00
Dan Winship
f3ba935336 Consistently use proxyutil as the name for pkg/proxy/util
Some places were using utilproxy, but that implies that it's
pkg/util/proxy...
2023-05-30 12:18:49 -04:00
Kubernetes Prow Robot
b2a1855f2c
Merge pull request #118088 from danwinship/kube-proxy-belated-cleanup
belated cleanup of some kube-proxy stuff for old versions
2023-05-18 13:18:34 -07:00
Dan Winship
80b9c85361 belated cleanup of some kube-proxy stuff for old versions 2023-05-17 18:34:27 -04:00
Dan Winship
0e456dcf86 Clarify localhost nodeport comments/errors 2023-05-16 09:14:11 -04:00
Dan Winship
a744a186b6 Rename GetNodeAddresses to GetNodeIPs, return net.IP 2023-05-16 09:14:09 -04:00
Dan Winship
2ca215fd99 Add NodePortAddresses.MatchAll()
Rather than having GetNodeAddresses() return a special magic value
indicating that it matches all IPs, add a separate method to check
that. (And have GetNodeAddresses() just return the IPs as expected
instead.)
2023-05-16 09:09:24 -04:00
Dan Winship
9ac657bb94 Make NodePortAddresses explicitly IP-family-specific
Both proxies handle IPv4 and IPv6 nodeport addresses separately, but
GetNodeAddresses went out of its way to make that difficult. Fix that.

This commit does not change any externally-visible semantics, but it
makes the existing weird semantics more obvious. Specifically, if you
say "--nodeport-addresses 10.0.0.0/8,192.168.0.0/16", then the
dual-stack proxy code would have split that into a list of IPv4 CIDRs
(["10.0.0.0/8", "192.168.0.0/16"]) to pass to the IPv4 proxier, and a
list of IPv6 CIDRs ([]) to pass to the IPv6 proxier, and then the IPv6
proxier would say "well since the list of nodeport addresses is empty,
I'll listen on all IPv6 addresses", which probably isn't what you
meant, but that's what it did.
2023-05-15 10:53:44 -04:00
Dan Winship
c3971002c9 MinimizeIPTablesRestore to GA 2023-05-09 18:19:00 -04:00
Dan Winship
cd51c1803e Add new partial/full sync time metrics for iptables kube-proxy 2023-05-05 22:48:45 -04:00
Daman
c2c8b8d178 pkg/proxy: using generic sets
pkg/proxy: using generic sets

Signed-off-by: Daman <aroradaman@gmail.com>
2023-05-05 14:29:23 +05:30
Daman
a6339e2702 proxy/conntrack: using common conntrack cleaning function in proxiers 2023-04-16 15:59:14 +05:30
Daman
efb0563094 proxy/conntrack: moved pkg/util/conntrack -> pkg/proxy/conntrack 2023-04-16 15:52:52 +05:30
Dan Winship
2bb35e08f4 Clarify kubelet/kube-proxy iptables rule skew constraints 2023-04-13 14:05:58 -04:00
Dan Winship
7696bcd10c Remove some now-obviously-unnecessary checks
Now that the endpoint update fields have names that make it clear that
they only contain UDP objects, it's obvious that the "protocol == UDP"
checks in the iptables and ipvs proxiers were no-ops, so remove them.
2023-03-14 12:18:58 -04:00
Dan Winship
dea8e34ea7 Improve the naming of the stale-conntrack-entry-tracking fields
The APIs talked about "stale services" and "stale endpoints", but the
thing that is actually "stale" is the conntrack entries, not the
services/endpoints. Fix the names to indicate what they actual keep
track of.

Also, all three fields (2 in the endpoints update object and 1 in the
service update object) are currently UDP-specific, but only the
service one made that clear. Fix that too.
2023-03-14 12:18:58 -04:00
Dan Winship
4381973a44 Revert (most of) "Issue 70020; Flush Conntrack entities for SCTP"
This commit did not actually work; in between when it was first
written and tested, and when it merged, the code in
pkg/proxy/endpoints.go was changed to only add UDP endpoints to the
"stale endpoints"/"stale services" lists, and so checking for "either
UDP or SCTP" rather than just UDP when processing those lists had no
effect.

This reverts most of commit aa8521df66
(but leaves the changes related to
ipvs.IsRsGracefulTerminationNeeded() since that actually did have the
effect it meant to have).
2023-03-14 12:18:58 -04:00
Kubernetes Prow Robot
611273a5bb
Merge pull request #115253 from danwinship/proxy-update-healthchecknodeport
Split out HealthCheckNodePort stuff from service/endpoint map Update()
2023-03-13 15:22:48 -07:00
Alexander Constantinescu
ec917850af Add proxy healthz result to ETP=local health check
Today, the health check response to the load balancers asking Kube-proxy for
the status of ETP:Local services does not include the healthz state of Kube-
proxy. This means that Kube-proxy might indicate to load balancers that they
should forward traffic to the node in question, simply because the endpoint
is running on the node - this overlooks the fact that Kube-proxy might be
not-healthy and hasn't successfully written the rules enabling traffic to
reach the endpoint.
2023-03-06 10:53:17 +01:00
Dan Winship
0c2711bf24 Make NodePortAddresses abstraction around GetNodeAddresses/ContainsIPv4Loopback 2023-02-22 08:32:19 -05:00
Dan Winship
d43878f970 Put all iptables nodeport address handling in one place
For some reason we were calculating the available nodeport IPs at the
top of syncProxyRules even though we didn't use them until the end.
(Well, the previous code avoided generating KUBE-NODEPORTS chain rules
if there were no node IPs available, but that case is considered an
error anyway, so there's no need to optimize it.)

(Also fix a stale `err` reference exposed by this move.)
2023-02-22 08:30:36 -05:00
Kubernetes Prow Robot
c94f708ce4
Merge pull request #114470 from danwinship/kep-3178-fixups
KEP-3178-related iptables rule fixups
2023-02-21 14:24:08 -08:00
Dan Winship
d901992eae Split out HealthCheckNodePort stuff from service/endpoint map Update()
In addition to actually updating their data from the provided list of
changes, EndpointsMap.Update() and ServicePortMap.Update() return a
struct with some information about things that changed because of that
update (eg services with stale conntrack entries).

For some reason, they were also returning information about
HealthCheckNodePorts, but they were returning *static* information
based on the current (post-Update) state of the map, not information
about what had *changed* in the update. Since this doesn't match how
the other data in the struct is used (and since there's no reason to
have the data only be returned when you call Update() anyway) , split
it out.
2023-01-22 10:33:33 -05:00
Dan Winship
b9bc0e5ac8 Ensure needFullSync is set at iptables proxy startup
The unit tests were broken with MinimizeIPTablesRestore enabled
because syncProxyRules() assumed that needFullSync would be set on the
first (post-setInitialized()) run, but the unit tests didn't ensure
that.

(In fact, there was a race condition in the real Proxier case as well;
theoretically syncProxyRules() could be run by the
BoundedFrequencyRunner after OnServiceSynced() called setInitialized()
but before it called forceSyncProxyRules(), thus causing the first
real sync to try to do a partial sync and fail. This is now fixed as
well.)
2023-01-18 10:50:12 -05:00
Dan Winship
169604d906 Validate single-stack --nodeport-addresses sooner
In the dual-stack case, iptables.NewDualStackProxier and
ipvs.NewDualStackProxier filtered the nodeport addresses values by IP
family before creating the single-stack proxiers. But in the
single-stack case, the kube-proxy startup code just passed the value
to the single-stack proxiers without validation, so they had to
re-check it themselves. Fix that.
2023-01-03 09:01:45 -05:00
Dan Winship
e7ed7220eb Explicitly pass IP family to proxier
Rather than re-determining it from the iptables object in both proxies.
2023-01-03 09:01:45 -05:00
Dan Winship
0ea0295965 Duplicate the "anti-martian-packet" rule in kube-proxy
This rule was mistakenly added to kubelet even though it only applies
to kube-proxy's traffic. We do not want to remove it from kubelet yet
because other components may be depending on it for security, but we
should make kube-proxy output its own rule rather than depending on
kubelet.
2022-12-29 16:24:58 -05:00
Dan Winship
305641bd4c Add iptablesKubeletJumpChains to iptables proxier
Some of the chains kube-proxy creates are also created by kubelet; we
need to ensure that those chains exist but we should not delete them
in CleanupLeftovers().
2022-12-29 16:24:58 -05:00
Dan Winship
1870c4cdd7 Add a comment-only rule to the end of KUBE-FW-* chains
With the removal of the "-j KUBE-MARK-DROP" rules, the firewall chains
end rather ambiguously. Add a comment-only rule explaining what will
happen.
2022-12-29 16:24:58 -05:00
Dan Winship
bfa4948bb6 Don't re-run EnsureChain/EnsureRules on partial syncs
We currently invoke /sbin/iptables 24 times on each syncProxyRules
before calling iptables-restore. Since even trivial iptables
invocations are slow on hosts with lots of iptables rules, this adds a
lot of time to each sync. Since these checks are expected to be a
no-op 99% of the time, skip them on partial syncs.
2022-11-29 09:42:49 -05:00
cyclinder
4aff0dba0d
kube-proxy ipatbles: update log message 2022-11-04 10:07:15 +08:00
Kubernetes Prow Robot
d86c013b0d
Merge pull request #108250 from cyclinder/add_flag_in_proxy
kube-proxy:  add a flag  to  disable nodePortOnLocalhost
2022-11-03 17:10:13 -07:00
Manav Agarwal
3320e50e24 If applied, this commit will refactor variable names in kube-proxy 2022-11-03 03:45:57 +05:30
cyclinder
bef2070031
kube-proxy: add a flag to disables the allowing NodePort services to be accessed via localhost 2022-11-02 16:17:52 +08:00
Dan Winship
818de5a545 proxy/iptables: Add metric for partial sync failures, add test 2022-09-26 16:31:42 -04:00
Dan Winship
ab326d2f4e proxy/iptables: Don't rewrite chains that haven't changed
iptables-restore requires that if you change any rule in a chain, you
have to rewrite the entire chain. But if you avoid mentioning a chain
at all, it will leave it untouched. Take advantage of this by not
rewriting the SVC, SVL, EXT, FW, and SEP chains for services that have
not changed since the last sync, which should drastically cut down on
the size of each iptables-restore in large clusters.
2022-09-26 16:30:42 -04:00
Dan Winship
9f69a3a9d4 kube-proxy: remove iptables-to-userspace fallback
Back when iptables was first made the default, there were
theoretically some users who wouldn't have been able to support it due
to having an old /sbin/iptables. But kube-proxy no longer does the
things that didn't work with old iptables, and we removed that check a
long time ago. There is also a check for a new-enough kernel version,
but it's checking for a feature which was added in kernel 3.6, and no
one could possibly be running Kubernetes with a kernel that old. So
the fallback code now never actually falls back, so it should just be
removed.
2022-08-16 09:21:34 -04:00
Dan Winship
f65fbc877b proxy/iptables: remove last references to KUBE-MARK-DROP 2022-07-28 09:03:49 -04:00
Dan Winship
9313188909 proxy/iptables: Don't use KUBE-MARK-DROP for LoadBalancerSourceRanges 2022-07-28 09:03:46 -04:00
Kubernetes Prow Robot
ce433f87b4
Merge pull request #110266 from danwinship/minimize-prep-reorg
iptables proxy reorg in preparation for minimizing iptables-restore
2022-07-27 04:06:30 -07:00
Dan Williams
f197509879 proxy: queue syncs on node events rather than syncing immediately
The proxies watch node labels for topology changes, but node labels
can change in bursts especially in larger clusters. This causes
pressure on all proxies because they can't filter the events, since
the topology could match on any label.

Change node event handling to queue the request rather than immediately
syncing. The sync runner can already handle short bursts which shouldn't
change behavior for most cases.

Signed-off-by: Dan Williams <dcbw@redhat.com>
2022-07-18 09:21:52 -05:00
Dan Winship
367f18c49b proxy/iptables: move firewall chain setup
Part of reorganizing the syncProxyRules loop to do:
  1. figure out what chains are needed, mark them in activeNATChains
  2. write servicePort jump rules to KUBE-SERVICES/KUBE-NODEPORTS
  3. write servicePort-specific chains (SVC, SVL, EXT, FW, SEP)

This moves the FW chain creation to the end (rather than having it in
the middle of adding the jump rules for the LB IPs).
2022-07-09 07:08:42 -04:00