kubernetes

Author	SHA1	Message	Date
Dan Winship	367f18c49b	proxy/iptables: move firewall chain setup Part of reorganizing the syncProxyRules loop to do: 1. figure out what chains are needed, mark them in activeNATChains 2. write servicePort jump rules to KUBE-SERVICES/KUBE-NODEPORTS 3. write servicePort-specific chains (SVC, SVL, EXT, FW, SEP) This moves the FW chain creation to the end (rather than having it in the middle of adding the jump rules for the LB IPs).	2022-07-09 07:08:42 -04:00
Dan Winship	2030591ce7	proxy/iptables: move internal traffic setup code Part of reorganizing the syncProxyRules loop to do: 1. figure out what chains are needed, mark them in activeNATChains 2. write servicePort jump rules to KUBE-SERVICES/KUBE-NODEPORTS 3. write servicePort-specific chains (SVC, SVL, EXT, FW, SEP) This fixes the jump rules for internal traffic. Previously we were handling "jumping from kubeServices to internalTrafficChain" and "adding masquerade rules to internalTrafficChain" in the same place.	2022-07-09 07:07:48 -04:00
Dan Winship	00f789cd8d	proxy/iptables: move EXT chain rule creation to the end Part of reorganizing the syncProxyRules loop to do: 1. figure out what chains are needed, mark them in activeNATChains 2. write servicePort jump rules to KUBE-SERVICES/KUBE-NODEPORTS 3. write servicePort-specific chains (SVC, SVL, EXT, FW, SEP) This fixes the handling of the EXT chain.	2022-07-09 07:07:47 -04:00
Dan Winship	8906ab390e	proxy/iptables: reorganize cluster/local chain creation Part of reorganizing the syncProxyRules loop to do: 1. figure out what chains are needed, mark them in activeNATChains 2. write servicePort jump rules to KUBE-SERVICES/KUBE-NODEPORTS 3. write servicePort-specific chains (SVC, SVL, EXT, FW, SEP) This fixes the handling of the SVC and SVL chains. We were already filling them in at the end of the loop; this fixes it to create them at the bottom of the loop as well.	2022-07-09 07:05:05 -04:00
Dan Winship	da14a12fe5	proxy/iptables: move endpoint chain rule creation to the end Part of reorganizing the syncProxyRules loop to do: 1. figure out what chains are needed, mark them in activeNATChains 2. write servicePort jump rules to KUBE-SERVICES/KUBE-NODEPORTS 3. write servicePort-specific chains (SVC, SVL, EXT, FW, SEP) This fixes the handling of the endpoint chains. Previously they were handled entirely at the top of the loop. Now we record which ones are in use at the top but don't create them and fill them in until the bottom.	2022-07-09 06:51:47 -04:00
Dan Winship	8a5801996b	proxy/iptables: belatedly simplify local traffic policy metrics We figure out early on whether we're going to end up outputting no endpoints, so update the metrics then. (Also remove a redundant feature gate check; svcInfo already checks the ServiceInternalTrafficPolicy feature gate itself and so svcInfo.InternalPolicyLocal() will always return false if the gate is not enabled.)	2022-07-09 06:50:16 -04:00
Dan Winship	95705350d5	proxy/iptables: Don't use KUBE-MARK-DROP for "no local endpoints" Rather than marking packets to be dropped in the "nat" table and then dropping them from the "filter" table later, just use rules in "filter" to drop the packets we don't like directly.	2022-06-29 16:37:24 -04:00
Dan Winship	7d3ba837f5	proxy/iptables: only clean up chains periodically in large clusters "iptables-save" takes several seconds to run on machines with lots of iptables rules, and we only use its result to figure out which chains are no longer referenced by any rules. While it makes things less confusing if we delete unused chains immediately, it's not actually _necessary_ since they never get called during packet processing. So in large clusters, make it so we only clean up chains periodically rather than on every sync.	2022-06-29 11:14:38 -04:00
Dan Winship	1cd461bd24	proxy/iptables: abstract the "endpointChainsNumberThreshold" a bit Turn this into a generic "large cluster mode" that determines whether we optimize for performance or debuggability.	2022-06-29 11:14:38 -04:00
Dan Winship	7c27cf0b9b	Simplify iptables-save parsing We don't need to parse out the counter values from the iptables-save output (since they are always 0 for the chains we care about). Just parse the chain names themselves. Also, all of the callers of GetChainLines() pass it input that contains only a single table, so just assume that, rather than carefully parsing only a single table's worth of the input.	2022-06-28 08:39:32 -04:00
Dan Winship	a3556edba1	Stop trying to "preserve" iptables counters that are always 0 The iptables and ipvs proxies have code to try to preserve certain iptables counters when modifying chains via iptables-restore, but the counters in question only actually exist for the built-in chains (eg INPUT, FORWARD, PREROUTING, etc), which we never modify via iptables-restore (and in fact, can't safely modify via iptables-restore), so we are really just doing a lot of unnecessary work to copy the constant string "[0:0]" over from iptables-save output to iptables-restore input. So stop doing that. Also fix a confused error message when iptables-save fails.	2022-06-28 08:39:32 -04:00
gkarthiks	1fd959e256	refactor: serviceNameString to svcptNameString Signed-off-by: gkarthiks <github.gkarthiks@gmail.com> refactor: svc port name variable #108806 Signed-off-by: gkarthiks <github.gkarthiks@gmail.com> refactor: rename struct for service port information to servicePortInfo and fields for more redability Signed-off-by: gkarthiks <github.gkarthiks@gmail.com> fix: drop chain rule Signed-off-by: gkarthiks <github.gkarthiks@gmail.com>	2022-05-22 03:31:00 -07:00
Dan Winship	b0d9c063a8	unexport mistakenly-exported constants	2022-05-06 07:33:29 -04:00
Kubernetes Prow Robot	2b3508e0f1	Merge pull request #109826 from danwinship/multi-load-balancer fix kube-proxy bug with multiple LB IPs and source ranges	2022-05-06 03:09:15 -07:00
Dan Winship	813aca47af	proxy/iptables: fix firewall rules with multiple LB IPs The various loops in the LoadBalancer rule section were mis-nested such that if a service had multiple LoadBalancer IPs, we would write out the firewall rules multiple times (and the allowFromNode rule for the second and later IPs would end up being written after the "else DROP" rule from the first IP).	2022-05-05 10:58:09 -04:00
Dan Winship	84ad54f0e5	Don't increment "no local endpoints" metric when there are no remote endpoints A service having no _local_ endpoints when it does have remote endpoints is different from a service having no endpoints at all.	2022-05-04 12:38:17 -04:00
Max Renaud	f0dfac5d07	Add sync_proxy_rules_no_local_endpoints_total metric	2022-03-31 18:54:23 +00:00
Tim Hockin	40e21e310f	Elide the -FW- chain when possible This makes it epsilon harder to reason about but saves one chain declaration and one rule per service-port usually.	2022-03-30 09:55:34 -07:00
Tim Hockin	7726b5f9fc	kube-proxy: inline `args` in most cases	2022-03-30 09:55:34 -07:00
Tim Hockin	9ed6b73495	kube-proxy: comment endpoint in SEP jumps	2022-03-30 09:55:34 -07:00
Tim Hockin	0e47dc3a65	kube-proxy: remove old TODO	2022-03-30 09:55:33 -07:00
Tim Hockin	30c1523708	kube-proxy: Renames for readability	2022-03-30 09:55:32 -07:00
Tim Hockin	f1553f58c5	kube-proxy: Remove now unneeded rule Now that NodePorts jump to EXT, we don't need a specific rule for loopback source detection.	2022-03-30 09:54:40 -07:00
Tim Hockin	db932a0ab1	kube-proxy: Rework LB VIP capture logic * Comments * If there are multiple VIPs, don't declare the fwChain multiple times. * Don't emit the last -j DROP if there's no source ranges	2022-03-30 09:54:40 -07:00
Tim Hockin	07b2585927	kube-proxy: Rename XLB -> EXT This changes the "XLB" chain into the "EXT" chain - the "external destinations" chain.	2022-03-30 09:54:38 -07:00
Tim Hockin	482f3bc4bf	kube-proxy: all external jumps to XLB chain This makes the "destination" policy model clearer. All external destination captures now jump to the "XLB chain, which is the main place that masquerade is done (removing it from most other places). This is simpler to trace - XLB always exists (as long as you have an external exposure) and never gets bypassed.	2022-03-30 09:52:18 -07:00
Tim Hockin	99330d407a	kube-proxy: internal renames	2022-03-29 18:48:27 -07:00
Dan Winship	b9141e5c0d	proxy/iptables: rename chain variables	2022-03-26 11:14:18 -04:00
Dan Winship	548cf9d5de	proxy/iptables: fix internal-vs-external traffic policy handling Fix internal and external traffic policy to be handled separately (so that, in particular, services with Local internal traffic policy and Cluster external traffic policy do not behave as though they had Local external traffic policy as well. Additionally, traffic to an `internalTrafficPolicy: Local` service on a node with no endpoints is now dropped rather than being rejected (which, as in the external case, may prevent traffic from being lost when endpoints are in flux).	2022-03-26 11:06:34 -04:00
Dan Winship	2e780ecd99	proxy/iptables: Split KUBE-SVL-XXX chain out of KUBE-XLB-XXX Now the XLB chain _only_ implements the "short-circuit local connections to the SVC chain" rule, and the actual endpoint selection happens in the SVL chain. Though not quite implemented yet, this will eventually also mean that "SVC" = "Service, Cluster traffic policy" as opposed to "SVL" = "Service, Local traffic policy"	2022-03-26 11:06:34 -04:00
Dan Winship	87dcf8b914	proxy/iptables: move XLB chain initial rule setup	2022-03-26 11:06:34 -04:00
Dan Winship	2b872a990d	proxy/iptables: clean up / clarify iptables chain names a bit	2022-03-26 11:06:34 -04:00
Kubernetes Prow Robot	475f7af1c1	Merge pull request #108812 from danwinship/endpoint-chain-names proxy/iptables: fix up endpoint chain name computation	2022-03-19 02:15:09 -07:00
Dan Winship	dd4d88398c	proxy/iptables: fix up endpoint chain name computation Rather than lazily computing and then caching the endpoint chain name because we don't have the right information at construct time, just pass the right information at construct time and compute the chain name then.	2022-03-18 16:10:33 -04:00
Dan Winship	e3549646ec	pkg/proxy: Simplify LocalTrafficDetector Now that we don't have to always append all of the iptables args into a single array, there's no reason to have LocalTrafficDetector take in a set of args to prepend to its own output, and also not much point in having it write out the "-j CHAIN" by itself either.	2022-03-18 16:09:04 -04:00
Khaled (Kal) Henidak	407dcf5164	iptables: remove port opener	2022-03-03 20:04:08 +00:00
Kubernetes Prow Robot	8f3636e8ac	Merge pull request #108224 from danwinship/kube-proxy-logging Only log full iptables-restore input at V(9)	2022-02-22 16:42:18 -08:00
Dan Winship	9483c272f4	Log metadata about kube-proxy iptables-restore calls For each iptables-restore call, log the number of services, endpoints, filter chains, filter rules, NAT chains, and NAT rules in the update at V(2), in addition to logging the actual rules if V(9).	2022-02-22 08:29:25 -05:00
Dan Winship	37ada4b04f	proxy/iptables: Don't create unused chains, and enable the unit test for that	2022-02-21 09:16:22 -05:00
Dan Winship	f5ad58b57b	Only log full iptables-restore input at V(9) In large clusters, the iptables-restore input will be tens of thousands of lines long, and logging it at V(5) essentially means that "kube-proxy -v=5" cannot be used in such clusters to see _other_ things that get logged at V(5), because logs will get rolled over far too quickly. So bump the full-rules logging output down to V(9).	2022-02-21 09:02:36 -05:00
Dan Winship	e7bae9df81	Count iptables lines as we write them	2022-02-19 11:56:14 -05:00
Antonio Ojea	8b5fa408e0	kube-proxy: only set route_localnet if required kube-proxy sets the sysctl net.ipv4.conf.all.route_localnet=1 so NodePort services can be accessed on the loopback addresses in IPv4, but this may present security issues. Leverage the --nodeport-addresses flag to opt-out of this feature, if the list is not empty and none of the IP ranges contains an IPv4 loopback address this sysctl is not set. In addition, add a warning to inform users about this behavior.	2022-02-17 20:20:31 +01:00
Quan Tian	6ce612ef65	kube-proxy: fix duplicate port opening When nodePortAddresses is not specified for kube-proxy, it tried to open the node port for a NodePort service twice, triggered by IPv4ZeroCIDR and IPv6ZeroCIDR separately. The first attempt would succeed and the second one would always generate an error log like below: "listen tcp4 :30522: bind: address already in use" This patch fixes it by ensuring nodeAddresses of a proxier only contain the addresses for its IP family.	2022-01-08 02:35:35 +08:00
cyclinder	97bd6e977d	kube-proxy should log the payload when iptables-restore fails Signed-off-by: cyclinder <qifeng.guo@daocloud.io>	2021-12-23 09:50:56 +08:00
Neha Lohia	fa1b6765d5	move pkg/util/node to component-helpers/node/util (#105347 ) Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com>	2021-11-12 07:52:27 -08:00
Quan Tian	95a706ba7c	Remove redundant forwarding rule in filter table	2021-11-11 10:27:53 +08:00
Dan Winship	8ef1255cdd	proxy/iptables: Abstract out code for writing service-chain-to-endpoint-chain rules The same code appeared twice, once for the SVC chain and once for the XLB chain, with the only difference being that the XLB version had more verbose comments.	2021-11-09 20:59:33 -05:00
Dan Winship	4c64008181	proxy/iptables: Abstract out shared OpenLocalPort code Also, in the NodePort code, fix it to properly take advantage of the fact that GetNodeAddresses() guarantees that if it returns a "match-all" CIDR, then it doesn't return anything else. That also makes it unnecessary to loop over the node addresses twice.	2021-11-09 20:59:30 -05:00
Dan Winship	9cd0552ddd	proxy/iptables: Remove unnecessary /32 and /128 in iptables rules If you pass just an IP address to "-s" or "-d", the iptables command will fill in the correct mask automatically. Originally, the proxier was just hardcoding "/32" for all of these, which was unnecessary but simple. But when IPv6 support was added, the code was made more complicated to deal with the fact that the "/32" needed to be "/128" in the IPv6 case, so it would parse the IPs to figure out which family they were, which in turn involved adding some checks in case the parsing fails (even though that "can't happen" and the old code didn't check for invalid IPs, even though that would break the iptables-restore if there had been any). Anyway, all of that is unnecessary because we can just pass the IP strings to iptables directly rather than parsing and unparsing them first. (The diff to proxier_test.go is just deleting "/32" everywhere.)	2021-11-09 09:32:50 -05:00
Dan Winship	62672d06e6	proxy/iptables: fix a bug in node address error handling If GetNodeAddresses() fails (eg, because you passed the wrong CIDR to `--nodeport-addresses`), then any NodePort services would end up with only half a set of iptables rules. Fix it to just not output the NodePort-specific parts in that case (in addition to logging an error about the GetNodeAddresses() failure).	2021-11-09 09:32:50 -05:00

1 2 3 4 5 ...

462 Commits