kubernetes

Author	SHA1	Message	Date
Dan Winship	e7bae9df81	Count iptables lines as we write them	2022-02-19 11:56:14 -05:00
Antonio Ojea	8b5fa408e0	kube-proxy: only set route_localnet if required kube-proxy sets the sysctl net.ipv4.conf.all.route_localnet=1 so NodePort services can be accessed on the loopback addresses in IPv4, but this may present security issues. Leverage the --nodeport-addresses flag to opt-out of this feature, if the list is not empty and none of the IP ranges contains an IPv4 loopback address this sysctl is not set. In addition, add a warning to inform users about this behavior.	2022-02-17 20:20:31 +01:00
Quan Tian	6ce612ef65	kube-proxy: fix duplicate port opening When nodePortAddresses is not specified for kube-proxy, it tried to open the node port for a NodePort service twice, triggered by IPv4ZeroCIDR and IPv6ZeroCIDR separately. The first attempt would succeed and the second one would always generate an error log like below: "listen tcp4 :30522: bind: address already in use" This patch fixes it by ensuring nodeAddresses of a proxier only contain the addresses for its IP family.	2022-01-08 02:35:35 +08:00
cyclinder	97bd6e977d	kube-proxy should log the payload when iptables-restore fails Signed-off-by: cyclinder <qifeng.guo@daocloud.io>	2021-12-23 09:50:56 +08:00
Neha Lohia	fa1b6765d5	move pkg/util/node to component-helpers/node/util (#105347 ) Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com>	2021-11-12 07:52:27 -08:00
Quan Tian	95a706ba7c	Remove redundant forwarding rule in filter table	2021-11-11 10:27:53 +08:00
Dan Winship	8ef1255cdd	proxy/iptables: Abstract out code for writing service-chain-to-endpoint-chain rules The same code appeared twice, once for the SVC chain and once for the XLB chain, with the only difference being that the XLB version had more verbose comments.	2021-11-09 20:59:33 -05:00
Dan Winship	4c64008181	proxy/iptables: Abstract out shared OpenLocalPort code Also, in the NodePort code, fix it to properly take advantage of the fact that GetNodeAddresses() guarantees that if it returns a "match-all" CIDR, then it doesn't return anything else. That also makes it unnecessary to loop over the node addresses twice.	2021-11-09 20:59:30 -05:00
Dan Winship	9cd0552ddd	proxy/iptables: Remove unnecessary /32 and /128 in iptables rules If you pass just an IP address to "-s" or "-d", the iptables command will fill in the correct mask automatically. Originally, the proxier was just hardcoding "/32" for all of these, which was unnecessary but simple. But when IPv6 support was added, the code was made more complicated to deal with the fact that the "/32" needed to be "/128" in the IPv6 case, so it would parse the IPs to figure out which family they were, which in turn involved adding some checks in case the parsing fails (even though that "can't happen" and the old code didn't check for invalid IPs, even though that would break the iptables-restore if there had been any). Anyway, all of that is unnecessary because we can just pass the IP strings to iptables directly rather than parsing and unparsing them first. (The diff to proxier_test.go is just deleting "/32" everywhere.)	2021-11-09 09:32:50 -05:00
Dan Winship	62672d06e6	proxy/iptables: fix a bug in node address error handling If GetNodeAddresses() fails (eg, because you passed the wrong CIDR to `--nodeport-addresses`), then any NodePort services would end up with only half a set of iptables rules. Fix it to just not output the NodePort-specific parts in that case (in addition to logging an error about the GetNodeAddresses() failure).	2021-11-09 09:32:50 -05:00
Dan Winship	ab67a942ca	proxy/iptables, proxy/ipvs: Remove an unnecessary check The iptables and ipvs proxiers both had a check that none of the elements of svcInfo.LoadBalancerIPStrings() were "", but that was already guaranteed by the svcInfo code. Drop the unnecessary checks and remove a level of indentation.	2021-11-09 09:32:50 -05:00
Tim Hockin	731dc8cf74	Fix regression in kube-proxy (#106214 ) * Fix regression in kube-proxy Don't use a prepend() - that allocates. Instead, make Write() take either strings or slices (I wish we could express that better). * WIP: switch to intf * WIP: less appends * tests and ipvs	2021-11-08 15:14:49 -08:00
Kubernetes Prow Robot	0940dd6fc4	Merge pull request #106163 from aojea/conntrack_readiness kube-proxy consider endpoint readiness to delete UDP stale conntrack entries	2021-11-08 13:11:44 -08:00
Tim Hockin	f662170ff7	kube-proxy: make iptables buffer-writing cleaner	2021-11-05 12:28:19 -07:00
Tim Hockin	f558554ce0	kube-proxy: minor cleanup Get rid of overlapping helper functions.	2021-11-05 12:28:19 -07:00
Antonio Ojea	909925b492	kube-proxy: fix stale detection logic The logic to detect stale endpoints was not assuming the endpoint readiness. We can have stale entries on UDP services for 2 reasons: - an endpoint was receiving traffic and is removed or replaced - a service was receiving traffic but not forwarding it, and starts to forward it. Add an e2e test to cover the regression	2021-11-05 20:14:56 +01:00
Dan Winship	229ae58520	proxy/iptables: fix all-vs-ready endpoints a bit Filter the allEndpoints list into readyEndpoints sooner, and set "hasEndpoints" based (mostly) on readyEndpoints, not allEndpoints (so that, eg, we correctly generate REJECT rules for services with no _functioning_ endpoints, even if they have unusable terminating endpoints). Also, write out the endpoint chains at the top of the loop when we iterate the endpoints for the first time, rather than copying some of the data to another set of variables and then writing them out later. And don't write out endpoint chains that won't be used Also, generate affinity rules only for readyEndpoints rather than allEndpoints, so affinity gets broken correctly when an endpoint becomes unready.	2021-11-04 16:32:08 -04:00
Dan Winship	3679639cf1	proxy/iptables: Remove a no-op check There was code to deal with endpoints that have invalid/empty IP addresses, but EndpointSlice validation already ensures that these can't exist.	2021-11-04 16:32:08 -04:00
Dan Winship	08680192fb	proxy/iptables: Fix sync_proxy_rules_iptables_total metric It was counting the number of lines including the "COMMIT" line at the end, so it was off by one.	2021-11-04 16:30:12 -04:00
Shivanshu Raj Shrivastava	81636f2158	Fixed improperly migrated logs (#105763 ) * fixed improperly migrated logs * small fixes * small fix * Update pkg/proxy/iptables/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/healthcheck/service_health.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/iptables/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/iptables/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/iptables/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/iptables/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/ipvs/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/ipvs/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/ipvs/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * Update pkg/proxy/winkernel/proxier.go Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com> * refactoring * refactoring * refactoring * reverted some files back to master Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>	2021-10-20 03:55:58 -07:00
Ricardo Pchevuzinske Katz	37d11bcdaf	Move node and networking related helpers from pkg/util to component helpers Signed-off-by: Ricardo Katz <rkatz@vmware.com>	2021-09-16 17:00:19 -03:00
Kubernetes Prow Robot	648559b63e	Merge pull request #104742 from khenidak/health-check-port change health-check port to listen to node port addresses	2021-09-13 15:43:52 -07:00
Khaled (Kal) Henidak	acdf50fbed	change proxiers to pass nodePortAddresses	2021-09-13 18:27:07 +00:00
Dan Winship	7f6fbc4482	Drop broken/no-op proxyconfig.EndpointsHandler implementations Because the proxy.Provider interface included proxyconfig.EndpointsHandler, all the backends needed to implement its methods. But iptables, ipvs, and winkernel implemented them as no-ops, and metaproxier had an implementation that wouldn't actually work (because it couldn't handle Services with no active Endpoints). Since Endpoints processing in kube-proxy is deprecated (and can't be re-enabled unless you're using a backend that doesn't support EndpointSlice), remove proxyconfig.EndpointsHandler from the definition of proxy.Provider and drop all the useless implementations.	2021-09-13 09:32:38 -04:00
Antonio Ojea	0cd75e8fec	run hack/update-netparse-cve.sh	2021-08-20 10:42:09 +02:00
Antonio Ojea	a2a22903bc	delete stale UDP conntrack entries for loadbalancer IPs	2021-07-29 17:35:07 +02:00
Swetha Repakula	0a42f7b989	Graduate EndpointSliceProxying and WindowsEndpointSliceProxying Gates	2021-07-07 13:33:30 -07:00
Kubernetes Prow Robot	96dff7d0c7	Merge pull request #102832 from Yuan-Junliang/migrateProxyEventAPI Migrate kube-proxy event to use v1 Event API	2021-07-05 17:44:17 -07:00
Swetha Repakula	03b7a699c2	Kubeproxy uses V1 EndpointSlice	2021-06-30 18:41:57 -07:00
Yuan-Junliang	2e06066bab	Migrate kube-proxy to use v1 Event API	2021-06-13 18:57:52 +08:00
Andrew Sy Kim	8c514cb232	proxier/iptables: check feature gate ProxyTerminatingEndpoints Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>	2021-06-04 15:17:43 -04:00
Andrew Sy Kim	4c8b190372	proxier/iptables: reuse the same variable for endpointchains for better memory consumption Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>	2021-06-04 15:17:43 -04:00
Andrew Sy Kim	732635fd4b	proxier/iptables: fallback to terminating endpoints if there are no ready endpoints Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>	2021-06-04 15:15:40 -04:00
刁浩 10284789	580b557592	Log spelling formatting and a redundant conversion Signed-off-by: 刁浩 10284789 <diao.hao@zte.com.cn>	2021-05-27 07:07:22 +00:00
Antonio Ojea	c6d97ee156	kube-proxy copy node labels	2021-04-28 13:26:26 +02:00
Surya Seetharaman	d3fe48e848	Kube-proxy: perf-enhancement: Reduce NAT table KUBE-SERVICES/NODEPORTS chain rules The nat KUBE-SERVICES chain is called from OUTPUT and PREROUTING stages. In clusters with large number of services, the nat-KUBE-SERVICES chain is the largest chain with for eg: 33k rules. This patch aims to move the KubeMarkMasq rules from the kubeServicesChain into the respective KUBE-SVC-* chains. This way during each packet-rule matching we won't have to traverse the MASQ rules of all services which get accumulated in the KUBE-SERVICES and/or KUBE-NODEPORTS chains. Since the jump to KUBE-MARK-MASQ ultimately sets the 0x400 mark for nodeIP SNAT, it should not matter whether the jump is made from KUBE-SERVICES or KUBE-SVC-* chains. Specifically we change: 1) For ClusterIP svc, we move the KUBE-MARK-MASQ jump rule from KUBE-SERVICES chain into KUBE-SVC-* chain. 2) For ExternalIP svc, we move the KUBE-MARK-MASQ jump rule in the case of non-ServiceExternalTrafficPolicyTypeLocal from KUBE-SERVICES chain into KUBE-SVC-* chain. 3) For NodePorts svc, we move the KUBE-MARK-MASQ jump rule in case of non-ServiceExternalTrafficPolicyTypeLocal from KUBE-NODEPORTS chain to KUBE-SVC-* chain. 4) For load-balancer svc, we don't change anything since it is already svc specific due to creation of KUBE-FW-* chains per svc. This would cut the rules per svc in KUBE-SERVICES and KUBE-NODEPORTS in half.	2021-04-21 16:41:03 +02:00
Kubernetes Prow Robot	eda1de301a	Merge pull request #100874 from lojies/proxyiptableslog improve the readability of log	2021-04-10 19:04:37 -07:00
卢振兴10069964	98d4bdb5d7	improve the readability of log	2021-04-07 15:10:05 +08:00
Masashi Honma	d43b8dbf4e	Use simpler expressions for error messages 1. Do not describe port type in message because lp.String() already has the information. 2. Remove duplicate error detail from event log. Previous log is like this. 47s Warning listen tcp4 :30764: socket: too many open files node/127.0.0.1 can't open port "nodePort for default/temp-svc:834" (:30764/tcp4), skipping it: listen tcp4 :30764: socket: too many open files	2021-04-01 09:13:45 +09:00
Masashi Honma	3266136c1d	Fire an event when failing to open NodePort [issue] When creating a NodePort service with the kubectl create command, the NodePort assignment may fail. Failure to assign a NodePort can be simulated with the following malicious command[1]. $ kubectl create service nodeport temp-svc --tcp=`python3 <<EOF print("1", end="") for i in range(2, 1026): print("," + str(i), end="") EOF ` The command succeeds and shows following output. service/temp-svc created The service has been successfully generated and can also be referenced with the get command. $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) temp-svc NodePort 10.0.0.139 <none> 1:31335/TCP,2:32367/TCP,3:30263/TCP,(omitted),1023:31821/TCP,1024:32475/TCP,1025:30311/TCP 12s The user does not recognize failure to assign a NodePort because create/get/describe command does not show any error. This is the issue. [solution] Users can notice errors by looking at the kube-proxy logs, but it may be difficult to see the kube-proxy logs of all nodes. E0327 08:50:10.216571 660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :30641: socket: too many open files" port="\"nodePort for default/temp-svc:744\" (:30641/tcp4)" E0327 08:50:10.216611 660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :30827: socket: too many open files" port="\"nodePort for default/temp-svc:857\" (:30827/tcp4)" ... E0327 08:50:10.217119 660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :32484: socket: too many open files" port="\"nodePort for default/temp-svc:805\" (:32484/tcp4)" E0327 08:50:10.217293 660960 proxier.go:1612] "Failed to execute iptables-restore" err="pipe2: too many open files ()" I0327 08:50:10.217341 660960 proxier.go:1615] "Closing local ports after iptables-restore failure" So, this patch will fire an event when NodePort assignment fails. In fact, when the externalIP assignment fails, it is also notified by event. The event will be displayed like this. $ kubectl get event LAST SEEN TYPE REASON OBJECT MESSAGE ... 2s Warning listen tcp4 :31055: socket: too many open files node/127.0.0.1 can't open "nodePort for default/temp-svc:901" (:31055/tcp4), skipping this nodePort: listen tcp4 :31055: socket: too many open files 2s Warning listen tcp4 :31422: socket: too many open files node/127.0.0.1 can't open "nodePort for default/temp-svc:474" (:31422/tcp4), skipping this nodePort: listen tcp4 :31422: socket: too many open files ... This PR fixes iptables and ipvs proxier. Since userspace proxier does not seem to be affected by this issue, it is not fixed. [1] Assume that fd limit is 1024(default). $ ulimit -n 1024	2021-04-01 08:27:51 +09:00
Rob Scott	f07be06a19	Adding support for TopologyAwareHints to kube-proxy	2021-03-08 15:37:47 -08:00
Fangyuan Li	0621e90d31	Rename fields and methods for BaseServiceInfo Fields: 1. rename onlyNodeLocalEndpoints to nodeLocalExternal; 2. rename onlyNodeLocalEndpointsForInternal to nodeLocalInternal; Methods: 1. rename OnlyNodeLocalEndpoints to NodeLocalExternal; 2. rename OnlyNodeLocalEndpointsForInternal to NodeLocalInternal;	2021-03-07 16:52:59 -08:00
Fangyuan Li	7ed2f1d94d	Implements Service Internal Traffic Policy 1. Add API definitions; 2. Add feature gate and drops the field when feature gate is not on; 3. Set default values for the field; 4. Add API Validation 5. add kube-proxy iptables and ipvs implementations 6. add tests	2021-03-07 16:52:59 -08:00
Antonio Ojea	654be57022	kube-proxy iptables expose number of rules metrics add a new metric to kube-proxy iptables, so it exposes the number of rules programmed in each iteration.	2021-03-05 10:00:38 +01:00
Kubernetes Prow Robot	6dc317a107	Merge pull request #98130 from JornShen/optimze_redundant_listenPortOpener migrate to use k8s.io/util/net/port in kube-proxy	2021-02-18 10:02:51 -08:00
jornshen	dbe89a5683	migrate kube canary chain as const	2021-02-15 16:50:48 +08:00
jornshen	e68e105102	migrate to use k8s.io/util LocalPort and ListenPortOpener in iptables.proxier	2021-02-15 16:36:06 +08:00
Antonio Ojea	ed21a0e16c	kube-proxy: clear conntrack entries after rules are in place Clear conntrack entries for UDP NodePorts, this has to be done AFTER the iptables rules are programmed. It can happen that traffic to the NodePort hits the host before the iptables rules are programmed this will create an stale entry in conntrack that will blackhole the traffic, so we need to clear it ONLY when the service has endpoints.	2021-02-10 16:22:03 +01:00
Kubernetes Prow Robot	c1b3797f4b	Merge pull request #97824 from hanlins/fix/97225/hc-rules Explicitly add iptables rule to allow healthcheck nodeport	2021-02-04 15:54:52 -08:00
Hanlin Shi	4cd1eacbc1	Add rule to allow healthcheck nodeport traffic in filter table 1. For iptables mode, add KUBE-NODEPORTS chain in filter table. Add rules to allow healthcheck node port traffic. 2. For ipvs mode, add KUBE-NODE-PORT chain in filter table. Add KUBE-HEALTH-CHECK-NODE-PORT ipset to allow traffic to healthcheck node port.	2021-02-03 15:20:10 +00:00

1 2 3 4 5 ...

422 Commits