Commit Graph

416 Commits

Author SHA1 Message Date
Dan Winship
8ef1255cdd proxy/iptables: Abstract out code for writing service-chain-to-endpoint-chain rules
The same code appeared twice, once for the SVC chain and once for the
XLB chain, with the only difference being that the XLB version had
more verbose comments.
2021-11-09 20:59:33 -05:00
Dan Winship
4c64008181 proxy/iptables: Abstract out shared OpenLocalPort code
Also, in the NodePort code, fix it to properly take advantage of the
fact that GetNodeAddresses() guarantees that if it returns a
"match-all" CIDR, then it doesn't return anything else. That also
makes it unnecessary to loop over the node addresses twice.
2021-11-09 20:59:30 -05:00
Dan Winship
9cd0552ddd proxy/iptables: Remove unnecessary /32 and /128 in iptables rules
If you pass just an IP address to "-s" or "-d", the iptables command
will fill in the correct mask automatically.

Originally, the proxier was just hardcoding "/32" for all of these,
which was unnecessary but simple. But when IPv6 support was added, the
code was made more complicated to deal with the fact that the "/32"
needed to be "/128" in the IPv6 case, so it would parse the IPs to
figure out which family they were, which in turn involved adding some
checks in case the parsing fails (even though that "can't happen" and
the old code didn't check for invalid IPs, even though that would
break the iptables-restore if there had been any).

Anyway, all of that is unnecessary because we can just pass the IP
strings to iptables directly rather than parsing and unparsing them
first.

(The diff to proxier_test.go is just deleting "/32" everywhere.)
2021-11-09 09:32:50 -05:00
Dan Winship
62672d06e6 proxy/iptables: fix a bug in node address error handling
If GetNodeAddresses() fails (eg, because you passed the wrong CIDR to
`--nodeport-addresses`), then any NodePort services would end up with
only half a set of iptables rules. Fix it to just not output the
NodePort-specific parts in that case (in addition to logging an error
about the GetNodeAddresses() failure).
2021-11-09 09:32:50 -05:00
Dan Winship
ab67a942ca proxy/iptables, proxy/ipvs: Remove an unnecessary check
The iptables and ipvs proxiers both had a check that none of the
elements of svcInfo.LoadBalancerIPStrings() were "", but that was
already guaranteed by the svcInfo code. Drop the unnecessary checks
and remove a level of indentation.
2021-11-09 09:32:50 -05:00
Tim Hockin
731dc8cf74
Fix regression in kube-proxy (#106214)
* Fix regression in kube-proxy

Don't use a prepend() - that allocates.  Instead, make Write() take
either strings or slices (I wish we could express that better).

* WIP: switch to intf

* WIP: less appends

* tests and ipvs
2021-11-08 15:14:49 -08:00
Kubernetes Prow Robot
0940dd6fc4
Merge pull request #106163 from aojea/conntrack_readiness
kube-proxy consider endpoint readiness to delete UDP stale conntrack entries
2021-11-08 13:11:44 -08:00
Tim Hockin
f662170ff7 kube-proxy: make iptables buffer-writing cleaner 2021-11-05 12:28:19 -07:00
Tim Hockin
f558554ce0 kube-proxy: minor cleanup
Get rid of overlapping helper functions.
2021-11-05 12:28:19 -07:00
Antonio Ojea
909925b492 kube-proxy: fix stale detection logic
The logic to detect stale endpoints was not assuming the endpoint
readiness.

We can have stale entries on UDP services for 2 reasons:
- an endpoint was receiving traffic and is removed or replaced
- a service was receiving traffic but not forwarding it, and starts
to forward it.

Add an e2e test to cover the regression
2021-11-05 20:14:56 +01:00
Dan Winship
229ae58520 proxy/iptables: fix all-vs-ready endpoints a bit
Filter the allEndpoints list into readyEndpoints sooner, and set
"hasEndpoints" based (mostly) on readyEndpoints, not allEndpoints (so
that, eg, we correctly generate REJECT rules for services with no
_functioning_ endpoints, even if they have unusable terminating
endpoints).

Also, write out the endpoint chains at the top of the loop when we
iterate the endpoints for the first time, rather than copying some of
the data to another set of variables and then writing them out later.
And don't write out endpoint chains that won't be used

Also, generate affinity rules only for readyEndpoints rather than
allEndpoints, so affinity gets broken correctly when an endpoint
becomes unready.
2021-11-04 16:32:08 -04:00
Dan Winship
3679639cf1 proxy/iptables: Remove a no-op check
There was code to deal with endpoints that have invalid/empty IP
addresses, but EndpointSlice validation already ensures that these
can't exist.
2021-11-04 16:32:08 -04:00
Dan Winship
08680192fb proxy/iptables: Fix sync_proxy_rules_iptables_total metric
It was counting the number of lines including the "COMMIT" line at the
end, so it was off by one.
2021-11-04 16:30:12 -04:00
Shivanshu Raj Shrivastava
81636f2158
Fixed improperly migrated logs (#105763)
* fixed improperly migrated logs

* small fixes

* small fix

* Update pkg/proxy/iptables/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/healthcheck/service_health.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/iptables/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/iptables/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/iptables/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/iptables/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/ipvs/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/ipvs/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/ipvs/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* Update pkg/proxy/winkernel/proxier.go

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>

* refactoring

* refactoring

* refactoring

* reverted some files back to master

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>
2021-10-20 03:55:58 -07:00
Ricardo Pchevuzinske Katz
37d11bcdaf Move node and networking related helpers from pkg/util to component helpers
Signed-off-by: Ricardo Katz <rkatz@vmware.com>
2021-09-16 17:00:19 -03:00
Kubernetes Prow Robot
648559b63e
Merge pull request #104742 from khenidak/health-check-port
change health-check port to listen to node port addresses
2021-09-13 15:43:52 -07:00
Khaled (Kal) Henidak
acdf50fbed change proxiers to pass nodePortAddresses 2021-09-13 18:27:07 +00:00
Dan Winship
7f6fbc4482 Drop broken/no-op proxyconfig.EndpointsHandler implementations
Because the proxy.Provider interface included
proxyconfig.EndpointsHandler, all the backends needed to
implement its methods. But iptables, ipvs, and winkernel implemented
them as no-ops, and metaproxier had an implementation that wouldn't
actually work (because it couldn't handle Services with no active
Endpoints).

Since Endpoints processing in kube-proxy is deprecated (and can't be
re-enabled unless you're using a backend that doesn't support
EndpointSlice), remove proxyconfig.EndpointsHandler from the
definition of proxy.Provider and drop all the useless implementations.
2021-09-13 09:32:38 -04:00
Antonio Ojea
0cd75e8fec run hack/update-netparse-cve.sh 2021-08-20 10:42:09 +02:00
Antonio Ojea
a2a22903bc delete stale UDP conntrack entries for loadbalancer IPs 2021-07-29 17:35:07 +02:00
Swetha Repakula
0a42f7b989 Graduate EndpointSliceProxying and WindowsEndpointSliceProxying Gates 2021-07-07 13:33:30 -07:00
Kubernetes Prow Robot
96dff7d0c7
Merge pull request #102832 from Yuan-Junliang/migrateProxyEventAPI
Migrate kube-proxy event to use v1 Event API
2021-07-05 17:44:17 -07:00
Swetha Repakula
03b7a699c2 Kubeproxy uses V1 EndpointSlice 2021-06-30 18:41:57 -07:00
Yuan-Junliang
2e06066bab Migrate kube-proxy to use v1 Event API 2021-06-13 18:57:52 +08:00
Andrew Sy Kim
8c514cb232 proxier/iptables: check feature gate ProxyTerminatingEndpoints
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2021-06-04 15:17:43 -04:00
Andrew Sy Kim
4c8b190372 proxier/iptables: reuse the same variable for endpointchains for better memory consumption
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2021-06-04 15:17:43 -04:00
Andrew Sy Kim
732635fd4b proxier/iptables: fallback to terminating endpoints if there are no ready endpoints
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2021-06-04 15:15:40 -04:00
刁浩 10284789
580b557592 Log spelling formatting and a redundant conversion
Signed-off-by: 刁浩 10284789 <diao.hao@zte.com.cn>
2021-05-27 07:07:22 +00:00
Antonio Ojea
c6d97ee156 kube-proxy copy node labels 2021-04-28 13:26:26 +02:00
Surya Seetharaman
d3fe48e848 Kube-proxy: perf-enhancement: Reduce NAT table KUBE-SERVICES/NODEPORTS chain rules
The nat KUBE-SERVICES chain is called from OUTPUT and PREROUTING stages. In
clusters with large number of services, the nat-KUBE-SERVICES chain is the largest
chain with for eg: 33k rules. This patch aims to move the KubeMarkMasq rules from
the kubeServicesChain into the respective KUBE-SVC-* chains. This way during each
packet-rule matching we won't have to traverse the MASQ rules of all services which
get accumulated in the KUBE-SERVICES and/or KUBE-NODEPORTS chains. Since the
jump to KUBE-MARK-MASQ ultimately sets the 0x400 mark for nodeIP SNAT, it should not
matter whether the jump is made from KUBE-SERVICES or KUBE-SVC-* chains.

Specifically we change:

1) For ClusterIP svc, we move the KUBE-MARK-MASQ jump rule from KUBE-SERVICES
chain into KUBE-SVC-* chain.
2) For ExternalIP svc, we move the KUBE-MARK-MASQ jump rule in the case of
non-ServiceExternalTrafficPolicyTypeLocal from KUBE-SERVICES
chain into KUBE-SVC-* chain.
3) For NodePorts svc, we move the KUBE-MARK-MASQ jump rule in case of
non-ServiceExternalTrafficPolicyTypeLocal from KUBE-NODEPORTS chain to
KUBE-SVC-* chain.
4) For load-balancer svc, we don't change anything since it is already svc specific
due to creation of KUBE-FW-* chains per svc.

This would cut the rules per svc in KUBE-SERVICES and KUBE-NODEPORTS in half.
2021-04-21 16:41:03 +02:00
Kubernetes Prow Robot
eda1de301a
Merge pull request #100874 from lojies/proxyiptableslog
improve the readability of log
2021-04-10 19:04:37 -07:00
卢振兴10069964
98d4bdb5d7 improve the readability of log 2021-04-07 15:10:05 +08:00
Masashi Honma
d43b8dbf4e Use simpler expressions for error messages
1. Do not describe port type in message because lp.String() already has the
information.

2. Remove duplicate error detail from event log.
Previous log is like this.

47s         Warning   listen tcp4 :30764: socket: too many open files   node/127.0.0.1   can't open port "nodePort for default/temp-svc:834" (:30764/tcp4), skipping it: listen tcp4 :30764: socket: too many open files
2021-04-01 09:13:45 +09:00
Masashi Honma
3266136c1d Fire an event when failing to open NodePort
[issue]
When creating a NodePort service with the kubectl create command, the NodePort
assignment may fail.

Failure to assign a NodePort can be simulated with the following malicious
command[1].

$ kubectl create service nodeport temp-svc --tcp=`python3 <<EOF
print("1", end="")
for i in range(2, 1026):
  print("," + str(i), end="")
EOF
`

The command succeeds and shows following output.

service/temp-svc created

The service has been successfully generated and can also be referenced with the
get command.

$ kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)
temp-svc     NodePort    10.0.0.139   <none>        1:31335/TCP,2:32367/TCP,3:30263/TCP,(omitted),1023:31821/TCP,1024:32475/TCP,1025:30311/TCP   12s

The user does not recognize failure to assign a NodePort because
create/get/describe command does not show any error. This is the issue.

[solution]
Users can notice errors by looking at the kube-proxy logs, but it may be difficult to see the kube-proxy logs of all nodes.

E0327 08:50:10.216571  660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :30641: socket: too many open files" port="\"nodePort for default/temp-svc:744\" (:30641/tcp4)"
E0327 08:50:10.216611  660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :30827: socket: too many open files" port="\"nodePort for default/temp-svc:857\" (:30827/tcp4)"
...
E0327 08:50:10.217119  660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :32484: socket: too many open files" port="\"nodePort for default/temp-svc:805\" (:32484/tcp4)"
E0327 08:50:10.217293  660960 proxier.go:1612] "Failed to execute iptables-restore" err="pipe2: too many open files ()"
I0327 08:50:10.217341  660960 proxier.go:1615] "Closing local ports after iptables-restore failure"

So, this patch will fire an event when NodePort assignment fails.
In fact, when the externalIP assignment fails, it is also notified by event.

The event will be displayed like this.

$ kubectl get event
LAST SEEN   TYPE      REASON                                            OBJECT           MESSAGE
...
2s          Warning   listen tcp4 :31055: socket: too many open files   node/127.0.0.1   can't open "nodePort for default/temp-svc:901" (:31055/tcp4), skipping this nodePort: listen tcp4 :31055: socket: too many open files
2s          Warning   listen tcp4 :31422: socket: too many open files   node/127.0.0.1   can't open "nodePort for default/temp-svc:474" (:31422/tcp4), skipping this nodePort: listen tcp4 :31422: socket: too many open files
...

This PR fixes iptables and ipvs proxier.
Since userspace proxier does not seem to be affected by this issue, it is not fixed.

[1] Assume that fd limit is 1024(default).
$ ulimit -n
1024
2021-04-01 08:27:51 +09:00
Rob Scott
f07be06a19
Adding support for TopologyAwareHints to kube-proxy 2021-03-08 15:37:47 -08:00
Fangyuan Li
0621e90d31 Rename fields and methods for BaseServiceInfo
Fields:
1. rename onlyNodeLocalEndpoints to nodeLocalExternal;
2. rename onlyNodeLocalEndpointsForInternal to nodeLocalInternal;
Methods:
1. rename OnlyNodeLocalEndpoints to NodeLocalExternal;
2. rename OnlyNodeLocalEndpointsForInternal to NodeLocalInternal;
2021-03-07 16:52:59 -08:00
Fangyuan Li
7ed2f1d94d Implements Service Internal Traffic Policy
1. Add API definitions;
2. Add feature gate and drops the field when feature gate is not on;
3. Set default values for the field;
4. Add API Validation
5. add kube-proxy iptables and ipvs implementations
6. add tests
2021-03-07 16:52:59 -08:00
Antonio Ojea
654be57022 kube-proxy iptables expose number of rules metrics
add a new metric to kube-proxy iptables, so it exposes the number
of rules programmed in each iteration.
2021-03-05 10:00:38 +01:00
Kubernetes Prow Robot
6dc317a107
Merge pull request #98130 from JornShen/optimze_redundant_listenPortOpener
migrate to use k8s.io/util/net/port in kube-proxy
2021-02-18 10:02:51 -08:00
jornshen
dbe89a5683 migrate kube canary chain as const 2021-02-15 16:50:48 +08:00
jornshen
e68e105102 migrate to use k8s.io/util LocalPort and ListenPortOpener in iptables.proxier 2021-02-15 16:36:06 +08:00
Antonio Ojea
ed21a0e16c kube-proxy: clear conntrack entries after rules are in place
Clear conntrack entries for UDP NodePorts,
this has to be done AFTER the iptables rules are programmed.
It can happen that traffic to the NodePort hits the host before
the iptables rules are programmed this will create an stale entry
in conntrack that will blackhole the traffic, so we need to
clear it ONLY when the service has endpoints.
2021-02-10 16:22:03 +01:00
Kubernetes Prow Robot
c1b3797f4b
Merge pull request #97824 from hanlins/fix/97225/hc-rules
Explicitly add iptables rule to allow healthcheck nodeport
2021-02-04 15:54:52 -08:00
Hanlin Shi
4cd1eacbc1 Add rule to allow healthcheck nodeport traffic in filter table
1. For iptables mode, add KUBE-NODEPORTS chain in filter table. Add
   rules to allow healthcheck node port traffic.
2. For ipvs mode, add KUBE-NODE-PORT chain in filter table. Add
   KUBE-HEALTH-CHECK-NODE-PORT ipset to allow traffic to healthcheck
   node port.
2021-02-03 15:20:10 +00:00
Kubernetes Prow Robot
e89e7b4ed1
Merge pull request #98083 from JornShen/optimize_proxier_duplicate_localaddrset
optimize proxier duplicate localaddrset
2021-01-29 01:21:40 -08:00
jornshen
3f506cadb0 optimize proxier duplicate localaddrset 2021-01-29 10:52:01 +08:00
jornshen
3783821553 move the redundant writeline writeBytesLine to proxy/util/util.go 2021-01-21 10:51:39 +08:00
Kubernetes Prow Robot
eb08f36c7d
Merge pull request #96371 from andrewsykim/kube-proxy-terminating
kube-proxy: track serving/terminating conditions in endpoints cache
2021-01-11 18:38:25 -08:00
Andrew Sy Kim
9c096292cc kube-proxy: iptables proxy should ignore endpoints with condition ready=false
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2021-01-11 16:27:38 -05:00
jornshen
5af5a2ac7d migrate proxy.UpdateServiceMap to be a method of ServiceMap 2021-01-11 11:07:30 +08:00