When cleaning up iptables rules and ipsets used by kube-proxy in IPVS mode
iptables chain KUBE-NODE-PORT needs to be deleted before ipset
KUBE-HEALTH-CHECK-NODE-PORT can be removed. Therefore, deletion of
iptables chain KUBE-NODE-PORT is added in this change.
Fix internal and external traffic policy to be handled separately (so
that, in particular, services with Local internal traffic policy and
Cluster external traffic policy do not behave as though they had Local
external traffic policy as well.
Additionally, traffic to an `internalTrafficPolicy: Local` service on
a node with no endpoints is now dropped rather than being rejected
(which, as in the external case, may prevent traffic from being lost
when endpoints are in flux).
Now that we don't have to always append all of the iptables args into
a single array, there's no reason to have LocalTrafficDetector take in
a set of args to prepend to its own output, and also not much point in
having it write out the "-j CHAIN" by itself either.
The iptables and ipvs proxiers both had a check that none of the
elements of svcInfo.LoadBalancerIPStrings() were "", but that was
already guaranteed by the svcInfo code. Drop the unnecessary checks
and remove a level of indentation.
* Fix regression in kube-proxy
Don't use a prepend() - that allocates. Instead, make Write() take
either strings or slices (I wish we could express that better).
* WIP: switch to intf
* WIP: less appends
* tests and ipvs
Because the proxy.Provider interface included
proxyconfig.EndpointsHandler, all the backends needed to
implement its methods. But iptables, ipvs, and winkernel implemented
them as no-ops, and metaproxier had an implementation that wouldn't
actually work (because it couldn't handle Services with no active
Endpoints).
Since Endpoints processing in kube-proxy is deprecated (and can't be
re-enabled unless you're using a backend that doesn't support
EndpointSlice), remove proxyconfig.EndpointsHandler from the
definition of proxy.Provider and drop all the useless implementations.
The iptables rule that matches kubeNodePortLocalSetSCTP must be inserted
before the one matches kubeNodePortSetSCTP, otherwise all SCTP traffic
would be masqueraded regardless of whether its ExternalTrafficPolicy is
Local or not.
To cover the case in tests, the patch adds rule order validation to
checkIptables.
1. Do not describe port type in message because lp.String() already has the
information.
2. Remove duplicate error detail from event log.
Previous log is like this.
47s Warning listen tcp4 :30764: socket: too many open files node/127.0.0.1 can't open port "nodePort for default/temp-svc:834" (:30764/tcp4), skipping it: listen tcp4 :30764: socket: too many open files
[issue]
When creating a NodePort service with the kubectl create command, the NodePort
assignment may fail.
Failure to assign a NodePort can be simulated with the following malicious
command[1].
$ kubectl create service nodeport temp-svc --tcp=`python3 <<EOF
print("1", end="")
for i in range(2, 1026):
print("," + str(i), end="")
EOF
`
The command succeeds and shows following output.
service/temp-svc created
The service has been successfully generated and can also be referenced with the
get command.
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
temp-svc NodePort 10.0.0.139 <none> 1:31335/TCP,2:32367/TCP,3:30263/TCP,(omitted),1023:31821/TCP,1024:32475/TCP,1025:30311/TCP 12s
The user does not recognize failure to assign a NodePort because
create/get/describe command does not show any error. This is the issue.
[solution]
Users can notice errors by looking at the kube-proxy logs, but it may be difficult to see the kube-proxy logs of all nodes.
E0327 08:50:10.216571 660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :30641: socket: too many open files" port="\"nodePort for default/temp-svc:744\" (:30641/tcp4)"
E0327 08:50:10.216611 660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :30827: socket: too many open files" port="\"nodePort for default/temp-svc:857\" (:30827/tcp4)"
...
E0327 08:50:10.217119 660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :32484: socket: too many open files" port="\"nodePort for default/temp-svc:805\" (:32484/tcp4)"
E0327 08:50:10.217293 660960 proxier.go:1612] "Failed to execute iptables-restore" err="pipe2: too many open files ()"
I0327 08:50:10.217341 660960 proxier.go:1615] "Closing local ports after iptables-restore failure"
So, this patch will fire an event when NodePort assignment fails.
In fact, when the externalIP assignment fails, it is also notified by event.
The event will be displayed like this.
$ kubectl get event
LAST SEEN TYPE REASON OBJECT MESSAGE
...
2s Warning listen tcp4 :31055: socket: too many open files node/127.0.0.1 can't open "nodePort for default/temp-svc:901" (:31055/tcp4), skipping this nodePort: listen tcp4 :31055: socket: too many open files
2s Warning listen tcp4 :31422: socket: too many open files node/127.0.0.1 can't open "nodePort for default/temp-svc:474" (:31422/tcp4), skipping this nodePort: listen tcp4 :31422: socket: too many open files
...
This PR fixes iptables and ipvs proxier.
Since userspace proxier does not seem to be affected by this issue, it is not fixed.
[1] Assume that fd limit is 1024(default).
$ ulimit -n
1024