Commit Graph

292 Commits

Author SHA1 Message Date
Masashi Honma
d43b8dbf4e Use simpler expressions for error messages
1. Do not describe port type in message because lp.String() already has the
information.

2. Remove duplicate error detail from event log.
Previous log is like this.

47s         Warning   listen tcp4 :30764: socket: too many open files   node/127.0.0.1   can't open port "nodePort for default/temp-svc:834" (:30764/tcp4), skipping it: listen tcp4 :30764: socket: too many open files
2021-04-01 09:13:45 +09:00
Masashi Honma
3266136c1d Fire an event when failing to open NodePort
[issue]
When creating a NodePort service with the kubectl create command, the NodePort
assignment may fail.

Failure to assign a NodePort can be simulated with the following malicious
command[1].

$ kubectl create service nodeport temp-svc --tcp=`python3 <<EOF
print("1", end="")
for i in range(2, 1026):
  print("," + str(i), end="")
EOF
`

The command succeeds and shows following output.

service/temp-svc created

The service has been successfully generated and can also be referenced with the
get command.

$ kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)
temp-svc     NodePort    10.0.0.139   <none>        1:31335/TCP,2:32367/TCP,3:30263/TCP,(omitted),1023:31821/TCP,1024:32475/TCP,1025:30311/TCP   12s

The user does not recognize failure to assign a NodePort because
create/get/describe command does not show any error. This is the issue.

[solution]
Users can notice errors by looking at the kube-proxy logs, but it may be difficult to see the kube-proxy logs of all nodes.

E0327 08:50:10.216571  660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :30641: socket: too many open files" port="\"nodePort for default/temp-svc:744\" (:30641/tcp4)"
E0327 08:50:10.216611  660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :30827: socket: too many open files" port="\"nodePort for default/temp-svc:857\" (:30827/tcp4)"
...
E0327 08:50:10.217119  660960 proxier.go:1286] "can't open port, skipping this nodePort" err="listen tcp4 :32484: socket: too many open files" port="\"nodePort for default/temp-svc:805\" (:32484/tcp4)"
E0327 08:50:10.217293  660960 proxier.go:1612] "Failed to execute iptables-restore" err="pipe2: too many open files ()"
I0327 08:50:10.217341  660960 proxier.go:1615] "Closing local ports after iptables-restore failure"

So, this patch will fire an event when NodePort assignment fails.
In fact, when the externalIP assignment fails, it is also notified by event.

The event will be displayed like this.

$ kubectl get event
LAST SEEN   TYPE      REASON                                            OBJECT           MESSAGE
...
2s          Warning   listen tcp4 :31055: socket: too many open files   node/127.0.0.1   can't open "nodePort for default/temp-svc:901" (:31055/tcp4), skipping this nodePort: listen tcp4 :31055: socket: too many open files
2s          Warning   listen tcp4 :31422: socket: too many open files   node/127.0.0.1   can't open "nodePort for default/temp-svc:474" (:31422/tcp4), skipping this nodePort: listen tcp4 :31422: socket: too many open files
...

This PR fixes iptables and ipvs proxier.
Since userspace proxier does not seem to be affected by this issue, it is not fixed.

[1] Assume that fd limit is 1024(default).
$ ulimit -n
1024
2021-04-01 08:27:51 +09:00
Rob Scott
f07be06a19
Adding support for TopologyAwareHints to kube-proxy 2021-03-08 15:37:47 -08:00
Fangyuan Li
0621e90d31 Rename fields and methods for BaseServiceInfo
Fields:
1. rename onlyNodeLocalEndpoints to nodeLocalExternal;
2. rename onlyNodeLocalEndpointsForInternal to nodeLocalInternal;
Methods:
1. rename OnlyNodeLocalEndpoints to NodeLocalExternal;
2. rename OnlyNodeLocalEndpointsForInternal to NodeLocalInternal;
2021-03-07 16:52:59 -08:00
Fangyuan Li
7ed2f1d94d Implements Service Internal Traffic Policy
1. Add API definitions;
2. Add feature gate and drops the field when feature gate is not on;
3. Set default values for the field;
4. Add API Validation
5. add kube-proxy iptables and ipvs implementations
6. add tests
2021-03-07 16:52:59 -08:00
jornshen
97a5a3d4d5 migrate to use k8s.io/util LocalPort and ListenPortOpener in ipvs.proxier 2021-02-15 16:36:08 +08:00
Kubernetes Prow Robot
c1b3797f4b
Merge pull request #97824 from hanlins/fix/97225/hc-rules
Explicitly add iptables rule to allow healthcheck nodeport
2021-02-04 15:54:52 -08:00
Hanlin Shi
4cd1eacbc1 Add rule to allow healthcheck nodeport traffic in filter table
1. For iptables mode, add KUBE-NODEPORTS chain in filter table. Add
   rules to allow healthcheck node port traffic.
2. For ipvs mode, add KUBE-NODE-PORT chain in filter table. Add
   KUBE-HEALTH-CHECK-NODE-PORT ipset to allow traffic to healthcheck
   node port.
2021-02-03 15:20:10 +00:00
Kubernetes Prow Robot
e89e7b4ed1
Merge pull request #98083 from JornShen/optimize_proxier_duplicate_localaddrset
optimize proxier duplicate localaddrset
2021-01-29 01:21:40 -08:00
jornshen
3f506cadb0 optimize proxier duplicate localaddrset 2021-01-29 10:52:01 +08:00
Kubernetes Prow Robot
97076f6647
Merge pull request #98297 from JornShen/replace_ipvs_proxier_protocal_str
use exist const to replace ipvs/proxier.go tcp,udp,sctp str
2021-01-28 14:41:52 -08:00
jornshen
249996e62f use exist const to replace ipvs/proxier.go tcp,udp,sctp 2021-01-22 14:52:00 +08:00
jornshen
3783821553 move the redundant writeline writeBytesLine to proxy/util/util.go 2021-01-21 10:51:39 +08:00
Kubernetes Prow Robot
eb08f36c7d
Merge pull request #96371 from andrewsykim/kube-proxy-terminating
kube-proxy: track serving/terminating conditions in endpoints cache
2021-01-11 18:38:25 -08:00
Kubernetes Prow Robot
5e22f7fead
Merge pull request #92938 from DataDog/lbernail/CVE-2020-8558
Do not set sysctlRouteLocalnet (CVE-2020-8558)
2021-01-11 17:38:24 -08:00
Andrew Sy Kim
a11abb5475 kube-proxy: ipvs proxy should ignore endpoints with condition ready=false
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2021-01-11 16:27:38 -05:00
Laurent Bernaille
15439148da
Do not set sysctlRouteLocalnet (CVE-2020-8558)
Signed-off-by: Laurent Bernaille <laurent.bernaille@datadoghq.com>
2021-01-11 11:41:32 +01:00
jornshen
5af5a2ac7d migrate proxy.UpdateServiceMap to be a method of ServiceMap 2021-01-11 11:07:30 +08:00
chengzhycn
c6c74f2a5d proxy/ipvs: return non-nil error when there is no matched IPVS service in syncEndpoint
Signed-off-by: chengzhycn <chengzhycn@gmail.com>
2021-01-07 10:49:04 +08:00
maao
d001b9b72a remove --cleanup-ipvs flag of kube-proxy
Signed-off-by: maao <maao420691301@gmail.com>
2020-12-31 11:29:38 +08:00
Kubernetes Prow Robot
6aae473318
Merge pull request #96830 from tnqn/ipvs-restore-commands
Fix duplicate chains in iptables-restore input
2020-12-08 20:03:34 -08:00
Quan Tian
9bf96b84c4 Fix duplicate chains in iptables-restore input
When running in ipvs mode, kube-proxy generated wrong iptables-restore
input because the chain names are hardcoded.

It also fixed a typo in method name.
2020-11-24 15:13:23 +08:00
Basant Amarkhed
707073d2f9 Fixup #1 addressing review comments 2020-11-17 07:13:51 +00:00
Basant Amarkhed
8fb895f3f1 Updating after merging with a conflicting commit 2020-11-14 01:09:46 +00:00
Patrik Cyvoct
d29665cc17
Revert "Merge pull request #92312 from Sh4d1/kep_1860"
This reverts commit ef16faf409, reversing
changes made to 2343b8a68b.
2020-11-11 10:26:53 +01:00
Patrik Cyvoct
20fc86df25
fix defaulting
Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
2020-11-07 10:00:59 +01:00
Patrik Cyvoct
0768b45e7b
add nil case in proxy
Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
2020-11-07 10:00:58 +01:00
Patrik Cyvoct
540901779c
fix reviews
Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
2020-11-07 10:00:53 +01:00
Patrik Cyvoct
0153b96ab8
fix review
Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
2020-11-07 10:00:27 +01:00
Patrik Cyvoct
47ae7cbf52
Add route type field to loadbalancer status ingress
Signed-off-by: Patrik Cyvoct <patrik@ptrk.io>
2020-11-07 09:59:58 +01:00
Kubernetes Prow Robot
0451848d64
Merge pull request #95787 from qingsenLi/k8s201022-format
format incorrectAddresses in klog
2020-11-05 11:50:33 -08:00
Khaled Henidak (Kal)
6675eba3ef
dual stack services (#91824)
* api: structure change

* api: defaulting, conversion, and validation

* [FIX] validation: auto remove second ip/family when service changes to SingleStack

* [FIX] api: defaulting, conversion, and validation

* api-server: clusterIPs alloc, printers, storage and strategy

* [FIX] clusterIPs default on read

* alloc: auto remove second ip/family when service changes to SingleStack

* api-server: repair loop handling for clusterIPs

* api-server: force kubernetes default service into single stack

* api-server: tie dualstack feature flag with endpoint feature flag

* controller-manager: feature flag, endpoint, and endpointSlice controllers handling multi family service

* [FIX] controller-manager: feature flag, endpoint, and endpointSlicecontrollers handling multi family service

* kube-proxy: feature-flag, utils, proxier, and meta proxier

* [FIX] kubeproxy: call both proxier at the same time

* kubenet: remove forced pod IP sorting

* kubectl: modify describe to include ClusterIPs, IPFamilies, and IPFamilyPolicy

* e2e: fix tests that depends on IPFamily field AND add dual stack tests

* e2e: fix expected error message for ClusterIP immutability

* add integration tests for dualstack

the third phase of dual stack is a very complex change in the API,
basically it introduces Dual Stack services. Main changes are:

- It pluralizes the Service IPFamily field to IPFamilies,
and removes the singular field.
- It introduces a new field IPFamilyPolicyType that can take
3 values to express the "dual-stack(mad)ness" of the cluster:
SingleStack, PreferDualStack and RequireDualStack
- It pluralizes ClusterIP to ClusterIPs.

The goal is to add coverage to the services API operations,
taking into account the 6 different modes a cluster can have:

- single stack: IP4 or IPv6 (as of today)
- dual stack: IPv4 only, IPv6 only, IPv4 - IPv6, IPv6 - IPv4

* [FIX] add integration tests for dualstack

* generated data

* generated files

Co-authored-by: Antonio Ojea <aojea@redhat.com>
2020-10-26 13:15:59 -07:00
Kubernetes Prow Robot
bdde4fb8f5
Merge pull request #93040 from cmluciano/cml/ipvsschedmodules
ipvs: ensure selected scheduler kernel modules are loaded
2020-10-26 10:25:17 -07:00
Christopher M. Luciano
51ed242194
ipvs: check for existence of scheduler module and fail if not found
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
2020-10-23 17:17:44 -04:00
qingsenLi
9ad39c9eda format incorrectAddresses in klog 2020-10-22 17:26:29 +08:00
Lion-Wei
1f7ea16560 kube-proxy ensure KUBE-MARK-DROP exist but not modify their rules 2020-10-16 14:52:07 +08:00
Amim Knabben
a18e5de51a LockToDefault the ExternalPolicyForExternalIP feature gate 2020-09-16 13:16:33 -04:00
Christopher M. Luciano
65ff4e8227
ipvs: log error if scheduler does not exist and fallback to rr
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
2020-07-23 13:58:02 -04:00
Christopher M. Luciano
e2a0eddaf0
ipvs: ensure selected scheduler kernel modules are loaded
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
2020-07-16 13:21:54 -04:00
Andrew Sy Kim
de2ecd7e2f proxier/ipvs: check already binded addresses in the IPVS dummy interface
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
Co-authored-by: Laurent Bernaille <laurent.bernaille@gmail.com>
2020-07-02 15:32:21 -04:00
Kubernetes Prow Robot
4d0ce2e708
Merge pull request #92584 from aojea/ipvsfix
IPVS: kubelet, kube-proxy: unmark packets before masquerading …
2020-07-01 23:13:57 -07:00
Kubernetes Prow Robot
8623c26150
Merge pull request #90909 from kumarvin123/feature/WindowsEpSlices
EndPointSlices implementation for Windows
2020-07-01 23:12:01 -07:00
Antonio Ojea
c40081b550 kube-proxy ipvs masquerade hairpin traffic
Masquerade de traffic that loops back to the originator
before they hit the kubernetes-specific postrouting rules

Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
2020-07-01 09:16:19 +02:00
Kubernetes Prow Robot
8a76c27b8d
Merge pull request #88573 from davidstack/master
the result value of functrion NodeIPS will contain the docker0 ip , update the comment
2020-06-30 00:01:59 -07:00
Vinod K L Swamy
4505d5b182
Changes to Proxy common code 2020-06-29 14:29:46 -07:00
Damon Wang
b199dd8ee1 update the comment of NodeIPs function 2020-06-29 15:29:16 +08:00
Kubernetes Prow Robot
73fa63a86d
Merge pull request #92035 from danwinship/unmark-before-masq
kubelet, kube-proxy: unmark packets before masquerading them
2020-06-16 00:50:03 -07:00
Dan Winship
c12534d8b4 kubelet, kube-proxy: unmark packets before masquerading them
It seems that if you set the packet mark on a packet and then route
that packet through a kernel VXLAN interface, the VXLAN-encapsulated
packet will still have the mark from the original packet. Since our
NAT rules are based on the packet mark, this was causing us to
double-NAT some packets, which then triggered a kernel checksumming
bug. But even without the checksum bug, there are reasons to avoid
double-NATting, so fix the rules to unmark the packets before
masquerading them.
2020-06-15 18:45:38 -04:00
Kubernetes Prow Robot
35fc65dc2c
Merge pull request #89998 from Nordix/issue-89923
Filter nodePortAddresses to proxiers
2020-06-13 09:39:55 -07:00
Andrew Sy Kim
18741157ef proxier/ipvs: remove redundant length check for node addresses
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
2020-05-28 11:48:48 -04:00