Errors from staticcheck:
pkg/proxy/healthcheck/proxier_health.go:55:2: field port is unused (U1000)
pkg/proxy/healthcheck/proxier_health.go:162:20: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
pkg/proxy/healthcheck/service_health.go:166:20: printf-style function with dynamic format string and no further arguments should use print-style function instead (SA1006)
pkg/proxy/iptables/proxier.go:737:2: this value of args is never used (SA4006)
pkg/proxy/iptables/proxier.go:737:15: this result of append is never used, except maybe in other appends (SA4010)
pkg/proxy/iptables/proxier.go:1287:28: this result of append is never used, except maybe in other appends (SA4010)
pkg/proxy/userspace/proxysocket.go:293:3: this value of n is never used (SA4006)
pkg/proxy/winkernel/metrics.go:74:6: func sinceInMicroseconds is unused (U1000)
pkg/proxy/winkernel/metrics.go:79:6: func sinceInSeconds is unused (U1000)
pkg/proxy/winuserspace/proxier.go:94:2: field portMapMutex is unused (U1000)
pkg/proxy/winuserspace/proxier.go:118:2: field owner is unused (U1000)
pkg/proxy/winuserspace/proxier.go:119:2: field socket is unused (U1000)
pkg/proxy/winuserspace/proxysocket.go:620:4: this value of n is never used (SA4006)
This reverts commit 1ca0ffeaf2.
kube-proxy is not recreating the rules associated to the
KUBE-MARK-DROP chain, that is created by the kubelet.
Is preferrable avoid the dependency between the kubelet and
kube-proxy and that each of them handle their own rules.
Until now, iptables probabilities had 5 decimal places of granularity.
That meant that probabilities would start to repeat once a Service
had 319 or more endpoints.
This doubles the granularity to 10 decimal places, ensuring that
probabilities will not repeat until a Service reaches 100,223 endpoints.
The proxy healthz server assumed that kube-proxy would regularly call
UpdateTimestamp() even when nothing changed, but that's no longer
true. Fix it to only report unhealthiness when updates have been
received from the apiserver but not promptly pushed out to
iptables/ipvs.
Kube-proxy runs two different health servers; one for monitoring the
health of kube-proxy itself, and one for monitoring the health of
specific services. Rename them to "ProxierHealthServer" and
"ServiceHealthServer" to make this clearer, and do a bit of API
cleanup too.
Kubelet and kube-proxy both had loops to ensure that their iptables
rules didn't get deleted, by repeatedly recreating them. But on
systems with lots of iptables rules (ie, thousands of services), this
can be very slow (and thus might end up holding the iptables lock for
several seconds, blocking other operations, etc).
The specific threat that they need to worry about is
firewall-management commands that flush *all* dynamic iptables rules.
So add a new iptables.Monitor() function that handles this by creating
iptables-flush canaries and only triggering a full rule reload after
noticing that someone has deleted those chains.
This should fix a bug that could break masters when the EndpointSlice
feature gate was enabled. This was all tied to how the apiserver creates
and manages it's own services and endpoints (or in this case endpoint
slices). Consumers of endpoint slices also need to know about the
corresponding service. Previously we were trying to set an owner
reference here for this purpose, but that came with potential downsides
and increased complexity. This commit changes behavior of the apiserver
endpointslice integration to set the service name label instead of owner
references, and simplifies consumer logic to reference that (both are
set by the EndpointSlice controller).
Additionally, this should fix a bug with the EndpointSlice GenerateName
value that had previously been set with a "." as a suffix.
Work around Linux kernel bug that sometimes causes multiple flows to
get mapped to the same IP:PORT and consequently some suffer packet
drops.
Also made the same update in kubelet.
Also added cross-pointers between the two bodies of code, in comments.
Some day we should eliminate the duplicate code. But today is not
that day.
As mentioned in issue #80061, in iptables lock contention case,
we can see increasing rate of iptables restore failures because it
need to grab iptables file lock.
The failure metric can provide administrators more insight
Metrics will be collected in kube-proxy iptables and ipvs modes
Signed-off-by: Hui Luo <luoh@vmware.com>
Kube-proxy's iptables mode used to care whether utiliptables's
EnsureRule was able to use "iptables -C" or if it had to implement it
hackily using "iptables-save". But that became irrelevant when
kube-proxy was reimplemented using "iptables-restore", and no one ever
noticed. So remove that check.
Currently the BaseServiceInfo struct implements the ServicePort interface, but
only uses that interface sometimes. All the elements of BaseServiceInfo are exported
and sometimes the interface is used to access them and othertimes not
I extended the ServicePort interface so that all relevent values can be accessed through
it and unexported all the elements of BaseServiceInfo
This adds some useful metrics around pending changes and last successful
sync time.
The goal is for administrators to be able to alert on proxies that, for
whatever reason, are quite stale.
Signed-off-by: Casey Callendrello <cdc@redhat.com>
As part of the endpoint creation process when going from 0 -> 1 conntrack entries
are cleared. This is to prevent an existing conntrack entry from preventing traffic
to the service. Currently the system ignores the existance of the services external IP
addresses, which exposes that errant behavior
This adds the externalIP addresses of udp services to the list of conntrack entries that
get cleared. Allowing traffic to flow
Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
We REJECT every other case. Close this FIXME.
To get this to work in all cases, we have to process service in
filter.INPUT, since LB IPS might be manged as local addresses.
When using NodePort to connect to an endpoint using UDP, if the endpoint is deleted on
restoration of the endpoint traffic does not flow. This happens because conntrack holds
the state of the connection and the proxy does not correctly clear the conntrack entry
for the stale endpoint.
Introduced a new function to conntrack ClearEntriesForPortNAT that uses the endpointIP
and NodePort to remove the stale conntrack entry and allow traffic to resume when
the endpoint is restored.
Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
This will allow for kube-proxy to be run without `privileged` and
with only adding the capability `NET_ADMIN`.
Signed-off-by: Jess Frazelle <acidburn@microsoft.com>
The requested Service Protocol is checked against the supported protocols of GCE Internal LB. The supported protocols are TCP and UDP.
SCTP is not supported by OpenStack LBaaS. If SCTP is requested in a Service with type=LoadBalancer, the request is rejected. Comment style is also corrected.
SCTP is not allowed for LoadBalancer Service and for HostPort. Kube-proxy can be configured not to start listening on the host port for SCTP: see the new SCTPUserSpaceNode parameter
changed the vendor github.com/nokia/sctp to github.com/ishidawataru/sctp. I.e. from now on we use the upstream version.
netexec.go compilation fixed. Various test cases fixed
SCTP related conformance tests removed. Netexec's pod definition and Dockerfile are updated to expose the new SCTP port(8082)
SCTP related e2e test cases are removed as the e2e test systems do not support SCTP
sctp related firewall config is removed from cluster/gce/util.sh. Variable name sctp_addr is corrected to sctpAddr in pkg/proxy/ipvs/proxier.go
cluster/gce/util.sh is copied from master
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Only run connection-rejecting rules on new connections
Kube-proxy has two iptables chains full of rules to reject incoming connections to services that don't have any endpoints. Currently these rules get tested against all incoming packets, but that's unnecessary; if a connection to a given service has already been established, then we can't have been rejecting connections to that service. By only checking the first packet in each new connection, we can get rid of a lot of unnecessary checks on incoming traffic.
Fixes#56842
**Release note**:
```release-note
Additional changes to iptables kube-proxy backend to improve performance on clusters with very large numbers of services.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Delete stale UDP conntrack entries that use hostPort
**What this PR does / why we need it**:
This PR introduces a change to delete stale conntrack entries for UDP connections, specifically for udp connections that use hostPort. When the pod listening on that udp port get updated/restarted(and gets a new ip address), these entries need to be flushed so that ongoing udp connections can recover once the pod is back and the new iptables rules have been installed.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59033
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 55637, 57461, 60268, 60290, 60210). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Don't create no-op iptables rules for services with no endpoints
Currently for all services we create `-t nat -A KUBE-SERVICES` rules that match the destination IPs (ClusterIP, ExternalIP, NodePort IPs, etc) and then jump to the appropriate `KUBE-SVC-XXXXXX` chain. But if the service has no endpoints then the `KUBE-SVC-XXXXXX` chain will be empty and so nothing happens except that we wasted time (a) forcing iptables-restore to parse the match rules, and (b) forcing the kernel to test matches that aren't going to have any effect.
This PR gets rid of the match rules in this case. Which is to say, it changes things so that every incoming service packet is matched *either* by nat rules to rewrite it *or* by filter rules to ICMP reject it, but not both. (Actually, that's not quite true: there are no filter rules to reject Ingress-addressed packets, and I *think* that's a bug?)
I also got rid of some comments that seemed redundant.
The patch is mostly reindentation, so best viewed with `diff -w`.
Partial fix for #56842 / Related to #56164 (which it conflicts with but I'll fix that after one or the other merges).
**Release note**:
```release-note
Removed some redundant rules created by the iptables proxier, to improve performance on systems with very many services.
```