As mentioned in issue #80061, in iptables lock contention case,
we can see increasing rate of iptables restore failures because it
need to grab iptables file lock.
The failure metric can provide administrators more insight
Metrics will be collected in kube-proxy iptables and ipvs modes
Signed-off-by: Hui Luo <luoh@vmware.com>
Kube-proxy's iptables mode used to care whether utiliptables's
EnsureRule was able to use "iptables -C" or if it had to implement it
hackily using "iptables-save". But that became irrelevant when
kube-proxy was reimplemented using "iptables-restore", and no one ever
noticed. So remove that check.
ipvs `getProxyMode` test fails on mac as `utilipvs.GetRequiredIPVSMods`
try to reach `/proc/sys/kernel/osrelease` to find version of the running
linux kernel. Linux kernel version is used to determine the list of required
kernel modules for ipvs.
Logic to determine kernel version is moved to GetKernelVersion
method in LinuxKernelHandler which implements ipvs.KernelHandler.
Mock KernelHandler is used in the test cases.
Read and parse file is converted to go function instead of execing cut.
Computing all node ips twice would always happen when no node port
addresses were explicitly set. The GetNodeAddresses call would return
two zero cidrs (ipv4 and ipv6) and we would then retrieve all node IPs
twice because the loop wouldn't break after the first time.
Also, it is possible for the user to set explicit node port addresses
including both a zero and a non-zero cidr, but this wouldn't make sense
for nodeIPs since the zero cidr would already cause nodeIPs to include
all IPs on the node.
Previously the same ip addresses would be computed for each nodePort
service and this could be CPU intensive for a large number of nodePort
services with a large number of ipaddresses on the node.
refs https://github.com/kubernetes/perf-tests/issues/640
We have too fine buckets granularity for lower latencies, at cost of the higher
latecies (7+ minutes). This is causing spikes in SLI calculated based on that
metrics.
I don't have strong opinion about actual values - those seemed to be better
matching our need. But let's have discussion about them.
Values:
0.015 s
0.030 s
0.060 s
0.120 s
0.240 s
0.480 s
0.960 s
1.920 s
3.840 s
7.680 s
15.360 s
30.720 s
61.440 s
122.880 s
245.760 s
491.520 s
983.040 s
1966.080 s
3932.160 s
7864.320 s
Currently the BaseServiceInfo struct implements the ServicePort interface, but
only uses that interface sometimes. All the elements of BaseServiceInfo are exported
and sometimes the interface is used to access them and othertimes not
I extended the ServicePort interface so that all relevent values can be accessed through
it and unexported all the elements of BaseServiceInfo
This adds some useful metrics around pending changes and last successful
sync time.
The goal is for administrators to be able to alert on proxies that, for
whatever reason, are quite stale.
Signed-off-by: Casey Callendrello <cdc@redhat.com>
use split host port func instead trim specific character
add unit test for metrics and healthz bind address
recover import package
refactor set default kube proxy configuration
fix ipv4 condition
fix set default port condition
rewrite call function occasion to reduce error
set ipv6 default value
move get GetBindAddressHostPort to util
use one func to handle deprecated series
update bazel
define address type
return earlier in the error case
refactor set default kube proxy configuration logic
recover import package
preserve some of the original comments
add get default address func
add append port if needed unit test
rewrite unit test for deprecated flags
remove unused codes
Per recommendation of @thockin:
https://github.com/kubernetes/kubernetes/pull/71735#pullrequestreview-189515580
---
IMO this code is as dead as it could be. The only significant user is OpenShift as far as I know. I'd rather never touch it again, but I know that is not realistic.
Also, it seems like maybe this could be broken into a couple commits for easier review?
I raised some questions about this design, but I think you should add yourselves as approvers in OWNERS for this subdir. If it evolves, I will lose context on the impl. I don't think it is covered by e2e, either (more argument for breaking it to a separate repo and having its own e2e tests)
---
The userspace proxy does not have any ratelimiting and when many
services are used will hammer iptables every time a service or
endpoint change occurs. Instead build up a map of changed
services and process all those changes at once instead of each
time an event comes in. This also ensures that no long-running
processing happens in the same call chain as the OnService*
calls as this blocks other handlers attached to the proxy's
parent ServiceConfig object for long periods of time.
Locking can also now be simplified as the only accesses to the
proxy's serviceMap happen from syncProxyRules(). So instead of
locking in many functions just lock once in syncProxyRules()
like the other proxies do.
https://bugzilla.redhat.com/show_bug.cgi?id=1590589https://bugzilla.redhat.com/show_bug.cgi?id=1689690