Kube/proxy, in NodeCIDR local detector mode, uses the node.Spec.PodCIDRs
field to build the Services iptables rules.
The Node object depends on the kubelet, but if kube-proxy runs as a
static pods or as a standalone binary, it is not possible to guarantee
that the values obtained at bootsrap are valid, causing traffic outages.
Kube-proxy has to react on node changes to avoid this problems, it
simply restarts if detect that the node PodCIDRs have changed.
In case that the Node has been deleted, kube-proxy will only log an
error and keep working, since it may break graceful shutdowns of the
node.
Some of the unit tests cannot pass on Windows due to various reasons:
- fsnotify does not have a Windows implementation.
- Proxy Mode IPVS not supported on Windows.
- Seccomp not supported on Windows.
- VolumeMode=Block is not supported on Windows.
- iSCSI volumes are mounted differently on Windows, and iscsiadm is a
Linux utility.
wire up feature_gate.go with metrics via AddMetrics method
Change-Id: I9b4f6b04c0f4eb9bcb198b16284393d21c774ad8
wire in metrics to kubernetes components
Change-Id: I6d4ef8b26f149f62b03f32d1658f04f3056fe4dc
rename metric since we're using the value to determine if enabled is true or false
Change-Id: I13a6b6df90a5ffb4b9c5b34fa187562413bea029
Update staging/src/k8s.io/component-base/featuregate/feature_gate.go
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
If the user passes "--proxy-mode ipvs", and it is not possible to use
IPVS, then error out rather than falling back to iptables.
There was never any good reason to be doing fallback; this was
presumably erroneously added to parallel the iptables-to-userspace
fallback (which only existed because we had wanted iptables to be the
default but not all systems could support it).
In particular, if the user passed configuration options for ipvs, then
they presumably *didn't* pass configuration options for iptables, and
so even if the iptables proxy is able to run, it is likely to be
misconfigured.
Back when iptables was first made the default, there were
theoretically some users who wouldn't have been able to support it due
to having an old /sbin/iptables. But kube-proxy no longer does the
things that didn't work with old iptables, and we removed that check a
long time ago. There is also a check for a new-enough kernel version,
but it's checking for a feature which was added in kernel 3.6, and no
one could possibly be running Kubernetes with a kernel that old. So
the fallback code now never actually falls back, so it should just be
removed.
This was implemented partly in server.go and partly in
server_others.go even though even the parts in server.go were totally
linux-specific. Simplify things by putting it all in server_others.go
and get rid of some unnecessary abstraction.
This commit adds the framework for the new local detection
modes BridgeInterface and InterfaceNamePrefix to work.
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This change adds 2 options for windows:
--forward-healthcheck-vip: If true forward service VIP for health check
port
--root-hnsendpoint-name: The name of the hns endpoint name for root
namespace attached to l2bridge, default is cbr0
When --forward-healthcheck-vip is set as true and winkernel is used,
kube-proxy will add an hns load balancer to forward health check request
that was sent to lb_vip:healthcheck_port to the node_ip:healthcheck_port.
Without this forwarding, the health check from google load balancer will
fail, and it will stop forwarding traffic to the windows node.
This change fixes the following 2 cases for service:
- `externalTrafficPolicy: Cluster` (default option): healthcheck_port is
10256 for all services. Without this fix, all traffic won't be directly
forwarded to windows node. It will always go through a linux node and
get forwarded to windows from there.
- `externalTrafficPolicy: Local`: different healthcheck_port for each
service that is configured as local. Without this fix, this feature
won't work on windows node at all. This feature preserves client ip
that tries to connect to their application running in windows pod.
Change-Id: If4513e72900101ef70d86b91155e56a1f8c79719
* kube-proxy cluder-cidr arg accepts comma-separated list
It is possible in dual-stack clusters to provide kube-proxy with
a comma-separated list with an IPv4 and IPv6 CIDR for pods.
update: signoff
update2: update email profile
Signed-off-by: Tyler Lloyd <Tyler.Lloyd@microsoft.com>
Signed-off-by: Tyler Lloyd <tylerlloyd928@gmail.com>
* Updating cluster-cidr comment description
Signed-off-by: Tyler Lloyd <tyler.lloyd@microsoft.com>
All Kubernetes commands should show flags with hyphens in their help text even
when the flag originally was defined with underscore. Converting a command to
this style is not breaking its command line API because the old-style parameter
with underscore is accepted as alias.
The easiest solution to achieve this is to set normalization shortly before
running the command in the new central cli.Run or the few places where that
function isn't used yet.
There may be some texts which depends on normalization at flag definition time,
like the --logging-format usage warning. Those get generated assuming that
hyphens will be used.
It wasn't documented that InitLogs already uses the log flush frequency, so
some commands have called it before parsing (for example, kubectl in the
original code for logs.go). The flag never had an effect in such commands.
Fixing this turned into a major refactoring of how commands set up flags and
run their Cobra command:
- component-base/logs: implicitely registering flags during package init is an
anti-pattern that makes it impossible to use the package in commands which
want full control over their command line. Logging flags must be added
explicitly now, something that the new cli.Run does automatically.
- component-base/logs: AddFlags would have crashed in kubectl-convert if it
had been called because it relied on the global pflag.CommandLine. This
has been fixed and kubectl-convert now has the same --log-flush-frequency
flag as other commands.
- component-base/logs/testinit: an exception are tests where flag.CommandLine has
to be used. This new package can be imported to add flags to that
once per test program.
- Normalization of the klog command line flags was inconsistent. Some commands
unintentionally didn't normalize to the recommended format with hyphens. This
gets fixed for sample programs, but not for production programs because
it would be a breaking change.
This refactoring has the following user-visible effects:
- The validation error for `go run ./cmd/kube-apiserver --logging-format=json
--add-dir-header` now references `add-dir-header` instead of `add_dir_header`.
- `staging/src/k8s.io/cloud-provider/sample` uses flags with hyphen instead of
underscore.
- `--log-flush-frequency` is not listed anymore in the --logging-format flag's
`non-default formats don't honor these flags` usage text because it will also
work for non-default formats once it is needed.
- `cmd/kubelet`: the description of `--logging-format` uses hyphens instead of
underscores for the flags, which now matches what the command is using.
- `staging/src/k8s.io/component-base/logs/example/cmd`: added logging flags.
- `apiextensions-apiserver` no longer prints a useless stack trace for `main`
when command line parsing raises an error.
Because the proxy.Provider interface included
proxyconfig.EndpointsHandler, all the backends needed to
implement its methods. But iptables, ipvs, and winkernel implemented
them as no-ops, and metaproxier had an implementation that wouldn't
actually work (because it couldn't handle Services with no active
Endpoints).
Since Endpoints processing in kube-proxy is deprecated (and can't be
re-enabled unless you're using a backend that doesn't support
EndpointSlice), remove proxyconfig.EndpointsHandler from the
definition of proxy.Provider and drop all the useless implementations.
With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values.
The code change was mistakenly created as PR in the k3s project (see https://github.com/k3s-io/k3s/pull/3505).
A real life use case is described in Rancher issue https://github.com/rancher/rancher/issues/33360.
When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in:
```
I0624 07:38:23.053960 54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0624 07:38:23.053999 54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
```
However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.
Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com
We were detecting the IP family that kube-proxy should use
based on the bind address, however, this is not valid when
using an unspecified address, because on those cases
kube-proxy adopts the IP family of the address reported
in the Node API object.
The IP family will be determined by the nodeIP used by the proxier
The order of precedence is:
1. config.bindAddress if bindAddress is not 0.0.0.0 or ::
2. the primary IP from the Node object, if set
3. if no IP is found it defaults to 127.0.0.1 and IPv4
Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
In case a malformed flag is passed to k8s components
such as "–foo", where "–" is not an ASCII dash character,
the components currently silently ignore the flag
and treat it as a positional argument.
Make k8s components/commands exit with an error if a positional argument
that is not empty is found. Include a custom error message for all
components except kubeadm, as cobra.NoArgs is used in a lot of
places already (can be fixed in a followup).
The kubelet already handles this properly - e.g.:
'unknown command: "–foo"'
This change affects:
- cloud-controller-manager
- kube-apiserver
- kube-controller-manager
- kube-proxy
- kubeadm {alpha|config|token|version}
- kubemark
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Lubomir I. Ivanov <lubomirivanov@vmware.com>
This builds on previous work but only sets the sysctlConnReuse value
if the kernel is known to be above 4.19. To avoid calling GetKernelVersion
twice, I store the value from the CanUseIPVS method and then check the version
constraint at time of expected sysctl call.
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
This creates a new EndpointSliceProxying feature gate to cover EndpointSlice
consumption (kube-proxy) and allow the existing EndpointSlice feature gate to
focus on EndpointSlice production only. Along with that addition, this enables
the EndpointSlice feature gate by default, now only affecting the controller.
The rationale here is that it's really difficult to guarantee all EndpointSlices
are created in a cluster upgrade process before kube-proxy attempts to consume
them. Although masters are generally upgraded before nodes, and in most cases,
the controller would have enough time to create EndpointSlices before a new node
with kube-proxy spun up, there are plenty of edge cases where that might not be
the case. The primary limitation on EndpointSlice creation is the API rate limit
of 20QPS. In clusters with a lot of endpoints and/or with a lot of other API
requests, it could be difficult to create all the EndpointSlices before a new
node with kube-proxy targeting EndpointSlices spun up.
Separating this into 2 feature gates allows for a more gradual rollout with the
EndpointSlice controller being enabled by default in 1.18, and EndpointSlices
for kube-proxy being enabled by default in the next release.
This patch removes pkg/util/mount completely, and replaces it with the
mount package now located at k8s.io/utils/mount. The code found at
k8s.io/utils/mount was moved there from pkg/util/mount, so the code is
identical, just no longer in-tree to k/k.
Removed unneeded comments
Matched style from other PR's
Only print error when lenient decoding is successful
Update Bazel for BUILD
Comment out existing strict decoder tests
Added tests for leniant path
Added comments to explain test additions
Cleanup TODO's and tests
Add explicit newline for appended config
Kube-proxy runs two different health servers; one for monitoring the
health of kube-proxy itself, and one for monitoring the health of
specific services. Rename them to "ProxierHealthServer" and
"ServiceHealthServer" to make this clearer, and do a bit of API
cleanup too.