1. Add API definitions;
2. Add feature gate and drops the field when feature gate is not on;
3. Set default values for the field;
4. Add API Validation
5. add kube-proxy iptables and ipvs implementations
6. add tests
Clear conntrack entries for UDP NodePorts,
this has to be done AFTER the iptables rules are programmed.
It can happen that traffic to the NodePort hits the host before
the iptables rules are programmed this will create an stale entry
in conntrack that will blackhole the traffic, so we need to
clear it ONLY when the service has endpoints.
1. For iptables mode, add KUBE-NODEPORTS chain in filter table. Add
rules to allow healthcheck node port traffic.
2. For ipvs mode, add KUBE-NODE-PORT chain in filter table. Add
KUBE-HEALTH-CHECK-NODE-PORT ipset to allow traffic to healthcheck
node port.
Currently kube-proxy treat ExternalIPs differently depending on:
- the traffic origin
- if the ExternalIP is present or not in the system.
It also depends on the CNI implementation to
discriminate between local and non-local traffic.
Since the ExternalIP belongs to a Service, we can avoid the roundtrip
of sending outside the traffic originated in the cluster.
Also, we leverage the new LocalTrafficDetector to detect the local
traffic and not rely on the CNI implementations for this.
A previous PR (#71573) intended to clear conntrack entry on endpoint
changes when using nodeport by introducing a dedicated function to
remove the stale conntrack entry on the node port and allow traffic to
resume. By doing so, it has introduced a nodeport specific bug where the
conntrack entries related to the ClusterIP does not get clean if
endpoint is changed (issue #96174). We fix by doing ClusterIP cleanup in
all cases.
* api: structure change
* api: defaulting, conversion, and validation
* [FIX] validation: auto remove second ip/family when service changes to SingleStack
* [FIX] api: defaulting, conversion, and validation
* api-server: clusterIPs alloc, printers, storage and strategy
* [FIX] clusterIPs default on read
* alloc: auto remove second ip/family when service changes to SingleStack
* api-server: repair loop handling for clusterIPs
* api-server: force kubernetes default service into single stack
* api-server: tie dualstack feature flag with endpoint feature flag
* controller-manager: feature flag, endpoint, and endpointSlice controllers handling multi family service
* [FIX] controller-manager: feature flag, endpoint, and endpointSlicecontrollers handling multi family service
* kube-proxy: feature-flag, utils, proxier, and meta proxier
* [FIX] kubeproxy: call both proxier at the same time
* kubenet: remove forced pod IP sorting
* kubectl: modify describe to include ClusterIPs, IPFamilies, and IPFamilyPolicy
* e2e: fix tests that depends on IPFamily field AND add dual stack tests
* e2e: fix expected error message for ClusterIP immutability
* add integration tests for dualstack
the third phase of dual stack is a very complex change in the API,
basically it introduces Dual Stack services. Main changes are:
- It pluralizes the Service IPFamily field to IPFamilies,
and removes the singular field.
- It introduces a new field IPFamilyPolicyType that can take
3 values to express the "dual-stack(mad)ness" of the cluster:
SingleStack, PreferDualStack and RequireDualStack
- It pluralizes ClusterIP to ClusterIPs.
The goal is to add coverage to the services API operations,
taking into account the 6 different modes a cluster can have:
- single stack: IP4 or IPv6 (as of today)
- dual stack: IPv4 only, IPv6 only, IPv4 - IPv6, IPv6 - IPv4
* [FIX] add integration tests for dualstack
* generated data
* generated files
Co-authored-by: Antonio Ojea <aojea@redhat.com>
In #56164, we had split the reject rules for non-ep existing services
into KUBE-EXTERNAL-SERVICES chain in order to avoid calling KUBE-SERVICES
from INPUT. However in #74394 KUBE-SERVICES was re-added into INPUT.
As noted in #56164, kernel is sensitive to the size of INPUT chain. This
patch refrains from calling the KUBE-SERVICES chain from INPUT and FORWARD,
instead adds the lb reject rule to the KUBE-EXTERNAL-SERVICES chain which will be
called from INPUT and FORWARD.
Before this fix, a Service with a loadBalancerSourceRange value that
included a space would cause kube-proxy to crashloop. This updates
kube-proxy to trim any space from that field.
It seems that if you set the packet mark on a packet and then route
that packet through a kernel VXLAN interface, the VXLAN-encapsulated
packet will still have the mark from the original packet. Since our
NAT rules are based on the packet mark, this was causing us to
double-NAT some packets, which then triggered a kernel checksumming
bug. But even without the checksum bug, there are reasons to avoid
double-NATting, so fix the rules to unmark the packets before
masquerading them.
Fixes two small issues with the metric added in #90175:
1. Bump the timestamp on initial informer sync. Otherwise it remains 0 if
restarting kube-proxy in a quiescent cluster, which isn't quite right.
2. Bump the timestamp even if no healthz server is specified.
This adds a metric, kubeproxy_sync_proxy_rules_last_queued_timestamp,
that captures the last time a change was queued to be applied to the
proxy. This matches the healthz logic, which fails if a pending change
is stale.
This allows us to write alerts that mirror healthz.
Signed-off-by: Casey Callendrello <cdc@redhat.com>