Masquerade de traffic that loops back to the originator
before they hit the kubernetes-specific postrouting rules
Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
It seems that if you set the packet mark on a packet and then route
that packet through a kernel VXLAN interface, the VXLAN-encapsulated
packet will still have the mark from the original packet. Since our
NAT rules are based on the packet mark, this was causing us to
double-NAT some packets, which then triggered a kernel checksumming
bug. But even without the checksum bug, there are reasons to avoid
double-NATting, so fix the rules to unmark the packets before
masquerading them.
Fixes two small issues with the metric added in #90175:
1. Bump the timestamp on initial informer sync. Otherwise it remains 0 if
restarting kube-proxy in a quiescent cluster, which isn't quite right.
2. Bump the timestamp even if no healthz server is specified.
This adds a metric, kubeproxy_sync_proxy_rules_last_queued_timestamp,
that captures the last time a change was queued to be applied to the
proxy. This matches the healthz logic, which fails if a pending change
is stale.
This allows us to write alerts that mirror healthz.
Signed-off-by: Casey Callendrello <cdc@redhat.com>
This builds on previous work but only sets the sysctlConnReuse value
if the kernel is known to be above 4.19. To avoid calling GetKernelVersion
twice, I store the value from the CanUseIPVS method and then check the version
constraint at time of expected sysctl call.
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
This allows the proxier to cache local addresses instead of fetching all
local addresses every time in IsLocalIP.
Signed-off-by: Andrew Sy Kim <kiman@vmware.com>
This creates a new EndpointSliceProxying feature gate to cover EndpointSlice
consumption (kube-proxy) and allow the existing EndpointSlice feature gate to
focus on EndpointSlice production only. Along with that addition, this enables
the EndpointSlice feature gate by default, now only affecting the controller.
The rationale here is that it's really difficult to guarantee all EndpointSlices
are created in a cluster upgrade process before kube-proxy attempts to consume
them. Although masters are generally upgraded before nodes, and in most cases,
the controller would have enough time to create EndpointSlices before a new node
with kube-proxy spun up, there are plenty of edge cases where that might not be
the case. The primary limitation on EndpointSlice creation is the API rate limit
of 20QPS. In clusters with a lot of endpoints and/or with a lot of other API
requests, it could be difficult to create all the EndpointSlices before a new
node with kube-proxy targeting EndpointSlices spun up.
Separating this into 2 feature gates allows for a more gradual rollout with the
EndpointSlice controller being enabled by default in 1.18, and EndpointSlices
for kube-proxy being enabled by default in the next release.