The APIs talked about "stale services" and "stale endpoints", but the
thing that is actually "stale" is the conntrack entries, not the
services/endpoints. Fix the names to indicate what they actual keep
track of.
Also, all three fields (2 in the endpoints update object and 1 in the
service update object) are currently UDP-specific, but only the
service one made that clear. Fix that too.
In addition to actually updating their data from the provided list of
changes, EndpointsMap.Update() and ServicePortMap.Update() return a
struct with some information about things that changed because of that
update (eg services with stale conntrack entries).
For some reason, they were also returning information about
HealthCheckNodePorts, but they were returning *static* information
based on the current (post-Update) state of the map, not information
about what had *changed* in the update. Since this doesn't match how
the other data in the struct is used (and since there's no reason to
have the data only be returned when you call Update() anyway) , split
it out.
In the dual-stack case, iptables.NewDualStackProxier and
ipvs.NewDualStackProxier filtered the nodeport addresses values by IP
family before creating the single-stack proxiers. But in the
single-stack case, the kube-proxy startup code just passed the value
to the single-stack proxiers without validation, so they had to
re-check it themselves. Fix that.
To check kernel modules is a bad way to check functionality.
This commit just removes the checks and makes it possible to
use a statically linked kernel.
Minimal updates to unit-tests are made.
Some of the unit tests cannot pass on Windows due to various reasons:
- fsnotify does not have a Windows implementation.
- Proxy Mode IPVS not supported on Windows.
- Seccomp not supported on Windows.
- VolumeMode=Block is not supported on Windows.
- iSCSI volumes are mounted differently on Windows, and iscsiadm is a
Linux utility.
If the user passes "--proxy-mode ipvs", and it is not possible to use
IPVS, then error out rather than falling back to iptables.
There was never any good reason to be doing fallback; this was
presumably erroneously added to parallel the iptables-to-userspace
fallback (which only existed because we had wanted iptables to be the
default but not all systems could support it).
In particular, if the user passed configuration options for ipvs, then
they presumably *didn't* pass configuration options for iptables, and
so even if the iptables proxy is able to run, it is likely to be
misconfigured.
The ipvs proxier was figuring out LoadBalancerSourceRanges matches in
the nat table and using KUBE-MARK-DROP to mark unmatched packets to be
dropped later. But with ipvs, unlike with iptables, DNAT happens after
the packet is "delivered" to the dummy interface, so the packet will
still be unmodified when it reaches the filter table (the first time)
so there's no reason to split the work between the nat and filter
tables; we can just do it all from the filter table and call DROP
directly.
Before:
- KUBE-LOAD-BALANCER (in nat) uses kubeLoadBalancerFWSet to match LB
traffic for services using LoadBalancerSourceRanges, and sends it
to KUBE-FIREWALL.
- KUBE-FIREWALL uses kubeLoadBalancerSourceCIDRSet and
kubeLoadBalancerSourceIPSet to match allowed source/dest combos
and calls "-j RETURN".
- All remaining traffic that doesn't escape KUBE-FIREWALL is sent to
KUBE-MARK-DROP.
- Traffic sent to KUBE-MARK-DROP later gets dropped by chains in
filter created by kubelet.
After:
- All INPUT and FORWARD traffic gets routed to KUBE-PROXY-FIREWALL
(in filter). (We don't use "KUBE-FIREWALL" any more because
there's already a chain in filter by that name that belongs to
kubelet.)
- KUBE-PROXY-FIREWALL sends traffic matching kubeLoadbalancerFWSet
to KUBE-SOURCE-RANGES-FIREWALL
- KUBE-SOURCE-RANGES-FIREWALL uses kubeLoadBalancerSourceCIDRSet and
kubeLoadBalancerSourceIPSet to match allowed source/dest combos
and calls "-j RETURN".
- All remaining traffic that doesn't escape
KUBE-SOURCE-RANGES-FIREWALL is dropped (directly via "-j DROP").
- (KUBE-LOAD-BALANCER in nat is now used only to set up masquerading)
kubeLoadbalancerFWSet was the only LoadBalancer-related identifier
with a lowercase "b", so fix that.
rename TestLoadBalanceSourceRanges to TestLoadBalancerSourceRanges to
match the field name (and the iptables proxier test).
FakeIPTables barely implemented any of the iptables interface, and the
main part that it did implement, it implemented incorrectly. Fix it:
- Implement EnsureChain, DeleteChain, EnsureRule, and DeleteRule, not
just SaveInto/Restore/RestoreAll.
- Restore/RestoreAll now correctly merge the provided state with the
existing state, rather than simply overwriting it.
- SaveInto now returns the table that was requested, rather than just
echoing back the Restore/RestoreAll.
There were previously some strange iptables-rule-parsing functions
that were only used by two unit tests in pkg/proxy/ipvs. Get rid of
them and replace them with some much better iptables-rule-parsing
functions.
The iptables rule that matches kubeNodePortLocalSetSCTP must be inserted
before the one matches kubeNodePortSetSCTP, otherwise all SCTP traffic
would be masqueraded regardless of whether its ExternalTrafficPolicy is
Local or not.
To cover the case in tests, the patch adds rule order validation to
checkIptables.
* pkg/features: promote the ServiceInternalTrafficPolicy field to Beta and on by default
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
* pkg/api/service/testing: update Service test fixture functions to set internalTrafficPolicy=Cluster by default
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
* pkg/apis/core/validation: add more Service validation tests for internalTrafficPolicy
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
* pkg/registry/core/service/storage: fix failing Service REST storage tests to use internalTrafficPolicy: Cluster
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
* pkg/registry/core/service/storage: add two test cases for Service REST TestServiceRegistryInternalTrafficPolicyClusterThenLocal and TestServiceRegistryInternalTrafficPolicyLocalThenCluster
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
* pkg/registry/core/service: update strategy unit tests to expect default
internalTrafficPolicy=Cluster
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
* pkg/proxy/ipvs: fix unit test Test_EndpointSliceReadyAndTerminatingLocal to use internalTrafficPolicy=Cluster
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
* pkg/apis/core: update fuzzers to set Service internalTrafficPolicy field
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
* pkg/api/service/testing: refactor Service test fixtures to use Tweak funcs
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>