We enabled topology hint on one of our services and this log line was
emitted ~92 million times in one day from one cluster tripping our log
quota for that cluster, as it is the log line cannot be disabled via the
`-v` flag because it does not specify verbosity.
I think more log locations need to set verbosity at which they are
logged, but this one is currently hurting the most.
Do an extra "add+delete" once to ensure all previous base chains in the
table will be recreated. Otherwise, altering properties (e.g. priority)
of these chains would fail the transaction.
Signed-off-by: Quan Tian <qtian@vmware.com>
The winkernel code was originally based on the iptables code but never
made use of some parts of it. (e.g., it logs a warning if you didn't
set `--cluster-cidr`, even though it doesn't actually use
`--cluster-cidr` if you do set it.)
NFTables proxy will no longer install drop and reject rules for node
port services with no endpoints in chains associated with forward and
output hooks.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
NFTables proxy will now drop traffic directed towards unallocated
ClusterIPs and reject traffic directed towards invalid ports of
Cluster IPs.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
This replaces the klog formatting and message routing with a simpler
implementation that uses less code. The main difference is that we skip the
entire unused message routing.
Instead, the same split output streams as for JSON gets implemented in the
io.Writer implementation that gets passed to the textlogger.
And use the fake interface in the unit tests, removing the dependency
on setting up FakeExec stuff when conntrack cleanup will be invoked.
Also, remove the isIPv6 argument to CleanStaleEntries, because it can
be inferred from the other args.
The iptables and nftables proxy backends had 2 unit tests
(TestDeleteEndpointConnections and TestProxierDeleteNodePortStaleUDP)
that were effectively testing that:
- If the proxy saw various Service/EndpointSlice events this would
result in specific changes to the service/endpoints trackers, AND
- If the service/endpoints trackers changed in those specific ways
this would result in specific UpdateServiceMapResult and
UpdateEndpointsMapResult values being generated, AND
- If you passed those specific UpdateServiceMapResult and
UpdateEndpointsMapResult values to conntrack.CleanStaleEntries it
would make specific calls to the lower-level conntrack methods,
AND
- If you called the lower-level conntrack methods with those
specific arguments, it would result in specific executions of the
conntrack binary, mixed with a specific number of klog
invocations.
This... is not a good unit test. We already test the change tracker
behavior in other unit tests, and we already tested the
Update{Service,Endpoints}MapResult behavior in the pkg/proxy unit
tests, and we already tested the conntrack exec behavior in
pkg/proxy/conntrack/conntrack_test.go, and we now test the
CleanStaleEntries behavior in pkg/proxy/conntrack/cleanup_test.go. So
there is no need to try to test the top-to-bottom behavior as a "unit
test".
Add an interface between CleanStaleEntries and the lower-level
conntrack helpers (ClearEntriesForIP, etc), and a fake implementation
of that interface, so that we can explicitly test CleanStaleEntries's
logic.
Remove some comments from conntrack.go that were explaining the
functions' callers rather than explaining the functions themselves
(and which were redundant with other comments in the callers anyway).
Fix the test names to match the functions they are testing.
Abstract out the repetitive FakeExec handling.
Explicitly specify the "expectCommand" in each one, to make it clearer
that that's really the part that we're testing.
For everything except TestExec(), test each case with both a "success"
result and a "nothing to delete" result from the conntrack binary.
Previously, the firewall-check chain was run in input, forward, and
output hook but not prerouting hook. When the LoadBalancer traffic
arrived at input or forward hook, it had been DNATed to endpoint IP and
port, so the firewall-check chain didn't take effect, traffic from out
of LoadBalancerSourceRanges was not dropped.
It was not detected by unit test because the chains were sorted by
priority only, while hook should be taken into consideration.
The commit links the firewall-check chain to prerouting hook and unlinks
it from input and forward hook to ensure the traffic is filtered before
DNAT. The priorities of filter chains are updated from "DNATPriority-1"
to "DNATPriority-10" to allow third parties to insert something else
between them.
Signed-off-by: Quan Tian <qtian@vmware.com>
The nftables implementation made use of concatenation of ranges when
creating the set "firewall-allow", but the support was not available
before kernel 5.6. Therefore, nftables mode couldn't run on earlier
kernels, while 5.4 is still widely used.
An alternative of concatenation of ranges is to create a separate
firewall chain for every service port that needs firewalling, and jump
to the service's firewall chain from the common firewall chain via a
rule with vmap.
Renaming from "firewall" to "firewall-ips" is required when changing the
set to the map to support existing clusters to upgrade, otherwise it
would fail to create the map. Besides, "firewall-ips" corresponds to the
"service-ips" map, later we can add use "firewall-nodeports" if it's
determined that NodePort traffic should be subject to
LoadBalancerSourceRanges.
Signed-off-by: Quan Tian <qtian@vmware.com>
In some cases a chain could change from stale to active, but once it's
added to staleChains it would always be deleted once. When the proxier
tries to delete a previously stale but currently active chain, it would
fail and lead to errors, though it won't cause real problem thanks to
kernel's validation.
The commit removes a chain from staleChains if it becomes active.
Signed-off-by: Quan Tian <qtian@vmware.com>
In particular, fix the description of ServiceChangeTracker.Update's
return value, and point out that it's different from
EndpointsChangeTracker.EndpointSliceUpdate's (though fortunately, in a
way that doesn't matter for any existing code).
EndpointSliceCache was using the name "endpointInfo" to refer to two
different data types (most egregiously in addEndpoints(), which had a
variable named `endpoint` of type `*endpointInfo` and a variable named
`endpointInfo` of type `Endpoint`).
Continue using "endpointInfo" in places that refer to proxy.Endpoint /
BaseEndpointInfo, since that's consistent with other code, but rename
the local "cache of the Endpoints field of an EndpointSlice" type from
"endpointInfo" to "endpointData". Likewise, rename endpointSliceInfo
to endpointSliceData, for consistency.
Put the ServiceChangeTracker and EndpointsChangeTracker definitions at
the top of the files, and put the ServicePortMap and EndpointsMap
definitions before their methods.
(No code changes.)
Move the ServicePort/BaseServicePortInfo types to serviceport.go.
Move the Endpoint/BaseEndpointInfo types to endpoint.go.
To avoid confusion with the new filenames, rename service.go to
servicechangetracker.go and endpoints.go to endpointschangetracker.go.
(No code changes; this just moves some code from types.go and
services.go to serviceport.go, and some code from types.go and
endpoints.go to endpoint.go.)
(This would become an error rather than a warning once we try to move
this code to another file.)
Also rename an "ok" variable to "exists" since that what it really
means.
ServicePortMap.merge had a giant comment explaining its return value,
but nothing ever used that return value.
ServicePort had an InternalTrafficPolicy() method, but nothing used it
(because it was redundant with InternalPolicyLocal().)