Merge pull request #121046 from danwinship/nftables

kube-proxy nftables backend
This commit is contained in:
Kubernetes Prow Robot
2023-11-01 01:50:59 +01:00
committed by GitHub
36 changed files with 10632 additions and 79 deletions

205
LICENSES/vendor/github.com/danwinship/knftables/LICENSE generated vendored Normal file
View File

@@ -0,0 +1,205 @@
= vendor/github.com/danwinship/knftables licensed under: =
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
= vendor/github.com/danwinship/knftables/LICENSE 86d3f3a95c324c9479bd8986968f4327

View File

@@ -553,6 +553,7 @@ API rule violation: names_match,k8s.io/kube-controller-manager/config/v1alpha1,V
API rule violation: names_match,k8s.io/kube-controller-manager/config/v1alpha1,VolumeConfiguration,FlexVolumePluginDir
API rule violation: names_match,k8s.io/kube-controller-manager/config/v1alpha1,VolumeConfiguration,PersistentVolumeRecyclerConfiguration
API rule violation: names_match,k8s.io/kube-proxy/config/v1alpha1,KubeProxyConfiguration,IPTables
API rule violation: names_match,k8s.io/kube-proxy/config/v1alpha1,KubeProxyConfiguration,NFTables
API rule violation: names_match,k8s.io/kubelet/config/v1beta1,KubeletConfiguration,IPTablesDropBit
API rule violation: names_match,k8s.io/kubelet/config/v1beta1,KubeletConfiguration,IPTablesMasqueradeBit
API rule violation: names_match,k8s.io/kubelet/config/v1beta1,KubeletConfiguration,ResolverConfig

View File

@@ -375,9 +375,12 @@ func (o *Options) Run() error {
return o.writeConfigFile()
}
err := platformCleanup(o.config.Mode, o.CleanupAndExit)
if o.CleanupAndExit {
return cleanupAndExit()
return err
}
// We ignore err otherwise; the cleanup is best-effort, and the backends will have
// logged messages if they failed in interesting ways.
proxyServer, err := newProxyServer(o.config, o.master, o.InitAndExit)
if err != nil {

View File

@@ -40,9 +40,11 @@ import (
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
utilfeature "k8s.io/apiserver/pkg/util/feature"
clientset "k8s.io/client-go/kubernetes"
toolswatch "k8s.io/client-go/tools/watch"
utilsysctl "k8s.io/component-helpers/node/util/sysctl"
"k8s.io/kubernetes/pkg/features"
"k8s.io/kubernetes/pkg/proxy"
proxyconfigapi "k8s.io/kubernetes/pkg/proxy/apis/config"
"k8s.io/kubernetes/pkg/proxy/iptables"
@@ -50,6 +52,7 @@ import (
utilipset "k8s.io/kubernetes/pkg/proxy/ipvs/ipset"
utilipvs "k8s.io/kubernetes/pkg/proxy/ipvs/util"
proxymetrics "k8s.io/kubernetes/pkg/proxy/metrics"
"k8s.io/kubernetes/pkg/proxy/nftables"
proxyutil "k8s.io/kubernetes/pkg/proxy/util"
proxyutiliptables "k8s.io/kubernetes/pkg/proxy/util/iptables"
utiliptables "k8s.io/kubernetes/pkg/util/iptables"
@@ -100,60 +103,73 @@ func (s *ProxyServer) platformSetup() error {
return nil
}
// isIPTablesBased checks whether mode is based on iptables rather than nftables
func isIPTablesBased(mode proxyconfigapi.ProxyMode) bool {
return mode == proxyconfigapi.ProxyModeIPTables || mode == proxyconfigapi.ProxyModeIPVS
}
// getIPTables returns an array of [IPv4, IPv6] utiliptables.Interfaces. If primaryFamily
// is not v1.IPFamilyUnknown then it will also separately return the interface for just
// that family.
func getIPTables(primaryFamily v1.IPFamily) ([2]utiliptables.Interface, utiliptables.Interface) {
execer := exec.New()
// Create iptables handlers for both families. Always ordered as IPv4, IPv6
ipt := [2]utiliptables.Interface{
utiliptables.New(execer, utiliptables.ProtocolIPv4),
utiliptables.New(execer, utiliptables.ProtocolIPv6),
}
var iptInterface utiliptables.Interface
if primaryFamily == v1.IPv4Protocol {
iptInterface = ipt[0]
} else if primaryFamily == v1.IPv6Protocol {
iptInterface = ipt[1]
}
return ipt, iptInterface
}
// platformCheckSupported is called immediately before creating the Proxier, to check
// what IP families are supported (and whether the configuration is usable at all).
func (s *ProxyServer) platformCheckSupported() (ipv4Supported, ipv6Supported, dualStackSupported bool, err error) {
execer := exec.New()
ipt := utiliptables.New(execer, utiliptables.ProtocolIPv4)
ipv4Supported = ipt.Present()
ipt = utiliptables.New(execer, utiliptables.ProtocolIPv6)
ipv6Supported = ipt.Present()
if isIPTablesBased(s.Config.Mode) {
ipt, _ := getIPTables(v1.IPFamilyUnknown)
ipv4Supported = ipt[0].Present()
ipv6Supported = ipt[1].Present()
if !ipv4Supported && !ipv6Supported {
err = fmt.Errorf("iptables is not available on this host")
} else if !ipv4Supported {
klog.InfoS("No iptables support for family", "ipFamily", v1.IPv4Protocol)
} else if !ipv6Supported {
klog.InfoS("No iptables support for family", "ipFamily", v1.IPv6Protocol)
}
} else {
// Assume support for both families.
// FIXME: figure out how to check for kernel IPv6 support using nft
ipv4Supported, ipv6Supported = true, true
}
// The Linux proxies can always support dual-stack if they can support both IPv4
// and IPv6.
dualStackSupported = ipv4Supported && ipv6Supported
if !ipv4Supported && !ipv6Supported {
err = fmt.Errorf("iptables is not available on this host")
} else if !ipv4Supported {
klog.InfoS("No iptables support for family", "ipFamily", v1.IPv4Protocol)
} else if !ipv6Supported {
klog.InfoS("No iptables support for family", "ipFamily", v1.IPv6Protocol)
}
return
}
// createProxier creates the proxy.Provider
func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguration, dualStack, initOnly bool) (proxy.Provider, error) {
var proxier proxy.Provider
var localDetectors [2]proxyutiliptables.LocalTrafficDetector
var localDetector proxyutiliptables.LocalTrafficDetector
var err error
primaryProtocol := utiliptables.ProtocolIPv4
if s.PrimaryIPFamily == v1.IPv6Protocol {
primaryProtocol = utiliptables.ProtocolIPv6
}
execer := exec.New()
iptInterface := utiliptables.New(execer, primaryProtocol)
var ipt [2]utiliptables.Interface
// Create iptables handlers for both families, one is already created
// Always ordered as IPv4, IPv6
if primaryProtocol == utiliptables.ProtocolIPv4 {
ipt[0] = iptInterface
ipt[1] = utiliptables.New(execer, utiliptables.ProtocolIPv6)
} else {
ipt[0] = utiliptables.New(execer, utiliptables.ProtocolIPv4)
ipt[1] = iptInterface
}
if config.Mode == proxyconfigapi.ProxyModeIPTables {
klog.InfoS("Using iptables Proxier")
if dualStack {
// Always ordered to match []ipt
var localDetectors [2]proxyutiliptables.LocalTrafficDetector
ipt, _ := getIPTables(s.PrimaryIPFamily)
localDetectors, err = getDualStackLocalDetectorTuple(config.DetectLocalMode, config, s.podCIDRs)
if err != nil {
return nil, fmt.Errorf("unable to create proxier: %v", err)
@@ -163,7 +179,7 @@ func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguratio
proxier, err = iptables.NewDualStackProxier(
ipt,
utilsysctl.New(),
execer,
exec.New(),
config.IPTables.SyncPeriod.Duration,
config.IPTables.MinSyncPeriod.Duration,
config.IPTables.MasqueradeAll,
@@ -179,7 +195,7 @@ func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguratio
)
} else {
// Create a single-stack proxier if and only if the node does not support dual-stack (i.e, no iptables support).
var localDetector proxyutiliptables.LocalTrafficDetector
_, iptInterface := getIPTables(s.PrimaryIPFamily)
localDetector, err = getLocalDetector(s.PrimaryIPFamily, config.DetectLocalMode, config, s.podCIDRs)
if err != nil {
return nil, fmt.Errorf("unable to create proxier: %v", err)
@@ -190,7 +206,7 @@ func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguratio
s.PrimaryIPFamily,
iptInterface,
utilsysctl.New(),
execer,
exec.New(),
config.IPTables.SyncPeriod.Duration,
config.IPTables.MinSyncPeriod.Duration,
config.IPTables.MasqueradeAll,
@@ -210,6 +226,7 @@ func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguratio
return nil, fmt.Errorf("unable to create proxier: %v", err)
}
} else if config.Mode == proxyconfigapi.ProxyModeIPVS {
execer := exec.New()
ipsetInterface := utilipset.New(execer)
ipvsInterface := utilipvs.New()
if err := ipvs.CanUseIPVSProxier(ipvsInterface, ipsetInterface, config.IPVS.Scheduler); err != nil {
@@ -218,8 +235,9 @@ func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguratio
klog.InfoS("Using ipvs Proxier")
if dualStack {
ipt, _ := getIPTables(s.PrimaryIPFamily)
// Always ordered to match []ipt
var localDetectors [2]proxyutiliptables.LocalTrafficDetector
localDetectors, err = getDualStackLocalDetectorTuple(config.DetectLocalMode, config, s.podCIDRs)
if err != nil {
return nil, fmt.Errorf("unable to create proxier: %v", err)
@@ -250,7 +268,7 @@ func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguratio
initOnly,
)
} else {
var localDetector proxyutiliptables.LocalTrafficDetector
_, iptInterface := getIPTables(s.PrimaryIPFamily)
localDetector, err = getLocalDetector(s.PrimaryIPFamily, config.DetectLocalMode, config, s.podCIDRs)
if err != nil {
return nil, fmt.Errorf("unable to create proxier: %v", err)
@@ -282,6 +300,58 @@ func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguratio
initOnly,
)
}
if err != nil {
return nil, fmt.Errorf("unable to create proxier: %v", err)
}
} else if config.Mode == proxyconfigapi.ProxyModeNFTables {
klog.InfoS("Using nftables Proxier")
if dualStack {
localDetectors, err = getDualStackLocalDetectorTuple(config.DetectLocalMode, config, s.podCIDRs)
if err != nil {
return nil, fmt.Errorf("unable to create proxier: %v", err)
}
// TODO this has side effects that should only happen when Run() is invoked.
proxier, err = nftables.NewDualStackProxier(
utilsysctl.New(),
config.NFTables.SyncPeriod.Duration,
config.NFTables.MinSyncPeriod.Duration,
config.NFTables.MasqueradeAll,
int(*config.NFTables.MasqueradeBit),
localDetectors,
s.Hostname,
s.NodeIPs,
s.Recorder,
s.HealthzServer,
config.NodePortAddresses,
initOnly,
)
} else {
// Create a single-stack proxier if and only if the node does not support dual-stack
localDetector, err = getLocalDetector(s.PrimaryIPFamily, config.DetectLocalMode, config, s.podCIDRs)
if err != nil {
return nil, fmt.Errorf("unable to create proxier: %v", err)
}
// TODO this has side effects that should only happen when Run() is invoked.
proxier, err = nftables.NewProxier(
s.PrimaryIPFamily,
utilsysctl.New(),
config.NFTables.SyncPeriod.Duration,
config.NFTables.MinSyncPeriod.Duration,
config.NFTables.MasqueradeAll,
int(*config.NFTables.MasqueradeBit),
localDetector,
s.Hostname,
s.NodeIPs[s.PrimaryIPFamily],
s.Recorder,
s.HealthzServer,
config.NodePortAddresses,
initOnly,
)
}
if err != nil {
return nil, fmt.Errorf("unable to create proxier: %v", err)
}
@@ -475,27 +545,35 @@ func getDualStackLocalDetectorTuple(mode proxyconfigapi.LocalMode, config *proxy
return localDetectors, nil
}
// cleanupAndExit remove iptables rules and ipset/ipvs rules
func cleanupAndExit() error {
execer := exec.New()
// cleanup IPv6 and IPv4 iptables rules, regardless of current configuration
ipts := []utiliptables.Interface{
utiliptables.New(execer, utiliptables.ProtocolIPv4),
utiliptables.New(execer, utiliptables.ProtocolIPv6),
}
ipsetInterface := utilipset.New(execer)
ipvsInterface := utilipvs.New()
// platformCleanup removes stale kube-proxy rules that can be safely removed. If
// cleanupAndExit is true, it will attempt to remove rules from all known kube-proxy
// modes. If it is false, it will only remove rules that are definitely not in use by the
// currently-configured mode.
func platformCleanup(mode proxyconfigapi.ProxyMode, cleanupAndExit bool) error {
var encounteredError bool
for _, ipt := range ipts {
encounteredError = iptables.CleanupLeftovers(ipt) || encounteredError
encounteredError = ipvs.CleanupLeftovers(ipvsInterface, ipt, ipsetInterface) || encounteredError
// Clean up iptables and ipvs rules if switching to nftables, or if cleanupAndExit
if !isIPTablesBased(mode) || cleanupAndExit {
ipts, _ := getIPTables(v1.IPFamilyUnknown)
execer := exec.New()
ipsetInterface := utilipset.New(execer)
ipvsInterface := utilipvs.New()
for _, ipt := range ipts {
encounteredError = iptables.CleanupLeftovers(ipt) || encounteredError
encounteredError = ipvs.CleanupLeftovers(ipvsInterface, ipt, ipsetInterface) || encounteredError
}
}
if utilfeature.DefaultFeatureGate.Enabled(features.NFTablesProxyMode) {
// Clean up nftables rules when switching to iptables or ipvs, or if cleanupAndExit
if isIPTablesBased(mode) || cleanupAndExit {
encounteredError = nftables.CleanupLeftovers() || encounteredError
}
}
if encounteredError {
return errors.New("encountered an error while tearing down rules")
}
return nil
}

View File

@@ -74,6 +74,11 @@ ipvs:
excludeCIDRs:
- "10.20.30.40/16"
- "fd00:1::0/64"
nftables:
masqueradeAll: true
masqueradeBit: 18
minSyncPeriod: 10s
syncPeriod: 60s
kind: KubeProxyConfiguration
metricsBindAddress: "%s"
mode: "%s"
@@ -218,6 +223,12 @@ nodePortAddresses:
SyncPeriod: metav1.Duration{Duration: 60 * time.Second},
ExcludeCIDRs: []string{"10.20.30.40/16", "fd00:1::0/64"},
},
NFTables: kubeproxyconfig.KubeProxyNFTablesConfiguration{
MasqueradeAll: true,
MasqueradeBit: ptr.To[int32](18),
MinSyncPeriod: metav1.Duration{Duration: 10 * time.Second},
SyncPeriod: metav1.Duration{Duration: 60 * time.Second},
},
MetricsBindAddress: tc.metricsBindAddress,
Mode: kubeproxyconfig.ProxyMode(tc.mode),
OOMScoreAdj: ptr.To[int32](17),

View File

@@ -125,7 +125,10 @@ func (s *ProxyServer) createProxier(config *proxyconfigapi.KubeProxyConfiguratio
return proxier, nil
}
// cleanupAndExit cleans up after a previous proxy run
func cleanupAndExit() error {
return errors.New("--cleanup-and-exit is not implemented on Windows")
// platformCleanup removes stale kube-proxy rules that can be safely removed.
func platformCleanup(mode proxyconfigapi.ProxyMode, cleanupAndExit bool) error {
if cleanupAndExit {
return errors.New("--cleanup-and-exit is not implemented on Windows")
}
return nil
}

1
go.mod
View File

@@ -27,6 +27,7 @@ require (
github.com/coreos/go-systemd/v22 v22.5.0
github.com/cpuguy83/go-md2man/v2 v2.0.2
github.com/cyphar/filepath-securejoin v0.2.4
github.com/danwinship/knftables v0.0.13
github.com/distribution/reference v0.5.0
github.com/docker/go-units v0.5.0
github.com/emicklei/go-restful/v3 v3.11.0

2
go.sum
View File

@@ -308,6 +308,8 @@ github.com/creack/pty v1.1.18 h1:n56/Zwd5o6whRC5PMGretI4IdRLlmBXYNjScPaBgsbY=
github.com/creack/pty v1.1.18/go.mod h1:MOBLtS5ELjhRRrroQr9kyvTxUAFNvYEK993ew/Vr4O4=
github.com/cyphar/filepath-securejoin v0.2.4 h1:Ugdm7cg7i6ZK6x3xDF1oEu1nfkyfH53EtKeQYTC3kyg=
github.com/cyphar/filepath-securejoin v0.2.4/go.mod h1:aPGpWjXOXUn2NCNjFvBE6aRxGGx79pTxQpKOJNYHHl4=
github.com/danwinship/knftables v0.0.13 h1:89Ieiia6MMfXWQF9dyaou1CwBU8h8sHa2Zo3OlY2o04=
github.com/danwinship/knftables v0.0.13/go.mod h1:OzipaBQqkQAIbVnafTGyHgfFbjWTJecrA7/XNLNMO5E=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=

View File

@@ -564,6 +564,13 @@ const (
// Robust VolumeManager reconstruction after kubelet restart.
NewVolumeManagerReconstruction featuregate.Feature = "NewVolumeManagerReconstruction"
// owner: @danwinship
// kep: https://kep.k8s.io/3866
// alpha: v1.29
//
// Allows running kube-proxy with `--mode nftables`.
NFTablesProxyMode featuregate.Feature = "NFTablesProxyMode"
// owner: @aravindhp @LorbusChris
// kep: http://kep.k8s.io/2271
// alpha: v1.27
@@ -1103,6 +1110,8 @@ var defaultKubernetesFeatureGates = map[featuregate.Feature]featuregate.FeatureS
NewVolumeManagerReconstruction: {Default: true, PreRelease: featuregate.Beta},
NFTablesProxyMode: {Default: false, PreRelease: featuregate.Alpha},
NodeLogQuery: {Default: false, PreRelease: featuregate.Alpha},
NodeOutOfServiceVolumeDetach: {Default: true, PreRelease: featuregate.GA, LockToDefault: true}, // remove in 1.31

View File

@@ -1095,6 +1095,7 @@ func GetOpenAPIDefinitions(ref common.ReferenceCallback) map[string]common.OpenA
"k8s.io/kube-proxy/config/v1alpha1.KubeProxyConntrackConfiguration": schema_k8sio_kube_proxy_config_v1alpha1_KubeProxyConntrackConfiguration(ref),
"k8s.io/kube-proxy/config/v1alpha1.KubeProxyIPTablesConfiguration": schema_k8sio_kube_proxy_config_v1alpha1_KubeProxyIPTablesConfiguration(ref),
"k8s.io/kube-proxy/config/v1alpha1.KubeProxyIPVSConfiguration": schema_k8sio_kube_proxy_config_v1alpha1_KubeProxyIPVSConfiguration(ref),
"k8s.io/kube-proxy/config/v1alpha1.KubeProxyNFTablesConfiguration": schema_k8sio_kube_proxy_config_v1alpha1_KubeProxyNFTablesConfiguration(ref),
"k8s.io/kube-proxy/config/v1alpha1.KubeProxyWinkernelConfiguration": schema_k8sio_kube_proxy_config_v1alpha1_KubeProxyWinkernelConfiguration(ref),
"k8s.io/kube-scheduler/config/v1.DefaultPreemptionArgs": schema_k8sio_kube_scheduler_config_v1_DefaultPreemptionArgs(ref),
"k8s.io/kube-scheduler/config/v1.Extender": schema_k8sio_kube_scheduler_config_v1_Extender(ref),
@@ -54415,6 +54416,13 @@ func schema_k8sio_kube_proxy_config_v1alpha1_KubeProxyConfiguration(ref common.R
Ref: ref("k8s.io/kube-proxy/config/v1alpha1.KubeProxyIPVSConfiguration"),
},
},
"nftables": {
SchemaProps: spec.SchemaProps{
Description: "nftables contains nftables-related configuration options.",
Default: map[string]interface{}{},
Ref: ref("k8s.io/kube-proxy/config/v1alpha1.KubeProxyNFTablesConfiguration"),
},
},
"winkernel": {
SchemaProps: spec.SchemaProps{
Description: "winkernel contains winkernel-related configuration options.",
@@ -54489,11 +54497,11 @@ func schema_k8sio_kube_proxy_config_v1alpha1_KubeProxyConfiguration(ref common.R
},
},
},
Required: []string{"clientConnection", "hostnameOverride", "bindAddress", "healthzBindAddress", "metricsBindAddress", "bindAddressHardFail", "enableProfiling", "showHiddenMetricsForVersion", "mode", "iptables", "ipvs", "winkernel", "detectLocalMode", "detectLocal", "clusterCIDR", "nodePortAddresses", "oomScoreAdj", "conntrack", "configSyncPeriod", "portRange"},
Required: []string{"clientConnection", "hostnameOverride", "bindAddress", "healthzBindAddress", "metricsBindAddress", "bindAddressHardFail", "enableProfiling", "showHiddenMetricsForVersion", "mode", "iptables", "ipvs", "nftables", "winkernel", "detectLocalMode", "detectLocal", "clusterCIDR", "nodePortAddresses", "oomScoreAdj", "conntrack", "configSyncPeriod", "portRange"},
},
},
Dependencies: []string{
"k8s.io/apimachinery/pkg/apis/meta/v1.Duration", "k8s.io/component-base/config/v1alpha1.ClientConnectionConfiguration", "k8s.io/component-base/logs/api/v1.LoggingConfiguration", "k8s.io/kube-proxy/config/v1alpha1.DetectLocalConfiguration", "k8s.io/kube-proxy/config/v1alpha1.KubeProxyConntrackConfiguration", "k8s.io/kube-proxy/config/v1alpha1.KubeProxyIPTablesConfiguration", "k8s.io/kube-proxy/config/v1alpha1.KubeProxyIPVSConfiguration", "k8s.io/kube-proxy/config/v1alpha1.KubeProxyWinkernelConfiguration"},
"k8s.io/apimachinery/pkg/apis/meta/v1.Duration", "k8s.io/component-base/config/v1alpha1.ClientConnectionConfiguration", "k8s.io/component-base/logs/api/v1.LoggingConfiguration", "k8s.io/kube-proxy/config/v1alpha1.DetectLocalConfiguration", "k8s.io/kube-proxy/config/v1alpha1.KubeProxyConntrackConfiguration", "k8s.io/kube-proxy/config/v1alpha1.KubeProxyIPTablesConfiguration", "k8s.io/kube-proxy/config/v1alpha1.KubeProxyIPVSConfiguration", "k8s.io/kube-proxy/config/v1alpha1.KubeProxyNFTablesConfiguration", "k8s.io/kube-proxy/config/v1alpha1.KubeProxyWinkernelConfiguration"},
}
}
@@ -54686,6 +54694,49 @@ func schema_k8sio_kube_proxy_config_v1alpha1_KubeProxyIPVSConfiguration(ref comm
}
}
func schema_k8sio_kube_proxy_config_v1alpha1_KubeProxyNFTablesConfiguration(ref common.ReferenceCallback) common.OpenAPIDefinition {
return common.OpenAPIDefinition{
Schema: spec.Schema{
SchemaProps: spec.SchemaProps{
Description: "KubeProxyNFTablesConfiguration contains nftables-related configuration details for the Kubernetes proxy server.",
Type: []string{"object"},
Properties: map[string]spec.Schema{
"masqueradeBit": {
SchemaProps: spec.SchemaProps{
Description: "masqueradeBit is the bit of the iptables fwmark space to use for SNAT if using the nftables proxy mode. Values must be within the range [0, 31].",
Type: []string{"integer"},
Format: "int32",
},
},
"masqueradeAll": {
SchemaProps: spec.SchemaProps{
Description: "masqueradeAll tells kube-proxy to SNAT all traffic sent to Service cluster IPs, when using the nftables mode. This may be required with some CNI plugins.",
Default: false,
Type: []string{"boolean"},
Format: "",
},
},
"syncPeriod": {
SchemaProps: spec.SchemaProps{
Description: "syncPeriod is an interval (e.g. '5s', '1m', '2h22m') indicating how frequently various re-synchronizing and cleanup operations are performed. Must be greater than 0.",
Ref: ref("k8s.io/apimachinery/pkg/apis/meta/v1.Duration"),
},
},
"minSyncPeriod": {
SchemaProps: spec.SchemaProps{
Description: "minSyncPeriod is the minimum period between iptables rule resyncs (e.g. '5s', '1m', '2h22m'). A value of 0 means every Service or EndpointSlice change will result in an immediate iptables resync.",
Ref: ref("k8s.io/apimachinery/pkg/apis/meta/v1.Duration"),
},
},
},
Required: []string{"masqueradeBit", "masqueradeAll", "syncPeriod", "minSyncPeriod"},
},
},
Dependencies: []string{
"k8s.io/apimachinery/pkg/apis/meta/v1.Duration"},
}
}
func schema_k8sio_kube_proxy_config_v1alpha1_KubeProxyWinkernelConfiguration(ref common.ReferenceCallback) common.OpenAPIDefinition {
return common.OpenAPIDefinition{
Schema: spec.Schema{

View File

@@ -43,6 +43,7 @@ func Funcs(codecs runtimeserializer.CodecFactory) []interface{} {
obj.HealthzBindAddress = fmt.Sprintf("%d.%d.%d.%d:%d", c.Intn(256), c.Intn(256), c.Intn(256), c.Intn(256), c.Intn(65536))
obj.IPTables.MasqueradeBit = ptr.To(c.Int31())
obj.IPTables.LocalhostNodePorts = ptr.To(c.RandBool())
obj.NFTables.MasqueradeBit = ptr.To(c.Int31())
obj.MetricsBindAddress = fmt.Sprintf("%d.%d.%d.%d:%d", c.Intn(256), c.Intn(256), c.Intn(256), c.Intn(256), c.Intn(65536))
obj.OOMScoreAdj = ptr.To(c.Int31())
obj.ClientConnection.ContentType = "bar"

View File

@@ -49,6 +49,11 @@ logging:
verbosity: 0
metricsBindAddress: 127.0.0.1:10249
mode: ""
nftables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 1s
syncPeriod: 30s
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""

View File

@@ -49,6 +49,11 @@ logging:
verbosity: 0
metricsBindAddress: 127.0.0.1:10249
mode: ""
nftables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 1s
syncPeriod: 30s
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""

View File

@@ -81,6 +81,25 @@ type KubeProxyIPVSConfiguration struct {
UDPTimeout metav1.Duration
}
// KubeProxyNFTablesConfiguration contains nftables-related configuration
// details for the Kubernetes proxy server.
type KubeProxyNFTablesConfiguration struct {
// masqueradeBit is the bit of the iptables fwmark space to use for SNAT if using
// the nftables proxy mode. Values must be within the range [0, 31].
MasqueradeBit *int32
// masqueradeAll tells kube-proxy to SNAT all traffic sent to Service cluster IPs,
// when using the nftables mode. This may be required with some CNI plugins.
MasqueradeAll bool
// syncPeriod is an interval (e.g. '5s', '1m', '2h22m') indicating how frequently
// various re-synchronizing and cleanup operations are performed. Must be greater
// than 0.
SyncPeriod metav1.Duration
// minSyncPeriod is the minimum period between iptables rule resyncs (e.g. '5s',
// '1m', '2h22m'). A value of 0 means every Service or EndpointSlice change will
// result in an immediate iptables resync.
MinSyncPeriod metav1.Duration
}
// KubeProxyConntrackConfiguration contains conntrack settings for
// the Kubernetes proxy server.
type KubeProxyConntrackConfiguration struct {
@@ -195,6 +214,8 @@ type KubeProxyConfiguration struct {
IPVS KubeProxyIPVSConfiguration
// winkernel contains winkernel-related configuration options.
Winkernel KubeProxyWinkernelConfiguration
// nftables contains nftables-related configuration options.
NFTables KubeProxyNFTablesConfiguration
// detectLocalMode determines mode to use for detecting local traffic, defaults to LocalModeClusterCIDR
DetectLocalMode LocalMode
@@ -228,8 +249,8 @@ type KubeProxyConfiguration struct {
// ProxyMode represents modes used by the Kubernetes proxy server.
//
// Currently, two modes of proxy are available on Linux platforms: 'iptables' and 'ipvs'.
// One mode of proxy is available on Windows platforms: 'kernelspace'.
// Currently, three modes of proxy are available on Linux platforms: 'iptables', 'ipvs',
// and 'nftables'. One mode of proxy is available on Windows platforms: 'kernelspace'.
//
// If the proxy mode is unspecified, the best-available proxy mode will be used (currently this
// is `iptables` on Linux and `kernelspace` on Windows). If the selected proxy mode cannot be
@@ -240,6 +261,7 @@ type ProxyMode string
const (
ProxyModeIPTables ProxyMode = "iptables"
ProxyModeIPVS ProxyMode = "ipvs"
ProxyModeNFTables ProxyMode = "nftables"
ProxyModeKernelspace ProxyMode = "kernelspace"
)

View File

@@ -71,6 +71,12 @@ func SetDefaults_KubeProxyConfiguration(obj *kubeproxyconfigv1alpha1.KubeProxyCo
if obj.IPVS.SyncPeriod.Duration == 0 {
obj.IPVS.SyncPeriod = metav1.Duration{Duration: 30 * time.Second}
}
if obj.NFTables.SyncPeriod.Duration == 0 {
obj.NFTables.SyncPeriod = metav1.Duration{Duration: 30 * time.Second}
}
if obj.NFTables.MinSyncPeriod.Duration == 0 {
obj.NFTables.MinSyncPeriod = metav1.Duration{Duration: 1 * time.Second}
}
if obj.Conntrack.MaxPerCore == nil {
obj.Conntrack.MaxPerCore = ptr.To[int32](32 * 1024)
@@ -83,6 +89,10 @@ func SetDefaults_KubeProxyConfiguration(obj *kubeproxyconfigv1alpha1.KubeProxyCo
temp := int32(14)
obj.IPTables.MasqueradeBit = &temp
}
if obj.NFTables.MasqueradeBit == nil {
temp := int32(14)
obj.NFTables.MasqueradeBit = &temp
}
if obj.Conntrack.TCPEstablishedTimeout == nil {
obj.Conntrack.TCPEstablishedTimeout = &metav1.Duration{Duration: 24 * time.Hour} // 1 day (1/5 default)
}

View File

@@ -62,6 +62,12 @@ func TestDefaultsKubeProxyConfiguration(t *testing.T) {
IPVS: kubeproxyconfigv1alpha1.KubeProxyIPVSConfiguration{
SyncPeriod: metav1.Duration{Duration: 30 * time.Second},
},
NFTables: kubeproxyconfigv1alpha1.KubeProxyNFTablesConfiguration{
MasqueradeBit: ptr.To[int32](14),
MasqueradeAll: false,
SyncPeriod: metav1.Duration{Duration: 30 * time.Second},
MinSyncPeriod: metav1.Duration{Duration: 1 * time.Second},
},
OOMScoreAdj: &oomScore,
Conntrack: kubeproxyconfigv1alpha1.KubeProxyConntrackConfiguration{
MaxPerCore: &ctMaxPerCore,
@@ -102,6 +108,12 @@ func TestDefaultsKubeProxyConfiguration(t *testing.T) {
IPVS: kubeproxyconfigv1alpha1.KubeProxyIPVSConfiguration{
SyncPeriod: metav1.Duration{Duration: 30 * time.Second},
},
NFTables: kubeproxyconfigv1alpha1.KubeProxyNFTablesConfiguration{
MasqueradeBit: ptr.To[int32](14),
MasqueradeAll: false,
SyncPeriod: metav1.Duration{Duration: 30 * time.Second},
MinSyncPeriod: metav1.Duration{Duration: 1 * time.Second},
},
OOMScoreAdj: &oomScore,
Conntrack: kubeproxyconfigv1alpha1.KubeProxyConntrackConfiguration{
MaxPerCore: &ctMaxPerCore,

View File

@@ -89,6 +89,16 @@ func RegisterConversions(s *runtime.Scheme) error {
}); err != nil {
return err
}
if err := s.AddGeneratedConversionFunc((*v1alpha1.KubeProxyNFTablesConfiguration)(nil), (*config.KubeProxyNFTablesConfiguration)(nil), func(a, b interface{}, scope conversion.Scope) error {
return Convert_v1alpha1_KubeProxyNFTablesConfiguration_To_config_KubeProxyNFTablesConfiguration(a.(*v1alpha1.KubeProxyNFTablesConfiguration), b.(*config.KubeProxyNFTablesConfiguration), scope)
}); err != nil {
return err
}
if err := s.AddGeneratedConversionFunc((*config.KubeProxyNFTablesConfiguration)(nil), (*v1alpha1.KubeProxyNFTablesConfiguration)(nil), func(a, b interface{}, scope conversion.Scope) error {
return Convert_config_KubeProxyNFTablesConfiguration_To_v1alpha1_KubeProxyNFTablesConfiguration(a.(*config.KubeProxyNFTablesConfiguration), b.(*v1alpha1.KubeProxyNFTablesConfiguration), scope)
}); err != nil {
return err
}
if err := s.AddGeneratedConversionFunc((*v1alpha1.KubeProxyWinkernelConfiguration)(nil), (*config.KubeProxyWinkernelConfiguration)(nil), func(a, b interface{}, scope conversion.Scope) error {
return Convert_v1alpha1_KubeProxyWinkernelConfiguration_To_config_KubeProxyWinkernelConfiguration(a.(*v1alpha1.KubeProxyWinkernelConfiguration), b.(*config.KubeProxyWinkernelConfiguration), scope)
}); err != nil {
@@ -144,6 +154,9 @@ func autoConvert_v1alpha1_KubeProxyConfiguration_To_config_KubeProxyConfiguratio
if err := Convert_v1alpha1_KubeProxyIPVSConfiguration_To_config_KubeProxyIPVSConfiguration(&in.IPVS, &out.IPVS, s); err != nil {
return err
}
if err := Convert_v1alpha1_KubeProxyNFTablesConfiguration_To_config_KubeProxyNFTablesConfiguration(&in.NFTables, &out.NFTables, s); err != nil {
return err
}
if err := Convert_v1alpha1_KubeProxyWinkernelConfiguration_To_config_KubeProxyWinkernelConfiguration(&in.Winkernel, &out.Winkernel, s); err != nil {
return err
}
@@ -190,6 +203,9 @@ func autoConvert_config_KubeProxyConfiguration_To_v1alpha1_KubeProxyConfiguratio
if err := Convert_config_KubeProxyWinkernelConfiguration_To_v1alpha1_KubeProxyWinkernelConfiguration(&in.Winkernel, &out.Winkernel, s); err != nil {
return err
}
if err := Convert_config_KubeProxyNFTablesConfiguration_To_v1alpha1_KubeProxyNFTablesConfiguration(&in.NFTables, &out.NFTables, s); err != nil {
return err
}
out.DetectLocalMode = v1alpha1.LocalMode(in.DetectLocalMode)
if err := Convert_config_DetectLocalConfiguration_To_v1alpha1_DetectLocalConfiguration(&in.DetectLocal, &out.DetectLocal, s); err != nil {
return err
@@ -304,6 +320,32 @@ func Convert_config_KubeProxyIPVSConfiguration_To_v1alpha1_KubeProxyIPVSConfigur
return autoConvert_config_KubeProxyIPVSConfiguration_To_v1alpha1_KubeProxyIPVSConfiguration(in, out, s)
}
func autoConvert_v1alpha1_KubeProxyNFTablesConfiguration_To_config_KubeProxyNFTablesConfiguration(in *v1alpha1.KubeProxyNFTablesConfiguration, out *config.KubeProxyNFTablesConfiguration, s conversion.Scope) error {
out.MasqueradeBit = (*int32)(unsafe.Pointer(in.MasqueradeBit))
out.MasqueradeAll = in.MasqueradeAll
out.SyncPeriod = in.SyncPeriod
out.MinSyncPeriod = in.MinSyncPeriod
return nil
}
// Convert_v1alpha1_KubeProxyNFTablesConfiguration_To_config_KubeProxyNFTablesConfiguration is an autogenerated conversion function.
func Convert_v1alpha1_KubeProxyNFTablesConfiguration_To_config_KubeProxyNFTablesConfiguration(in *v1alpha1.KubeProxyNFTablesConfiguration, out *config.KubeProxyNFTablesConfiguration, s conversion.Scope) error {
return autoConvert_v1alpha1_KubeProxyNFTablesConfiguration_To_config_KubeProxyNFTablesConfiguration(in, out, s)
}
func autoConvert_config_KubeProxyNFTablesConfiguration_To_v1alpha1_KubeProxyNFTablesConfiguration(in *config.KubeProxyNFTablesConfiguration, out *v1alpha1.KubeProxyNFTablesConfiguration, s conversion.Scope) error {
out.MasqueradeBit = (*int32)(unsafe.Pointer(in.MasqueradeBit))
out.MasqueradeAll = in.MasqueradeAll
out.SyncPeriod = in.SyncPeriod
out.MinSyncPeriod = in.MinSyncPeriod
return nil
}
// Convert_config_KubeProxyNFTablesConfiguration_To_v1alpha1_KubeProxyNFTablesConfiguration is an autogenerated conversion function.
func Convert_config_KubeProxyNFTablesConfiguration_To_v1alpha1_KubeProxyNFTablesConfiguration(in *config.KubeProxyNFTablesConfiguration, out *v1alpha1.KubeProxyNFTablesConfiguration, s conversion.Scope) error {
return autoConvert_config_KubeProxyNFTablesConfiguration_To_v1alpha1_KubeProxyNFTablesConfiguration(in, out, s)
}
func autoConvert_v1alpha1_KubeProxyWinkernelConfiguration_To_config_KubeProxyWinkernelConfiguration(in *v1alpha1.KubeProxyWinkernelConfiguration, out *config.KubeProxyWinkernelConfiguration, s conversion.Scope) error {
out.NetworkName = in.NetworkName
out.SourceVip = in.SourceVip

View File

@@ -31,6 +31,7 @@ import (
logsapi "k8s.io/component-base/logs/api/v1"
"k8s.io/component-base/metrics"
apivalidation "k8s.io/kubernetes/pkg/apis/core/validation"
"k8s.io/kubernetes/pkg/features"
kubeproxyconfig "k8s.io/kubernetes/pkg/proxy/apis/config"
netutils "k8s.io/utils/net"
)
@@ -47,8 +48,11 @@ func Validate(config *kubeproxyconfig.KubeProxyConfiguration) field.ErrorList {
}
allErrs = append(allErrs, validateKubeProxyIPTablesConfiguration(config.IPTables, newPath.Child("KubeProxyIPTablesConfiguration"))...)
if config.Mode == kubeproxyconfig.ProxyModeIPVS {
switch config.Mode {
case kubeproxyconfig.ProxyModeIPVS:
allErrs = append(allErrs, validateKubeProxyIPVSConfiguration(config.IPVS, newPath.Child("KubeProxyIPVSConfiguration"))...)
case kubeproxyconfig.ProxyModeNFTables:
allErrs = append(allErrs, validateKubeProxyNFTablesConfiguration(config.NFTables, newPath.Child("KubeProxyNFTablesConfiguration"))...)
}
allErrs = append(allErrs, validateKubeProxyConntrackConfiguration(config.Conntrack, newPath.Child("KubeProxyConntrackConfiguration"))...)
allErrs = append(allErrs, validateProxyMode(config.Mode, newPath.Child("Mode"))...)
@@ -152,6 +156,28 @@ func validateKubeProxyIPVSConfiguration(config kubeproxyconfig.KubeProxyIPVSConf
return allErrs
}
func validateKubeProxyNFTablesConfiguration(config kubeproxyconfig.KubeProxyNFTablesConfiguration, fldPath *field.Path) field.ErrorList {
allErrs := field.ErrorList{}
if config.MasqueradeBit != nil && (*config.MasqueradeBit < 0 || *config.MasqueradeBit > 31) {
allErrs = append(allErrs, field.Invalid(fldPath.Child("MasqueradeBit"), config.MasqueradeBit, "must be within the range [0, 31]"))
}
if config.SyncPeriod.Duration <= 0 {
allErrs = append(allErrs, field.Invalid(fldPath.Child("SyncPeriod"), config.SyncPeriod, "must be greater than 0"))
}
if config.MinSyncPeriod.Duration < 0 {
allErrs = append(allErrs, field.Invalid(fldPath.Child("MinSyncPeriod"), config.MinSyncPeriod, "must be greater than or equal to 0"))
}
if config.MinSyncPeriod.Duration > config.SyncPeriod.Duration {
allErrs = append(allErrs, field.Invalid(fldPath.Child("SyncPeriod"), config.MinSyncPeriod, fmt.Sprintf("must be greater than or equal to %s", fldPath.Child("MinSyncPeriod").String())))
}
return allErrs
}
func validateKubeProxyConntrackConfiguration(config kubeproxyconfig.KubeProxyConntrackConfiguration, fldPath *field.Path) field.ErrorList {
allErrs := field.ErrorList{}
@@ -198,6 +224,10 @@ func validateProxyModeLinux(mode kubeproxyconfig.ProxyMode, fldPath *field.Path)
string(kubeproxyconfig.ProxyModeIPVS),
)
if utilfeature.DefaultFeatureGate.Enabled(features.NFTablesProxyMode) {
validModes.Insert(string(kubeproxyconfig.ProxyModeNFTables))
}
if mode == "" || validModes.Has(string(mode)) {
return nil
}

View File

@@ -80,6 +80,7 @@ func (in *KubeProxyConfiguration) DeepCopyInto(out *KubeProxyConfiguration) {
in.IPTables.DeepCopyInto(&out.IPTables)
in.IPVS.DeepCopyInto(&out.IPVS)
out.Winkernel = in.Winkernel
in.NFTables.DeepCopyInto(&out.NFTables)
out.DetectLocal = in.DetectLocal
if in.NodePortAddresses != nil {
in, out := &in.NodePortAddresses, &out.NodePortAddresses
@@ -206,6 +207,29 @@ func (in *KubeProxyIPVSConfiguration) DeepCopy() *KubeProxyIPVSConfiguration {
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *KubeProxyNFTablesConfiguration) DeepCopyInto(out *KubeProxyNFTablesConfiguration) {
*out = *in
if in.MasqueradeBit != nil {
in, out := &in.MasqueradeBit, &out.MasqueradeBit
*out = new(int32)
**out = **in
}
out.SyncPeriod = in.SyncPeriod
out.MinSyncPeriod = in.MinSyncPeriod
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new KubeProxyNFTablesConfiguration.
func (in *KubeProxyNFTablesConfiguration) DeepCopy() *KubeProxyNFTablesConfiguration {
if in == nil {
return nil
}
out := new(KubeProxyNFTablesConfiguration)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *KubeProxyWinkernelConfiguration) DeepCopyInto(out *KubeProxyWinkernelConfiguration) {
*out = *in

View File

@@ -0,0 +1,796 @@
/*
Copyright 2015 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package nftables
import (
"context"
"fmt"
"regexp"
"runtime"
"sort"
"strings"
"testing"
"github.com/danwinship/knftables"
"github.com/google/go-cmp/cmp"
"github.com/lithammer/dedent"
"k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/util/sets"
netutils "k8s.io/utils/net"
)
// getLine returns a string containing the file and line number of the caller, if
// possible. This is useful in tests with a large number of cases - when something goes
// wrong you can find which case more easily.
func getLine() string {
_, file, line, ok := runtime.Caller(1)
if !ok {
return ""
}
return fmt.Sprintf(" (from %s:%d)", file, line)
}
// objectOrder defines the order we sort different types into (higher = earlier); while
// not necessary just for comparison purposes, it's more intuitive in the Diff output to
// see rules/sets/maps before chains/elements.
var objectOrder = map[string]int{
"table": 10,
"chain": 9,
"rule": 8,
"set": 7,
"map": 6,
"element": 5,
// anything else: 0
}
// sortNFTablesTransaction sorts an nftables transaction into a standard order for comparison
func sortNFTablesTransaction(tx string) string {
lines := strings.Split(tx, "\n")
// strip blank lines and comments
for i := 0; i < len(lines); {
if lines[i] == "" || lines[i][0] == '#' {
lines = append(lines[:i], lines[i+1:]...)
} else {
i++
}
}
// sort remaining lines
sort.SliceStable(lines, func(i, j int) bool {
li := lines[i]
wi := strings.Split(li, " ")
lj := lines[j]
wj := strings.Split(lj, " ")
// All lines will start with "add OBJECTTYPE ip kube-proxy". Everything
// except "add table" will have an object name after the table name, and
// "add table" will have a comment after the table name. So every line
// should have at least 5 words.
if len(wi) < 5 || len(wj) < 5 {
return false
}
// Sort by object type first.
if wi[1] != wj[1] {
return objectOrder[wi[1]] >= objectOrder[wj[1]]
}
// Sort by object name when object type is identical.
if wi[4] != wj[4] {
return wi[4] < wj[4]
}
// Leave rules in the order they were originally added.
if wi[1] == "rule" {
return false
}
// Sort by the whole line when object type and name is identical. (e.g.,
// individual "add rule" and "add element" lines in a chain/set/map.)
return li < lj
})
return strings.Join(lines, "\n")
}
// diffNFTablesTransaction is a (testable) helper function for assertNFTablesTransactionEqual
func diffNFTablesTransaction(expected, result string) string {
expected = sortNFTablesTransaction(expected)
result = sortNFTablesTransaction(result)
return cmp.Diff(expected, result)
}
// assertNFTablesTransactionEqual asserts that expected and result are equal, ignoring
// irrelevant differences.
func assertNFTablesTransactionEqual(t *testing.T, line string, expected, result string) {
diff := diffNFTablesTransaction(expected, result)
if diff != "" {
t.Errorf("tables do not match%s:\ndiff:\n%s\nfull result: %+v", line, diff, result)
}
}
// diffNFTablesChain is a (testable) helper function for assertNFTablesChainEqual
func diffNFTablesChain(nft *knftables.Fake, chain, expected string) string {
expected = strings.TrimSpace(expected)
result := ""
if ch := nft.Table.Chains[chain]; ch != nil {
for i, rule := range ch.Rules {
if i > 0 {
result += "\n"
}
result += rule.Rule
}
}
return cmp.Diff(expected, result)
}
// assertNFTablesChainEqual asserts that the indicated chain in nft's table contains
// exactly the rules in expected (in that order).
func assertNFTablesChainEqual(t *testing.T, line string, nft *knftables.Fake, chain, expected string) {
if diff := diffNFTablesChain(nft, chain, expected); diff != "" {
t.Errorf("rules do not match%s:\ndiff:\n%s", line, diff)
}
}
// nftablesTracer holds data used while virtually tracing a packet through a set of
// iptables rules
type nftablesTracer struct {
nft *knftables.Fake
nodeIPs sets.Set[string]
t *testing.T
// matches accumulates the list of rules that were matched, for debugging purposes.
matches []string
// outputs accumulates the list of matched terminal rule targets (endpoint
// IP:ports, or a special target like "REJECT") and is eventually used to generate
// the return value of tracePacket.
outputs []string
// markMasq tracks whether the packet has been marked for masquerading
markMasq bool
}
// newNFTablesTracer creates an nftablesTracer. nodeIPs are the IP to treat as local node
// IPs (for determining whether rules with "fib saddr type local" or "fib daddr type
// local" match).
func newNFTablesTracer(t *testing.T, nft *knftables.Fake, nodeIPs []string) *nftablesTracer {
return &nftablesTracer{
nft: nft,
nodeIPs: sets.New(nodeIPs...),
t: t,
}
}
func (tracer *nftablesTracer) addressMatches(ipStr, not, ruleAddress string) bool {
ip := netutils.ParseIPSloppy(ipStr)
if ip == nil {
tracer.t.Fatalf("Bad IP in test case: %s", ipStr)
}
var match bool
if strings.Contains(ruleAddress, "/") {
_, cidr, err := netutils.ParseCIDRSloppy(ruleAddress)
if err != nil {
tracer.t.Errorf("Bad CIDR in kube-proxy output: %v", err)
}
match = cidr.Contains(ip)
} else {
ip2 := netutils.ParseIPSloppy(ruleAddress)
if ip2 == nil {
tracer.t.Errorf("Bad IP/CIDR in kube-proxy output: %s", ruleAddress)
}
match = ip.Equal(ip2)
}
if not == "!= " {
return !match
} else {
return match
}
}
// matchDestIPOnly checks an "ip daddr" against a set/map, and returns the matching
// Element, if found.
func (tracer *nftablesTracer) matchDestIPOnly(elements []*knftables.Element, destIP string) *knftables.Element {
for _, element := range elements {
if element.Key[0] == destIP {
return element
}
}
return nil
}
// matchDest checks an "ip daddr . meta l4proto . th dport" against a set/map, and returns
// the matching Element, if found.
func (tracer *nftablesTracer) matchDest(elements []*knftables.Element, destIP, protocol, destPort string) *knftables.Element {
for _, element := range elements {
if element.Key[0] == destIP && element.Key[1] == protocol && element.Key[2] == destPort {
return element
}
}
return nil
}
// matchDestAndSource checks an "ip daddr . meta l4proto . th dport . ip saddr" against a
// set/map, where the source is allowed to be a CIDR, and returns the matching Element, if
// found.
func (tracer *nftablesTracer) matchDestAndSource(elements []*knftables.Element, destIP, protocol, destPort, sourceIP string) *knftables.Element {
for _, element := range elements {
if element.Key[0] == destIP && element.Key[1] == protocol && element.Key[2] == destPort && tracer.addressMatches(sourceIP, "", element.Key[3]) {
return element
}
}
return nil
}
// matchDestPort checks an "meta l4proto . th dport" against a set/map, and returns the
// matching Element, if found.
func (tracer *nftablesTracer) matchDestPort(elements []*knftables.Element, protocol, destPort string) *knftables.Element {
for _, element := range elements {
if element.Key[0] == protocol && element.Key[1] == destPort {
return element
}
}
return nil
}
// We intentionally don't try to parse arbitrary nftables rules, as the syntax is quite
// complicated and context sensitive. (E.g., "ip daddr" could be the start of an address
// comparison, or it could be the start of a set/map lookup.) Instead, we just have
// regexps to recognize the specific pieces of rules that we create in proxier.go.
// Anything matching ignoredRegexp gets stripped out of the rule, and then what's left
// *must* match one of the cases in runChain or an error will be logged. In cases where
// the regexp doesn't end with `$`, and the matched rule succeeds against the input data,
// runChain will continue trying to match the rest of the rule. E.g., "ip daddr 10.0.0.1
// drop" would first match destAddrRegexp, and then (assuming destIP was "10.0.0.1") would
// match verdictRegexp.
var destAddrRegexp = regexp.MustCompile(`^ip6* daddr (!= )?(\S+)`)
var destAddrLocalRegexp = regexp.MustCompile(`^fib daddr type local`)
var destPortRegexp = regexp.MustCompile(`^(tcp|udp|sctp) dport (\d+)`)
var destIPOnlyLookupRegexp = regexp.MustCompile(`^ip6* daddr @(\S+)`)
var destLookupRegexp = regexp.MustCompile(`^ip6* daddr \. meta l4proto \. th dport @(\S+)`)
var destSourceLookupRegexp = regexp.MustCompile(`^ip6* daddr \. meta l4proto \. th dport \. ip6* saddr @(\S+)`)
var destPortLookupRegexp = regexp.MustCompile(`^meta l4proto \. th dport @(\S+)`)
var destDispatchRegexp = regexp.MustCompile(`^ip6* daddr \. meta l4proto \. th dport vmap @(\S+)$`)
var destPortDispatchRegexp = regexp.MustCompile(`^meta l4proto \. th dport vmap @(\S+)$`)
var sourceAddrRegexp = regexp.MustCompile(`^ip6* saddr (!= )?(\S+)`)
var sourceAddrLocalRegexp = regexp.MustCompile(`^fib saddr type local`)
var endpointVMAPRegexp = regexp.MustCompile(`^numgen random mod \d+ vmap \{(.*)\}$`)
var endpointVMapEntryRegexp = regexp.MustCompile(`\d+ : goto (\S+)`)
var masqueradeRegexp = regexp.MustCompile(`^jump ` + kubeMarkMasqChain + `$`)
var jumpRegexp = regexp.MustCompile(`^(jump|goto) (\S+)$`)
var returnRegexp = regexp.MustCompile(`^return$`)
var verdictRegexp = regexp.MustCompile(`^(drop|reject)$`)
var dnatRegexp = regexp.MustCompile(`^meta l4proto (tcp|udp|sctp) dnat to (\S+)$`)
var ignoredRegexp = regexp.MustCompile(strings.Join(
[]string{
// Ignore comments (which can only appear at the end of a rule).
` *comment "[^"]*"$`,
// The trace tests only check new connections, so for our purposes, this
// check always succeeds (and thus can be ignored).
`^ct state new`,
// Likewise, this rule never matches and thus never drops anything, and so
// can be ignored.
`^ct state invalid drop$`,
},
"|",
))
// runChain runs the given packet through the rules in the given table and chain, updating
// tracer's internal state accordingly. It returns true if it hits a terminal action.
func (tracer *nftablesTracer) runChain(chname, sourceIP, protocol, destIP, destPort string) bool {
ch := tracer.nft.Table.Chains[chname]
if ch == nil {
tracer.t.Errorf("unknown chain %q", chname)
return true
}
for _, ruleObj := range ch.Rules {
rule := ignoredRegexp.ReplaceAllLiteralString(ruleObj.Rule, "")
for rule != "" {
rule = strings.TrimLeft(rule, " ")
// Note that the order of (some of) the cases is important. e.g.,
// masqueradeRegexp must be checked before jumpRegexp, since
// jumpRegexp would also match masqueradeRegexp but do the wrong
// thing with it.
switch {
case destIPOnlyLookupRegexp.MatchString(rule):
// `^ip6* daddr @(\S+)`
// Tests whether destIP is a member of the indicated set.
match := destIPOnlyLookupRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
set := match[1]
if tracer.matchDestIPOnly(tracer.nft.Table.Sets[set].Elements, destIP) == nil {
rule = ""
break
}
case destSourceLookupRegexp.MatchString(rule):
// `^ip6* daddr . meta l4proto . th dport . ip6* saddr @(\S+)`
// Tests whether "destIP . protocol . destPort . sourceIP" is
// a member of the indicated set.
match := destSourceLookupRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
set := match[1]
if tracer.matchDestAndSource(tracer.nft.Table.Sets[set].Elements, destIP, protocol, destPort, sourceIP) == nil {
rule = ""
break
}
case destLookupRegexp.MatchString(rule):
// `^ip6* daddr . meta l4proto . th dport @(\S+)`
// Tests whether "destIP . protocol . destPort" is a member
// of the indicated set.
match := destLookupRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
set := match[1]
if tracer.matchDest(tracer.nft.Table.Sets[set].Elements, destIP, protocol, destPort) == nil {
rule = ""
break
}
case destPortLookupRegexp.MatchString(rule):
// `^meta l4proto . th dport @(\S+)`
// Tests whether "protocol . destPort" is a member of the
// indicated set.
match := destPortLookupRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
set := match[1]
if tracer.matchDestPort(tracer.nft.Table.Sets[set].Elements, protocol, destPort) == nil {
rule = ""
break
}
case destDispatchRegexp.MatchString(rule):
// `^ip6* daddr \. meta l4proto \. th dport vmap @(\S+)$`
// Looks up "destIP . protocol . destPort" in the indicated
// verdict map, and if found, runs the assocated verdict.
match := destDispatchRegexp.FindStringSubmatch(rule)
mapName := match[1]
element := tracer.matchDest(tracer.nft.Table.Maps[mapName].Elements, destIP, protocol, destPort)
if element == nil {
rule = ""
break
} else {
rule = element.Value[0]
}
case destPortDispatchRegexp.MatchString(rule):
// `^meta l4proto \. th dport vmap @(\S+)$`
// Looks up "protocol . destPort" in the indicated verdict map,
// and if found, runs the assocated verdict.
match := destPortDispatchRegexp.FindStringSubmatch(rule)
mapName := match[1]
element := tracer.matchDestPort(tracer.nft.Table.Maps[mapName].Elements, protocol, destPort)
if element == nil {
rule = ""
break
} else {
rule = element.Value[0]
}
case destAddrRegexp.MatchString(rule):
// `^ip6* daddr (!= )?(\S+)`
// Tests whether destIP does/doesn't match a literal.
match := destAddrRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
not, ip := match[1], match[2]
if !tracer.addressMatches(destIP, not, ip) {
rule = ""
break
}
case destAddrLocalRegexp.MatchString(rule):
// `^fib daddr type local`
// Tests whether destIP is a local IP.
match := destAddrLocalRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
if !tracer.nodeIPs.Has(destIP) {
rule = ""
break
}
case destPortRegexp.MatchString(rule):
// `^(tcp|udp|sctp) dport (\d+)`
// Tests whether destPort matches a literal.
match := destPortRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
proto, port := match[1], match[2]
if protocol != proto || destPort != port {
rule = ""
break
}
case sourceAddrRegexp.MatchString(rule):
// `^ip6* saddr (!= )?(\S+)`
// Tests whether sourceIP does/doesn't match a literal.
match := sourceAddrRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
not, ip := match[1], match[2]
if !tracer.addressMatches(sourceIP, not, ip) {
rule = ""
break
}
case sourceAddrLocalRegexp.MatchString(rule):
// `^fib saddr type local`
// Tests whether sourceIP is a local IP.
match := sourceAddrLocalRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
if !tracer.nodeIPs.Has(sourceIP) {
rule = ""
break
}
case masqueradeRegexp.MatchString(rule):
// `^jump mark-for-masquerade$`
// Mark for masquerade: we just treat the jump rule itself as
// being what creates the mark, rather than trying to handle
// the rules inside that chain and the "masquerading" chain.
match := jumpRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
tracer.matches = append(tracer.matches, ruleObj.Rule)
tracer.markMasq = true
case jumpRegexp.MatchString(rule):
// `^(jump|goto) (\S+)$`
// Jumps to another chain.
match := jumpRegexp.FindStringSubmatch(rule)
rule = strings.TrimPrefix(rule, match[0])
action, destChain := match[1], match[2]
tracer.matches = append(tracer.matches, ruleObj.Rule)
terminated := tracer.runChain(destChain, sourceIP, protocol, destIP, destPort)
if terminated {
// destChain reached a terminal statement, so we
// terminate too.
return true
} else if action == "goto" {
// After a goto, return to our calling chain
// (without terminating) rather than continuing
// with this chain.
return false
}
case verdictRegexp.MatchString(rule):
// `^(drop|reject)$`
// Drop/reject the packet and terminate processing.
match := verdictRegexp.FindStringSubmatch(rule)
verdict := match[1]
tracer.matches = append(tracer.matches, ruleObj.Rule)
tracer.outputs = append(tracer.outputs, strings.ToUpper(verdict))
return true
case returnRegexp.MatchString(rule):
// `^return$`
// Returns to the calling chain.
tracer.matches = append(tracer.matches, ruleObj.Rule)
return false
case dnatRegexp.MatchString(rule):
// `meta l4proto (tcp|udp|sctp) dnat to (\S+)`
// DNAT to an endpoint IP and terminate processing.
match := dnatRegexp.FindStringSubmatch(rule)
destEndpoint := match[2]
tracer.matches = append(tracer.matches, ruleObj.Rule)
tracer.outputs = append(tracer.outputs, destEndpoint)
return true
case endpointVMAPRegexp.MatchString(rule):
// `^numgen random mod \d+ vmap \{(.*)\}$`
// Selects a random endpoint and jumps to it. For tracePacket's
// purposes, we jump to *all* of the endpoints.
match := endpointVMAPRegexp.FindStringSubmatch(rule)
elements := match[1]
for _, match = range endpointVMapEntryRegexp.FindAllStringSubmatch(elements, -1) {
// `\d+ : goto (\S+)`
destChain := match[1]
tracer.matches = append(tracer.matches, ruleObj.Rule)
// Ignore return value; we know each endpoint has a
// terminating dnat verdict, but we want to gather all
// of the endpoints into tracer.output.
_ = tracer.runChain(destChain, sourceIP, protocol, destIP, destPort)
}
return true
default:
tracer.t.Errorf("unmatched rule: %s", ruleObj.Rule)
rule = ""
}
}
}
return false
}
// tracePacket determines what would happen to a packet with the given sourceIP, destIP,
// and destPort, given the indicated iptables ruleData. nodeIPs are the local node IPs (for
// rules matching "local"). (The protocol value should be lowercase as in nftables
// rules, not uppercase as in corev1.)
//
// The return values are: an array of matched rules (for debugging), the final packet
// destinations (a comma-separated list of IPs, or one of the special targets "ACCEPT",
// "DROP", or "REJECT"), and whether the packet would be masqueraded.
func tracePacket(t *testing.T, nft *knftables.Fake, sourceIP, protocol, destIP, destPort string, nodeIPs []string) ([]string, string, bool) {
tracer := newNFTablesTracer(t, nft, nodeIPs)
// Collect "base chains" (ie, the chains that are run by netfilter directly rather
// than only being run when they are jumped to). Skip postrouting because it only
// does masquerading and we handle that separately.
var baseChains []string
for chname, ch := range nft.Table.Chains {
if ch.Priority != nil && chname != "nat-postrouting" {
baseChains = append(baseChains, chname)
}
}
// Sort by priority
sort.Slice(baseChains, func(i, j int) bool {
// FIXME: IPv4 vs IPv6 doesn't actually matter here
iprio, _ := knftables.ParsePriority(knftables.IPv4Family, string(*nft.Table.Chains[baseChains[i]].Priority))
jprio, _ := knftables.ParsePriority(knftables.IPv4Family, string(*nft.Table.Chains[baseChains[j]].Priority))
return iprio < jprio
})
for _, chname := range baseChains {
terminated := tracer.runChain(chname, sourceIP, protocol, destIP, destPort)
if terminated {
break
}
}
return tracer.matches, strings.Join(tracer.outputs, ", "), tracer.markMasq
}
type packetFlowTest struct {
name string
sourceIP string
protocol v1.Protocol
destIP string
destPort int
output string
masq bool
}
func runPacketFlowTests(t *testing.T, line string, nft *knftables.Fake, nodeIPs []string, testCases []packetFlowTest) {
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
protocol := strings.ToLower(string(tc.protocol))
if protocol == "" {
protocol = "tcp"
}
matches, output, masq := tracePacket(t, nft, tc.sourceIP, protocol, tc.destIP, fmt.Sprintf("%d", tc.destPort), nodeIPs)
var errors []string
if output != tc.output {
errors = append(errors, fmt.Sprintf("wrong output: expected %q got %q", tc.output, output))
}
if masq != tc.masq {
errors = append(errors, fmt.Sprintf("wrong masq: expected %v got %v", tc.masq, masq))
}
if errors != nil {
t.Errorf("Test %q of a packet from %s to %s:%d%s got result:\n%s\n\nBy matching:\n%s\n\n",
tc.name, tc.sourceIP, tc.destIP, tc.destPort, line, strings.Join(errors, "\n"), strings.Join(matches, "\n"))
}
})
}
}
// helpers_test unit tests
var testInput = dedent.Dedent(`
add table ip testing { comment "rules for kube-proxy" ; }
add chain ip testing forward
add rule ip testing forward ct state invalid drop
add chain ip testing mark-for-masquerade
add rule ip testing mark-for-masquerade mark set mark or 0x4000
add chain ip testing masquerading
add rule ip testing masquerading mark and 0x4000 == 0 return
add rule ip testing masquerading mark set mark xor 0x4000
add rule ip testing masquerading masquerade fully-random
add set ip testing firewall { type ipv4_addr . inet_proto . inet_service ; comment "destinations that are subject to LoadBalancerSourceRanges" ; }
add set ip testing firewall-allow { type ipv4_addr . inet_proto . inet_service . ipv4_addr ; flags interval ; comment "destinations+sources that are allowed by LoadBalancerSourceRanges" ; }
add chain ip testing firewall-check
add chain ip testing firewall-allow-check
add rule ip testing firewall-allow-check ip daddr . meta l4proto . th dport . ip saddr @firewall-allow return
add rule ip testing firewall-allow-check drop
add rule ip testing firewall-check ip daddr . meta l4proto . th dport @firewall jump firewall-allow-check
# svc1
add chain ip testing service-ULMVA6XW-ns1/svc1/tcp/p80
add rule ip testing service-ULMVA6XW-ns1/svc1/tcp/p80 ip daddr 172.30.0.41 tcp dport 80 ip saddr != 10.0.0.0/8 jump mark-for-masquerade
add rule ip testing service-ULMVA6XW-ns1/svc1/tcp/p80 numgen random mod 1 vmap { 0 : goto endpoint-5OJB2KTY-ns1/svc1/tcp/p80__10.180.0.1/80 }
add chain ip testing endpoint-5OJB2KTY-ns1/svc1/tcp/p80__10.180.0.1/80
add rule ip testing endpoint-5OJB2KTY-ns1/svc1/tcp/p80__10.180.0.1/80 ip saddr 10.180.0.1 jump mark-for-masquerade
add rule ip testing endpoint-5OJB2KTY-ns1/svc1/tcp/p80__10.180.0.1/80 meta l4proto tcp dnat to 10.180.0.1:80
add element ip testing service-ips { 172.30.0.41 . tcp . 80 : goto service-ULMVA6XW-ns1/svc1/tcp/p80 }
# svc2
add chain ip testing service-42NFTM6N-ns2/svc2/tcp/p80
add rule ip testing service-42NFTM6N-ns2/svc2/tcp/p80 ip daddr 172.30.0.42 tcp dport 80 ip saddr != 10.0.0.0/8 jump mark-for-masquerade
add rule ip testing service-42NFTM6N-ns2/svc2/tcp/p80 numgen random mod 1 vmap { 0 : goto endpoint-SGOXE6O3-ns2/svc2/tcp/p80__10.180.0.2/80 }
add chain ip testing external-42NFTM6N-ns2/svc2/tcp/p80
add rule ip testing external-42NFTM6N-ns2/svc2/tcp/p80 ip saddr 10.0.0.0/8 goto service-42NFTM6N-ns2/svc2/tcp/p80 comment "short-circuit pod traffic"
add rule ip testing external-42NFTM6N-ns2/svc2/tcp/p80 fib saddr type local jump mark-for-masquerade comment "masquerade local traffic"
add rule ip testing external-42NFTM6N-ns2/svc2/tcp/p80 fib saddr type local goto service-42NFTM6N-ns2/svc2/tcp/p80 comment "short-circuit local traffic"
add chain ip testing endpoint-SGOXE6O3-ns2/svc2/tcp/p80__10.180.0.2/80
add rule ip testing endpoint-SGOXE6O3-ns2/svc2/tcp/p80__10.180.0.2/80 ip saddr 10.180.0.2 jump mark-for-masquerade
add rule ip testing endpoint-SGOXE6O3-ns2/svc2/tcp/p80__10.180.0.2/80 meta l4proto tcp dnat to 10.180.0.2:80
add element ip testing service-ips { 172.30.0.42 . tcp . 80 : goto service-42NFTM6N-ns2/svc2/tcp/p80 }
add element ip testing service-ips { 192.168.99.22 . tcp . 80 : goto external-42NFTM6N-ns2/svc2/tcp/p80 }
add element ip testing service-ips { 1.2.3.4 . tcp . 80 : goto external-42NFTM6N-ns2/svc2/tcp/p80 }
add element ip testing service-nodeports { tcp . 3001 : goto external-42NFTM6N-ns2/svc2/tcp/p80 }
add element ip testing no-endpoint-nodeports { tcp . 3001 comment "ns2/svc2:p80" : drop }
add element ip testing no-endpoint-services { 1.2.3.4 . tcp . 80 comment "ns2/svc2:p80" : drop }
add element ip testing no-endpoint-services { 192.168.99.22 . tcp . 80 comment "ns2/svc2:p80" : drop }
`)
var testExpected = dedent.Dedent(`
add table ip testing { comment "rules for kube-proxy" ; }
add chain ip testing endpoint-5OJB2KTY-ns1/svc1/tcp/p80__10.180.0.1/80
add chain ip testing endpoint-SGOXE6O3-ns2/svc2/tcp/p80__10.180.0.2/80
add chain ip testing external-42NFTM6N-ns2/svc2/tcp/p80
add chain ip testing firewall-allow-check
add chain ip testing firewall-check
add chain ip testing forward
add chain ip testing mark-for-masquerade
add chain ip testing masquerading
add chain ip testing service-42NFTM6N-ns2/svc2/tcp/p80
add chain ip testing service-ULMVA6XW-ns1/svc1/tcp/p80
add rule ip testing endpoint-5OJB2KTY-ns1/svc1/tcp/p80__10.180.0.1/80 ip saddr 10.180.0.1 jump mark-for-masquerade
add rule ip testing endpoint-5OJB2KTY-ns1/svc1/tcp/p80__10.180.0.1/80 meta l4proto tcp dnat to 10.180.0.1:80
add rule ip testing endpoint-SGOXE6O3-ns2/svc2/tcp/p80__10.180.0.2/80 ip saddr 10.180.0.2 jump mark-for-masquerade
add rule ip testing endpoint-SGOXE6O3-ns2/svc2/tcp/p80__10.180.0.2/80 meta l4proto tcp dnat to 10.180.0.2:80
add rule ip testing external-42NFTM6N-ns2/svc2/tcp/p80 ip saddr 10.0.0.0/8 goto service-42NFTM6N-ns2/svc2/tcp/p80 comment "short-circuit pod traffic"
add rule ip testing external-42NFTM6N-ns2/svc2/tcp/p80 fib saddr type local jump mark-for-masquerade comment "masquerade local traffic"
add rule ip testing external-42NFTM6N-ns2/svc2/tcp/p80 fib saddr type local goto service-42NFTM6N-ns2/svc2/tcp/p80 comment "short-circuit local traffic"
add rule ip testing firewall-allow-check ip daddr . meta l4proto . th dport . ip saddr @firewall-allow return
add rule ip testing firewall-allow-check drop
add rule ip testing firewall-check ip daddr . meta l4proto . th dport @firewall jump firewall-allow-check
add rule ip testing forward ct state invalid drop
add rule ip testing mark-for-masquerade mark set mark or 0x4000
add rule ip testing masquerading mark and 0x4000 == 0 return
add rule ip testing masquerading mark set mark xor 0x4000
add rule ip testing masquerading masquerade fully-random
add rule ip testing service-42NFTM6N-ns2/svc2/tcp/p80 ip daddr 172.30.0.42 tcp dport 80 ip saddr != 10.0.0.0/8 jump mark-for-masquerade
add rule ip testing service-42NFTM6N-ns2/svc2/tcp/p80 numgen random mod 1 vmap { 0 : goto endpoint-SGOXE6O3-ns2/svc2/tcp/p80__10.180.0.2/80 }
add rule ip testing service-ULMVA6XW-ns1/svc1/tcp/p80 ip daddr 172.30.0.41 tcp dport 80 ip saddr != 10.0.0.0/8 jump mark-for-masquerade
add rule ip testing service-ULMVA6XW-ns1/svc1/tcp/p80 numgen random mod 1 vmap { 0 : goto endpoint-5OJB2KTY-ns1/svc1/tcp/p80__10.180.0.1/80 }
add set ip testing firewall { type ipv4_addr . inet_proto . inet_service ; comment "destinations that are subject to LoadBalancerSourceRanges" ; }
add set ip testing firewall-allow { type ipv4_addr . inet_proto . inet_service . ipv4_addr ; flags interval ; comment "destinations+sources that are allowed by LoadBalancerSourceRanges" ; }
add element ip testing no-endpoint-nodeports { tcp . 3001 comment "ns2/svc2:p80" : drop }
add element ip testing no-endpoint-services { 1.2.3.4 . tcp . 80 comment "ns2/svc2:p80" : drop }
add element ip testing no-endpoint-services { 192.168.99.22 . tcp . 80 comment "ns2/svc2:p80" : drop }
add element ip testing service-ips { 1.2.3.4 . tcp . 80 : goto external-42NFTM6N-ns2/svc2/tcp/p80 }
add element ip testing service-ips { 172.30.0.41 . tcp . 80 : goto service-ULMVA6XW-ns1/svc1/tcp/p80 }
add element ip testing service-ips { 172.30.0.42 . tcp . 80 : goto service-42NFTM6N-ns2/svc2/tcp/p80 }
add element ip testing service-ips { 192.168.99.22 . tcp . 80 : goto external-42NFTM6N-ns2/svc2/tcp/p80 }
add element ip testing service-nodeports { tcp . 3001 : goto external-42NFTM6N-ns2/svc2/tcp/p80 }
`)
func Test_sortNFTablesTransaction(t *testing.T) {
output := sortNFTablesTransaction(testInput)
expected := strings.TrimSpace(testExpected)
diff := cmp.Diff(expected, output)
if diff != "" {
t.Errorf("output does not match expected:\n%s", diff)
}
}
func Test_diffNFTablesTransaction(t *testing.T) {
diff := diffNFTablesTransaction(testInput, testExpected)
if diff != "" {
t.Errorf("found diff in inputs that should have been equal:\n%s", diff)
}
notExpected := strings.Join(strings.Split(testExpected, "\n")[2:], "\n")
diff = diffNFTablesTransaction(testInput, notExpected)
if diff == "" {
t.Errorf("found no diff in inputs that should have been different")
}
}
func Test_diffNFTablesChain(t *testing.T) {
fake := knftables.NewFake(knftables.IPv4Family, "testing")
tx := fake.NewTransaction()
tx.Add(&knftables.Table{})
tx.Add(&knftables.Chain{
Name: "mark-masq-chain",
})
tx.Add(&knftables.Chain{
Name: "masquerade-chain",
})
tx.Add(&knftables.Chain{
Name: "empty-chain",
})
tx.Add(&knftables.Rule{
Chain: "mark-masq-chain",
Rule: "mark set mark or 0x4000",
})
tx.Add(&knftables.Rule{
Chain: "masquerade-chain",
Rule: "mark and 0x4000 == 0 return",
})
tx.Add(&knftables.Rule{
Chain: "masquerade-chain",
Rule: "mark set mark xor 0x4000",
})
tx.Add(&knftables.Rule{
Chain: "masquerade-chain",
Rule: "masquerade fully-random",
})
err := fake.Run(context.Background(), tx)
if err != nil {
t.Fatalf("Unexpected error running transaction: %v", err)
}
diff := diffNFTablesChain(fake, "mark-masq-chain", "mark set mark or 0x4000")
if diff != "" {
t.Errorf("unexpected difference in mark-masq-chain:\n%s", diff)
}
diff = diffNFTablesChain(fake, "mark-masq-chain", "mark set mark or 0x4000\n")
if diff != "" {
t.Errorf("unexpected difference in mark-masq-chain with trailing newline:\n%s", diff)
}
diff = diffNFTablesChain(fake, "masquerade-chain", "mark and 0x4000 == 0 return\nmark set mark xor 0x4000\nmasquerade fully-random")
if diff != "" {
t.Errorf("unexpected difference in masquerade-chain:\n%s", diff)
}
diff = diffNFTablesChain(fake, "masquerade-chain", "mark set mark xor 0x4000\nmasquerade fully-random")
if diff == "" {
t.Errorf("unexpected lack of difference in wrong masquerade-chain")
}
diff = diffNFTablesChain(fake, "empty-chain", "")
if diff != "" {
t.Errorf("unexpected difference in empty-chain:\n%s", diff)
}
diff = diffNFTablesChain(fake, "empty-chain", "\n")
if diff != "" {
t.Errorf("unexpected difference in empty-chain with trailing newline:\n%s", diff)
}
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -33,6 +33,12 @@ type LocalTrafficDetector interface {
// IfNotLocal returns iptables arguments that will match traffic that is not from a pod
IfNotLocal() []string
// IfLocalNFT returns nftables arguments that will match traffic from a pod
IfLocalNFT() []string
// IfNotLocalNFT returns nftables arguments that will match traffic that is not from a pod
IfNotLocalNFT() []string
}
type noOpLocalDetector struct{}
@@ -54,21 +60,39 @@ func (n *noOpLocalDetector) IfNotLocal() []string {
return nil // no-op; matches all traffic
}
func (n *noOpLocalDetector) IfLocalNFT() []string {
return nil // no-op; matches all traffic
}
func (n *noOpLocalDetector) IfNotLocalNFT() []string {
return nil // no-op; matches all traffic
}
type detectLocalByCIDR struct {
ifLocal []string
ifNotLocal []string
ifLocal []string
ifNotLocal []string
ifLocalNFT []string
ifNotLocalNFT []string
}
// NewDetectLocalByCIDR implements the LocalTrafficDetector interface using a CIDR. This can be used when a single CIDR
// range can be used to capture the notion of local traffic.
func NewDetectLocalByCIDR(cidr string) (LocalTrafficDetector, error) {
_, _, err := netutils.ParseCIDRSloppy(cidr)
_, parsed, err := netutils.ParseCIDRSloppy(cidr)
if err != nil {
return nil, err
}
nftFamily := "ip"
if netutils.IsIPv6CIDR(parsed) {
nftFamily = "ip6"
}
return &detectLocalByCIDR{
ifLocal: []string{"-s", cidr},
ifNotLocal: []string{"!", "-s", cidr},
ifLocal: []string{"-s", cidr},
ifNotLocal: []string{"!", "-s", cidr},
ifLocalNFT: []string{nftFamily, "saddr", cidr},
ifNotLocalNFT: []string{nftFamily, "saddr", "!=", cidr},
}, nil
}
@@ -84,9 +108,19 @@ func (d *detectLocalByCIDR) IfNotLocal() []string {
return d.ifNotLocal
}
func (d *detectLocalByCIDR) IfLocalNFT() []string {
return d.ifLocalNFT
}
func (d *detectLocalByCIDR) IfNotLocalNFT() []string {
return d.ifNotLocalNFT
}
type detectLocalByBridgeInterface struct {
ifLocal []string
ifNotLocal []string
ifLocal []string
ifNotLocal []string
ifLocalNFT []string
ifNotLocalNFT []string
}
// NewDetectLocalByBridgeInterface implements the LocalTrafficDetector interface using a bridge interface name.
@@ -96,8 +130,10 @@ func NewDetectLocalByBridgeInterface(interfaceName string) (LocalTrafficDetector
return nil, fmt.Errorf("no bridge interface name set")
}
return &detectLocalByBridgeInterface{
ifLocal: []string{"-i", interfaceName},
ifNotLocal: []string{"!", "-i", interfaceName},
ifLocal: []string{"-i", interfaceName},
ifNotLocal: []string{"!", "-i", interfaceName},
ifLocalNFT: []string{"iif", interfaceName},
ifNotLocalNFT: []string{"iif", "!=", interfaceName},
}, nil
}
@@ -113,9 +149,19 @@ func (d *detectLocalByBridgeInterface) IfNotLocal() []string {
return d.ifNotLocal
}
func (d *detectLocalByBridgeInterface) IfLocalNFT() []string {
return d.ifLocalNFT
}
func (d *detectLocalByBridgeInterface) IfNotLocalNFT() []string {
return d.ifNotLocalNFT
}
type detectLocalByInterfaceNamePrefix struct {
ifLocal []string
ifNotLocal []string
ifLocal []string
ifNotLocal []string
ifLocalNFT []string
ifNotLocalNFT []string
}
// NewDetectLocalByInterfaceNamePrefix implements the LocalTrafficDetector interface using an interface name prefix.
@@ -126,8 +172,10 @@ func NewDetectLocalByInterfaceNamePrefix(interfacePrefix string) (LocalTrafficDe
return nil, fmt.Errorf("no interface prefix set")
}
return &detectLocalByInterfaceNamePrefix{
ifLocal: []string{"-i", interfacePrefix + "+"},
ifNotLocal: []string{"!", "-i", interfacePrefix + "+"},
ifLocal: []string{"-i", interfacePrefix + "+"},
ifNotLocal: []string{"!", "-i", interfacePrefix + "+"},
ifLocalNFT: []string{"iif", interfacePrefix + "*"},
ifNotLocalNFT: []string{"iif", "!=", interfacePrefix + "*"},
}, nil
}
@@ -142,3 +190,11 @@ func (d *detectLocalByInterfaceNamePrefix) IfLocal() []string {
func (d *detectLocalByInterfaceNamePrefix) IfNotLocal() []string {
return d.ifNotLocal
}
func (d *detectLocalByInterfaceNamePrefix) IfLocalNFT() []string {
return d.ifLocalNFT
}
func (d *detectLocalByInterfaceNamePrefix) IfNotLocalNFT() []string {
return d.ifNotLocalNFT
}

View File

@@ -77,6 +77,25 @@ type KubeProxyIPVSConfiguration struct {
UDPTimeout metav1.Duration `json:"udpTimeout"`
}
// KubeProxyNFTablesConfiguration contains nftables-related configuration
// details for the Kubernetes proxy server.
type KubeProxyNFTablesConfiguration struct {
// masqueradeBit is the bit of the iptables fwmark space to use for SNAT if using
// the nftables proxy mode. Values must be within the range [0, 31].
MasqueradeBit *int32 `json:"masqueradeBit"`
// masqueradeAll tells kube-proxy to SNAT all traffic sent to Service cluster IPs,
// when using the nftables mode. This may be required with some CNI plugins.
MasqueradeAll bool `json:"masqueradeAll"`
// syncPeriod is an interval (e.g. '5s', '1m', '2h22m') indicating how frequently
// various re-synchronizing and cleanup operations are performed. Must be greater
// than 0.
SyncPeriod metav1.Duration `json:"syncPeriod"`
// minSyncPeriod is the minimum period between iptables rule resyncs (e.g. '5s',
// '1m', '2h22m'). A value of 0 means every Service or EndpointSlice change will
// result in an immediate iptables resync.
MinSyncPeriod metav1.Duration `json:"minSyncPeriod"`
}
// KubeProxyConntrackConfiguration contains conntrack settings for
// the Kubernetes proxy server.
type KubeProxyConntrackConfiguration struct {
@@ -189,6 +208,8 @@ type KubeProxyConfiguration struct {
IPTables KubeProxyIPTablesConfiguration `json:"iptables"`
// ipvs contains ipvs-related configuration options.
IPVS KubeProxyIPVSConfiguration `json:"ipvs"`
// nftables contains nftables-related configuration options.
NFTables KubeProxyNFTablesConfiguration `json:"nftables"`
// winkernel contains winkernel-related configuration options.
Winkernel KubeProxyWinkernelConfiguration `json:"winkernel"`

View File

@@ -57,6 +57,7 @@ func (in *KubeProxyConfiguration) DeepCopyInto(out *KubeProxyConfiguration) {
in.Logging.DeepCopyInto(&out.Logging)
in.IPTables.DeepCopyInto(&out.IPTables)
in.IPVS.DeepCopyInto(&out.IPVS)
in.NFTables.DeepCopyInto(&out.NFTables)
out.Winkernel = in.Winkernel
out.DetectLocal = in.DetectLocal
if in.NodePortAddresses != nil {
@@ -184,6 +185,29 @@ func (in *KubeProxyIPVSConfiguration) DeepCopy() *KubeProxyIPVSConfiguration {
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *KubeProxyNFTablesConfiguration) DeepCopyInto(out *KubeProxyNFTablesConfiguration) {
*out = *in
if in.MasqueradeBit != nil {
in, out := &in.MasqueradeBit, &out.MasqueradeBit
*out = new(int32)
**out = **in
}
out.SyncPeriod = in.SyncPeriod
out.MinSyncPeriod = in.MinSyncPeriod
return
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new KubeProxyNFTablesConfiguration.
func (in *KubeProxyNFTablesConfiguration) DeepCopy() *KubeProxyNFTablesConfiguration {
if in == nil {
return nil
}
out := new(KubeProxyNFTablesConfiguration)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *KubeProxyWinkernelConfiguration) DeepCopyInto(out *KubeProxyWinkernelConfiguration) {
*out = *in

201
vendor/github.com/danwinship/knftables/LICENSE generated vendored Normal file
View File

@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

177
vendor/github.com/danwinship/knftables/README.md generated vendored Normal file
View File

@@ -0,0 +1,177 @@
# knftables: a golang nftables library
This is a library for using nftables from Go.
It is not intended to support arbitrary use cases, but instead
specifically focuses on supporing Kubernetes components which are
using nftables in the way that nftables is supposed to be used (as
opposed to using nftables in a naively-translated-from-iptables way,
or using nftables to do totally valid things that aren't the sorts of
things Kubernetes components are likely to need to do).
It is still under development and is not API stable.
## Usage
Create an `Interface` object to manage operations on a single nftables
table:
```golang
nft, err := knftables.New(knftables.IPv4Family, "my-table")
if err != nil {
return fmt.Errorf("no nftables support: %v", err)
}
```
You can use the `List`, `ListRules`, and `ListElements` methods on the
`Interface` to check if objects exist. `List` returns the names of
`"chains"`, `"sets"`, or `"maps"` in the table, while `ListElements`
returns `Element` objects and `ListRules` returns *partial* `Rule`
objects.
```golang
chains, err := nft.List(ctx, "chains")
if err != nil {
return fmt.Errorf("could not list chains: %v", err)
}
FIXME
elements, err := nft.ListElements(ctx, "map", "mymap")
if err != nil {
return fmt.Errorf("could not list map elements: %v", err)
}
FIXME
```
To make changes, create a `Transaction`, add the appropriate
operations to the transaction, and then call `nft.Run` on it:
```golang
tx := nft.NewTransaction()
tx.Add(&knftables.Chain{
Name: "mychain",
Comment: knftables.PtrTo("this is my chain"),
})
tx.Flush(&knftables.Chain{
Name: "mychain",
})
var destIP net.IP
var destPort uint16
...
tx.Add(&knftables.Rule{
Chain: "mychain",
Rule: knftables.Concat(
"ip daddr", destIP,
"ip protocol", "tcp",
"th port", destPort,
"jump", destChain,
)
})
err := nft.Run(context, tx)
```
If any operation in the transaction would fail, then `Run()` will
return an error and the entire transaction will be ignored. You can
use the `knftables.IsNotFound()` and `knftables.IsAlreadyExists()`
methods to check for those well-known error types. In a large
transaction, there is no supported way to determine exactly which
operation failed.
## `knftables.Transaction` operations
`knftables.Transaction` operations correspond to the top-level commands
in the `nft` binary. Currently-supported operations are:
- `tx.Add()`: adds an object, which may already exist, as with `nft add`
- `tx.Create()`: creates an object, which must not already exist, as with `nft create`
- `tx.Flush()`: flushes the contents of a table/chain/set/map, as with `nft flush`
- `tx.Delete()`: deletes an object, as with `nft delete`
- `tx.Insert()`: inserts a rule before another rule, as with `nft insert rule`
- `tx.Replace()`: replaces a rule, as with `nft replace rule`
## Objects
The `Transaction` methods take arguments of type `knftables.Object`.
The currently-supported objects are:
- `Table`
- `Chain`
- `Rule`
- `Set`
- `Map`
- `Element`
Optional fields in objects can be filled in with the help of the
`PtrTo()` function, which just returns a pointer to its argument.
`Concat()` can be used to concatenate a series of strings, `[]string`
arrays, and other arguments (including numbers, `net.IP`s /
`net.IPNet`s, and anything else that can be formatted usefully via
`fmt.Sprintf("%s")`) together into a single string. This is often
useful when constructing `Rule`s.
## `knftables.Fake`
There is a fake (in-memory) implementation of `knftables.Interface`
for use in unit tests. Use `knftables.NewFake()` instead of
`knftables.New()` to create it, and then it should work mostly the
same. See `fake.go` for more details of the public APIs for examining
the current state of the fake nftables database.
Note that at the present time, `fake.Run()` is not actually
transactional, so unit tests that rely on things not being changed if
a transaction fails partway through will not work as expected.
## Missing APIs
Various top-level object types are not yet supported (notably the
"stateful objects" like `counter`).
Most IPTables libraries have an API for "add this rule only if it
doesn't already exist", but that does not seem as useful in nftables
(or at least "in nftables as used by Kubernetes-ish components that
aren't just blindly copying over old iptables APIs"), because chains
tend to have static rules and dynamic sets/maps, rather than having
dynamic rules. If you aren't sure if a chain has the correct rules,
you can just `Flush` it and recreate all of the rules.
I've considered changing the semantics of `tx.Add(obj)` so that
`obj.Handle` is filled in with the new object's handle on return from
`Run()`, for ease of deleting later. (This would be implemented by
using the `--handle` (`-a`) and `--echo` (`-e`) flags to `nft add`.)
However, this would require potentially difficult parsing of the `nft`
output. `ListRules` fills in the handles of the rules it returns, so
it's possible to find out a rule's handle after the fact that way. For
other supported object types, either handles don't exist (`Element`)
or you don't really need to know their handles because it's possible
to delete by name instead (`Table`, `Chain`, `Set`, `Map`).
The "destroy" (delete-without-ENOENT) command that exists in newer
versions of `nft` is not currently supported because it would be
unexpectedly heavyweight to emulate on systems that don't have it, so
it is better (for now) to force callers to implement it by hand.
`ListRules` returns `Rule` objects without the `Rule` field filled in,
because it uses the JSON API to list the rules, but there is no easy
way to convert the JSON rule representation back into plaintext form.
This means that it is only useful when either (a) you know the order
of the rules in the chain, but want to know their handles, or (b) you
can recognize the rules you are looking for by their comments, rather
than the rule bodies.
# Design Notes
The library works by invoking the `nft` binary. "Write" operations are
implemented with the ordinary plain-text API, while "read" operations
are implemented with the JSON API, for parseability.
The fact that the API uses functions and objects (e.g.
`tx.Add(&knftables.Chain{...})`) rather than just specifying everything
as textual input to `nft` (e.g. `tx.Exec("add chain ...")`) is mostly
just because it's _much_ easier to have a fake implementation for unit
tests this way.

93
vendor/github.com/danwinship/knftables/error.go generated vendored Normal file
View File

@@ -0,0 +1,93 @@
/*
Copyright 2023 Red Hat, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package knftables
import (
"errors"
"fmt"
"os/exec"
"strings"
"syscall"
)
type nftablesError struct {
wrapped error
msg string
errno syscall.Errno
}
// wrapError wraps an error resulting from running nft
func wrapError(err error) error {
nerr := &nftablesError{wrapped: err, msg: err.Error()}
if ee, ok := err.(*exec.ExitError); ok {
if len(ee.Stderr) > 0 {
nerr.msg = string(ee.Stderr)
eol := strings.Index(nerr.msg, "\n")
// The nft binary does not call setlocale() and so will return
// English error strings regardless of the locale.
enoent := strings.Index(nerr.msg, "No such file or directory")
eexist := strings.Index(nerr.msg, "File exists")
if enoent != -1 && (enoent < eol || eol == -1) {
nerr.errno = syscall.ENOENT
} else if eexist != -1 && (eexist < eol || eol == -1) {
nerr.errno = syscall.EEXIST
}
}
}
return nerr
}
// notFoundError returns an nftablesError with the given message for which IsNotFound will
// return true.
func notFoundError(format string, args ...interface{}) error {
return &nftablesError{msg: fmt.Sprintf(format, args...), errno: syscall.ENOENT}
}
// existsError returns an nftablesError with the given message for which IsAlreadyExists
// will return true.
func existsError(format string, args ...interface{}) error {
return &nftablesError{msg: fmt.Sprintf(format, args...), errno: syscall.EEXIST}
}
func (nerr *nftablesError) Error() string {
return nerr.msg
}
func (nerr *nftablesError) Unwrap() error {
return nerr.wrapped
}
// IsNotFound tests if err corresponds to an nftables "not found" error of any sort.
// (e.g., in response to a "delete rule" command, this might indicate that the rule
// doesn't exist, or the chain doesn't exist, or the table doesn't exist.)
func IsNotFound(err error) bool {
var nerr *nftablesError
if errors.As(err, &nerr) {
return nerr.errno == syscall.ENOENT
}
return false
}
// IsAlreadyExists tests if err corresponds to an nftables "already exists" error (e.g.
// when doing a "create" rather than an "add").
func IsAlreadyExists(err error) bool {
var nerr *nftablesError
if errors.As(err, &nerr) {
return nerr.errno == syscall.EEXIST
}
return false
}

48
vendor/github.com/danwinship/knftables/exec.go generated vendored Normal file
View File

@@ -0,0 +1,48 @@
/*
Copyright 2023 Red Hat, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package knftables
import (
"os/exec"
)
// execer is a mockable wrapper around os/exec.
type execer interface {
// LookPath wraps exec.LookPath
LookPath(file string) (string, error)
// Run runs cmd as with cmd.Output(). If an error occurs, and the process outputs
// stderr, then that output will be returned in the error.
Run(cmd *exec.Cmd) (string, error)
}
// realExec implements execer by actually using os/exec
type realExec struct{}
// LookPath is part of execer
func (_ realExec) LookPath(file string) (string, error) {
return exec.LookPath(file)
}
// Run is part of execer
func (_ realExec) Run(cmd *exec.Cmd) (string, error) {
out, err := cmd.Output()
if err != nil {
err = wrapError(err)
}
return string(out), err
}

495
vendor/github.com/danwinship/knftables/fake.go generated vendored Normal file
View File

@@ -0,0 +1,495 @@
/*
Copyright 2023 Red Hat, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package knftables
import (
"context"
"fmt"
"reflect"
"sort"
"strings"
)
// Fake is a fake implementation of Interface
type Fake struct {
nftContext
nextHandle int
// Table contains the Interface's table. This will be `nil` until you `tx.Add()`
// the table.
Table *FakeTable
}
// FakeTable wraps Table for the Fake implementation
type FakeTable struct {
Table
// Chains contains the table's chains, keyed by name
Chains map[string]*FakeChain
// Sets contains the table's sets, keyed by name
Sets map[string]*FakeSet
// Maps contains the table's maps, keyed by name
Maps map[string]*FakeMap
}
// FakeChain wraps Chain for the Fake implementation
type FakeChain struct {
Chain
// Rules contains the chain's rules, in order
Rules []*Rule
}
// FakeSet wraps Set for the Fake implementation
type FakeSet struct {
Set
// Elements contains the set's elements. You can also use the FakeSet's
// FindElement() method to see if a particular element is present.
Elements []*Element
}
// FakeMap wraps Set for the Fake implementation
type FakeMap struct {
Map
// Elements contains the map's elements. You can also use the FakeMap's
// FindElement() method to see if a particular element is present.
Elements []*Element
}
// NewFake creates a new fake Interface, for unit tests
func NewFake(family Family, table string) *Fake {
return &Fake{
nftContext: nftContext{
family: family,
table: table,
},
}
}
var _ Interface = &Fake{}
// List is part of Interface.
func (fake *Fake) List(ctx context.Context, objectType string) ([]string, error) {
if fake.Table == nil {
return nil, notFoundError("no such table %q", fake.table)
}
var result []string
switch objectType {
case "chain", "chains":
for name := range fake.Table.Chains {
result = append(result, name)
}
case "set", "sets":
for name := range fake.Table.Sets {
result = append(result, name)
}
case "map", "maps":
for name := range fake.Table.Maps {
result = append(result, name)
}
default:
return nil, fmt.Errorf("unsupported object type %q", objectType)
}
return result, nil
}
// ListRules is part of Interface
func (fake *Fake) ListRules(ctx context.Context, chain string) ([]*Rule, error) {
if fake.Table == nil {
return nil, notFoundError("no such chain %q", chain)
}
ch := fake.Table.Chains[chain]
if ch == nil {
return nil, notFoundError("no such chain %q", chain)
}
return ch.Rules, nil
}
// ListElements is part of Interface
func (fake *Fake) ListElements(ctx context.Context, objectType, name string) ([]*Element, error) {
if fake.Table == nil {
return nil, notFoundError("no such %s %q", objectType, name)
}
if objectType == "set" {
s := fake.Table.Sets[name]
if s != nil {
return s.Elements, nil
}
} else if objectType == "map" {
m := fake.Table.Maps[name]
if m != nil {
return m.Elements, nil
}
}
return nil, notFoundError("no such %s %q", objectType, name)
}
// NewTransaction is part of Interface
func (fake *Fake) NewTransaction() *Transaction {
return &Transaction{nftContext: &fake.nftContext}
}
// Run is part of Interface
func (fake *Fake) Run(ctx context.Context, tx *Transaction) error {
if tx.err != nil {
return tx.err
}
// FIXME: not actually transactional!
for _, op := range tx.operations {
// If the table hasn't been created, and this isn't a Table operation, then fail
if fake.Table == nil {
if _, ok := op.obj.(*Table); !ok {
return notFoundError("no such table \"%s %s\"", fake.family, fake.table)
}
}
if op.verb == addVerb || op.verb == createVerb || op.verb == insertVerb {
fake.nextHandle++
}
switch obj := op.obj.(type) {
case *Table:
err := checkExists(op.verb, "table", fake.table, fake.Table != nil)
if err != nil {
return err
}
switch op.verb {
case flushVerb:
fake.Table = nil
fallthrough
case addVerb, createVerb:
if fake.Table != nil {
continue
}
table := *obj
table.Handle = PtrTo(fake.nextHandle)
fake.Table = &FakeTable{
Table: table,
Chains: make(map[string]*FakeChain),
Sets: make(map[string]*FakeSet),
Maps: make(map[string]*FakeMap),
}
case deleteVerb:
fake.Table = nil
default:
return fmt.Errorf("unhandled operation %q", op.verb)
}
case *Chain:
existingChain := fake.Table.Chains[obj.Name]
err := checkExists(op.verb, "chain", obj.Name, existingChain != nil)
if err != nil {
return err
}
switch op.verb {
case addVerb, createVerb:
if existingChain != nil {
continue
}
chain := *obj
chain.Handle = PtrTo(fake.nextHandle)
fake.Table.Chains[obj.Name] = &FakeChain{
Chain: chain,
}
case flushVerb:
existingChain.Rules = nil
case deleteVerb:
// FIXME delete-by-handle
delete(fake.Table.Chains, obj.Name)
default:
return fmt.Errorf("unhandled operation %q", op.verb)
}
case *Rule:
existingChain := fake.Table.Chains[obj.Chain]
if existingChain == nil {
return notFoundError("no such chain %q", obj.Chain)
}
if op.verb == deleteVerb {
i := findRule(existingChain.Rules, *obj.Handle)
if i == -1 {
return notFoundError("no rule with handle %d", *obj.Handle)
}
existingChain.Rules = append(existingChain.Rules[:i], existingChain.Rules[i+1:]...)
continue
}
rule := *obj
refRule := -1
if rule.Handle != nil {
refRule = findRule(existingChain.Rules, *obj.Handle)
if refRule == -1 {
return notFoundError("no rule with handle %d", *obj.Handle)
}
} else if obj.Index != nil {
if *obj.Index >= len(existingChain.Rules) {
return notFoundError("no rule with index %d", *obj.Index)
}
refRule = *obj.Index
}
switch op.verb {
case addVerb:
if refRule == -1 {
existingChain.Rules = append(existingChain.Rules, &rule)
} else {
existingChain.Rules = append(existingChain.Rules[:refRule+1], append([]*Rule{&rule}, existingChain.Rules[refRule+1:]...)...)
}
rule.Handle = PtrTo(fake.nextHandle)
case insertVerb:
if refRule == -1 {
existingChain.Rules = append([]*Rule{&rule}, existingChain.Rules...)
} else {
existingChain.Rules = append(existingChain.Rules[:refRule], append([]*Rule{&rule}, existingChain.Rules[refRule:]...)...)
}
rule.Handle = PtrTo(fake.nextHandle)
case replaceVerb:
existingChain.Rules[refRule] = &rule
default:
return fmt.Errorf("unhandled operation %q", op.verb)
}
case *Set:
existingSet := fake.Table.Sets[obj.Name]
err := checkExists(op.verb, "set", obj.Name, existingSet != nil)
if err != nil {
return err
}
switch op.verb {
case addVerb, createVerb:
if existingSet != nil {
continue
}
set := *obj
set.Handle = PtrTo(fake.nextHandle)
fake.Table.Sets[obj.Name] = &FakeSet{
Set: set,
}
case flushVerb:
existingSet.Elements = nil
case deleteVerb:
// FIXME delete-by-handle
delete(fake.Table.Sets, obj.Name)
default:
return fmt.Errorf("unhandled operation %q", op.verb)
}
case *Map:
existingMap := fake.Table.Maps[obj.Name]
err := checkExists(op.verb, "map", obj.Name, existingMap != nil)
if err != nil {
return err
}
switch op.verb {
case addVerb:
if existingMap != nil {
continue
}
mapObj := *obj
mapObj.Handle = PtrTo(fake.nextHandle)
fake.Table.Maps[obj.Name] = &FakeMap{
Map: mapObj,
}
case flushVerb:
existingMap.Elements = nil
case deleteVerb:
// FIXME delete-by-handle
delete(fake.Table.Maps, obj.Name)
default:
return fmt.Errorf("unhandled operation %q", op.verb)
}
case *Element:
if len(obj.Value) == 0 {
existingSet := fake.Table.Sets[obj.Set]
if existingSet == nil {
return notFoundError("no such set %q", obj.Set)
}
switch op.verb {
case addVerb, createVerb:
element := *obj
if i := findElement(existingSet.Elements, element.Key); i != -1 {
if op.verb == createVerb {
return existsError("element %q already exists", strings.Join(element.Key, " . "))
}
existingSet.Elements[i] = &element
} else {
existingSet.Elements = append(existingSet.Elements, &element)
}
case deleteVerb:
element := *obj
if i := findElement(existingSet.Elements, element.Key); i != -1 {
existingSet.Elements = append(existingSet.Elements[:i], existingSet.Elements[i+1:]...)
} else {
return notFoundError("no such element %q", strings.Join(element.Key, " . "))
}
default:
return fmt.Errorf("unhandled operation %q", op.verb)
}
} else {
existingMap := fake.Table.Maps[obj.Map]
if existingMap == nil {
return notFoundError("no such map %q", obj.Map)
}
switch op.verb {
case addVerb, createVerb:
element := *obj
if i := findElement(existingMap.Elements, element.Key); i != -1 {
if op.verb == createVerb {
return existsError("element %q already exists", strings.Join(element.Key, ". "))
}
existingMap.Elements[i] = &element
} else {
existingMap.Elements = append(existingMap.Elements, &element)
}
case deleteVerb:
element := *obj
if i := findElement(existingMap.Elements, element.Key); i != -1 {
existingMap.Elements = append(existingMap.Elements[:i], existingMap.Elements[i+1:]...)
} else {
return notFoundError("no such element %q", strings.Join(element.Key, " . "))
}
default:
return fmt.Errorf("unhandled operation %q", op.verb)
}
}
default:
return fmt.Errorf("unhandled object type %T", op.obj)
}
}
return nil
}
func checkExists(verb verb, objectType, name string, exists bool) error {
switch verb {
case addVerb:
// It's fine if the object either exists or doesn't
return nil
case createVerb:
if exists {
return existsError("%s %q already exists", objectType, name)
}
default:
if !exists {
return notFoundError("no such %s %q", objectType, name)
}
}
return nil
}
// Dump dumps the current contents of fake, in a way that looks like an nft transaction,
// but not actually guaranteed to be usable as such. (e.g., chains may be referenced
// before they are created, etc)
func (fake *Fake) Dump() string {
if fake.Table == nil {
return ""
}
buf := &strings.Builder{}
table := fake.Table
table.writeOperation(addVerb, &fake.nftContext, buf)
for _, cname := range sortKeys(table.Chains) {
ch := table.Chains[cname]
ch.writeOperation(addVerb, &fake.nftContext, buf)
for _, rule := range ch.Rules {
// Avoid outputing handles
dumpRule := *rule
dumpRule.Handle = nil
dumpRule.Index = nil
dumpRule.writeOperation(addVerb, &fake.nftContext, buf)
}
}
for _, sname := range sortKeys(table.Sets) {
s := table.Sets[sname]
s.writeOperation(addVerb, &fake.nftContext, buf)
for _, element := range s.Elements {
element.writeOperation(addVerb, &fake.nftContext, buf)
}
}
for _, mname := range sortKeys(table.Maps) {
m := table.Maps[mname]
m.writeOperation(addVerb, &fake.nftContext, buf)
for _, element := range m.Elements {
element.writeOperation(addVerb, &fake.nftContext, buf)
}
}
return buf.String()
}
func sortKeys[K ~string, V any](m map[K]V) []K {
keys := make([]K, 0, len(m))
for key := range m {
keys = append(keys, key)
}
sort.Slice(keys, func(i, j int) bool { return keys[i] < keys[j] })
return keys
}
func findRule(rules []*Rule, handle int) int {
for i := range rules {
if rules[i].Handle != nil && *rules[i].Handle == handle {
return i
}
}
return -1
}
func findElement(elements []*Element, key []string) int {
for i := range elements {
if reflect.DeepEqual(elements[i].Key, key) {
return i
}
}
return -1
}
// FindElement finds an element of the set with the given key. If there is no matching
// element, it returns nil.
func (s *FakeSet) FindElement(key ...string) *Element {
index := findElement(s.Elements, key)
if index == -1 {
return nil
}
return s.Elements[index]
}
// FindElement finds an element of the map with the given key. If there is no matching
// element, it returns nil.
func (m *FakeMap) FindElement(key ...string) *Element {
index := findElement(m.Elements, key)
if index == -1 {
return nil
}
return m.Elements[index]
}

436
vendor/github.com/danwinship/knftables/nftables.go generated vendored Normal file
View File

@@ -0,0 +1,436 @@
/*
Copyright 2023 Red Hat, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package knftables
import (
"context"
"encoding/json"
"fmt"
"os/exec"
)
// Interface is an interface for running nftables commands against a given family and table.
type Interface interface {
// NewTransaction returns a new (empty) Transaction
NewTransaction() *Transaction
// Run runs a Transaction and returns the result. The IsNotFound and
// IsAlreadyExists methods can be used to test the result.
Run(ctx context.Context, tx *Transaction) error
// List returns a list of the names of the objects of objectType ("chain", "set",
// or "map") in the table. If there are no such objects, this will return an empty
// list and no error.
List(ctx context.Context, objectType string) ([]string, error)
// ListRules returns a list of the rules in a chain, in order. Note that at the
// present time, the Rule objects will have their `Comment` and `Handle` fields
// filled in, but *not* the actual `Rule` field. So this can only be used to find
// the handles of rules if they have unique comments to recognize them by, or if
// you know the order of the rules within the chain. If the chain exists but
// contains no rules, this will return an empty list and no error.
ListRules(ctx context.Context, chain string) ([]*Rule, error)
// ListElements returns a list of the elements in a set or map. (objectType should
// be "set" or "map".) If the set/map exists but contains no elements, this will
// return an empty list and no error.
ListElements(ctx context.Context, objectType, name string) ([]*Element, error)
}
type nftContext struct {
family Family
table string
// noObjectComments is true if comments on Table/Chain/Set/Map are not supported.
// (Comments on Rule and Element are always supported.)
noObjectComments bool
}
// realNFTables is an implementation of Interface
type realNFTables struct {
nftContext
exec execer
path string
}
// for unit tests
func newInternal(family Family, table string, execer execer) (Interface, error) {
var err error
nft := &realNFTables{
nftContext: nftContext{
family: family,
table: table,
},
exec: execer,
}
nft.path, err = nft.exec.LookPath("nft")
if err != nil {
return nil, fmt.Errorf("could not find nftables binary: %w", err)
}
cmd := exec.Command(nft.path, "--check", "add", "table", string(nft.family), nft.table)
_, err = nft.exec.Run(cmd)
if err != nil {
return nil, fmt.Errorf("could not run nftables command: %w", err)
}
cmd = exec.Command(nft.path, "--check", "add", "table", string(nft.family), nft.table,
"{", "comment", `"test"`, "}",
)
_, err = nft.exec.Run(cmd)
if err != nil {
nft.noObjectComments = true
}
return nft, nil
}
// New creates a new nftables.Interface for interacting with the given table. If nftables
// is not available/usable on the current host, it will return an error.
func New(family Family, table string) (Interface, error) {
return newInternal(family, table, realExec{})
}
// NewTransaction is part of Interface
func (nft *realNFTables) NewTransaction() *Transaction {
return &Transaction{nftContext: &nft.nftContext}
}
// Run is part of Interface
func (nft *realNFTables) Run(ctx context.Context, tx *Transaction) error {
if tx.err != nil {
return tx.err
}
buf, err := tx.asCommandBuf()
if err != nil {
return err
}
cmd := exec.CommandContext(ctx, nft.path, "-f", "-")
cmd.Stdin = buf
_, err = nft.exec.Run(cmd)
return err
}
// jsonVal looks up key in json; if it exists and is of type T, it returns (json[key], true).
// Otherwise it returns (_, false).
func jsonVal[T any](json map[string]interface{}, key string) (T, bool) {
if ifVal, exists := json[key]; exists {
tVal, ok := ifVal.(T)
return tVal, ok
} else {
var zero T
return zero, false
}
}
// getJSONObjects takes the output of "nft -j list", validates it, and returns an array
// of just the objects of objectType.
func getJSONObjects(listOutput, objectType string) ([]map[string]interface{}, error) {
// listOutput should contain JSON looking like:
//
// {
// "nftables": [
// {
// "metainfo": {
// "json_schema_version": 1,
// ...
// }
// },
// {
// "chain": {
// "family": "ip",
// "table": "kube-proxy",
// "name": "KUBE-SERVICES",
// "handle": 3
// }
// },
// {
// "chain": {
// "family": "ip",
// "table": "kube-proxy",
// "name": "KUBE-NODEPORTS",
// "handle": 4
// }
// },
// ...
// ]
// }
//
// In this case, given objectType "chain", we would return
//
// [
// {
// "family": "ip",
// "table": "kube-proxy",
// "name": "KUBE-SERVICES",
// "handle": 3
// },
// {
// "family": "ip",
// "table": "kube-proxy",
// "name": "KUBE-NODEPORTS",
// "handle": 4
// },
// ...
// ]
jsonResult := map[string][]map[string]map[string]interface{}{}
if err := json.Unmarshal([]byte(listOutput), &jsonResult); err != nil {
return nil, fmt.Errorf("could not parse nft output: %w", err)
}
nftablesResult := jsonResult["nftables"]
if nftablesResult == nil || len(nftablesResult) == 0 {
return nil, fmt.Errorf("could not find result in nft output %q", listOutput)
}
metainfo := nftablesResult[0]["metainfo"]
if metainfo == nil {
return nil, fmt.Errorf("could not find metadata in nft output %q", listOutput)
}
// json_schema_version is an integer but `json.Unmarshal()` will have parsed it as
// a float64 since we didn't tell it otherwise.
if version, ok := jsonVal[float64](metainfo, "json_schema_version"); !ok || version != 1.0 {
return nil, fmt.Errorf("could not find supported json_schema_version in nft output %q", listOutput)
}
var objects []map[string]interface{}
for _, objContainer := range nftablesResult {
obj := objContainer[objectType]
if obj != nil {
objects = append(objects, obj)
}
}
return objects, nil
}
// List is part of Interface.
func (nft *realNFTables) List(ctx context.Context, objectType string) ([]string, error) {
// All currently-existing nftables object types have plural forms that are just
// the singular form plus 's'.
var typeSingular, typePlural string
if objectType[len(objectType)-1] == 's' {
typeSingular = objectType[:len(objectType)-1]
typePlural = objectType
} else {
typeSingular = objectType
typePlural = objectType + "s"
}
cmd := exec.CommandContext(ctx, nft.path, "--json", "list", typePlural, string(nft.family))
out, err := nft.exec.Run(cmd)
if err != nil {
return nil, fmt.Errorf("failed to run nft: %w", err)
}
objects, err := getJSONObjects(out, typeSingular)
if err != nil {
return nil, err
}
var result []string
for _, obj := range objects {
objTable, _ := jsonVal[string](obj, "table")
if objTable != nft.table {
continue
}
if name, ok := jsonVal[string](obj, "name"); ok {
result = append(result, name)
}
}
return result, nil
}
// ListRules is part of Interface
func (nft *realNFTables) ListRules(ctx context.Context, chain string) ([]*Rule, error) {
cmd := exec.CommandContext(ctx, nft.path, "--json", "list", "chain", string(nft.family), nft.table, chain)
out, err := nft.exec.Run(cmd)
if err != nil {
return nil, fmt.Errorf("failed to run nft: %w", err)
}
jsonRules, err := getJSONObjects(out, "rule")
if err != nil {
return nil, fmt.Errorf("unable to parse JSON output: %w", err)
}
rules := make([]*Rule, 0, len(jsonRules))
for _, jsonRule := range jsonRules {
rule := &Rule{
Chain: chain,
}
// handle is written as an integer in nft's output, but json.Unmarshal
// will have parsed it as a float64. (Handles are uint64s, but they are
// assigned consecutively starting from 1, so as long as fewer than 2**53
// nftables objects have been created since boot time, we won't run into
// float64-vs-uint64 precision issues.)
if handle, ok := jsonVal[float64](jsonRule, "handle"); ok {
rule.Handle = PtrTo(int(handle))
}
if comment, ok := jsonVal[string](jsonRule, "comment"); ok {
rule.Comment = &comment
}
rules = append(rules, rule)
}
return rules, nil
}
// ListElements is part of Interface
func (nft *realNFTables) ListElements(ctx context.Context, objectType, name string) ([]*Element, error) {
cmd := exec.CommandContext(ctx, nft.path, "--json", "list", objectType, string(nft.family), nft.table, name)
out, err := nft.exec.Run(cmd)
if err != nil {
return nil, fmt.Errorf("failed to run nft: %w", err)
}
jsonSetsOrMaps, err := getJSONObjects(out, objectType)
if err != nil {
return nil, fmt.Errorf("unable to parse JSON output: %w", err)
}
if len(jsonSetsOrMaps) != 1 {
return nil, fmt.Errorf("unexpected JSON output from nft (multiple results)")
}
jsonElements, _ := jsonVal[[]interface{}](jsonSetsOrMaps[0], "elem")
elements := make([]*Element, 0, len(jsonElements))
for _, jsonElement := range jsonElements {
var key, value interface{}
elem := &Element{}
if objectType == "set" {
elem.Set = name
key = jsonElement
} else {
elem.Map = name
tuple, ok := jsonElement.([]interface{})
if !ok || len(tuple) != 2 {
return nil, fmt.Errorf("unexpected JSON output from nft (elem is not [key,val]: %q)", jsonElement)
}
key, value = tuple[0], tuple[1]
}
// If the element has a comment, then key will be a compound object like:
//
// {
// "elem": {
// "val": "192.168.0.1",
// "comment": "this is a comment"
// }
// }
//
// (Where "val" contains the value that key would have held if there was no
// comment.)
if obj, ok := key.(map[string]interface{}); ok {
if compoundElem, ok := jsonVal[map[string]interface{}](obj, "elem"); ok {
if key, ok = jsonVal[interface{}](compoundElem, "val"); !ok {
return nil, fmt.Errorf("unexpected JSON output from nft (elem with no val: %q)", jsonElement)
}
if comment, ok := jsonVal[string](compoundElem, "comment"); ok {
elem.Comment = &comment
}
}
}
elem.Key, err = parseElementValue(key)
if err != nil {
return nil, err
}
if value != nil {
elem.Value, err = parseElementValue(value)
if err != nil {
return nil, err
}
}
elements = append(elements, elem)
}
return elements, nil
}
// parseElementValue parses a JSON element key/value, handling concatenations, and
// converting numeric or "verdict" values to strings.
func parseElementValue(json interface{}) ([]string, error) {
// json can be:
//
// - a single string, e.g. "192.168.1.3"
//
// - a single number, e.g. 80
//
// - a concatenation, expressed as an object containing an array of simple
// values:
// {
// "concat": [
// "192.168.1.3",
// "tcp",
// 80
// ]
// }
//
// - a verdict (for a vmap value), expressed as an object:
// {
// "drop": null
// }
//
// {
// "goto": {
// "target": "destchain"
// }
// }
switch val := json.(type) {
case string:
return []string{val}, nil
case float64:
return []string{fmt.Sprintf("%d", int(val))}, nil
case map[string]interface{}:
if concat, _ := jsonVal[[]interface{}](val, "concat"); concat != nil {
vals := make([]string, len(concat))
for i := range concat {
if str, ok := concat[i].(string); ok {
vals[i] = str
} else if num, ok := concat[i].(float64); ok {
vals[i] = fmt.Sprintf("%d", int(num))
} else {
return nil, fmt.Errorf("could not parse element value %q", concat[i])
}
}
return vals, nil
} else if len(val) == 1 {
var verdict string
// We just checked that len(val) == 1, so this loop body will only
// run once
for k, v := range val {
if v == nil {
verdict = k
} else if target, ok := v.(map[string]interface{}); ok {
verdict = fmt.Sprintf("%s %s", k, target["target"])
}
}
return []string{verdict}, nil
}
}
return nil, fmt.Errorf("could not parse element value %q", json)
}

377
vendor/github.com/danwinship/knftables/objects.go generated vendored Normal file
View File

@@ -0,0 +1,377 @@
/*
Copyright 2023 Red Hat, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package knftables
import (
"fmt"
"io"
"strings"
)
// Object implementation for Table
func (table *Table) validate(verb verb) error {
switch verb {
case addVerb, createVerb, flushVerb:
if table.Handle != nil {
return fmt.Errorf("cannot specify Handle in %s operation", verb)
}
case deleteVerb:
// Handle can be nil or non-nil
default:
return fmt.Errorf("%s is not implemented for tables", verb)
}
return nil
}
func (table *Table) writeOperation(verb verb, ctx *nftContext, writer io.Writer) {
// Special case for delete-by-handle
if verb == deleteVerb && table.Handle != nil {
fmt.Fprintf(writer, "delete table %s handle %d", ctx.family, *table.Handle)
return
}
// All other cases refer to the table by name
fmt.Fprintf(writer, "%s table %s %s", verb, ctx.family, ctx.table)
if verb == addVerb || verb == createVerb {
if table.Comment != nil && !ctx.noObjectComments {
fmt.Fprintf(writer, " { comment %q ; }", *table.Comment)
}
}
fmt.Fprintf(writer, "\n")
}
// Object implementation for Chain
func (chain *Chain) validate(verb verb) error {
if chain.Hook == nil && (chain.Type != nil || chain.Priority != nil) {
return fmt.Errorf("regular chain %q must not specify Type or Priority", chain.Name)
} else if chain.Hook != nil && (chain.Type == nil || chain.Priority == nil) {
return fmt.Errorf("base chain %q must specify Type and Priority", chain.Name)
}
switch verb {
case addVerb, createVerb, flushVerb:
if chain.Name == "" {
return fmt.Errorf("no name specified for chain")
}
if chain.Handle != nil {
return fmt.Errorf("cannot specify Handle in %s operation", verb)
}
case deleteVerb:
if chain.Name == "" && chain.Handle == nil {
return fmt.Errorf("must specify either name or handle")
}
default:
return fmt.Errorf("%s is not implemented for chains", verb)
}
return nil
}
func (chain *Chain) writeOperation(verb verb, ctx *nftContext, writer io.Writer) {
// Special case for delete-by-handle
if verb == deleteVerb && chain.Handle != nil {
fmt.Fprintf(writer, "delete chain %s %s handle %d", ctx.family, ctx.table, *chain.Handle)
return
}
fmt.Fprintf(writer, "%s chain %s %s %s", verb, ctx.family, ctx.table, chain.Name)
if verb == addVerb || verb == createVerb {
if chain.Type != nil || (chain.Comment != nil && !ctx.noObjectComments) {
fmt.Fprintf(writer, " {")
if chain.Type != nil {
// Parse the priority to a number if we can, because older
// versions of nft don't accept certain named priorities
// in all contexts (eg, "dstnat" priority in the "output"
// hook).
if priority, err := ParsePriority(ctx.family, string(*chain.Priority)); err == nil {
fmt.Fprintf(writer, " type %s hook %s priority %d ;", *chain.Type, *chain.Hook, priority)
} else {
fmt.Fprintf(writer, " type %s hook %s priority %s ;", *chain.Type, *chain.Hook, *chain.Priority)
}
}
if chain.Comment != nil && !ctx.noObjectComments {
fmt.Fprintf(writer, " comment %q ;", *chain.Comment)
}
fmt.Fprintf(writer, " }")
}
}
fmt.Fprintf(writer, "\n")
}
// Object implementation for Rule
func (rule *Rule) validate(verb verb) error {
if rule.Chain == "" {
return fmt.Errorf("no chain name specified for rule")
}
if rule.Index != nil && rule.Handle != nil {
return fmt.Errorf("cannot specify both Index and Handle")
}
switch verb {
case addVerb, insertVerb:
if rule.Rule == "" {
return fmt.Errorf("no rule specified")
}
case replaceVerb:
if rule.Rule == "" {
return fmt.Errorf("no rule specified")
}
if rule.Handle == nil {
return fmt.Errorf("must specify Handle with %s", verb)
}
case deleteVerb:
if rule.Handle == nil {
return fmt.Errorf("must specify Handle with %s", verb)
}
default:
return fmt.Errorf("%s is not implemented for rules", verb)
}
return nil
}
func (rule *Rule) writeOperation(verb verb, ctx *nftContext, writer io.Writer) {
fmt.Fprintf(writer, "%s rule %s %s %s", verb, ctx.family, ctx.table, rule.Chain)
if rule.Index != nil {
fmt.Fprintf(writer, " index %d", *rule.Index)
} else if rule.Handle != nil {
fmt.Fprintf(writer, " handle %d", *rule.Handle)
}
switch verb {
case addVerb, insertVerb, replaceVerb:
fmt.Fprintf(writer, " %s", rule.Rule)
if rule.Comment != nil {
fmt.Fprintf(writer, " comment %q", *rule.Comment)
}
}
fmt.Fprintf(writer, "\n")
}
// Object implementation for Set
func (set *Set) validate(verb verb) error {
switch verb {
case addVerb, createVerb:
if (set.Type == "" && set.TypeOf == "") || (set.Type != "" && set.TypeOf != "") {
return fmt.Errorf("set must specify either Type or TypeOf")
}
if set.Handle != nil {
return fmt.Errorf("cannot specify Handle in %s operation", verb)
}
fallthrough
case flushVerb:
if set.Name == "" {
return fmt.Errorf("no name specified for set")
}
case deleteVerb:
if set.Name == "" && set.Handle == nil {
return fmt.Errorf("must specify either name or handle")
}
default:
return fmt.Errorf("%s is not implemented for sets", verb)
}
return nil
}
func (set *Set) writeOperation(verb verb, ctx *nftContext, writer io.Writer) {
// Special case for delete-by-handle
if verb == deleteVerb && set.Handle != nil {
fmt.Fprintf(writer, "delete set %s %s handle %d", ctx.family, ctx.table, *set.Handle)
return
}
fmt.Fprintf(writer, "%s set %s %s %s", verb, ctx.family, ctx.table, set.Name)
if verb == addVerb || verb == createVerb {
fmt.Fprintf(writer, " {")
if set.Type != "" {
fmt.Fprintf(writer, " type %s ;", set.Type)
} else {
fmt.Fprintf(writer, " typeof %s ;", set.TypeOf)
}
if len(set.Flags) != 0 {
fmt.Fprintf(writer, " flags ")
for i := range set.Flags {
if i > 0 {
fmt.Fprintf(writer, ",")
}
fmt.Fprintf(writer, "%s", set.Flags[i])
}
fmt.Fprintf(writer, " ;")
}
if set.Timeout != nil {
fmt.Fprintf(writer, " timeout %ds ;", int64(set.Timeout.Seconds()))
}
if set.GCInterval != nil {
fmt.Fprintf(writer, " gc-interval %ds ;", int64(set.GCInterval.Seconds()))
}
if set.Size != nil {
fmt.Fprintf(writer, " size %d ;", *set.Size)
}
if set.Policy != nil {
fmt.Fprintf(writer, " policy %s ;", *set.Policy)
}
if set.AutoMerge != nil && *set.AutoMerge {
fmt.Fprintf(writer, " auto-merge ;")
}
if set.Comment != nil && !ctx.noObjectComments {
fmt.Fprintf(writer, " comment %q ;", *set.Comment)
}
fmt.Fprintf(writer, " }")
}
fmt.Fprintf(writer, "\n")
}
// Object implementation for Map
func (mapObj *Map) validate(verb verb) error {
switch verb {
case addVerb, createVerb:
if (mapObj.Type == "" && mapObj.TypeOf == "") || (mapObj.Type != "" && mapObj.TypeOf != "") {
return fmt.Errorf("map must specify either Type or TypeOf")
}
if mapObj.Handle != nil {
return fmt.Errorf("cannot specify Handle in %s operation", verb)
}
fallthrough
case flushVerb:
if mapObj.Name == "" {
return fmt.Errorf("no name specified for map")
}
case deleteVerb:
if mapObj.Name == "" && mapObj.Handle == nil {
return fmt.Errorf("must specify either name or handle")
}
default:
return fmt.Errorf("%s is not implemented for maps", verb)
}
return nil
}
func (mapObj *Map) writeOperation(verb verb, ctx *nftContext, writer io.Writer) {
// Special case for delete-by-handle
if verb == deleteVerb && mapObj.Handle != nil {
fmt.Fprintf(writer, "delete map %s %s handle %d", ctx.family, ctx.table, *mapObj.Handle)
return
}
fmt.Fprintf(writer, "%s map %s %s %s", verb, ctx.family, ctx.table, mapObj.Name)
if verb == addVerb || verb == createVerb {
fmt.Fprintf(writer, " {")
if mapObj.Type != "" {
fmt.Fprintf(writer, " type %s ;", mapObj.Type)
} else {
fmt.Fprintf(writer, " typeof %s ;", mapObj.TypeOf)
}
if len(mapObj.Flags) != 0 {
fmt.Fprintf(writer, " flags ")
for i := range mapObj.Flags {
if i > 0 {
fmt.Fprintf(writer, ",")
}
fmt.Fprintf(writer, "%s", mapObj.Flags[i])
}
fmt.Fprintf(writer, " ;")
}
if mapObj.Timeout != nil {
fmt.Fprintf(writer, " timeout %ds ;", int64(mapObj.Timeout.Seconds()))
}
if mapObj.GCInterval != nil {
fmt.Fprintf(writer, " gc-interval %ds ;", int64(mapObj.GCInterval.Seconds()))
}
if mapObj.Size != nil {
fmt.Fprintf(writer, " size %d ;", *mapObj.Size)
}
if mapObj.Policy != nil {
fmt.Fprintf(writer, " policy %s ;", *mapObj.Policy)
}
if mapObj.Comment != nil && !ctx.noObjectComments {
fmt.Fprintf(writer, " comment %q ;", *mapObj.Comment)
}
fmt.Fprintf(writer, " }")
}
fmt.Fprintf(writer, "\n")
}
// Object implementation for Element
func (element *Element) validate(verb verb) error {
if element.Map == "" && element.Set == "" {
return fmt.Errorf("no set/map name specified for element")
} else if element.Set != "" && element.Map != "" {
return fmt.Errorf("element specifies both a set name and a map name")
}
if len(element.Key) == 0 {
return fmt.Errorf("no key specified for element")
}
if element.Set != "" && len(element.Value) != 0 {
return fmt.Errorf("map value specified for set element")
}
switch verb {
case addVerb, createVerb:
if element.Map != "" && len(element.Value) == 0 {
return fmt.Errorf("no map value specified for map element")
}
case deleteVerb:
default:
return fmt.Errorf("%s is not implemented for elements", verb)
}
return nil
}
func (element *Element) writeOperation(verb verb, ctx *nftContext, writer io.Writer) {
name := element.Set
if name == "" {
name = element.Map
}
fmt.Fprintf(writer, "%s element %s %s %s { %s", verb, ctx.family, ctx.table, name,
strings.Join(element.Key, " . "))
if verb == addVerb || verb == createVerb {
if element.Comment != nil {
fmt.Fprintf(writer, " comment %q", *element.Comment)
}
if len(element.Value) != 0 {
fmt.Fprintf(writer, " : %s", strings.Join(element.Value, " . "))
}
}
fmt.Fprintf(writer, " }\n")
}

138
vendor/github.com/danwinship/knftables/transaction.go generated vendored Normal file
View File

@@ -0,0 +1,138 @@
/*
Copyright 2023 Red Hat, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package knftables
import (
"bytes"
"fmt"
"io"
)
// Transaction represents an nftables transaction
type Transaction struct {
*nftContext
operations []operation
err error
}
// operation contains a single nftables operation (eg "add table", "flush chain")
type operation struct {
verb verb
obj Object
}
// verb is used internally to represent the different "nft" verbs
type verb string
const (
addVerb verb = "add"
createVerb verb = "create"
insertVerb verb = "insert"
replaceVerb verb = "replace"
deleteVerb verb = "delete"
flushVerb verb = "flush"
)
// asCommandBuf returns the transaction as an io.Reader that outputs a series of nft commands
func (tx *Transaction) asCommandBuf() (io.Reader, error) {
if tx.err != nil {
return nil, tx.err
}
buf := &bytes.Buffer{}
for _, op := range tx.operations {
op.obj.writeOperation(op.verb, tx.nftContext, buf)
}
return buf, nil
}
// String returns the transaction as a string containing the nft commands; if there is
// a pending error, it will be output as a comment at the end of the transaction.
func (tx *Transaction) String() string {
buf := &bytes.Buffer{}
for _, op := range tx.operations {
op.obj.writeOperation(op.verb, tx.nftContext, buf)
}
if tx.err != nil {
fmt.Fprintf(buf, "# ERROR: %v", tx.err)
}
return buf.String()
}
func (tx *Transaction) operation(verb verb, obj Object) {
if tx.err != nil {
return
}
if tx.err = obj.validate(verb); tx.err != nil {
return
}
tx.operations = append(tx.operations, operation{verb: verb, obj: obj})
}
// Add adds an "nft add" operation to tx, ensuring that obj exists by creating it if it
// did not already exist. (If obj is a Rule, it will be appended to the end of its chain,
// or else added after the Rule indicated by this rule's Index or Handle.) The Add() call
// always succeeds, but if obj is invalid, or inconsistent with the existing nftables
// state, then an error will be returned when the transaction is Run.
func (tx *Transaction) Add(obj Object) {
tx.operation(addVerb, obj)
}
// Create adds an "nft create" operation to tx, creating obj, which must not already
// exist. (If obj is a Rule, it will be appended to the end of its chain, or else added
// after the Rule indicated by this rule's Index or Handle.) The Create() call always
// succeeds, but if obj is invalid, already exists, or is inconsistent with the existing
// nftables state, then an error will be returned when the transaction is Run.
func (tx *Transaction) Create(obj Object) {
tx.operation(createVerb, obj)
}
// Insert adds an "nft insert" operation to tx, inserting obj (which must be a Rule) at
// the start of its chain, or before the other Rule indicated by this rule's Index or
// Handle. The Insert() call always succeeds, but if obj is invalid or is inconsistent
// with the existing nftables state, then an error will be returned when the transaction
// is Run.
func (tx *Transaction) Insert(obj Object) {
tx.operation(insertVerb, obj)
}
// Replace adds an "nft replace" operation to tx, replacing an existing rule with obj
// (which must be a Rule). The Replace() call always succeeds, but if obj is invalid, does
// not contain the Handle of an existing rule, or is inconsistent with the existing
// nftables state, then an error will be returned when the transaction is Run.
func (tx *Transaction) Replace(obj Object) {
tx.operation(replaceVerb, obj)
}
// Flush adds an "nft flush" operation to tx, clearing the contents of obj. The Flush()
// call always succeeds, but if obj does not exist (or does not support flushing) then an
// error will be returned when the transaction is Run.
func (tx *Transaction) Flush(obj Object) {
tx.operation(flushVerb, obj)
}
// Delete adds an "nft delete" operation to tx, deleting obj. The Delete() call always
// succeeds, but if obj does not exist or cannot be deleted based on the information
// provided (eg, Handle is required but not set) then an error will be returned when the
// transaction is Run.
func (tx *Transaction) Delete(obj Object) {
tx.operation(deleteVerb, obj)
}

338
vendor/github.com/danwinship/knftables/types.go generated vendored Normal file
View File

@@ -0,0 +1,338 @@
/*
Copyright 2023 Red Hat, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package knftables
import (
"io"
"time"
)
const (
// Maximum length of a table, chain, set, etc, name
NameLengthMax = 256
// Maximum length of a comment
CommentLengthMax = 128
)
// Object is the interface for an nftables object. All of the concrete object types
// implement this interface.
type Object interface {
// validate validates an object for an operation
validate(verb verb) error
// writeOperation writes out an "nft" operation involving the object. It assumes
// that the object has been validated.
writeOperation(verb verb, ctx *nftContext, writer io.Writer)
}
// Family is an nftables family
type Family string
const (
// IPv4Family represents the "ip" nftables family, for IPv4 rules.
IPv4Family Family = "ip"
// IPv6Family represents the "ip6" nftables family, for IPv6 rules.
IPv6Family Family = "ip6"
// InetFamily represents the "inet" nftables family, for mixed IPv4 and IPv6 rules.
InetFamily Family = "inet"
// ARPFamily represents the "arp" nftables family, for ARP rules.
ARPFamily Family = "arp"
// BridgeFamily represents the "bridge" nftables family, for rules operating
// on packets traversing a bridge.
BridgeFamily Family = "bridge"
// NetDevFamily represents the "netdev" nftables family, for rules operating on
// the device ingress/egress path.
NetDevFamily Family = "netdev"
)
// Table represents an nftables table.
type Table struct {
// Comment is an optional comment for the table. (Note that this can be specified
// on creation, but depending on the version of /sbin/nft that is available, it
// may not be filled in correctly in the result of a List.)
Comment *string
// Handle is an identifier that can be used to uniquely identify an object when
// deleting it. When adding a new object, this must be nil.
Handle *int
}
// BaseChainType represents the "type" of a "base chain" (ie, a chain that is attached to a hook)
type BaseChainType string
const (
// FilterType is the chain type for basic packet filtering.
FilterType BaseChainType = "filter"
// NATType is the chain type for doing DNAT, SNAT, and masquerading.
// NAT operations are only available from certain hooks.
NATType BaseChainType = "nat"
// RouteType is the chain type for rules that change the routing of packets.
// Chains of this type can only be added to the "output" hook.
RouteType BaseChainType = "route"
)
// BaseChainHook represents the "hook" that a base chain is attached to
type BaseChainHook string
// FIXME: document these correctly; virtually all of the existing iptables/nftables
// documentation is slightly wrong, particular wrt locally-generated packets.
const (
PreroutingHook BaseChainHook = "prerouting"
InputHook BaseChainHook = "input"
ForwardHook BaseChainHook = "forward"
OutputHook BaseChainHook = "output"
PostroutingHook BaseChainHook = "postrouting"
IngressHook BaseChainHook = "ingress"
EgressHook BaseChainHook = "egress"
)
// BaseChainPriority represents the "priority" of a base chain. In addition to the const
// values, you can also use a signed integer value, or an arithmetic expression consisting
// of a const value followed by "+" or "-" and an integer. Lower values run earlier.
type BaseChainPriority string
const (
// RawPriority is the earliest named priority. In particular, it can be used for
// rules that need to run before conntrack. It is equivalent to the value -300 and
// can be used in the ip, ip6, and inet families.
RawPriority BaseChainPriority = "raw"
// ManglePriority is the standard priority for packet-rewriting operations. It is
// equivalent to the value -150 and can be used in the ip, ip6, and inet families.
ManglePriority BaseChainPriority = "mangle"
// DNATPriority is the standard priority for DNAT operations. In the ip, ip6, and
// inet families, it is equivalent to the value -100. In the bridge family it is
// equivalent to the value -300. In both cases it can only be used from the
// prerouting hook.
DNATPriority BaseChainPriority = "dstnat"
// FilterPriority is the standard priority for filtering operations. In the ip,
// ip6, inet, arp, and netdev families, it is equivalent to the value 0. In the
// bridge family it is equivalent to the value -200.
FilterPriority BaseChainPriority = "filter"
// OutPriority is FIXME. It is equivalent to the value 300 and can only be used in
// the bridge family.
OutPriority BaseChainPriority = "out"
// SecurityPriority is the standard priority for security operations ("where
// secmark can be set for example"). It is equivalent to the value 50 and can be
// used in the ip, ip6, and inet families.
SecurityPriority BaseChainPriority = "security"
// SNATPriority is the standard priority for SNAT operations. In the ip, ip6, and
// inet families, it is equivalent to the value 100. In the bridge family it is
// equivalent to the value 300. In both cases it can only be used from the
// postrouting hook.
SNATPriority BaseChainPriority = "srcnat"
)
// Chain represents an nftables chain; either a "base chain" (if Type, Hook, and Priority
// are specified), or a "regular chain" (if they are not).
type Chain struct {
// Name is the name of the chain.
Name string
// Type is the chain type; this must be set for a base chain and unset for a
// regular chain.
Type *BaseChainType
// Hook is the hook that the chain is connected to; this must be set for a base
// chain and unset for a regular chain.
Hook *BaseChainHook
// Priority is the chain priority; this must be set for a base chain and unset for
// a regular chain. You can call ParsePriority() to convert this to a number.
Priority *BaseChainPriority
// Comment is an optional comment for the object.
Comment *string
// Handle is an identifier that can be used to uniquely identify an object when
// deleting it. When adding a new object, this must be nil
Handle *int
}
// Rule represents a rule in a chain
type Rule struct {
// Chain is the name of the chain that contains this rule
Chain string
// Rule is the rule in standard nftables syntax. (Should be empty on Delete, but
// is ignored if not.) Note that this does not include any rule comment, which is
// separate from the rule itself.
Rule string
// Comment is an optional comment for the rule.
Comment *string
// Index is the number of a rule (counting from 0) to Add this Rule after or
// Insert it before. Cannot be specified along with Handle. If neither Index
// nor Handle is specified then Add appends the rule the end of the chain and
// Insert prepends it to the beginning.
Index *int
// Handle is a rule handle. In Add or Insert, if set, this is the handle of
// existing rule to put the new rule after/before. In Delete or Replace, this
// indicates the existing rule to delete/replace, and is mandatory. In the result
// of a List, this will indicate the rule's handle that can then be used in a
// later operation.
Handle *int
}
// SetFlag represents a set or map flag
type SetFlag string
const (
// ConstantFlag is a flag indicating that the set/map is constant. FIXME UNDOCUMENTED
ConstantFlag SetFlag = "constant"
// DynamicFlag is a flag indicating that the set contains stateful objects
// (counters, quotas, or limits) that will be dynamically updated.
DynamicFlag SetFlag = "dynamic"
// IntervalFlag is a flag indicating that the set contains either CIDR elements or
// IP ranges.
IntervalFlag SetFlag = "interval"
// TimeoutFlag is a flag indicating that the set/map has a timeout after which
// dynamically added elements will be removed. (It is set automatically if the
// set/map has a Timeout.)
TimeoutFlag SetFlag = "timeout"
)
// SetPolicy represents a set or map storage policy
type SetPolicy string
const (
// PolicyPerformance FIXME
PerformancePolicy SetPolicy = "performance"
// PolicyMemory FIXME
MemoryPolicy SetPolicy = "memory"
)
// Set represents the definition of an nftables set (but not its elements)
type Set struct {
// Name is the name of the set.
Name string
// Type is the type of the set key (eg "ipv4_addr"). Either Type or TypeOf, but
// not both, must be non-empty.
Type string
// TypeOf is the type of the set key as an nftables expression (eg "ip saddr").
// Either Type or TypeOf, but not both, must be non-empty.
TypeOf string
// Flags are the set flags
Flags []SetFlag
// Timeout is the time that an element will stay in the set before being removed.
// (Optional; mandatory for sets that will be added to from the packet path)
Timeout *time.Duration
// GCInterval is the interval at which timed-out elements will be removed from the
// set. (Optional; FIXME DEFAULT)
GCInterval *time.Duration
// Size if the maximum numer of elements in the set.
// (Optional; mandatory for sets that will be added to from the packet path)
Size *uint64
// Policy is the FIXME
Policy *SetPolicy
// AutoMerge indicates that adjacent/overlapping set elements should be merged
// together (only for interval sets)
AutoMerge *bool
// Comment is an optional comment for the object.
Comment *string
// Handle is an identifier that can be used to uniquely identify an object when
// deleting it. When adding a new object, this must be nil
Handle *int
}
// Map represents the definition of an nftables map (but not its elements)
type Map struct {
// Name is the name of the map.
Name string
// Type is the type of the map key and value (eg "ipv4_addr : verdict"). Either
// Type or TypeOf, but not both, must be non-empty.
Type string
// TypeOf is the type of the set key as an nftables expression (eg "ip saddr : verdict").
// Either Type or TypeOf, but not both, must be non-empty.
TypeOf string
// Flags are the map flags
Flags []SetFlag
// Timeout is the time that an element will stay in the set before being removed.
// (Optional; mandatory for sets that will be added to from the packet path)
Timeout *time.Duration
// GCInterval is the interval at which timed-out elements will be removed from the
// set. (Optional; FIXME DEFAULT)
GCInterval *time.Duration
// Size if the maximum numer of elements in the set.
// (Optional; mandatory for sets that will be added to from the packet path)
Size *uint64
// Policy is the FIXME
Policy *SetPolicy
// Comment is an optional comment for the object.
Comment *string
// Handle is an identifier that can be used to uniquely identify an object when
// deleting it. When adding a new object, this must be nil
Handle *int
}
// Element represents a set or map element
type Element struct {
// Set is the name of the set that contains this element (or the empty string if
// this is a map element.)
Set string
// Map is the name of the map that contains this element (or the empty string if
// this is a set element.)
Map string
// Key is the element key. (The list contains a single element for "simple" keys,
// or multiple elements for concatenations.)
Key []string
// Value is the map element value. As with Key, this may be a single value or
// multiple. For set elements, this must be nil.
Value []string
// Comment is an optional comment for the element
Comment *string
}

117
vendor/github.com/danwinship/knftables/util.go generated vendored Normal file
View File

@@ -0,0 +1,117 @@
/*
Copyright 2023 Red Hat, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package knftables
import (
"fmt"
"strconv"
"strings"
)
// PtrTo can be used to fill in optional field values in objects
func PtrTo[T any](val T) *T {
return &val
}
var numericPriorities = map[string]int{
"raw": -300,
"mangle": -150,
"dstnat": -100,
"filter": 0,
"security": 50,
"srcnat": 100,
}
var bridgeNumericPriorities = map[string]int{
"dstnat": -300,
"filter": -200,
"out": 100,
"srcnat": 300,
}
// ParsePriority tries to convert the string form of a chain priority into a number
func ParsePriority(family Family, priority string) (int, error) {
val, err := strconv.Atoi(priority)
if err == nil {
return val, nil
}
modVal := 0
if i := strings.IndexAny(priority, "+-"); i != -1 {
mod := priority[i:]
modVal, err = strconv.Atoi(mod)
if err != nil {
return 0, fmt.Errorf("could not parse modifier %q: %w", mod, err)
}
priority = priority[:i]
}
var found bool
if family == BridgeFamily {
val, found = bridgeNumericPriorities[priority]
} else {
val, found = numericPriorities[priority]
}
if !found {
return 0, fmt.Errorf("unknown priority %q", priority)
}
return val + modVal, nil
}
// Concat is a helper (primarily) for constructing Rule objects. It takes a series of
// arguments and concatenates them together into a single string with spaces between the
// arguments. Strings are output as-is, string arrays are output element by element,
// numbers are output as with `fmt.Sprintf("%d")`, and all other types are output as with
// `fmt.Sprintf("%s")`. To help with set/map lookup syntax, an argument of "@" will not
// be followed by a space, so you can do, eg, `Concat("ip saddr", "@", setName)`.
func Concat(args ...interface{}) string {
b := &strings.Builder{}
var needSpace, wroteAt bool
for _, arg := range args {
switch x := arg.(type) {
case string:
if needSpace {
b.WriteByte(' ')
}
b.WriteString(x)
wroteAt = (x == "@")
case []string:
for _, s := range x {
if needSpace {
b.WriteByte(' ')
}
b.WriteString(s)
wroteAt = (s == "@")
needSpace = b.Len() > 0 && !wroteAt
}
case int, uint, int16, uint16, int32, uint32, int64, uint64:
if needSpace {
b.WriteByte(' ')
}
fmt.Fprintf(b, "%d", x)
default:
if needSpace {
b.WriteByte(' ')
}
fmt.Fprintf(b, "%s", x)
}
needSpace = b.Len() > 0 && !wroteAt
}
return b.String()
}

3
vendor/modules.txt vendored
View File

@@ -179,6 +179,9 @@ github.com/cpuguy83/go-md2man/v2/md2man
# github.com/cyphar/filepath-securejoin v0.2.4
## explicit; go 1.13
github.com/cyphar/filepath-securejoin
# github.com/danwinship/knftables v0.0.13
## explicit; go 1.20
github.com/danwinship/knftables
# github.com/davecgh/go-spew v1.1.1
## explicit
github.com/davecgh/go-spew/spew