(*) Fix cleanup of NodePort resources. (*) Fix the logic to select existing policies
Fix review comment
Fix Bazel
Update GoDep License
Fix NodePort forwarding to target port
Fix Darwin Build break. +1
Implement IsCompatible to validate kernel support for kernel mode
The following changes are proposed for the iptables proxier:
* There are three places where a string specifying IP:port is parsed
using something like this:
if index := strings.Index(e.endpoint, ":"); index != -1 {
This will fail for IPv6 since V6 addresses contain colons. Also,
the V6 address is expected to be surrounded by square brackets
(i.e. []:). Fix this by replacing call to Index with
call to LastIndex() and stripping out square brackets.
* The String() method for the localPort struct should put square brackets
around IPv6 addresses.
* The logging in the merge() method for proxyServiceMap should put brackets
around IPv6 addresses.
* There are several places where filterRules destination is hardcoded to
/32. This should be a /128 for IPv6 case.
* Add IPv6 unit test cases
fixes#48550
Windows Kernel now exposes "Internal Load Balancing"
using VFP (Virtual Filtering Platform) part of Virtual Switch. An inbuild
windows service HNS (Host Networking Service) acts as interface to program
the VFP. VFP is synonymous to iptables in functionality. HNS uses json based
data as input.
With the help of the interface available in github.com/Microsoft/hcsshim,
these APIs are exposed to the world in github to program HNS and use
the feature.
*** More info about the changes in this PR ***
(1) For every endpoint available in the system, an HNS Endpoint is added
(1.a) for local endpoints, a local HNS Endpoint would already exist, as part of
container creation.
(1.b) For all remote endpoints, a remote HNS Endpoint is created via HNS
(2) For every Service, a HNS ILB LoadBalancer is added referring the endpoints
created in (1)
Sample Input to HNS:
{
"Policies": [
{
"ExternalPort": 80,
"InternalPort": 80,
"Protocol": 6,
"Type": "ELB",
"VIPs": [
"11.0.98.129"
]
}
],
"References": [
"/endpoints/ca8b877b-ab90-499a-bc0e-7d736c425632",
"/endpoints/ee0ef08b-8434-4f8b-b748-393884e77465"
]
}
(2-a) This is done for Cluster IP, LoadBalancer Ingress IP, NodePort, External IP
Following the regular service and endpoint updates,
the HNS is notified of the updates and the system is kept in sync.
This change causes kube-proxy to supply the required "-f ipv6"
family flag whenever the conntrack utility is executed and the
associated service is using IPv6.
This change is required for IPv6-only operation.
Note that unit test coverage for the 2-line changes in
pkg/proxy/iptables/proxier.go and /pkg/proxy/ipvs/proxier.go will need
to be added after support for IPv6 service addresses is added to these
files. For pkg/proxy/iptables/proxier.go, this coverage will be added
either with PR #48551.
fixes#52027
Automatic merge from submit-queue
rsync IPVS proxier to the HEAD of iptables
**What this PR does / why we need it**:
There was a significant performance improvement made to iptables. Since IPVS proxier makes use of iptables in some use cases, I think we should rsync IPVS proxier to the HEAD of iptables.
**Which issue this PR fixes** :
xref #51679
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51819, 51706, 51761, 51818, 51500)
fix kube-proxy panic because of nil sessionAffinityConfig
**What this PR does / why we need it**:
fix kube-proxy panic because of nil sessionAffinityConfig
**Which issue this PR fixes**: closes#51499
**Special notes for your reviewer**:
I apology that this bug is introduced by #49850 :(
@thockin @smarterclayton @gnufied
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49850, 47782, 50595, 50730, 51341)
Paramaterize `stickyMaxAgeMinutes` for service in API
**What this PR does / why we need it**:
Currently I find `stickyMaxAgeMinutes` for a session affinity type service is hard code to 180min. There is a TODO comment, see
https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L205
I think the seesion sticky max time varies from service to service and users may not aware of it since it's hard coded in all proxier.go - iptables, userspace and winuserspace.
Once we parameterize it in API, users can set/get the values for their different services.
Perhaps, we can introduce a new field `api.ClientIPAffinityConfig` in `api.ServiceSpec`.
There is an initial discussion about it in sig-network group. See,
https://groups.google.com/forum/#!topic/kubernetes-sig-network/i-LkeHrjs80
**Which issue this PR fixes**:
fixes#49831
**Special notes for your reviewer**:
**Release note**:
```release-note
Paramaterize session affinity timeout seconds in service API for Client IP based session affinity.
```
Automatic merge from submit-queue (batch tested with PRs 50094, 48966, 49478, 50593, 49140)
[kube-proxy] Move UDP conntrack operations together to pkg/proxy/util/conntrack.go
**What this PR does / why we need it**:
Fix TODO in pkg/proxy/iptables.go, see
https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L1632
Move UDP conntrack operations together to from `pkg/proxy/iptables/proxier.go` to `pkg/proxy/util/conntrack.go` so that make them more consistent and add some UTs.
**Which issue this PR fixes**
Fixes#49477
**Special notes for your reviewer**:
```release-note
NONE
```
Automatic merge from submit-queue
Fix winspace proxier wrong comment message
**What this PR does / why we need it**:
Since winspace proxier has nothing to do with iptables, this PR remove the wrong comment message on iptables.
**Which issue this PR fixes**:
Fixes#50524
Automatic merge from submit-queue
Switch from package syscall to golang.org/x/sys/unix
**What this PR does / why we need it**:
The syscall package is locked down and the comment in https://github.com/golang/go/blob/master/src/syscall/syscall.go#L21-L24 advises to switch code to use the corresponding package from golang.org/x/sys. This PR does so and replaces usage of package syscall with package golang.org/x/sys/unix where applicable. This will also allow to get updates and fixes
without having to use a new go version.
In order to get the latest functionality, golang.org/x/sys/ is re-vendored. This also allows to use Eventfd() from this package instead of calling the eventfd() C function.
**Special notes for your reviewer**:
This follows previous works in other Go projects, see e.g. moby/moby#33399, cilium/cilium#588
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Log abridged set of rules at v2 in kube-proxy on error
**What this PR does / why we need it**:
this is a follow-on to https://github.com/kubernetes/kubernetes/pull/48085
**Special notes for your reviewer**:
we hit this in operations where we typically run in v2, and would like to log abridged set of output rather than full output.
**Release note**:
```release-note
NONE
```
The syscall package is locked down and the comment in [1] advises to
switch code to use the corresponding package from golang.org/x/sys. Do
so and replace usage of package syscall with package
golang.org/x/sys/unix where applicable.
[1] https://github.com/golang/go/blob/master/src/syscall/syscall.go#L21-L24
This will also allow to get updates and fixes for syscall wrappers
without having to use a new go version.
Errno, Signal and SysProcAttr aren't changed as they haven't been
implemented in /x/sys/. Stat_t from syscall is used if standard library
packages (e.g. os) require it. syscall.SIGTERM is used for
cross-platform files.
Automatic merge from submit-queue (batch tested with PRs 47232, 48625, 48613, 48567, 39173)
proxy/userspace: honor listen IP address as host IP if given
Allows the proxier to be used on an interface that's not the default route,
otherwise hostIP gets set to the default route interface even if that's
not what the user intended.
If listen IP isn't given, falls back to previous behavior.
```release-note
To allow the userspace proxy to work correctly on multi-interface hosts when using the non-default-route interface, you may now set the `bindAddress` configuration option to an IP address assigned to a network interface. The proxy will use that IP address for any required NAT operations instead of the IP address of the interface which has the default route.
```
@kubernetes/sig-network-misc @thockin @wojtek-t
Allows the proxier to be used on an interface that's not the default route,
otherwise hostIP gets set to the default route interface even if that's
not what the user intended.
If listen IP isn't given, falls back to previous behavior.
Automatic merge from submit-queue
Raise a warning instead of info if br-netfilter is missing or unset
Took quite a while to figure out why service VIP is unreachable on my cluster. It turns out br-nf-call-iptables is unset. I wish this message could be a warning to attract considerable attention.
Automatic merge from submit-queue
Move the remaining controllers to shared informers
Completes work done in 1.6 to move the last two hold outs to shared informers - tokens controller and scheduler. Adds a few more tools to allow informer reuse (like filtering the informer, or maintaining a mutation cache).
The mutation cache is identical to #45838 and will be removed when that merges
@ncdc @deads2k extracted from openshift/origin#14086
Automatic merge from submit-queue
Fix the issue in Windows kube-proxy when processing unqualified name. This is for DNS client such as ping or iwr that validate name in response and original question.
**What this PR does / why we need it**:
This PR is an additional fix to #41618 and [the corresponding commit](b9dfb69dd7). The DNS client such as nslookup does not validate name matching in response and original question. That works fine when we append DNS suffix to unqualified name in DNS query in Windows kube-proxy. However, for DNS client such as ping or Invoke-WebRequest that validates name in response and original question, the issue arises and the DNS query fails although the received DNS response has no error.
This PR fixes the additional issue by restoring the original question name in DNS response. Further, this PR refactors DNS message routines by using miekg's DNS library.
This PR affects the Windows kube-proxy only.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42605
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix DNS suffix search list support in Windows kube-proxy.
```
Automatic merge from submit-queue (batch tested with PRs 45953, 45889)
Add /metrics and profiling handlers to kube-proxy
Also expose "syncProxyRules latency" as a prometheus metrics.
Fix https://github.com/kubernetes/kubernetes/issues/45876
Automatic merge from submit-queue (batch tested with PRs 45623, 45241, 45460, 41162)
Promotes Source IP preservation for Virtual IPs from Beta to GA
Fixes#33625. Feature issue: kubernetes/features#27.
Bullet points:
- Declare 2 fields (ExternalTraffic and HealthCheckNodePort) that mirror the ESIPP annotations.
- ESIPP alpha annotations will be ignored.
- Existing ESIPP beta annotations will still be fully supported.
- Allow promoting beta annotations to first class fields or reversely.
- Disallow setting invalid ExternalTraffic and HealthCheckNodePort on services. Default ExternalTraffic field for nodePort or loadBalancer type service to "Global" if not set.
**Release note**:
```release-note
Promotes Source IP preservation for Virtual IPs to GA.
Two api fields are defined correspondingly:
- Service.Spec.ExternalTrafficPolicy <- 'service.beta.kubernetes.io/external-traffic' annotation.
- Service.Spec.HealthCheckNodePort <- 'service.beta.kubernetes.io/healthcheck-nodeport' annotation.
```
Automatic merge from submit-queue (batch tested with PRs 45685, 45572, 45624, 45723, 45733)
Remove reasons from iptables syncProxyRules
The reasons are no longer useful, since we know if something changed anyway, I think.
Automatic merge from submit-queue (batch tested with PRs 45571, 45657, 45638, 45663, 45622)
Use real proxier inside hollow-proxy but with mocked syscalls
Fixes https://github.com/kubernetes/kubernetes/issues/43701
This should make hollow-proxy better mimic the real kube-proxy in performance.
Maybe next we should have a more realistic implementation even for fake iptables (adding/updating/deleting rules/chains in an table, just not on the real one)? Though I'm not sure how important it is.
cc @kubernetes/sig-scalability-misc @kubernetes/sig-network-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 44727, 45409, 44968, 45122, 45493)
Separate healthz server from metrics server in kube-proxy
From #14661, proposal is on kubernetes/community#552.
Couple bullet points as in commit:
- /healthz will be served on 0.0.0.0:10256 by default.
- /metrics and /proxyMode will be served on port 10249 as before.
- Healthz handler will verify timestamp in iptables mode.
/assign @nicksardo @bowei @thockin
**Release note**:
```release-note
NONE
```
- /healthz will be served on 0.0.0.0:10256 by default.
- /metrics and /proxyMode will be served on port 10249
as before.
- Healthz handler will verify timestamp in iptables mode.
Automatic merge from submit-queue (batch tested with PRs 44590, 44969, 45325, 45208, 44714)
Fix onlylocal endpoint's healthcheck nodeport logic
I was in the middle of rebasing #41162, surprisingly found the healthcheck nodeport logic in kube-proxy is still buggy. Separate this fix out as it isn't GA related.
/assign @freehan @thockin
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Edge-based userspace LB in kube-proxy
@thockin @bowei - if one of you could take a look if that PR doesn't break some basic kube-proxy assumptions. The similar change for winuserproxy should be pretty trivial.
And we should also do that for iptables, but that requires splitting the iptables code to syncProxyRules (which from what I know @thockin already started working on so we should probably wait for it to be done).
Automatic merge from submit-queue
Restore "Setting endpoints" log message
**What this PR does / why we need it**:
The "Setting endpoints" message from kube-proxy at high verbosity was
lost as part of a larger simplification in kubernetes/kubernetes#42747.
This change brings it back, simply outputting the just-constructed
addresses list.
I need this message to monitor delays in propagating endpoints changes across nodes.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43871, 44053)
Proxy healthchecks overhaul
The first commit is #44051
These three commits are tightly coupled, but should be reviewed one-by-one. The first adds tests for healthchecks, and found a bug. The second basically rewrites the healthcheck pkg to be much simpler and less flexible (since we weren't using the flexibility). The third tweaks how healthchecks are handled in endpoints-path to be more like they are in services-path.
@MrHohn because I know you were in here for source-IP GA work.
@wojtek-t
The "Setting endpoints" message from kube-proxy at high verbosity was
lost as part of a larger simplification in kubernetes/kubernetes#42747.
This change brings it back, simply outputting the just-constructed
addresses list.
Automatic merge from submit-queue
kube-proxy: filter INPUT as well as OUTPUT
We need to apply filter rules on the way in (nodeports) and out (cluster
IPs). Testing here is insufficient to have caught this - will come back
for that.
Fixes#43969
@justinsb since you have the best repro, can you test? It passes what I think is repro.
@ethernetdan we will want this in 1.6.x
```release-note
Fix bug with service nodeports that have no backends not being rejected, when they should be. This is not a regression vs v1.5 - it's a fix that didn't quite fix hard enough.
```
The existing healthcheck lib was pretty complicated and was hiding some
bugs (like the count always being 1), This is a reboot of the interface
and implementation to be significantly simpler and better tested.
Automatic merge from submit-queue
Use shared informers for proxy endpoints and service configs
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
Follow-up to #43295 cc @wojtek-t
Will race with #43937 for conflicting changes 😄 cc @thockin
cc @smarterclayton @sttts @liggitt @deads2k @derekwaynecarr @eparis @kubernetes/rh-cluster-infra
Adding test cases for HC updates found a bug with an update that
simultaneously removes one port and adds another. Map iteration is
randomized, so sometimes no HC would be created.
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
We need to apply filter rules on the way in (nodeports) and out (cluster
IPs). Testing here is insufficient to have caught this - will come back
for that.
We don't need the svcPortToInfoMap. Its only purpose was to
send "valid" local endpoints (those with valid IP and >0 port) to the
health checker. But we shouldn't be sending invalid endpoints to
the health checker anyway, because it can't do anything with them.
If we exclude invalid endpoints earlier, then we don't need
flattenValidEndpoints().
And if we don't need flattenValidEndpoints() it makes no sense to have
svcPortToInfoMap store hostPortInfo, since endpointsInfo is the same
thing as hostPortInfo except with a combined host:port.
And if svcPortToInfoMap now only stores valid endpointsInfos, it is
exactly the same thing as newEndpoints.
Automatic merge from submit-queue (batch tested with PRs 41672, 42084, 42233, 42165, 42273)
Don't sync IPtables before underlying store/reflector is fully synced
Ref #42108
Build on top of #42108 - only the second commit is unique.
Automatic merge from submit-queue
Extensible Userspace Proxy
This PR refactors the userspace proxy to allow for custom proxy socket implementations.
It changes the the ProxySocket interface to ensure that other packages can properly implement it (making sure all arguments are publicly exposed types, etc), and adds in a mechanism for an implementation to create an instance of the userspace proxy with a non-standard ProxySocket.
Custom ProxySockets are useful to inject additional logic into the actual proxying. For example, our idling proxier uses a custom proxy socket to hold connections and notify the cluster that idled scalable resources need to be woken up.
Also-Authored-By: Ben Bennett bbennett@redhat.com
Automatic merge from submit-queue (batch tested with PRs 42316, 41618, 42201, 42113, 42191)
Support unqualified and partially qualified domain name in DNS query in Windows kube-proxy
**What this PR does / why we need it**:
In Windows container networking, --dns-search is not currently supported on Windows Docker. Besides, even with --dns-suffix, inside Windows container DNS suffix is not appended to DNS query names. That makes unqualified domain name or partially qualified domain name in DNS query not able to resolve.
This PR provides a solution to resolve unqualified domain name or partially qualified domain name in DNS query for Windows container in Windows kube-proxy. It uses well-known Kubernetes DNS suffix as well host DNS suffix search list to append to the name in DNS query. DNS packet in kube-proxy UDP stream is modified as appropriate.
This PR affects the Windows kube-proxy only.
**Special notes for your reviewer**:
This PR is based on top of Anthony Howe's commit 48647fb, 0e37f0a and 7e2c71f which is already included in the PR 41487. Please only review commit b9dfb69.
**Release note**:
```release-note
Add DNS suffix search list support in Windows kube-proxy.
```
Automatic merge from submit-queue (batch tested with PRs 42200, 39535, 41708, 41487, 41335)
Update kube-proxy support for Windows
**What this PR does / why we need it**:
The kube-proxy is built upon the sophisticated iptables NAT rules. Windows does not have an equivalent capability. This introduces a change to the architecture of the user space mode of the Windows version of kube-proxy to match the capabilities of Windows.
The proxy is organized around service ports and portals. For each service a service port is created and then a portal, or iptables NAT rule, is opened for each service ip, external ip, node port, and ingress ip. This PR merges the service port and portal into a single concept of a "ServicePortPortal" where there is one connection opened for each of service IP, external ip, node port, and ingress IP.
This PR only affects the Windows kube-proxy. It is important for the Windows kube-proxy because it removes the limited portproxy rule and RRAS service and enables full tcp/udp capability to services.
**Special notes for your reviewer**:
**Release note**:
```
Add tcp/udp userspace proxy support for Windows.
```
This changes the userspace proxy so that it cleans up its conntrack
settings when a service is removed (as the iptables proxy already
does). This could theoretically cause problems when a UDP service
as deleted and recreated quickly (with the same IP address). As
long as packets from the same UDP source IP and port were going to
the same destination IP and port, the the conntrack would apply and
the packets would be sent to the old destination.
This is astronomically unlikely if you did not specify the IP address
to use in the service, and even then, only happens with an "established"
UDP connection. However, in cases where a service could be "switched"
between using the iptables proxy and the userspace proxy, this case
becomes much more frequent.
This commit makes the userspace proxy keep an ObjectReference to the
service being proxied. This allows the consumers of the `ServiceInfo`
struct, like `ProxySockets` to emit events about or otherwise refer to
the service.
This commit adds a new method for constructing userspace proxiers,
`NewCustomProxier`. `NewCustomProxier` functions identically to
`NewProxier`, except that it allows a custom constructor method to
be passed in to construct instances of ProxySocket.
This commit makes it possible for the `ProxySocket` interface to be
implemented by types outside of the `userspace` package. It mainly just
exposes relevant types and fields as public.
This commit adds a method to the `LoadBalancer` interface in the
userspace proxy which allows consumers of the `LoadBalancer` to check if
it thinks a given service has endpoints available.
This makes it more obvious that they run together and makes the upcoming
rate-limited syncs easier.
Also make test use ints for ports, so it's easier to see when a port is
a literal value vs a name.
This is a weird function, but I didn't want to change any semantics
until the tests are in place. Testing exposed one bug where stale
connections of renamed ports were not marked stale.
There are other things that seem wrong here, more will follow.
Move the feature test to where we are activating the feature, rather
than where we detect locality. This is in service of better tests,
which is in service of less-frequent resyncing, which is going to
require refactoring.
Instead of copying the map, like OnServicesUpdate() used to do and which
was copied into buildServiceMap() to preserve semantics while creating
testcases, start with a new empty map and do deletion checking later.
The API docs say:
// ServiceTypeExternalName means a service consists of only a reference to
// an external name that kubedns or equivalent will return as a CNAME
// record, with no exposing or proxying of any pods involved.
which implies that ExternalName services should be ignored for proxy
purposes.
Automatic merge from submit-queue (batch tested with PRs 35884, 37305, 37369, 37429, 35679)
fix mixleading warning message regarding kube-proxy nodeIP initializa…
The current warning message implies that the operator should restart kube-proxy with some flag related to node IP which can be very misleading.
Automatic merge from submit-queue
Cover port_allocator_test with more conditions
The test cases of port_allocator_test should cover more conditions, such as `rangeAllocator.used.Bit`.
Automatic merge from submit-queue
Bug fix. Incoming UDP packets not reach newly deployed services
**What this PR does / why we need it**:
Incoming UDP packets not reach newly deployed services when old connection's state in conntrack is not cleared. When a packet arrives, it will not go through NAT table again, because it is not "the first" packet. The PR fix the issue
**Which issue this PR fixes**
Fixes#31983
xref https://github.com/docker/docker/issues/8795
Automatic merge from submit-queue
Curating Owners: pkg/proxy
cc @thockin
In an effort to expand the existing pool of reviewers and establish a
two-tiered review process (first someone lgtms and then someone
experienced in the project approves), we are adding new reviewers to
existing owners files.
If You Care About the Process:
------------------------------
We did this by algorithmically figuring out who’s contributed code to
the project and in what directories. Unfortunately, that doesn’t work
well: people that have made mechanical code changes (e.g change the
copyright header across all directories) end up as reviewers in lots of
places.
Instead of using pure commit data, we generated an excessively large
list of reviewers and pruned based on all time commit data, recent
commit data and review data (number of PRs commented on).
At this point we have a decent list of reviewers, but it needs one last
pass for fine tuning.
Also, see https://github.com/kubernetes/contrib/issues/1389.
TLDR:
-----
As an owner of a sig/directory and a leader of the project, here’s what
we need from you:
1. Use PR https://github.com/kubernetes/kubernetes/pull/35715 as an example.
2. The pull-request is made editable, please edit the `OWNERS` file to
remove the names of people that shouldn't be reviewing code in the
future in the **reviewers** section. You probably do NOT need to modify
the **approvers** section. Names asre sorted by relevance, using some
secret statistics.
3. Notify me if you want some OWNERS file to be removed. Being an
approver or reviewer of a parent directory makes you a reviewer/approver
of the subdirectories too, so not all OWNERS files may be necessary.
4. Please use ALIAS if you want to use the same list of people over and
over again (don't hesitate to ask me for help, or use the pull-request
above as an example)
Automatic merge from submit-queue
Change stickyMaxAge from seconds to minutes, fixes issue #35677
**What this PR does / why we need it**: Increases the service sessionAfinity time from 180 seconds to 180 minutes for proxy mode iptables which was a bug introduced in a refactor.
**Which issue this PR fixes**: fixes#35677
**Special notes for your reviewer**:
**Release note**:
``` release-note
Fixed wrong service sessionAffinity stickiness time from 180 sec to 180 minutes in proxy mode iptables.
```
Since there is no test for the sessionAffinity feature at the moment I wanted to create one but I don't know how.
Automatic merge from submit-queue
Add Windows support to kube-proxy
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
This is the first stab at supporting kube-proxy (userspace mode) on Windows
**Which issue this PR fixes** :
fixes#30278
**Special notes for your reviewer**:
The MVP uses `netsh portproxy` to redirect traffic from `ServiceIP:ServicePort` to a `LocalIP:LocalPort`.
For the next version we are expecting to have guidance from Microsoft Container Networking team.
**Limitations**:
Current implementation does not support DNS queries over UDP as `netsh portproxy` currently only supports TCP. We are working with Microsoft to remediate this.
cc: @brendandburns @dcbw
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
Automatic merge from submit-queue
Add kubelet --network-plugin-mtu flag for MTU selection
* Add network-plugin-mtu option which lets us pass down a MTU to a network provider (currently processed by kubenet)
* Add a test, and thus make sysctl testable
Automatic merge from submit-queue
Remove br_netfilter warning in kube-proxy
Many distros have this module linked in, generating a spurious error.
Fixes#23385
bridge-nf-call-iptables appears to only be relevant when the containers are
attached to a Linux bridge, which is usually the case with default Kubernetes
setups, docker, and flannel. That ensures that the container traffic is
actually subject to the iptables rules since it traverses a Linux bridge
and bridged traffic is only subject to iptables when bridge-nf-call-iptables=1.
But with other networking solutions (like openshift-sdn) that don't use Linux
bridges, bridge-nf-call-iptables may not be not relevant, because iptables is
invoked at other points not involving a Linux bridge.
The decision to set bridge-nf-call-iptables should be influenced by networking
plugins, so push the responsiblity out to them. If no network plugin is
specified, fall back to the existing bridge-nf-call-iptables=1 behavior.
This allows us to use the MARK-MASQ chain as a subroutine, rather than encoding
the mark in many places. Having a KUBE-POSTROUTING chain means we can flush
and rebuild it atomically. This makes followon work to change the mark
significantly easier.
We were trying to be clever and respect TCP's notion of half-open sockets, but
it causes leaks when we can't unblock io.Copy(). This fixes those leaks and
seems to follow most expectations. I think we were just be too clever.
Refactor `pkg/proxy/config`’s ServiceConfigHandler.OnUpdate and
EndpointsConfigHandler.OnUpdate to different method names as they have
different signatures.
This will let the new proxy
(https://github.com/GoogleCloudPlatform/kubernetes/issues/3760)
implement both interfaces.
Since we won’t need a separate loadbalancer structure (load balancing
is handled in the proxy rules), we will simply handle both event types
from the same object.
Moves the userspace code in proxy to a sub-package and adds the
ProxyProvider interface.
This is in preparation for landing an implementation of
https://github.com/GoogleCloudPlatform/kubernetes/issues/3760, which
will mostly be in another sub package for iptables.
Not doing so breaks e2e tests and people that may be using them,
even though we will eventually want to stop supporting this now
that we have better alternatives for typical use cases (NodePort)
A service with a NodePort set will listen on that port, on every node.
This is both handy for some load balancers (AWS ELB) and for people
that want to expose a service without using a load balancer.
Moves proxySocket out of proxier.go to new proxysocket.go in proxy
package in order to start separating proxy logic and implementation and
make proxier more manageable to review.
Standardize how our fakes are used so that a test case can use a
simpler mechanism for providing large, complex data sets, as well
as represent queries over time.
Instead of endpoints being a flat list, it is now a list of "subsets"
where each is a struct of {Addresses, Ports}. To generate the list of
endpoints you need to take union of the Cartesian products of the
subsets. This is compact in the vast majority of cases, yet still
represents named ports and corner cases (e.g. each pod has a different
port number).
This also stores subsets in a deterministic order (sorted by hash) to
avoid spurious updates and comparison problems.
This is a fully compatible change - old objects and clients will
keepworking as long as they don't need the new functionality.
This is the prep for multi-port Services, which will add API to produce
endpoints in this new structure.
All the distros that use this have been updated,
or have PRs out to update them, or owners
have been asked to fix RPMs.
Removing this prevents further use of this model.
Remove now dead code: EtcdClientOrDie
Remove now dead pkg/proxy/config/etcd.go.
Remove unused imports.
gofmt -s from 1.4 does not like
for _ = range BLAH
it wants
for range BLAH
But gofmt from 1.3 dies:
./pkg/proxy/config/config.go:265:6: expected operand, found 'range'
./pkg/proxy/config/config.go:268:3: expected '{', found 'EOF'
So instead, rewrite the code to make them both happy
As far as I know, nobody uses it. It was replaced by PublicIPs. If I were
being very polite I would leave it in internal, but since I am 99.99% sure
nobody uses it, I am cutting it. Let's argue about it.
It was an ABA problem where the proxy loop might see its own service as
"existing" when it had been destroyed and recreated (as in an update).
To prove this I added a counter of running ProxyLoop goroutines and check that
in tests. If I undo my main change, the tests fail. This makes the
proxier_test significantly slower (3 seconds vs 0.5 seconds). Sorry.
Split SourceAPI into two subobjects.
Parallel structure for endpoints, services will allow
changing to use generic code in pkg/client/cache/reflector.go.
Rename some funcs to be more like pkg/client/cache.
After this DNS is resolvable from the host, if the DNS server is targetted
explicitly. This does NOT add the cluster DNS to the host's resolv.conf. That
is a larger problem, with distro-specific tie-ins and circular deps.
- Added process to cleanup stale session affinity records
- Automatically set cloud provided load balancer for sticky session if the service requires it - Note, this only works on GCE right now.
- Changed sessionAffinityMap a map to pointers instead of structs to improve performance
- Commented out cookie and protocol from sessionAffinityDetail to avoid confusion as it is not yet implemented.
Don't log an error when Accept failed because the interface (portal)
was just removed.
Don't pass around a pointer to a serviceInfo since another thread
deletes those. Instead, just check if service name is still in the
service map.
Delete the locking on the serviceInfo object since it is only used
by the "main" proxier thread.
A watch of the API can return an api.Status rather than the watched
obejct type. This code didn't handle that.
Tested with services e2e test (in conjunction with other PR).
The iptables args list needs to include all fields as they are eventually spit
out by iptables-save. This is because some systems do not support the
'iptables -C' arg, and so fall back on parsing iptables-save output. If this
does not match, it will not pass the check. For example: adding the /32 on
the destination IP arg is not strictly required, but causes this list to not
match the final iptables-save output. This is fragile and I hope one day we
can stop supporting such old iptables versions.
This allows the proxier to portal Public IPs even if the
createExternalLoadBalancer flag is not set.
This also fixes what appears to be a bug in the createExternalLoadBalancer path
wherein multiple PublicIPs would get truncated.
Allows us to define different watch versioning regimes in the future
as well as to encode information with the resource version.
This changes /watch/resources?resourceVersion=3 to start the watch at
4 instead of 3, which means clients can read a resource version and
then send it back to the server. Clients should no longer do math on
resource versions.
Move a lot of common error logging into better buckets:
glog.Errorf() - Always an error
glog.Warningf() - Something unexpected, but probably not an error
glog.V(0) - Generally useful for this to ALWAYS be visible
to an operator
* Programmer errors
* Logging extra info about a panic
* CLI argument handling
glog.V(1) - A reasonable default log level if you don't want
verbosity
* Information about config (listening on X, watching Y)
* Errors that repeat frequently that relate to conditions
that can be corrected (pod detected as unhealthy)
glog.V(2) - Useful steady state information about the service
* Logging HTTP requests and their exit code
* System state changing (killing pod)
* Controller state change events (starting pods)
* Scheduler log messages
glog.V(3) - Extended information about changes
* More info about system state changes
glog.V(4) - Debug level verbosity (for now)
* Logging in particularly thorny parts of code where
you may want to come back later and check it
* Make Codec separate from Scheme
* Move EncodeOrDie off Scheme to take a Codec
* Make Copy work without a Codec
* Create a "latest" package that imports all versions and
sets global defaults for "most recent encoding"
* v1beta1 is the current "latest", v1beta2 exists
* Kill DefaultCodec, replace it with "latest.Codec"
* This updates the client and etcd to store the latest known version
* EmbeddedObject is per schema and per package now
* Move runtime.DefaultScheme to api.Scheme
* Split out WatchEvent since it's not an API object today, treat it
like a special object in api
* Kill DefaultResourceVersioner, instead place it on "latest" (as the
package that understands all packages)
* Move objDiff to runtime.ObjectDiff