Automatic merge from submit-queue
Enable endpoints in kubernetes service started by local-cluster-up.sh
--advertise_address should be set to 127.0.0.1, So let API server pick
the default if necessary.
Fixes#29374
Automatic merge from submit-queue
[garbage collector] add e2e test
This PR also includes some changes to plumb controller-manager's `--enable_garbage_collector` from the environment variable.
The e2e test will not be run by the core suite because it's marked `[Feature:GarbageCollector]`.
The corresponding jenkins job configuration PR is https://github.com/kubernetes/test-infra/pull/132.
Automatic merge from submit-queue
controller-manager support number of garbage collector workers to be configurable
The number of garbage collector workers of controller-manager is a fixed value 5 now, make it configurable should more properly
Automatic merge from submit-queue
node_e2e: configure gce images via config file
This file provides the abiliy to specify image project on a per-image
basis and is more extensible for future changes.
For backwards compatibility and local development convenience, the
existing flags are kept and should work.
The eventual goal is to be able to source some images, such as the CoreOS one (and possibly containervm one) from their upstream projects and do all new configuration changes via a cloud-init key added to the image config.
This PR is a first step there. A following PR will add a config key of `cloud-init` or `user-data` and migrate the CoreOS e2e to use that.
This motivation is driven by the fact that currently the changes needed for the CoreOS image can all be done quickly in cloud-init and this will make it much easier to update the image and ensure that changes are applied consistently.
/cc @timstclair @vishh @yifan-gu @pwittrock
Automatic merge from submit-queue
Node E2E: Prep for continuous Docker validation node e2e test
Based on https://github.com/kubernetes/kubernetes/pull/28516, for https://github.com/kubernetes/kubernetes/issues/25215.
https://github.com/kubernetes/kubernetes/pull/26813 added support to run e2e test on gci preview image and newest docker version.
This PR added the same support to node e2e test.
The main dependencies of node e2e test are `docker`, `kubelet`, `etcd` and `apiserver`.
Currently, node e2e test builds `kubelet` and `apiserver` locally, and copies them into `/tmp` directory in VM instance. GCI also has built-in `docker`. So the only dependency missing is `etcd`.
This PR injected a simple cloud-init script when creating instance to install `etcd` during node startup.
@andyzheng0831 for the cloud init script.
@wonderfly for the gci instance setup.
@pwittrock for the node e2e test change.
/cc @dchen1107
[]()
This file provides the abiliy to specify image project on a per-image
basis and is more extensible for future changes.
For backwards compatibility and local development convenience, the
existing flags are kept and should work.
Search and replace for references to moved examples
Reverted find and replace paths on auto gen docs
Reverting changes to changelog
Fix bugs in test-cmd.sh
Fixed path in examples README
ran update-all successfully
Updated verify-flags exceptions to include renamed files
This drives conversion generation from file tags like:
// +conversion-gen=k8s.io/my/internal/version
.. rather than hardcoded lists of packages.
The only net change in generated code can be explained as correct. Previously
it didn't know that conversion was available.
Automatic merge from submit-queue
Prep for not checking in generated, part 1/2
This PR is extracted from #25978 - it is just the deep-copy related parts. All the Makefile and conversion stuff is excluded.
@wojtek-t this is literally branched, a bunch of commits deleted, and a very small number of manual fixups applied. If you think this is easier to review (and if it passes CI) you can feel free to go over it again. I will follow this with a conversion-related PR to build on this.
Or if you prefer, just close this and let the mega-PR ride.
@lavalamp
Automatic merge from submit-queue
Node E2E: Disable kubenet for local node e2e test.
After https://github.com/kubernetes/kubernetes/pull/28196, we must manually setup cni and nsenter in local node to run `make test_e2e_node`, which may not be necessary for local development.
I've tried to move cni downloading logic into `BeforeSuite`, however it is still hard to figure out who should install nsenter, manually installed by every developer? in the `setup_host.sh` script? in `BeforeSuite`?
This PR:
* Added a flag to disable kubenet and disabled kubenet in local test.
* Cleaned up the CNI installation logic a bit.
/cc @yujuhong @freehan
[]()
This re-institutes some of the rolled-back logic from previous commits. It
bounds the scope of what the deepcopy generator is willing to do with regards
to generating and calling generated functions.
Automatic merge from submit-queue
Implementing a proper master/worker split in the juju cluster code.
```
release-note-none
```
General updates to the cluster/juju Kubernetes provider, to bring it up to date.
Updating the skydns templates to version 11
Updating the etcd container definition to include arch.
Updating the master template to include arch and version for hyperkube container.
Adding dns_domain configuration options.
Adding storage layer options.
[]()
Updating the skydns templates to version 11
Updating the etcd container definition to include arch.
Updating the master template to include arch and version for hyperkube container.
Adding dns_domain configuration options.
Adding storage layer options.
Fixing underscore problem and adding exceptions.
Fixing the underscore flag errors.
Downstream generators that want to reuse the upstream generated types
need to be able to define a different ignore tag (so that they can see
the already generated types).
The staging images are now created with image families, so we can get rid of the
image indices stored in GCS. Also, get images based on milestone number instead
of "image type".
Automatic merge from submit-queue
Revert "Revert "GCI: add support for network plugin""
PR #27027 added the network plugin support in GCI config, but later a bug in the network plugin broke e2e tests (see issue #27118). The bug was fixed by #27141 and we have been repeatedly run the serial e2e tests more than 10 times to verify the fix. Now it should be safe to put the GCI network plugin support back.
We will first merge in the master branch and monitor the Jenkins serial tests for a while and then cherry-pick it into release-1.3 branch.
Automatic merge from submit-queue
Fixes and improvements to Photon Controller backend for kube-up
- Improve reliability of network address detection by using MAC
address. VMware has a MAC OUI that reliably distinguishes the VM's
NICs from the other NICs (like the CBR). This doesn't rely on the
unreliable reporting of the portgroup.
- Persist route changes. We configure routes on the master and nodes,
but previously we didn't persist them so they didn't last across
reboots. This persists them in /etc/network/interfaces
- Fix regression that didn't configure auth for kube-apiserver with
Photon Controller.
- Reliably run apt-get update: Not doing this can cause apt to fail.
- Remove unused nginx config in salt
- Improve reliability of network address detection by using MAC
address. VMware has a MAC OUI that reliably distinguishes the VM's
NICs from the other NICs (like the CBR). This doesn't rely on the
unreliable reporting of the portgroup.
- Persist route changes. We configure routes on the master and nodes,
but previously we didn't persist them so they didn't last across
reboots. This persists them in /etc/network/interfaces
- Fix regression that didn't configure auth for kube-apiserver with
Photon Controller.
- Reliably run apt-get update: Not doing this can cause apt to fail.
- Remove unused nginx config in salt
Automatic merge from submit-queue
Add a custom main instead of the standard test main, to reduce stack …
Adds a custom test main handler (see: `TestMain` in https://golang.org/pkg/testing/ for details)
Partial fix for https://github.com/kubernetes/kubernetes/issues/25965
This does the standard timeout, but strips non-kubernetes stacks out of the stack trace (e.g. it filters things like:
```
goroutine 466 [IO wait, 7 minutes]:
net.runtime_pollWait(0x7fd74c4672c0, 0x72, 0xc821614000)
/usr/local/go/src/runtime/netpoll.go:160 +0x60
net.(*pollDesc).Wait(0xc8215c21b0, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc8215c21b0, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc8215c2150, 0xc821614000, 0x1000, 0x1000, 0x0, 0x7fd74c491050, 0xc820014058)
/usr/local/go/src/net/fd_unix.go:250 +0x23a
net.(*conn).Read(0xc820a5a090, 0xc821614000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:172 +0xe4
net/http.noteEOFReader.Read(0x7fd74c465258, 0xc820a5a090, 0xc8215f0068, 0xc821614000, 0x1000, 0x1000, 0x405773, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1687 +0x67
net/http.(*noteEOFReader).Read(0xc8215ae1a0, 0xc821614000, 0x1000, 0x1000, 0xc82159ad1d, 0x0, 0x0)
<autogenerated>:284 +0xd0
bufio.(*Reader).fill(0xc8202a2b40)
/usr/local/go/src/bufio/bufio.go:97 +0x1e9
bufio.(*Reader).Peek(0xc8202a2b40, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:132 +0xcc
net/http.(*persistConn).readLoop(0xc8215f0000)
/usr/local/go/src/net/http/transport.go:1073 +0x177
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:857 +0x10a6
```
We may want to get even more aggressive in the future.
@kubernetes/sig-testing
Automatic merge from submit-queue
GCI: add support for network plugin
I had run e2e against a cluster with both master and nodes on GCI a couple of times. The PR auto tests will cover the hybrid cluster with just master on GCI.
cc/ @roberthbailey @fabioy @kubernetes/goog-image
Also includes other improvements:
- Makefile rule to run tests against remote instance using existing host or image
- Makefile will reuse an instance created from an image if it was not torn down
- Runner starts gce instances in parallel with building source
- Runner uses instance ip instead of hostname so that it doesn't need to resolve
- Runner supports cleaning up files and processes on an instance without stopping / deleting it
- Runner runs tests using `ginkgo` binary to support running tests in parallel
This PR contains Kubelet changes to enable attach/detach controller control.
* It introduces a new "enable-controller-attach-detach" kubelet flag to
enable control by controller. Default enabled.
* It removes all references "SafeToDetach" annoation from controller.
* It adds the new VolumesInUse field to the Node Status API object.
* It modifies the controller to use VolumesInUse instead of SafeToDetach
annotation to gate detachment.
* There is a bug in node-problem-detector that causes VolumesInUse to
get reset every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9
opened to fix that.
This change recovers some of the side effects of
https://github.com/kubernetes/kubernetes/pull/26197, i.e., keeps the defaults of
`NODE_IMAGE` and `NODE_IMAGE_PROJECT` to `MASTER_IMAGE` and
`MASTER_IMAGE_PROJECT`, for backward compatibility. Although it keeps
`OS_DISTRIBUTION` defaulting to `gci`, the default settings of these vars are
moved to `cluster/gce/util.sh` and conditioned on `OS_DISTRIBUTION==gci`.
Automatic merge from submit-queue
Support per-test-environment ginkgo flags for node e2e tests to facilitate skipping miss behaving tests in PR builder
We had an issue today where some node e2e tests were timing out in the pr builder. We want to be able to skip tests in the pr builder and leave them running in the CI if this happens again.
[]()
This allows kube-controller-manager to allocate CIDRs to nodes (with
allocate-node-cidrs=true), but will not try to configure them on the
cloud provider, even if the cloud provider supports Routes.
The default is configure-cloud-routes=true, and it will only try to
configure routes if allocate-node-cidrs is also configured, so the
default behaviour is unchanged.
This is useful because on AWS the cloud provider configures routes by
setting up VPC routing table entries, but there is a limit of 50
entries. So setting configure-cloud-routes on AWS would allow us to
continue to allocate node CIDRs as today, but replace the VPC
route-table mechanism with something not limited to 50 nodes.
We can't just turn off the cloud-provider entirely because it also
controls other things - node discovery, load balancer creation etc.
Fix#25602
Automatic merge from submit-queue
Attach Detach Controller Business Logic
This PR adds the meat of the attach/detach controller proposed in #20262.
The PR splits the in-memory cache into a desired and actual state of the world.
Split controller cache into actual and desired state of world.
Controller will only operate on volumes scheduled to nodes that
have the "volumes.kubernetes.io/controller-managed-attach" annotation.
- Add junit test reported
- Write etcd.log, kubelet.log and kube-apiserver.log to files instead of stdout
- Scp artifacts to the jenkins WORKSPACE
Fixes#25966
Automatic merge from submit-queue
Add a 'kubectl clusterinfo dump' option
Ref: #3500
@bgrant0607 @smarterclayton @jszczepkowski
Usage:
```
# Dump current cluster state to stdout
kubectl clusterinfo dump
# Dump current cluster state to /tmp
kubectl clusterinfo dump --output-directory=/tmp
# Dump all namespaces to stdout
kubectl clusterinfo dump --all-namespaces
# Dump a set of namespaces to /tmp
kubectl clusterinfo dump --namespaces default,kube-system --output-directory=/tmp
```
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/20672)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Cache Webhook Authentication responses
Add a simple LRU cache w/ 2 minute TTL to the webhook authenticator.
Kubectl is a little spammy, w/ >= 4 API requests per command. This also prevents a single unauthenticated user from being able to DOS the remote authenticator.
Automatic merge from submit-queue
Updaing QoS policy to be at the pod level
Quality of Service will be derived from an entire Pod Spec, instead of being derived from resource specifications of individual resources per-container.
A Pod is `Guaranteed` iff all its containers have limits == requests for all the first-class resources (cpu, memory as of now).
A Pod is `BestEffort` iff requests & limits are not specified for any resource across all containers.
A Pod is `Burstable` otherwise.
Note: Existing pods might be more susceptible to OOM Kills on the node due to this PR! To protect pods from being OOM killed on the node, set `limits` for all resources across all containers in a pod.
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/14943)
<!-- Reviewable:end -->