Automatic merge from submit-queue
Increase goroutinemap unit test timeouts.
50ms is flaky in Jenkins. This makes the test to take at least 0.5s to check that things that should block and wait for something really do block and wait (was 100ms before).
Fixes#25825
Automatic merge from submit-queue
Attach Detach Controller Business Logic
This PR adds the meat of the attach/detach controller proposed in #20262.
The PR splits the in-memory cache into a desired and actual state of the world.
Automatic merge from submit-queue
Refactor storePodsNamespacer.List()
This fixes a bug in the previous version where, when we fell back on a
brute force approach, we were still returning an error.
It also clarifies the flow control into 3 distinct cases. The cases
don't share variables any more, which makes mistakes like the one
mentioned above harder.
Automatic merge from submit-queue
volume controller: use better operation names
Using volume/claim.UID in the operation name is not really useful, as UIDs are not logged by rest of the controller. On the other hand, volume.Name and claim.Namespace/Name is logged pretty often and it would help to log these also in operation name. Still, I'd prefer to have the operation name really unique to be protected from users deleting a volume and quickly creating another one with the same name, so UID is still part of the operation name.
This has been already proven to be very useful in controller debugging.
Automatic merge from submit-queue
Use read-only root filesystem capabilities of rkt
Propagates `api.Container.SecurityContext.ReadOnlyRootFileSystem` flag to rkt container runtime.
cc @yifan-gu
Fixes#23837
Automatic merge from submit-queue
Use docker containerInfo.LogPath and not manually constructed path
## Pull Request Guidelines
Since the containerInfo has the LogPath in it, let's use that and
not manually construct the path ourselves. This also makes the code
less prone to breaking if docker change this path.
Fixes#23695
Refactor storePodsNamespacer.List() and
storeReplicationContollersNamespacer.List(). They are the same
function, just with different signatures.
This fixes a bug where, when we fell back on a brute force approach, we
were still returning an error.
Also change to explicit return without named return values.
Automatic merge from submit-queue
Ensure that init containers are preserved during pruning
Pods with multiple init containers were getting the wrong containers
pruned. Fix an error message and add a test.
Fixes#26131
Automatic merge from submit-queue
Add some extra checking in the tests to prevent flakes.
Attempts to fix https://github.com/kubernetes/kubernetes/issues/25967
The hypothesis is that somehow waitTest() catches an idle that occurs before all changes have been applied. This will block until the expected number of changes have arrived.
Automatic merge from submit-queue
use monotonic now in TestDelNode
Fixes https://github.com/kubernetes/kubernetes/issues/24971.
Briefly, the rate_limited_queue uses a `container/heap` to store values, and use this data structure to ensure we can always fetch the value with the minimum `processAt`. However, in some extreme condition, the continuous call to `time.Now()` would get the same value, which causes some unpredictable order in the queue, this fix uses a monotonic `now()` to avoid that.
@smarterclayton please take a look.
Automatic merge from submit-queue
rkt: Support alternate stage1's via annotation
This provides a basic implementation for setting a stage1 on a per-pod
basis via an annotation.
This provides a basic implementation for setting a stage1 on a per-pod
basis via an annotation. See discussion here for how this approach was arrived at: https://github.com/kubernetes/kubernetes/issues/23944#issuecomment-212653776
It's possible this feature should be gated behind additional knobs, such
as a kubelet flag to filter allowed stage1s, or a check akin to what
priviliged gets in the apiserver.
Currently, it checks `AllowPrivileged`, as a means to let people disable
this feature, though overloading it as stage1 and privileged isn't
ideal.
Fixes#23944
Testing done (note, unfortunately done with some additional ./cluster changes merged in):
```
$ cat examples/stage1-fly/fly-me-to-the-moon.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
name: exit
name: exit-fast
annotations: {"rkt.alpha.kubernetes.io/stage1-name-override": "coreos.com/rkt/stage1-fly:1.3.0"}
spec:
restartPolicy: Never
containers:
- name: exit
image: busybox
command: ["sh", "-c", "ps aux"]
$ kubectl create -f examples/stage1-fly
$ ssh core@minion systemctl status -l --no-pager k8s_2f169b2e-c32a-49e9-a5fb-29ae1f6b4783.service
...
failed
...
May 04 23:33:03 minion rkt[2525]: stage0: error writing /etc/rkt-resolv.conf: open /var/lib/rkt/pods/run/2f169b2e-c32a-49e9-a5fb-29ae1f6b4783/stage1/rootfs/etc/rkt-resolv.conf: no such file or directory
...
# Restart kubelet with allow-privileged=false
$ kubectl create -f examples/stage1-fly
$ kubectl describe exit-fast
...
1m 19s 5 {kubelet euank-e2e-test-minion-dv3u} spec.containers{exit} Warning Failed Failed to create rkt container with error: cannot make "exit-fast_default(17050ce9-1252-11e6-a52a-42010af00002)": running a custom stage1 requires a privileged security context
....
```
Note as well that the "success" here is rkt spitting out an [error message](https://github.com/coreos/rkt/issues/2141) which indicates that the right stage1 was being used at least.
cc @yifan-gu @aaronlevy
Automatic merge from submit-queue
reduce conflict retries
Eliminates quota admission conflicts due to latent caches on the same API server.
@derekwaynecarr
Automatic merge from submit-queue
Downward API implementation for resources limits and requests
This is an implementation of Downward API for resources limits and requests, and it works with environment variables and volume plugin.
This is based on proposal https://github.com/kubernetes/kubernetes/pull/24051. This implementation follows API with magic keys approach as discussed in the proposal.
@kubernetes/rh-cluster-infra
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24179)
<!-- Reviewable:end -->
Split controller cache into actual and desired state of world.
Controller will only operate on volumes scheduled to nodes that
have the "volumes.kubernetes.io/controller-managed-attach" annotation.
Automatic merge from submit-queue
Add a 'kubectl clusterinfo dump' option
Ref: #3500
@bgrant0607 @smarterclayton @jszczepkowski
Usage:
```
# Dump current cluster state to stdout
kubectl clusterinfo dump
# Dump current cluster state to /tmp
kubectl clusterinfo dump --output-directory=/tmp
# Dump all namespaces to stdout
kubectl clusterinfo dump --all-namespaces
# Dump a set of namespaces to /tmp
kubectl clusterinfo dump --namespaces default,kube-system --output-directory=/tmp
```
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/20672)
<!-- Reviewable:end -->
Automatic merge from submit-queue
plumb Update resthandler to allow old/new comparisons in admission
Rework how updated objects are passed to rest storage Update methods (first pass at https://github.com/kubernetes/kubernetes/pull/23928#discussion_r61444342)
* allows centralizing precondition checks (uid and resourceVersion)
* allows admission to have the old and new objects on patch/update operations (sets us up for field level authorization, differential quota updates, etc)
* allows patch operations to avoid double-GETting the object to apply the patch
Overview of important changes:
* pkg/api/rest/rest.go
* changes `rest.Update` interface to give rest storage an `UpdatedObjectInfo` interface instead of the object directly. To get the updated object, the storage must call `UpdatedObject()`, passing in the current object
* pkg/api/rest/update.go
* provides a default `UpdatedObjectInfo` impl
* passes a copy of the updated object through any provided transforming functions and returns it when asked
* builds UID preconditions from the updated object if they can be extracted
* pkg/apiserver/resthandler.go
* Reworks update and patch operations to give old objects to admission
* pkg/registry/generic/registry/store.go
* Calls `UpdatedObject()` inside `GuaranteedUpdate` so it can provide the old object
Todo:
- [x] Update rest.Update interface:
* Given the name of the object being updated
* To get the updated object data, the rest storage must pass the current object (fetched using the name) to an `UpdatedObject(ctx, oldObject) (newObject, error)` func. This is typically done inside a `GuaranteedUpdate` call.
- [x] Add old object to admission attributes interface
- [x] Update resthandler Update to move admission into the UpdatedObject() call
- [x] Update resthandler Patch to move the patch application and admission into the UpdatedObject() call
- [x] Add resttest tests to make sure oldObj is correctly passed to UpdatedObject(), and errors propagate back up
Follow-up:
* populate oldObject in admission for delete operations?
* update quota plugin to use `GetOldObject()` in admission attributes
* admission plugin to gate ownerReference modification on delete permission
* Decide how to handle preconditions (does that belong in the storage layer or in the resthander layer?)
Instead of just rate limits to operation polling, send all API calls
through a rate limited RoundTripper.
This isn't a perfect solution, since the QPS is obviously getting
split between different controllers, etc., but it's also spread across
different APIs, which, in practice, rate limit differently.
Fixes#26119 (hopefully)
This provides a basic implementation for setting a stage1 on a per-pod
basis via an annotation.
It's possible this feature should be gated behind additional knobs, such
as a kubelet flag to filter allowed stage1s, or a check akin to what
priviliged gets in the apiserver.
Currently, it checks `AllowPrivileged`, as a means to let people disable
this feature, though overloading it as stage1 and privileged isn't
ideal.
Automatic merge from submit-queue
Handle federated service name lookups in kube-dns.
For the domain name queries that fail to match any records in the local
kube-dns cache, we check if the queried name matches the federation
pattern. If it does, we send a CNAME response to the federated name.
For more details look at the comments in the code.
Tests are coming ...
Also, this PR is based on @ArtfulCoder's PR #23930. So please review only the last commit here.
PTAL @ArtfulCoder @thockin @quinton-hoole @nikhiljindal
[]()
For the domain name queries that fail to match any records in the local
kube-dns cache, we check if the queried name matches the federation
pattern. If it does, we send a CNAME response to the federated name.
For more details look at the comments in the code.
Automatic merge from submit-queue
Support sort-by timestamp in kubectl get
## Pull Request Guidelines
1. Please read our [contributor guidelines](https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md).
1. See our [developer guide](https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md).
1. Follow the instructions for [labeling and writing a release note for this PR](https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes) in the block below.
```release-note
```
**Before:**
```console
$ kubectl get svc --sort-by='{.metadata.creationTimestamp}'
proto: no encoder for TypeMeta unversioned.TypeMeta [GetProperties]
proto: tag has too few fields: "-"
proto: no coders for struct *reflect.rtype
proto: no encoder for sec int64 [GetProperties]
proto: no encoder for nsec int32 [GetProperties]
proto: no encoder for loc *time.Location [GetProperties]
proto: no encoder for Time time.Time [GetProperties]
proto: no coders for intstr.Type
proto: no encoder for Type intstr.Type [GetProperties]
F0513 16:46:49.499894 29562 sorting_printer.go:182] Field {.metadata.creationTimestamp} in TypeMeta:<kind:"Service" apiVersion:"v1" > metadata:<name:"kubernetes" generateName:"" namespace:"default" selfLink:"/api/v1/namespaces/default/services/kubernetes" uid:"b88b4739-1964-11e6-9ac3-64510658e388" resourceVersion:"8" generation:0 creationTimestamp:<2016-05-13T16:45:06-07:00> labels:<key:"component" value:"apiserver" > labels:<key:"provider" value:"kubernetes" > > spec:<ports:<name:"https" protocol:"TCP" port:443 targetPort:<type:0 intVal:443 strVal:"" > nodePort:0 > clusterIP:"10.0.0.1" type:"ClusterIP" sessionAffinity:"ClientIP" loadBalancerIP:"" > status:<loadBalancer:<> > is an unsortable type: struct, err: unsortable type: struct
```
**After:**
```console
$ kubectl get svc --sort-by='{.metadata.creationTimestamp}'
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 10.0.0.1 <none> 443/TCP 48s
frontend 10.0.0.108 <none> 80/TCP 10s
```
@kubernetes/kubectl
[]()
Automatic merge from submit-queue
cache: reflector should never stop watching
A recent change tries to separate resync and relist. The motivation
was to avoid triggering relist when a resync is required.
However, the change is not effective since it stops the watcher. As hongchao
mentioned in the original comment, today's storage interface will not deliever
any progress notification to the watch chan. So any watcher that does not receive
events for the last few seconds will not be able to catch up from the previous
index after a hard close since the index of the last received event is out of
the cache window inside etcd2.
This pull request tries to fix this issue by not stoping watcher when a resync is
required.
/cc @hongchaodeng @wojtek-t @timothysc @rrati @smarterclayton
Automatic merge from submit-queue
vSphere Volume Plugin Implementation
This PR implements vSphere Volume plugin support in Kubernetes (ref. issue #23932).
A recent change tries to separate resync and relist. The motivation
was to avoid triggering relist when a resync is required.
However, the change is not effective since it stops the watcher. As hongchao
mentioned in the original comment, today's storage interface will not deliever
any progress notification to the watch chan. So any watcher that does not receive
events for the last few seconds will not be able to catch up from the previous
index after a hard close since the index of the last received event is out of
the cache window inside etcd2.
This pull request tries to fix this issue by not stoping watcher when a resync is
required.
Since the containerInfo has the LogPath in it, let's use that and
not manually construct the path ourselves. This also makes the code
less prone to breaking if docker change this path.
Fixes#23695
Automatic merge from submit-queue
ResourceQuota controller uses rate limiter to prevent hot-loops in error situations
Have resource quota controller use a rate limited queue to prevent hot-looping in error situations.