Automatic merge from submit-queue
Reorganize volume controllers and manager
* Move both PV and attach/detach volume controllers to `controllers/volume` (closes#26222)
* Rename `kubelet/volume` to `kubelet/volumemanager`
* Add/update OWNER files
Automatic merge from submit-queue
TLS bootstrap API group (alpha)
This PR only covers the new types and related client/storage code- the vast majority of the line count is codegen. The implementation differs slightly from the current proposal document based on discussions in design thread (#20439). The controller logic and kubelet support mentioned in the proposal are forthcoming in separate requests.
I submit that #18762 ("Creating a new API group is really hard") is, if anything, understating it. I've tried to structure the commits to illustrate the process.
@mikedanese @erictune @smarterclayton @deads2k
```release-note-experimental
An alpha implementation of the the TLS bootstrap API described in docs/proposals/kubelet-tls-bootstrap.md.
```
[]()
Automatic merge from submit-queue
Add EndpointReconcilerConfig to master Config
Add EndpointReconcilerConfig to master Config to allow downstream integrators to customize the reconciler and reconciliation interval when starting a customized master
@kubernetes/sig-api-machinery @deads2k @smarterclayton @liggitt @kubernetes/rh-cluster-infra
Add EndpointReconcilerConfig to master Config to allow downstream integrators to customize the reconciler
and reconciliation interval when starting a customized master.
Automatic merge from submit-queue
Add integration test for binding PVs using label selectors
Adds an integration test for persistent volume claim 'MatchExpressions' label selector.
Automatic merge from submit-queue
Reapply ScheduledJob tests (2ab885a53a)
Re-applied the ScheduledJob tests (#25737) which were reverted due to an integration test error in #27184.
The problem was in `TestBatchGroupBackwardCompatibility` which is testing backwards compatibility for storing jobs (`extensions/v1beta1` vs `batch/v1`), which is not needed for `batch/v2alpha1`. I've added a skip to aforementioned test for that group. See `test/integration/master_test.go` for the actual fix.
@caesarxuchao @mikedanese ptal
@piosz @jszczepkowski @erictune fyi
[]()
Automatic merge from submit-queue
Add possibility to run integration tests in parallel
- add env. variable with etcd URL to intergration tests
- update documentation with example how to use it to find flakes
Automatic merge from submit-queue
Add integration test for binding PVs using label selectors
Adds an integration test for persistent volume claim label selector.
Many integration tests delete all keys in etcd as part of their cleanup.
To run these tests in parallel we must run several etcd daemons, each on
different port and pass etcd url to the test suite.
Automatic merge from submit-queue
add unit and integration tests for rbac authorizer
This PR adds lots of tests for the RBAC authorizer.
The plan over the next couple days is to add a lot more test cases.
Updates #23396
cc @erictune
Automatic merge from submit-queue
Add integration test for provisioning/deleting many PVs.
The test is configurable by KUBE_INTEGRATION_PV_OBJECTS for load tests, 100 objects are created by default.
@kubernetes/sig-storage
This commit adds a new volume manager in kubelet that synchronizes
volume mount/unmount (and attach/detach, if attach/detach controller
is not enabled).
This eliminates the race conditions between the pod creation loop
and the orphaned volumes loops. It also removes the unmount/detach
from the `syncPod()` path so volume clean up never blocks the
`syncPod` loop.
Automatic merge from submit-queue
volume integration: wait for PVs before creating PVCs
The test should wait until all volumes are processed by volume controller (i.e. in the controller cache) before creating a PVC.
Without that, the "best" matching PV could not be in the cache and controller might bind the PVC to suboptiomal one.
This fixes integration test flake "Bind mismatch! Expected pvc-2 capacity 50000000000 but got pvc-2 capacity 52000000000".
Fixes#27179 (together with #26894)
Automatic merge from submit-queue
Fix integration pv flakes
There are two fixes in this PR:
- run tests in separarate functions and use objects with different names, otherwise events from the beginning of the function are caught later when we watch for events of a different PV/PVC
- don't set PV.Spec.ClaimRef.UID of pre-bound PVs. PVs with UID set are considered as bound and they are deleted/recycled when appropriate PVC does not exists yet.
Fixes#26730 and probably also ~~#26894~~ #26256
The test should wait until all volumes are processed by volume controller (i.e.
in the controller cache) before creating a PVC.
Without that, the "best" matching PV could not be in the cache and controller
might bind the PVC to suboptiomal one.
This fixes integration test flake "Bind mismatch! Expected pvc-2 capacity
50000000000 but got pvc-2 capacity 52000000000".
Automatic merge from submit-queue
Considering all nodes for the scheduler cache to allow lookups
Fixes the actual issue that led me to create https://github.com/kubernetes/kubernetes/issues/22554
Currently the nodes in the cache provided to the predicates excludes the unschedulable nodes using field level filtering for the watch results. This results in the above issue as the `ServiceAffinity` predicate uses the cached node list to look up the node metadata for a peer pod (another pod belonging to the same service). Since this peer pod could be currently hosted on a node that is currently unschedulable, the lookup could potentially fail, resulting in the pod failing to be scheduled.
As part of the fix, we are now including all nodes in the watch results and excluding the unschedulable nodes using `NodeCondition`
@derekwaynecarr PTAL
Automatic merge from submit-queue
Enable WatchCache in test/integration/ tests
We already run cmd/integration/ with watch cache on. We should also run tests in test/integration/ with watch cache on.
@wojtek-t @lavalamp
Automatic merge from submit-queue
Add a custom main instead of the standard test main, to reduce stack …
Adds a custom test main handler (see: `TestMain` in https://golang.org/pkg/testing/ for details)
Partial fix for https://github.com/kubernetes/kubernetes/issues/25965
This does the standard timeout, but strips non-kubernetes stacks out of the stack trace (e.g. it filters things like:
```
goroutine 466 [IO wait, 7 minutes]:
net.runtime_pollWait(0x7fd74c4672c0, 0x72, 0xc821614000)
/usr/local/go/src/runtime/netpoll.go:160 +0x60
net.(*pollDesc).Wait(0xc8215c21b0, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc8215c21b0, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc8215c2150, 0xc821614000, 0x1000, 0x1000, 0x0, 0x7fd74c491050, 0xc820014058)
/usr/local/go/src/net/fd_unix.go:250 +0x23a
net.(*conn).Read(0xc820a5a090, 0xc821614000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:172 +0xe4
net/http.noteEOFReader.Read(0x7fd74c465258, 0xc820a5a090, 0xc8215f0068, 0xc821614000, 0x1000, 0x1000, 0x405773, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1687 +0x67
net/http.(*noteEOFReader).Read(0xc8215ae1a0, 0xc821614000, 0x1000, 0x1000, 0xc82159ad1d, 0x0, 0x0)
<autogenerated>:284 +0xd0
bufio.(*Reader).fill(0xc8202a2b40)
/usr/local/go/src/bufio/bufio.go:97 +0x1e9
bufio.(*Reader).Peek(0xc8202a2b40, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:132 +0xcc
net/http.(*persistConn).readLoop(0xc8215f0000)
/usr/local/go/src/net/http/transport.go:1073 +0x177
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:857 +0x10a6
```
We may want to get even more aggressive in the future.
@kubernetes/sig-testing
When we create a PV, we should created it withoud Spec.ClaimRef.UID.
In rare cases, when 'PV added' event with UID is processed before 'PVC
added' (created by for loop few lines above), the controller does not know
a PVC with this UID and considers the PV as released. Reclaim policy is
then executed and the PV is deleted and it's never bound.
With UID="", the controller waits for the PVC to get created and binds
it.
Different tests should use different objects and watchers - I noticed
sometimes an event from old tests leaked into subsequent test in the
same function.
And add some logs.
The test tries to bind configured nr. of PVs to the same nr. of PVCs.
'100' is used by default, which should take ~1-3 seconds (depends on log level).
Periodic sync is needed in rare cases, which may add another 10 seconds. - cache
from #25881 will help here and sync should not be needed at all.
The test is configurable and may be reused to measure binder performance.
Set KUBE_INTEGRATION_PV_* env. variables as described in
persistent_volume_test.go and run the tests:
# compile
$ cd test/integration
$ godep go test -tags 'integration no-docker' -c
# run the tests
$ KUBE_INTEGRATION_PV_SYNC_PERIOD=10s KUBE_INTEGRATION_PV_OBJECTS=1000 time ./integration.test -test.run TestPersistentVolumeMultiPVsPVCs -v 2
Log level '2' is useful to get timestamps of various events like
'TestPersistentVolumeMultiPVsPVCs: start' and 'TestPersistentVolumeMultiPVsPVCs:
claims are bound'.
- add more logs
- wait both for volume and claim to get bound
When binding volumes to claims the controller saves PV first and PVC right
after that. In theory, this saved PV could cause waitForPersistentVolumePhase
to finish and PVC could be checked in the test before the controller saves it.
So, wait for both PVC and PV to get bound and check the results only after
that. This is only a theory, there are no usable logs in integration tests.
Automatic merge from submit-queue
Use pause image depending on the server's platform when testing
Removed all pause image constant strings, now the pause image is chosen by arch. Part of the effort of making e2e arch-agnostic.
The pause image name and version is also now only in two places, and it's documented to bump both
Also removed "amd64" constants in the code. Such constants should be replaced by `runtime.GOARCH` or by looking up the server platform
Fixes: #22876 and #15140
Makes it easier for: #25730
Related: #17981
This is for `v1.3`
@ixdy @thockin @vishh @kubernetes/sig-testing @andyzheng0831 @pensu
Automatic merge from submit-queue
Attach Detach Controller Business Logic
This PR adds the meat of the attach/detach controller proposed in #20262.
The PR splits the in-memory cache into a desired and actual state of the world.
Split controller cache into actual and desired state of world.
Controller will only operate on volumes scheduled to nodes that
have the "volumes.kubernetes.io/controller-managed-attach" annotation.
Automatic merge from submit-queue
Fix panic in auth test failure
I got a spurious failure in the webhook integration test, but couldn't see the error returned because a panic was hit that assumed a body was always returned with the response
Automatic merge from submit-queue
Cache Webhook Authentication responses
Add a simple LRU cache w/ 2 minute TTL to the webhook authenticator.
Kubectl is a little spammy, w/ >= 4 API requests per command. This also prevents a single unauthenticated user from being able to DOS the remote authenticator.
- create 100 PV, ranging from 0 to 99GB; create 1 PVC to claim 50GB. Verify only one PV is bound and rest are pending
- create 2 PVs with different access modes (RWM, RWO), 1 PVC to claim RWM PV. Verify RWM is bound and RWO is not bound.
Signed-off-by: Huamin Chen <hchen@redhat.com>
Automatic merge from submit-queue
Refactor persistent volume controller
Here is complete persistent controller as designed in https://github.com/pmorie/pv-haxxz/blob/master/controller.go
It's feature complete and compatible with current binder/recycler/provisioner. No new features, it *should* be much more stable and predictable.
Testing
--
The unit test framework is quite complicated, still it was necessary to reach reasonable coverage (78% in `persistentvolume_controller.go`). The untested part are error cases, which are quite hard to test in reasonable way - sure, I can inject a VersionConflictError on any object update and check the error bubbles up to appropriate places, but the real test would be to run `syncClaim`/`syncVolume` again and check it recovers appropriately from the error in the next periodic sync. That's the hard part.
Organization
---
The PR starts with `rm -rf kubernetes/pkg/controller/persistentvolume`. I find it easier to read when I see only the new controller without old pieces scattered around.
[`types.go` from the old controller is reused to speed up matching a bit, the code looks solid and has 95% unit test coverage].
I tried to split the PR into smaller patches, let me know what you think.
~~TODO~~
--
* ~~Missing: provisioning, recycling~~.
* ~~Fix integration tests~~
* ~~Fix e2e tests~~
@kubernetes/sig-storage
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24331)
<!-- Reviewable:end -->
Fixes#15632
Automatic merge from submit-queue
Webhook Token Authenticator
Add a webhook token authenticator plugin to allow a remote service to make authentication decisions.
Automatic merge from submit-queue
Let the dynamic client take runtime.Object instead of v1.ListOptions
so that I can pass whatever version of ListOptions to the List/Watch/DeleteCollection methods.
cc @krousey