Imporved testing turned these up:
1) Headless+Selectorless, on a single-stack cluster, policy=PreferDual
Prior to this commit, the result was a single IPFamiliy (because we
checked that the 2nd allocator was present). This changes that case to
populate both families (we don't care if the allocator exists), which is
the same as RequireDual.
2) ClusterIP, user specifies 2 families but no IPs
Prior to this commit, the policy was inferred to be SingleStack. This
changes that case to correctly default to RequireDual when 2 families
are present but no IPs.
Follow the original TODO from back in c86b84c with the errors added
in d3be1ac. Edit the TODO to make clear that a dynamic response would
still be ideal.
Dramatically reduce the time based on suggestion in PR, and remove name from TODO
as not currently active.
* Mixed protocol support for Services with type=LoadBalancer
KEP: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/20200103-mixed-protocol-lb.md
Add new feature gate to control the support of mixed protocols in Services with type=LoadBalancer
Add new fields to the ServiceStatus
Add Ports to the LoadBalancerIngress, so cloud provider implementations can report the status of the requested load balanc
er ports
Add ServiceCondition to the ServiceStatus so Service controllers can indicate the conditions of the Service
* regenerate conflicting stuff
Old stored services will not have the `clusterIPs` field when read back
without this.
This includes some renaming for clarity and expanded comments, and a new
test for default on read.
Service has had a problem since forever:
- User creates a service type=LoadBalancer
- We silently allocate them a NodePort
- User changes type to ClusterIP
- We fail the operation because they did not clear NodePort
They never asked for or used the NodePort!
Dual-stack introduced some dependent fields that get auto-wiped on
updates. This carries it further.
If you squint, you can see Service as a big, messy discriminated union,
with type as the discriminator. Ignoring fields for non-selected
union-modes seems right.
This introduces the potential for an apply loop. Specifically, we will
accept YAML that we did not previously accept. Apply could see the
field in local YAML and not in the server and repeatedly try to patch it
in. But since that YAML is currently an error, it seems like a very low
risk. Almost nobody actually specifies their own NodePort values.
To mitigate this somewhat, we only auto-wipe on updates. The same YAML
would fail to create. This is a little inconsistent. We could
auto-wipe on create, too, at the risk of more potential impact.
To do this properly, we need to know the old and new values, which means
we can not do it in defaulting or conversion. So we do it in strategy.
This change also adds unit tests and updates e2e tests to rely on and
verify this behavior.
* api: structure change
* api: defaulting, conversion, and validation
* [FIX] validation: auto remove second ip/family when service changes to SingleStack
* [FIX] api: defaulting, conversion, and validation
* api-server: clusterIPs alloc, printers, storage and strategy
* [FIX] clusterIPs default on read
* alloc: auto remove second ip/family when service changes to SingleStack
* api-server: repair loop handling for clusterIPs
* api-server: force kubernetes default service into single stack
* api-server: tie dualstack feature flag with endpoint feature flag
* controller-manager: feature flag, endpoint, and endpointSlice controllers handling multi family service
* [FIX] controller-manager: feature flag, endpoint, and endpointSlicecontrollers handling multi family service
* kube-proxy: feature-flag, utils, proxier, and meta proxier
* [FIX] kubeproxy: call both proxier at the same time
* kubenet: remove forced pod IP sorting
* kubectl: modify describe to include ClusterIPs, IPFamilies, and IPFamilyPolicy
* e2e: fix tests that depends on IPFamily field AND add dual stack tests
* e2e: fix expected error message for ClusterIP immutability
* add integration tests for dualstack
the third phase of dual stack is a very complex change in the API,
basically it introduces Dual Stack services. Main changes are:
- It pluralizes the Service IPFamily field to IPFamilies,
and removes the singular field.
- It introduces a new field IPFamilyPolicyType that can take
3 values to express the "dual-stack(mad)ness" of the cluster:
SingleStack, PreferDualStack and RequireDualStack
- It pluralizes ClusterIP to ClusterIPs.
The goal is to add coverage to the services API operations,
taking into account the 6 different modes a cluster can have:
- single stack: IP4 or IPv6 (as of today)
- dual stack: IPv4 only, IPv6 only, IPv4 - IPv6, IPv6 - IPv4
* [FIX] add integration tests for dualstack
* generated data
* generated files
Co-authored-by: Antonio Ojea <aojea@redhat.com>
Currently, if you have a PDB with 0 disruptions
available and you attempt to evict a non-healthy
pod, the eviction request will always fail. This
is because the eviction API does not currently
take in to account that the pod you are removing
is the unhealthy one.
This commit accounts for trying to evict an
unhealthy pod as long as there are enough healthy
pods to satisfy the PDB's requirements. To
protect against race conditions, a ResourceVersion
constraint is enforced. This will ensure that
the target pod does not go healthy and allow
any race condition to occur which might disrupt
too many pods at once.
This commit also eliminates superfluous class to
DeepCopy for the deleteOptions struct.
When updating ephemeral containers, convert Pod to EphemeralContainers
in storage validation. This resolves a bug where admission webhook
validation fails for ephemeral container updates because the webhook
client cannot perform the conversion.
Also enable the EphemeralContainers feature gate for the admission
control integration test, which would have caught this bug.
the endpoints API handler was using the Canonicalize() method to
reorder the endpoints, however, due to differences with the
endpoint controller RepackSubsets(), the controller was considering
the endpoints different despite they were not, generating unnecessary
updates evert resync period.
The test suite was using a /24 cluster network for the allocator.
The ip allocator, if no ip is specified when creating the cluster,
picks one randomly, that means that we had 1/256 chances of
collision.
The TestServiceRegistryUpdateDryRun was creating a service without
a ClusterIP, the ip allocator assigned one random, and it was
never deleting it. The same test was checking later if one
specific IP was not allocated, not taking into consideration
that the same ip may have allocated to the first Service.
To avoid any randomness, we create the first Service with a specific
IP address.
Errors from staticcheck:
pkg/registry/autoscaling/horizontalpodautoscaler/storage/storage_test.go:207:7: this value of err is never used (SA4006)
pkg/registry/core/namespace/storage/storage.go:256:5: options.OrphanDependents is deprecated: please use the PropagationPolicy, this field will be deprecated in 1.7. Should the dependent objects be orphaned. If true/false, the "orphan" finalizer will be added to/removed from the object's finalizers list. Either this field or PropagationPolicy may be set, but not both. +optional (SA1019)
pkg/registry/core/namespace/storage/storage.go:257:11: options.OrphanDependents is deprecated: please use the PropagationPolicy, this field will be deprecated in 1.7. Should the dependent objects be orphaned. If true/false, the "orphan" finalizer will be added to/removed from the object's finalizers list. Either this field or PropagationPolicy may be set, but not both. +optional (SA1019)
pkg/registry/core/namespace/storage/storage.go:266:5: options.OrphanDependents is deprecated: please use the PropagationPolicy, this field will be deprecated in 1.7. Should the dependent objects be orphaned. If true/false, the "orphan" finalizer will be added to/removed from the object's finalizers list. Either this field or PropagationPolicy may be set, but not both. +optional (SA1019)
pkg/registry/core/namespace/storage/storage.go:267:11: options.OrphanDependents is deprecated: please use the PropagationPolicy, this field will be deprecated in 1.7. Should the dependent objects be orphaned. If true/false, the "orphan" finalizer will be added to/removed from the object's finalizers list. Either this field or PropagationPolicy may be set, but not both. +optional (SA1019)
pkg/registry/core/persistentvolumeclaim/storage/storage_test.go:165:2: this value of err is never used (SA4006)
pkg/registry/core/resourcequota/storage/storage_test.go:202:7: this value of err is never used (SA4006)
pkg/registry/core/service/ipallocator/allocator_test.go:338:2: this value of other is never used (SA4006)
pkg/registry/core/service/portallocator/allocator_test.go:199:2: this value of other is never used (SA4006)
pkg/registry/core/service/storage/rest_test.go:1843:2: this value of location is never used (SA4006)
pkg/registry/core/service/storage/rest_test.go:1849:2: this value of location is never used (SA4006)
pkg/registry/core/service/storage/rest_test.go:3174:20: use net.IP.Equal to compare net.IPs, not bytes.Equal (SA1021)
pkg/registry/core/service/storage/rest_test.go:3178:20: use net.IP.Equal to compare net.IPs, not bytes.Equal (SA1021)
pkg/registry/core/service/storage/rest_test.go:3185:20: use net.IP.Equal to compare net.IPs, not bytes.Equal (SA1021)
pkg/registry/core/service/storage/rest_test.go:3189:20: use net.IP.Equal to compare net.IPs, not bytes.Equal (SA1021)
When using the eviction API, if a pod already has
a non-zero DeletionTimestamp, we don't need to check
PDBs as it has already been marked for deletion.
Several of the functions in pkg/registry/core/service/ipallocator were
moved to k8s.io/utils/net, but then the original code was never
updated to used to the vendored versions.
(utilnet's version of RangeSize does not have the IPv6 special case
that the original code did, so we need to move that to
NewAllocatorCIDRRange now.)
If the dual-stack flag is enabled and the cluster is single stack IPv6,
the allocator logic for service clusterIP does not properly handle rejecting
a request for an IPv4 family. Return a 422 Invalid on the ipFamily field
when the dual stack flag is on (as it would when it hits beta) and the
cluster is configured for single-stack IPv6.
The family is now defaulted or cleared in BeforeCreate/BeforeUpdate,
and is either inherited from the previous object (if nil or unchanged),
or set to the default strategy's family as necessary. The existing
family defaulting when cluster ip is provided remains in the api
section. We add additonal family defaulting at the time we allocate
the IP to ensure that IPFamily is a consequence of the ClusterIP
and prevent accidental reversion. This defaulting also ensures that
old clients that submit a nil IPFamily for non ClusterIP services
receive a default.
To properly handle validation, make the strategy and the validation code
path condition on which configuration options are passed to service
storage. Move validation and preparation logic inside the strategy where
it belongs. Service validation is now dependent on the configuration of
the server, and as such ValidateConditionService needs to know what the
allowed families are.
Currently, if you have a PDB set, it is possible for
a pod stuck in pending state to be prevented from
deletion even though there are in fact enough healthy
replicas.
This commit allows pods in Pending state to be removed.
This commit also adds associated unit tests.
related-bug: #80389
The service allocator is used to allocate ip addresses for the
Service IP allocator and NodePorts for the Service NodePort
allocator. It uses a bitmap backed by etcd to store the allocation
and tries to allocate the resources directly from the local memory
instead from etcd, that can cause issues in environment with
high concurrency.
It may happen, in deployments with multiple apiservers, that the
resource allocation information is out of sync, this is more
sensible with NodePorts, per example:
1. apiserver A create a service with NodePort X
2. apiserver B deletes the service
3. apiserver A creates the service again
If the allocation data of apiserver A wasn't refreshed with the
deletion of apiserver B, apiserver A fails the allocation because
the data is out of sync. The Repair loops solve the problem later,
but there are some use cases that require to improve the concurrency
in the allocation logic.
We can try to not do the Allocation and Release operations locally,
and try instead to check if the local data is up to date with etcd,
and operate over the most recent version of the data.
This implementation allows Pod to request multiple hugepage resources
of different size and mount hugepage volumes using storage medium
HugePage-<size>, e.g.
spec:
containers:
resources:
requests:
hugepages-2Mi: 2Mi
hugepages-1Gi: 2Gi
volumeMounts:
- mountPath: /hugepages-2Mi
name: hugepage-2mi
- mountPath: /hugepages-1Gi
name: hugepage-1gi
...
volumes:
- name: hugepage-2mi
emptyDir:
medium: HugePages-2Mi
- name: hugepage-1gi
emptyDir:
medium: HugePages-1Gi
NOTE: This is an alpha feature.
Feature gate HugePageStorageMediumSize must be enabled for it to work.
This combines container names into a single list because separating them
into a long, variable length string isn't particularly useful in the
context of an streaming error message.
Remove the validation for pre-allocated hugepages on node level.
Validation is currently the only thing making it impossible to use
pre-allocated huge pages in more than one size.
We have now quite a few reports from real users that this feature is
welcome.
Such declarations will make using the package exported identifiers impossible after the declaration or create confusion when reading the code.
Signed-off-by: clarklee92 <clarklee1992@hotmail.com>
All objects with graceful deletion allow multiple DELETE calls in the pending
state. Namespace is the one outlier, and the error here predates graceful
deletion and finalizers. We should make this behavior consistent with other
calls and merely indicate success and return the state of the object, the
same as if there were pending metadata finalizers.
Clients that previously checked for a conflict error during delete to know
that the server is already deleting will now no longer receive an error
(as if the object were rapidly deleted). There is a small chance that some
clients have error checking for this state, but a much larger chance that
clients that want to trigger a delete of the namespace do not handle this
error case.
Discovered in an e2e test that used the framework namespace and triggered
deletion of that ns itself, and then the AfterEach step in e2e failed
because the namespace was already pending deletion.
conflict.
Adding unit test verify that deleteValidation is retried.
adding e2e test verifying the webhook can intercept configmap and custom
resource deletion, and the existing object is sent via the
admissionreview.OldObject.
update the admission integration test to verify that the existing object
is passed to the deletion admission webhook as oldObject, in case of an
immediate deletion and in case of an update-on-delete.
* fix duplicated imports of api/core/v1
* fix duplicated imports of client-go/kubernetes
* fix duplicated imports of rest code
* change import name to more reasonable
This makes the behavior being consistent with generic store, The
orphan finalizer should be removed if the delete options does not
specify propagarionPolicy as orphan.
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
The service account authenticator isn't the only authenticator that
should respect API audience. The authentication config structure should
reflect that.
This change adds comments to exported things and renames the tcp,
http, and exec probe interfaces to just be Prober within their
namespace.
Issue #68026
Automatic merge from submit-queue (batch tested with PRs 67323, 66717, 67038). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Prevent side effects on dryrun in service registry
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
added serviceAccountName to field selectors
What this PR does / why we need it:
Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes#66114
Special notes for your reviewer:
```release-note
NONE
```
Make scope.Kind of pod/attach, pod/exec, pod/portforward, node/proxy,
service/proxy to their respective subresource Kind, instead of the
parent Kind. The kind is used by the admission webhook controller to
determine how to convert the object.
Events have long shown the most data of the core objects in their output, but that data is of varying use
to a user. Following the principle that events are intended for the system to communicate information back
to the user, and that Message is the primary human readable field, this commit alters the default columns
to ensure event is shown with the most width.
1. Events are no longer sorted in the printer (this was a bug and was broken with paging and server side
rendering)
2. Only the last seen, type, reason, kind, and message fields are shown by default, which makes the
message prominent
3. Source, subobject, count, and first seen are only shown under `-o wide`
4. The duration fields were changed to be the more precise output introduced for job duration (2-3 sig figs)
Automatic merge from submit-queue (batch tested with PRs 65299, 65524, 65154, 65329, 65536). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Allow override of AllowCreateOnUpdate with new argument to Update
**What this PR does / why we need it**:
Changes the Update function signature to include a new bool which tells storage to override what the UpdateStrategy returns for AllowCreateOnUpdate. This is not exposed to the user, the handler is the one that sets this override value. Eventually the patch handler will set this to true, in order to provide more consistent apply behavior, without changing the existing PUT behavior.
Redo of https://github.com/kubernetes/kubernetes/pull/65075 but on master to reduce number of conflicts when we merge feature-serverside-apply with master.
/sig api-machinery
/cc @apelisse @lavalamp
**Release note**:
```release-note
NONE
```
No release note because this is just an internal change
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add limit to the TokenRequest expiration time
**What this PR does / why we need it**:
A new API TokenRequest has been implemented.It improves current serviceaccount model from many ways.
This patch adds limit to TokenRequest expiration time.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63575
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 64613, 64596, 64573, 64154, 64639). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
printers: fix json types – int64 is only allowed integer
We have the invariant in apimachinery that all integers in JSON are int64. We panic on other types on deepcopy and possibly at other occasions.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix nodeport repair for ESIPP services
**What this PR does / why we need it**:
The nodeport allocation repair controller does not scrape the `Service.Spec.healthCheckNodePort` value and would remove the allocation from memory and etcd after 10 minutes. This opens the door for other services to use the same nodeport and cause collisions.
**Which issue(s) this PR fixes**:
Fixes#54885
**Release note**:
```release-note
Fix issue of colliding nodePorts when the cluster has services with externalTrafficPolicy=Local
```
Updates dynamic Kubelet config to use a structured status, rather than a
node condition. This makes the status machine-readable, and thus more
useful for config orchestration.
Fixes: #56896
Automatic merge from submit-queue (batch tested with PRs 63421, 63432, 63333). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
update tests to be specific about the versions they are testing
When setting up tests, you want to rely on your own scheme. This eliminates coupling to floating versions which gives unnecessary flexibility in most cases and prevents testing all the versions you need.
@liggitt scrubs unnecessary deps.
```release-note
NONE
```