Service has had a problem since forever:
- User creates a service type=LoadBalancer
- We silently allocate them a NodePort
- User changes type to ClusterIP
- We fail the operation because they did not clear NodePort
They never asked for or used the NodePort!
Dual-stack introduced some dependent fields that get auto-wiped on
updates. This carries it further.
If you squint, you can see Service as a big, messy discriminated union,
with type as the discriminator. Ignoring fields for non-selected
union-modes seems right.
This introduces the potential for an apply loop. Specifically, we will
accept YAML that we did not previously accept. Apply could see the
field in local YAML and not in the server and repeatedly try to patch it
in. But since that YAML is currently an error, it seems like a very low
risk. Almost nobody actually specifies their own NodePort values.
To mitigate this somewhat, we only auto-wipe on updates. The same YAML
would fail to create. This is a little inconsistent. We could
auto-wipe on create, too, at the risk of more potential impact.
To do this properly, we need to know the old and new values, which means
we can not do it in defaulting or conversion. So we do it in strategy.
This change also adds unit tests and updates e2e tests to rely on and
verify this behavior.
Generally try to waive away folks who see a particular event stream
and feel tempted to extrapolate and build tooling that expects the
same underlying resource transition chain to continue to produce a
similar event stream as the underlying components evolve and are
updated. New controllers should not be constrained to be
backwards-compatible with previous versions with regard to Event
emission. This is distinct from the Event type itself, which has the
usual Kubernetes-API compatibility commitments for versioned types.
The EventTTL default has been 1h since 7e258b85bd (Reduce TTL for
events in etcd from 48hrs to 1hr, 2015-03-11, #5315), and remains so
today:
$ git --no-pager log -1 --format='%h %s' origin/master
8e5c02255c Merge pull request #90942 from ii/ii-create-pod%2Bpodstatus-resource-lifecycle-test
$ git --no-pager grep EventTTL: 8e5c02255c cmd/kube-apiserver/app/options/options.go
8e5c02255cc:cmd/kube-apiserver/app/options/options.go: EventTTL: 1 * time.Hour,
In this space [1,2]:
To avoid filling up master's disk, a retention policy is enforced:
events are removed one hour after the last occurrence. To provide
longer history and aggregation capabilities, a third party solution
should be installed to capture events.
...
Note: It is not guaranteed that all events happening in a cluster
will be exported to Stackdriver. One possible scenario when events
will not be exported is when event exporter is not running
(e.g. during restart or upgrade). In most cases it's fine to use
events for purposes like setting up metrics and alerts, but you
should be aware of the potential inaccuracy.
...
To prevent disturbing your workloads, event exporter does not have
resources set and is in the best effort QOS class, which means that
it will be the first to be killed in the case of resource
starvation.
Although that's talking more about export from etcd -> external
storage, and not about cluster components submitting events to etcd.
[1]: https://kubernetes.io/docs/tasks/debug-application-cluster/events-stackdriver/
[2]: https://github.com/kubernetes/website/pull/4155/files#diff-d8eb69c5436aa38b396d4f3ed75e4792R10
As part of externalizing this function to the k8s.io/component-helpers repo,
this commit simplifies the function signature and makes its 2 helpers private
(nodeSelectorRequirementsAsSelector and nodeSelectorRequirementsAsFieldSelector).
* api: structure change
* api: defaulting, conversion, and validation
* [FIX] validation: auto remove second ip/family when service changes to SingleStack
* [FIX] api: defaulting, conversion, and validation
* api-server: clusterIPs alloc, printers, storage and strategy
* [FIX] clusterIPs default on read
* alloc: auto remove second ip/family when service changes to SingleStack
* api-server: repair loop handling for clusterIPs
* api-server: force kubernetes default service into single stack
* api-server: tie dualstack feature flag with endpoint feature flag
* controller-manager: feature flag, endpoint, and endpointSlice controllers handling multi family service
* [FIX] controller-manager: feature flag, endpoint, and endpointSlicecontrollers handling multi family service
* kube-proxy: feature-flag, utils, proxier, and meta proxier
* [FIX] kubeproxy: call both proxier at the same time
* kubenet: remove forced pod IP sorting
* kubectl: modify describe to include ClusterIPs, IPFamilies, and IPFamilyPolicy
* e2e: fix tests that depends on IPFamily field AND add dual stack tests
* e2e: fix expected error message for ClusterIP immutability
* add integration tests for dualstack
the third phase of dual stack is a very complex change in the API,
basically it introduces Dual Stack services. Main changes are:
- It pluralizes the Service IPFamily field to IPFamilies,
and removes the singular field.
- It introduces a new field IPFamilyPolicyType that can take
3 values to express the "dual-stack(mad)ness" of the cluster:
SingleStack, PreferDualStack and RequireDualStack
- It pluralizes ClusterIP to ClusterIPs.
The goal is to add coverage to the services API operations,
taking into account the 6 different modes a cluster can have:
- single stack: IP4 or IPv6 (as of today)
- dual stack: IPv4 only, IPv6 only, IPv4 - IPv6, IPv6 - IPv4
* [FIX] add integration tests for dualstack
* generated data
* generated files
Co-authored-by: Antonio Ojea <aojea@redhat.com>
fixed syntax, wrote a test
fixed a test
.
1
Update staging/src/k8s.io/apimachinery/pkg/util/intstr/intstr_test.go
Co-Authored-By: Joel Speed <Joel.speed@hotmail.co.uk>
added test
.
fix
fix test
fixed a test
gofmt
lint
fix
function name
validation fix
.
godocs added
.
Fix ingress validation so that it validates the rules of an ingress that
specifies a wildcard host. Commit 60f4fbf4f2
added an inopportune continue statement that caused this validation to be
skipped. For backwards compatibility, this change restores validation for
v1 of the api but still skips it on v1beta1.
* pkg/apis/networking/validation/validation.go (IngressValidationOptions):
Add AllowInvalidWildcardHostRule field to indicate that validation of rules
should be skipped for ingresses that specify wildcard hosts.
(ValidateIngressCreate): Set AllowInvalidWildcardHostRule to true if the
request is using the v1beta1 API version.
(ValidateIngressUpdate): Set AllowInvalidWildcardHostRule to true if the
request or old ingress is using the v1beta1 API version.
(validateIngressRules): Don't skip validation of the ingress rules unless
the ingress has a wildcard host and AllowInvalidWildcardHostRule is true.
(allowInvalidWildcardHostRule): New helper for ValidateIngressCreate and
ValidateIngressUpdate.
* pkg/apis/networking/validation/validation_test.go
(TestValidateIngressCreate, TestValidateIngressUpdate): Add test cases to
ensure that validation is performed on v1 objects and skipped on v1beta
objects for backwards compatibility.
(TestValidateIngressTLS): Specify PathType so that the test passes.
Co-authored-by: jordan@liggitt.net
Add a test that verifies that an ingress with an empty TLS value or with a
TLS value that specifies an empty list of hosts passes validation.
* pkg/apis/networking/validation/validation_test.go
(TestValidateEmptyIngressTLS): New test.
And give ownership to pkg/scheduler/framework/plugins/volumebinding
Signed-off-by: Aldo Culquicondor <acondor@google.com>
Change-Id: I4bd89b1745a2be0e458601056ab905bdd6692195
The promotion to beta missed some code locations. The owner also
changed since the feature was initially designed and implemented.
The "is handled by an external CSI driver" to "by certain external CSI
drivers" change is supposed to avoid the misconception that this
volume type will work with arbitrary CSI drivers.
These changes add a new field, called setHostnameAsFQDN, to the PodSpec. This
field is a bool that will be used to indicate whether we would like
FQDN be set as hostname or not.
This is PART1 of the changes to enable KEP #1797 and addresses #91036
This was added to staging/src/k8s.io/api/storage/v1beta1/types.go but
unintentionally left out for the internal type. Both docs should
better match because it is uncertain which one will be looked at by
developers.
The original "iff" for "if and only if" is unnecessary because the
comment about "otherwise false" avoids ambiguity.
If the dual-stack flag is enabled and the cluster is single stack IPv6,
the allocator logic for service clusterIP does not properly handle rejecting
a request for an IPv4 family. Return a 422 Invalid on the ipFamily field
when the dual stack flag is on (as it would when it hits beta) and the
cluster is configured for single-stack IPv6.
The family is now defaulted or cleared in BeforeCreate/BeforeUpdate,
and is either inherited from the previous object (if nil or unchanged),
or set to the default strategy's family as necessary. The existing
family defaulting when cluster ip is provided remains in the api
section. We add additonal family defaulting at the time we allocate
the IP to ensure that IPFamily is a consequence of the ClusterIP
and prevent accidental reversion. This defaulting also ensures that
old clients that submit a nil IPFamily for non ClusterIP services
receive a default.
To properly handle validation, make the strategy and the validation code
path condition on which configuration options are passed to service
storage. Move validation and preparation logic inside the strategy where
it belongs. Service validation is now dependent on the configuration of
the server, and as such ValidateConditionService needs to know what the
allowed families are.
ingress: use new serviceBackend split
ingress: remove all v1beta1 restrictions on creation
This change removes creation and update restrictions enforced by
k8s 1.18 for not allowing resource backends.
Paths are no longer
required to be valid regex and a PathType is now user-specified
and no longer defaulted.
Also remove all TODOs in staging/net/v1 types
Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>
NetworkPolicyPeer in types has an outdated comment from the
times when it only supported ingress rules. Update the comment
to reflect the current usage of the field.
As this is a a local object reference from a global object, referencing a ConfigMap would not be possible. Controller specific custom resources are a much better fit here, allowing for better validation.
PodOverhead is now a beta feature and set to true by default. No need to
override to true during testing.
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
This implementation allows Pod to request multiple hugepage resources
of different size and mount hugepage volumes using storage medium
HugePage-<size>, e.g.
spec:
containers:
resources:
requests:
hugepages-2Mi: 2Mi
hugepages-1Gi: 2Gi
volumeMounts:
- mountPath: /hugepages-2Mi
name: hugepage-2mi
- mountPath: /hugepages-1Gi
name: hugepage-1gi
...
volumes:
- name: hugepage-2mi
emptyDir:
medium: HugePages-2Mi
- name: hugepage-1gi
emptyDir:
medium: HugePages-1Gi
NOTE: This is an alpha feature.
Feature gate HugePageStorageMediumSize must be enabled for it to work.
Unit test for updating container hugepage limit
Add warning message about ignoring case.
Update error handling about hugepage size requirements
Signed-off-by: sewon.oh <sewon.oh@samsung.com>
Remove the validation for pre-allocated hugepages on node level.
Validation is currently the only thing making it impossible to use
pre-allocated huge pages in more than one size.
We have now quite a few reports from real users that this feature is
welcome.
The validation had an excess nested loop and also caused wrong
error feedback that all policyTypes input will be reported as
unsupported if any of them is wrong.
This updates EndpointSlice port validation to mirror the validation
already in use for Service and Endpoint ports. This is required to
ensure all valid Service ports can be mapped directly to EndpointSlice
ports.
add host file write for podIPs
update tests
remove import alias
update type check
update type check
remove import alias
update open api spec
add tests
update test
add tests
address review comments
update imports
remove todo and import alias
* Fix lint errors related to receiver name
Ref #68026
* Fix lint errors related to comments
Ref #68026
* Fix package name in comments
Ref #68026
* Rename Cpu to CPU
Ref #68026
* Fix lint errors related to naming convention
Ref #68026
* Remove deprecated field
DoNotUse_ExternalID has been deprecated and is not in use anymore.
It has been removed to fix lint errors related to underscores in field
names.
Ref #68026, #61966
* Include pkg/apis/core in golint check
Ref #68026
* Rename var to fix lint errors
Ref #68026
* Revert "Remove deprecated field"
This reverts commit 75e9bfc168077fcb9346e334b59d60a2c997735b.
Ref #82919
* Remove math from godoc
Ref #82919, #68026
* Remove underscore from var name
Ref #68026
* Rename var in staging core api type
Ref #68026
Errors from staticcheck:
cmd/kube-scheduler/app/server.go:297:27: prometheus.Handler is deprecated: Please note the issues described in the doc comment of InstrumentHandler. You might want to consider using promhttp.Handler instead. (SA1019)
pkg/apis/scheduling/v1alpha1/defaults.go:27:6: func addDefaultingFuncs is unused (U1000)
pkg/apis/scheduling/v1beta1/defaults.go:27:6: func addDefaultingFuncs is unused (U1000)
test/e2e/scheduling/predicates.go:757:6: func verifyReplicasResult is unused (U1000)
test/e2e/scheduling/predicates.go:765:6: func getPodsByLabels is unused (U1000)
test/e2e/scheduling/predicates.go:772:6: func runAndKeepPodWithLabelAndGetNodeName is unused (U1000)
test/e2e/scheduling/limit_range.go:172:3: this value of pod is never used (SA4006)
test/e2e/scheduling/limit_range.go:177:3: this value of pod is never used (SA4006)
test/e2e/scheduling/limit_range.go:196:3: this value of pod is never used (SA4006)
test/e2e/scheduling/limit_range.go:201:3: this value of pod is never used (SA4006)
test/e2e/scheduling/limit_range.go:240:3: this value of pod is never used (SA4006)
test/e2e/scheduling/taints.go:428:13: this value of err is never used (SA4006)
test/e2e/scheduling/ubernetes_lite.go:219:2: this value of pods is never used (SA4006)
test/integration/scheduler/extender_test.go:78:4: this value of resp is never used (SA4006)
test/integration/volumescheduling/volume_binding_test.go:529:15: this result of append is never used, except maybe in other appends (SA4010)
test/integration/volumescheduling/volume_binding_test.go:538:15: this result of append is never used, except maybe in other appends (SA4010)
This should fix a bug that could break masters when the EndpointSlice
feature gate was enabled. This was all tied to how the apiserver creates
and manages it's own services and endpoints (or in this case endpoint
slices). Consumers of endpoint slices also need to know about the
corresponding service. Previously we were trying to set an owner
reference here for this purpose, but that came with potential downsides
and increased complexity. This commit changes behavior of the apiserver
endpointslice integration to set the service name label instead of owner
references, and simplifies consumer logic to reference that (both are
set by the EndpointSlice controller).
Additionally, this should fix a bug with the EndpointSlice GenerateName
value that had previously been set with a "." as a suffix.
Currently, the character limit for the usernames set in the RunAsUserName is 20,
which is too low, considering that "ContainerAdministrator" is a valid username and
it is longer than 20 characters. A user should be able to run containers as
Administrator, if needed.
According to [1], Logon names can be up to 104 characters. The previous limit
only applies to local user accounts for the local system.
[1] https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-2000-server/bb726984(v=technet.10)
When adding CSIDriver.Spec.VolumeLifecycleModes, the defaulting in
pkg/apis/storage/fuzzer/fuzzer.go did not quite match the one from
pkg/apis/storage/v1beta1/defaults.go, causing a test failure when the
corresponding feature gate is enabled.
This ensures that users get a good error message early on when trying
to do something that isn't okay:
$ kubectl create -f csi-hostpath-driverinfo.yaml
The CSIDriver "hostpath.csi.k8s.io" is invalid: spec.volumeLifecycleModes: Unsupported value: "foobar": supported values: "persistent", "ephemeral"
Using a "normal" CSI driver for an inline ephemeral volume may have
unexpected and potentially harmful effects when the driver gets a
NodePublishVolume call that it isn't expecting. To prevent that mistake,
driver deployments for a driver that supports such volumes must:
- deploy a CSIDriver object for the driver
- list "ephemeral" as one of the supported modes
The default is "persistent", so existing deployments continue to work
and are automatically protected against incorrect usage.
This commit contains the API change. Generated code and manual code
which uses the new API follow.
Adds the field RunAsUserName in the WindowsSecurityContextOptions type,
which is used in PodSecurityContext and SecurityContext.
This field needs to allow for a valid set of usernames allowed for
Windows containers. It must have the format "U
This commit also validates the runAsUserName field, making sure that it valid,
having the format DOMAIN\USER (case insensitive), where DOMAIN\ is optional and
has to be a valid NetBios or DNS domain name.
For more information about the restrictions on the DOMAIN and USER parts, look here: [1] [2]
Adds the WindowsRunAsUserName alpha feature gate. By default, it is disabled.
If the feature gate is not enabled, the WindowsOptions.RunAsUserName field
will be dropped from both the PodSecurityContext and container
SecurityContext.
Co-Authored-By: Claudiu Belu <cbelu@cloudbasesolutions.com>
[1] https://support.microsoft.com/en-us/help/909264/naming-conventions-in-active-directory-for-computers-domains-sites-and
[2] https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.localaccounts/new-localuser?view=powershell-5.1
Add support for scaling to zero pods
minReplicas is allowed to be zero
condition is set once
Based on https://github.com/kubernetes/kubernetes/pull/61423
set original valid condition
add scale to/from zero and invalid metric tests
Scaling up from zero pods ignores tolerance
validate metrics when minReplicas is 0
Document HPA behaviour when minReplicas is 0
Documented minReplicas field in autoscaling APIs
Update the unit tests to include checks for incorrect APIGroup type in
PVC DataSource and change the name of the feature gate to be more clear:
s/VolumeDataSource/VolumePVCDataSource/
* fix duplicated imports of api/core/v1
* fix duplicated imports of client-go/kubernetes
* fix duplicated imports of rest code
* change import name to more reasonable
This PR is the first step to transition CSINodeInfo and CSIDriver
CRD's to in-tree APIs. It adds them to the existing API group
“storage.k8s.io” as core storage APIs.
The trailing period tells the resolver to stop immediately instead
of trying recursively. With that said, trailing period should be
acceptable in searches.
Moved all flag code from `staging/src/k8s.io/apiserver/pkg/util/[flag|globalflag]` to `component-base/cli/[flag|globalflag]` except for the term function because of unwanted dependencies.
Update the NetWorkPolicy `policyTypes` definition in the spec documentation so its
clear there are only three options: "Ingress", "Egress", and
"Ingress,Egress".
- s/objet/object/
- A relying party (validating a token) may not have access to the
resource named in the `BoundObjectRef`; only the API server can be asserted to
have access.
Note this in the field's documentation.
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
This patch introduces glusterfsPersistentVolumeSource addition
to glusterfsVolumeSource. All fields remains same as glusterfsVolumeSource
with an addition of a new field
called `EndpointsNamespace` to define namespace of endpoint in the
spec.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
The comment will ensure that only imports with
this path will compile.This is needed to make sure
that the vanity url k8s.io is used as the import path
instead of github.com.
See https://golang.org/doc/go1.4#canonicalimports
for more details.
Adds "MayRunAs" value among other group strategies. This strategy
allows to define a certain range of GIDs for FSGroupStrategy and
SupplementalGroupStrategy in a PSP.
This new strategy works similarly to the "MustRunAs" one, except that
when no GID is specified in a pod/container security context then no
GID is generated for the respective containers.
Resolves#56173
One scenario where nodeName can change for the same ip address is if
the endpoints are in hostNetwork mode and nodes are being added/deleted.
With the current validation check, if endpoints controller misses a pod
delete event, future endpoint updates will never succeed.
removed unused helper functions
Adding blank line between comment tag and package name in doc.go. So
that the comment tags such as '+k8s:deepcopy-gen=package' do not show up
in GoDoc.
Automatic merge from submit-queue (batch tested with PRs 68171, 67945, 68233). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Move the CloudControllerManagerConfiguration to an API group in `cmd/`
**What this PR does / why we need it**:
This PR is the last piece of https://github.com/kubernetes/kubernetes/issues/67233.
It moves the `CloudControllerManagerConfiguration` to its own `cloudcontrollermanager.config.k8s.io` config API group, but unlike the other components this API group is "private" (only available in `k8s.io/kubernetes`, which limits consumer base), as it's located entirely in `cmd/` vs a staging repo.
This decision was made for now as we're not sure what the story for the ccm loading ComponentConfig files is, and probably a "real" file-loading ccm will never exist in core, only helper libraries. Eventually the ccm will only be a library in any case, and implementors will/can use the base types the ccm library API group provides. It's probably good to note that there is no practical implication of this change as the ccm **cannot** read ComponentConfig files. Hencec the code move isn't user-facing.
With this change, we're able to remove `pkg/apis/componentconfig`, as this was the last consumer. That is hence done in this PR as well (so the move is easily visible in git, vs first one "big add" then a "big remove"). The only piece of code that was used was the flag helper structs, so I moved them to `pkg/util/flag` that I think makes sense for now.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
ref: kubernetes/community#2354
**Special notes for your reviewer**:
This PR builds on top of (first two commits, marked as `Co-authored by: @stewart-yu`) https://github.com/kubernetes/kubernetes/pull/67689
**Release note**:
```release-note
NONE
```
/assign @liggitt @sttts @thockin @stewart-yu
1. If TTLAfterFinished feature is enabled, the value should be non-negative.
2. If TTLAfterFinished feature is disabled, the field value should not
be kept.
Automatic merge from submit-queue (batch tested with PRs 63011, 68089, 67944, 68132). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Support both directory and block device for local volume plugin FileSystem VolumeMode
Support both directory and block device for local volume plugin FileSystem VolumeMode
xref: [local storage dynamic provisioning design #1914](https://github.com/kubernetes/community/pull/1914)
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Support both directory and block device for local volume plugin FileSystem VolumeMode
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Replace scale down window
**What this PR does / why we need it**:
Replace scale down forbidden window with scale down stabilization window.
This allows scale down based on more than one sample, to avoid rapidly changing size up and down for controllers with fluctuating load.
A bit more in https://docs.google.com/document/d/1IdG3sqgCEaRV3urPLA29IDudCufD89RYCohfBPNeWIM
This PR is copy of #67771 with resolved comments.
**Release note**:
```release-note
Replace scale down forbidden window with scale down stabilization window. Rather than waiting a fixed period of time between scale downs HPA now scales down to the highest recommendation it during the scale down stabilization window.
```
Automatic merge from submit-queue (batch tested with PRs 67397, 68019). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Fix conversion for autoscaling/v1 ObjectMetricSource and add fuzzer
**What this PR does / why we need it**:
Selectors in ObjectMetricSource's weren't being persisted through roundtrip conversions, and this wasn't caught because we had no fuzzer testing MetricIdentifier selectors
**Which issue(s) this PR fixes**:
none
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Increase Horizontal Pod Autoscaler update frequency to every 15s
**What this PR does / why we need it**:
PR increases Horizontal Pod Autoscaler default update interval (30s -> 15s). It will improve HPA reaction time for metric changes.
**Release note**:
```release-note
Increase Horizontal Pod Autoscaler default update interval (30s -> 15s). It will improve HPA reaction time for metric changes.
```
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Add a ProcMount option to the SecurityContext & AllowedProcMountTypes to PodSecurityPolicy
So there is a bit of a chicken and egg problem here in that the CRI runtimes will need to implement this for there to be any sort of e2e testing.
**What this PR does / why we need it**: This PR implements design proposal https://github.com/kubernetes/community/pull/1934. This adds a ProcMount option to the SecurityContext and AllowedProcMountTypes to PodSecurityPolicy
Relies on https://github.com/google/cadvisor/pull/1967
**Release note**:
```release-note
ProcMount added to SecurityContext and AllowedProcMounts added to PodSecurityPolicy to allow paths in the container's /proc to not be masked.
```
cc @Random-Liu @mrunalp
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Change CPU sample sanitization in HPA.
**What this PR does / why we need it**:
Change CPU sample sanitization in HPA.
Ignore samples if:
- Pod is beeing initalized - 5 minutes from start defined by flag
- pod is unready
- pod is ready but full window of metric hasn't been colected since
transition
- Pod is initialized - 5 minutes from start defined by flag:
- Pod has never been ready after initial readiness period.
**Release notes:**
```release-note
Improve CPU sample sanitization in HPA by taking metric's freshness into account.
```
Ignore samples if:
- Pod is beeing initalized - 5 minutes from start defined by flag
- pod is unready
- pod is ready but full window of metric hasn't been colected since
transition
- Pod is initialized - 5 minutes from start defined by flag:
- Pod has never been ready after initial readiness period.
Automatic merge from submit-queue (batch tested with PRs 67745, 67432, 67569, 67825, 67943). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Move volume dynamic provisioning scheduling to beta
**What this PR does / why we need it**:
* Combine feature gate VolumeScheduling and DynamicProvisioningScheduling into one
* Add allowedTopologies description in kubectl
**Special notes for your reviewer**:
Wait until related e2e and downside plugins are ready.
/hold
**Release note**:
```release-note
Move volume dynamic provisioning scheduling to beta (ACTION REQUIRED: The DynamicProvisioningScheduling alpha feature gate has been removed. The VolumeScheduling beta feature gate is still required for this feature)
```
Selectors in ObjectMetricSource's weren't being persisted through roundtrip conversions, and this wasn't caught because we had no fuzzer testing MetricIdentifier selectors
Automatic merge from submit-queue (batch tested with PRs 67694, 64973, 67902). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
SCTP support implementation for Kubernetes
**What this PR does / why we need it**: This PR adds SCTP support to Kubernetes, including Service, Endpoint, and NetworkPolicy.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#44485
**Special notes for your reviewer**:
**Release note**:
```release-note
SCTP is now supported as additional protocol (alpha) alongside TCP and UDP in Pod, Service, Endpoint, and NetworkPolicy.
```
Automatic merge from submit-queue (batch tested with PRs 64597, 67854, 67734, 67917, 67688). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Allow ImageReview backend to add audit annotations.
**What this PR does / why we need it**:
This can be used to create annotations that will allow auditing of the created
pods.
The change also introduces "fail open" audit annotations in addition to the
previously existing pod annotation for fail open. The pod annotations for
fail open will be deprecated soon.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Allow ImageReview backend to return annotations to be added to the created pod.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Added set and map structural validation for AllowedTopologies
**What this PR does / why we need it**: Adding structural validation to AllowedTopologies field in StorageClass.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66184
**Release note**:
```release-note
AllowedTopologies field inside StorageClass is now validated against set and map semantics. Specifically, there cannot be duplicate TopologySelectorTerms, MatchLabelExpressions keys, and TopologySelectorLabelRequirement Values.
```
The requested Service Protocol is checked against the supported protocols of GCE Internal LB. The supported protocols are TCP and UDP.
SCTP is not supported by OpenStack LBaaS. If SCTP is requested in a Service with type=LoadBalancer, the request is rejected. Comment style is also corrected.
SCTP is not allowed for LoadBalancer Service and for HostPort. Kube-proxy can be configured not to start listening on the host port for SCTP: see the new SCTPUserSpaceNode parameter
changed the vendor github.com/nokia/sctp to github.com/ishidawataru/sctp. I.e. from now on we use the upstream version.
netexec.go compilation fixed. Various test cases fixed
SCTP related conformance tests removed. Netexec's pod definition and Dockerfile are updated to expose the new SCTP port(8082)
SCTP related e2e test cases are removed as the e2e test systems do not support SCTP
sctp related firewall config is removed from cluster/gce/util.sh. Variable name sctp_addr is corrected to sctpAddr in pkg/proxy/ipvs/proxier.go
cluster/gce/util.sh is copied from master
This extends the Kubelet to create and periodically update leases in a
new kube-node-lease namespace. Based on [KEP-0009](https://github.com/kubernetes/community/blob/master/keps/sig-node/0009-node-heartbeat.md),
these leases can be used as a node health signal, and will allow us to
reduce the load caused by over-frequent node status reporting.
- add NodeLease feature gate
- add kube-node-lease system namespace for node leases
- add Kubelet option for lease duration
- add Kubelet-internal lease controller to create and update lease
- add e2e test for NodeLease feature
- modify node authorizer and node restriction admission controller
to allow Kubelets access to corresponding leases
Automatic merge from submit-queue (batch tested with PRs 66916, 67252, 67794, 67619, 67328). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix HPA sample sanitization
**What this PR does / why we need it**: @mwielgus pointed out a case when HPA fails as a result of my changes to HPA algorithm:
- Have pods that use a lot of CPU during initilization, become ready right after they initialize,
- Trigger a scale up,
- When new pods become ready will will count their usage (even though it's not related to any work that needs doing),
- This triggers another scale up, even though existing pods can handle work, no problem.
The fix is:
- Use all samples for non-cpu metrics.
- Only use CPU samples if:
- Pod is ready and was started more than 2 minutes ago, or
- Pod is unready and last readiness change happened more than 10s after it was started.
Reasoning behind this in: https://docs.google.com/document/d/1UdtYedhmCxjaJIQi6hwJMY0eHQQKxlVD8lSHZC1BPOA/edit
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
Replace scale up forbidden window with disregarding CPU samples collected when pod was initializing.
```
Duration of initialization taint on CPU and window of initial readiness
setting controlled by flags.
Adding API violation exceptions following example of e50340ee23
This can be used to create annotations that will allow auditing of the created
pods.
The change also introduces "fail open" audit annotations in addition to the
previously existing pod annotation for fail open. The pod annotations for
fail open will be deprecated soon.
Automatic merge from submit-queue (batch tested with PRs 67399, 67471, 66815, 67301, 55840). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
use NameIsDNSSubdomain validation from staging
**What this PR does / why we need it**:
> // TODO update all references to these functions to point to the apimachineryvalidation ones
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
xref #67219
**Special notes for your reviewer**:
/cc seans3
@kubernetes/sig-apps-pr-reviews
@kubernetes/sig-api-machinery-pr-reviews
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
added serviceAccountName to field selectors
What this PR does / why we need it:
Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes#66114
Special notes for your reviewer:
```release-note
NONE
```
This replaces the following path-label munger config, except
we're using kind/api-change for everything instead of two
different kind/ labels
```
^pkg/api/([^/]+/)?types.go$ kind/api-change
^pkg/api/([^/]+/)?register.go$ kind/new-api
^pkg/apis/[^/]+/([^/]+/)?types.go$ kind/api-change
^pkg/apis/[^/]+/([^/]+/)?register.go$ kind/new-api
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Promote ShareProcessNamespace to beta
**What this PR does / why we need it**: The ability to configure PID namespace sharing per-pod was added as an alpha feature in 1.10. This promotes the feature to beta and makes the feature available by default.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
WIP #1615
**Special notes for your reviewer**:
/assign @yujuhong
**Release note**:
```release-note
The PodShareProcessNamespace feature to configure PID namespace sharing within a pod has been promoted to beta.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
refuse serviceaccount projection volume request when pod has no servceaccount bounded
**What this PR does / why we need it**:
Currently, if user starts a cluster with ServiceAccount admission plugin disabled, then creates a Pod
like this:
```
kind: Pod
apiVersion: v1
metadata:
labels:
run: nginx
name: busybox2
spec:
containers:
- image: gcr.io/google-containers/nginx
name: nginx
volumeMounts:
- mountPath: /var/run/secrets/tokens
name: token
- image: ubuntu
name: ttt
volumeMounts:
- mountPath: /var/run/secrets/tokens
name: token
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
volumes:
- name: token
projected:
sources:
- serviceAccountToken:
path: tokenPath
expirationSeconds: 6000
audience: gakki-audiences
```
The pod creation will fail with error info like:
Events:
```
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 23s default-scheduler Successfully assigned office/busybox2 to 127.0.0.1
Warning FailedMount 8s (x6 over 23s) kubelet, 127.0.0.1 MountVolume.SetUp failed for volume "token" : failed to fetch token: resource name may not be empty
```
We should refuse the projection request earlier. This patch fix this.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
improve serviceaccount projected volume validation error info
**What this PR does / why we need it**:
Fix a small bug here;
We should use srcPath instead fldPath here like other projection volume do which could give info
about which source triggered the error.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 67042, 66480, 67053). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
ensure MatchNodeSelectorTerms() runs statelessly
**What this PR does**:
Fix sorting behavior in selector.go:
- move sorting from NewRequirement() out to String()
- add related unit tests
- add unit tests in one of outer callers (pkg/apis/core/v1/helper)
**Why we need it**:
- Without this fix, scheduling and daemonset controller doesn't work well in some (corner) cases
**Which issue(s) this PR fixes**:
Fixes#66298
**Special notes for your reviewer**:
Parameter `nodeSelectorTerms` in method MatchNodeSelectorTerms() is a slice, which is fundamentally a {*elements, len, cap} tuple - i.e. it's passing in a pointer. In that method, NodeSelectorRequirementsAsSelector() -> NewRequirement() is invoked, and the `matchExpressions[*].values` is passed into and **modified** via `sort.Strings(vals)`.
This will cause following daemonset pod fall into an infinite create/delete loop:
```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: problem
spec:
selector:
matchLabels:
app: sleeper
template:
metadata:
labels:
app: sleeper
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 127.0.0.2
- 127.0.0.1
containers:
- name: busybox
image: busybox
command: ["/bin/sleep", "7200"]
```
(the problem can be stably reproduced on a local cluster started by `hack/local-up-cluster.sh`)
The first time daemonset yaml is handled by apiserver and persisted in etcd with original format (original order of values was kept - 127.0.0.2, 127.0.0.1). After that, daemonset controller tries to schedule pod, and it reuses the predicates logic in scheduler component - where the values are **sorted** deeply. This not only causes the pod to be created in sorted order (127.0.0.1, 127.0.0.2), but also introduced a bug when updating daemonset - internally ds controller use a "rawMessage" (bytes of an object) to calculate hash acting as a "controller-revision-hash" to control revision rollingUpdate/rollBack, so it keeps killing "old" pod and spawning "new" pod back and forth, and fall into an infinite loop.
The issue exists in `master`, `release-1.11` and `release-1.10`.
**Release note**:
```release-note
NONE
```
- move sorting from NewRequirement() out to String()
- add related unit tests
- add unit tests in one of outer callers (pkg/apis/core/v1/helper)
Closes#66298
Automatic merge from submit-queue (batch tested with PRs 66351, 66883, 66156). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix coordination.Lease validation
Fix a couple issues I noticed in the coordination.Lease validation logic which copying it for a new API:
- Field path should use the json path names (`objectMeta` -> `matadata`)
- ObjectMeta should be validated on update
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Default extensions/v1beta1 Deployment's ProgressDeadlineSeconds to MaxInt32
**What this PR does / why we need it**: Default values should be set in all API versions, because defaulting happens whenever a serialized version is read. When we switched to `apps/v1` as the storage version in `1.10` (#58854), `extensions/v1beta1` `DeploymentSpec.ProgressDeadlineSeconds` gets `apps/v1` default value (`600`) instead of being unset.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66135
**Special notes for your reviewer**: We need to cherrypick this fix to 1.10 and 1.11. Note that this fix will only help people who haven't upgraded to 1.10 or 1.11 when the storage version is changed.
@kubernetes/sig-apps-bugs
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65771, 65849). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add a new conversion path to replace GenericConversionFunc
reflect.Call is very expensive. We currently use a switch block as part of AddGenericConversionFunc to avoid the bulk of top level a->b conversion for our primary types which is hand-written. Instead of having these be handwritten, we should generate them.
The pattern for generating them looks like:
```
scheme.AddConversionFunc(&v1.Type{}, &internal.Type{}, func(a, b interface{}, scope conversion.Scope) error {
return Convert_v1_Type_to_internal_Type(a.(*v1.Type), b.(*internal.Type), scope)
})
```
which matches AddDefaultObjectFunc (which proved out the approach last year). The
conversion machinery should then do a simple map lookup based on the incoming types and invoke the function. Like defaulting, it's up to the caller to match the types to arguments, which we do by generating this code. This bypasses reflect.Call and in the future allows Golang mid-stack inlining to optimize this code.
As part of this change I strengthened registration of custom functions to be generated instead of hand registered, and also strengthened error checking of the generator when it sees a manual conversion to error out. Since custom functions are automatically used by the generator, we don't really have a case for not registering the functions.
Once this is fully tested out, we can remove the reflection based path and the old registration methods, and all conversion will work from point to point methods (whether generated or custom).
Much of the need for the reflection path has been removed by changes to generation (to omit fields) and changes to Go (to make assigning equivalent structs easy).
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Compare stateful set updates semantically
Fixes#66137
```release-note
fixes a validation error that could prevent updates to StatefulSet objects containing non-normalized resource requests
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fixing comments in types.go to describe the changes for CSI driver default FS type override fix
This PR fixes the comment in types.go which was made in the commit 5dfe7b5758
In the above commit, the change that fixed the override of default FSType for CSI driver was made. However the comments in types.go were made for GCEPersistentDiskVolumeSource and RBDVolumeSource respectively. This commit fixes that comment to reflect the changes for CSI driver
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 66030, 65997). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
declare conversion dependencies
fixes#65988
verified all packages regenerate cleanly individually with:
```
for x in $(find . -name *zz*conversion* | xargs -n 1 dirname); do touch $x; make generated_files; git status; done
```
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 55023, 65499). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bugfix/csi default fs type
This PR address the issue mentioned in the following ticket https://github.com/kubernetes/kubernetes/issues/65122
The FSType string will now not be defaulted to ext4. Removes defaulting of CSI file system type to ext4. CSI plugins that depended on this default need to be updated as the fsType would remain an empty string if not provided and would not default to ext4. CSI spec allows for an empty fstype string. This is intended for non-block plugins like nfs and gluster where filesystems are not separately created on the volume. But currently the default file system is overridden to ext4 which makes the above case redundant. This commit prevents such an overridding.
```release-note
ACTION REQUIRED: Removes defaulting of CSI file system type to ext4. All the production drivers listed under https://kubernetes-csi.github.io/docs/Drivers.html were tested and work as expected after this change. If you are using a driver not in that list, please test the drivers on an updated test cluster first. ```
Automatic merge from submit-queue (batch tested with PRs 64226, 65880). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Populate NodeAffinity on top of labels for cloud based PersistentVolumes
**What this PR does / why we need it**:
This PR populates the NodeAffinity field (on top of the existing labels) for PVs backed by cloud providers like EC2 EBS and GCE PD.
**Special notes for your reviewer**:
Related to https://github.com/kubernetes/kubernetes/pull/63232
Sample `describe pv` output for EBS with node affinity field populated:
```
kubectl describe pv pv0001
Name: pv0001
Labels: failure-domain.beta.kubernetes.io/region=us-west-2
failure-domain.beta.kubernetes.io/zone=us-west-2a
Annotations: <none>
Finalizers: [kubernetes.io/pv-protection]
StorageClass:
Status: Available
Claim:
Reclaim Policy: Retain
Access Modes: RWO
Capacity: 5Gi
Node Affinity:
Required Terms:
Term 0: failure-domain.beta.kubernetes.io/zone in [us-west-2a]
failure-domain.beta.kubernetes.io/region in [us-west-2]
Message:
Source:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: vol-00cf03a068c62cbe6
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
```
/sig storage
/assign @msau42
**Release note**:
```NONE```