This patch adds new Kubelet option topologyManagerPolicyOptions.
To introduce new TopologyManager options, first we need to introduce new
flag called `topology-manager-policy-options` to allow users to modify
behaviour of best-effort and restricted policies.
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
Fix conversion errors
Changed the order
update
update
fix manaul coversions
keep the global parameter for backward compatibility
Address Wei's comments
Fix an error
Fix issues
Add unit tests for validation
Fix a comment
Address comments
Update comments
fix verifiation errors
Add tests for scheme_test.go
Convert percentageOfNodesToScore to pointer
Fix errors
Resolve conflicts
Fix testing errors
Address Wei's comments
Revert IntPtr to Int changes
Address comments
Not overrite percentageOfNodesToScore
Fix a bug
Fix a bug
change errs to err
Fix a nit
Remove duplication
Address comments
Fix lint warning
Fix an issue
Update comments
Clean up
Address comments
Revert changes to defaults
fix unit test error
Update
Fix tests
Use default PluginConfigs
cpu.cfs_period_us is measured in microseconds in the kernel but
provided in time.Duration by the user, that change clarifies the code
to make this evident to the reader.
Also, the minimum value for that feature is 1ms and not 1μs, and this
change alters the validation to reject values smaller than 1ms.
cpu.cfs_period_us is 100μs by default despite having an "ms" unit
for some unfortunate reason. Documentation:
https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html#management
The desired effect of that change is to match
k8s default `CPUCFSQuotaPeriod` value (100ms before that change)
with one used in k8s without the `CustomCPUCFSQuotaPeriod` flag enabled
and Linux CFS (100us, 1000x smaller than 100ms).
This change is to promote local storage capacity isolation feature to GA
At the same time, to allow rootless system disable this feature due to
unable to get root fs, this change introduced a new kubelet config
"localStorageCapacityIsolation". By default it is set to true. For
rootless systems, they can set this configuration to false to disable
the feature. Once it is set, user cannot set ephemeral-storage
request/limit because capacity and allocatable will not be set.
Change-Id: I48a52e737c6a09e9131454db6ad31247b56c000a
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Making the LoggingConfiguration part of the versioned component-base/config API
had the theoretic advantage that components could have offered different
configuration APIs with experimental features limited to alpha versions (for
example, sanitization offered only in a v1alpha1.KubeletConfiguration). Some
components could have decided to only use stable logging options.
In practice, this wasn't done. Furthermore, we don't want different components
to make different choices regarding which logging features they offer to
users. It should always be the same everywhere, for the sake of consistency.
This can be achieved with a saner Go API by dropping the distinction between
internal and external LoggingConfiguration types. Different stability levels of
indidividual fields have to be covered by documentation (done) and potentially
feature gates (not currently done).
Advantages:
- everything related to logging is under component-base/logs;
previously this was scattered across different packages and
different files under "logs" (why some code was in logs/config.go
vs. logs/options.go vs. logs/logs.go always confused me again
and again when coming back to the code):
- long-term config and command line API are clearly separated
into the "api" package underneath that
- logs/logs.go itself only deals with legacy global flags and
logging configuration
- removal of separate Go APIs like logs.BindLoggingFlags and
logs.Options
- LogRegistry becomes an implementation detail, with less code
and less exported functionality (only registration needs to
be exported, querying is internal)
* Introduce networking/v1alpha1 api, ClusterCIDRConfig type
Introduce networking/v1alpha1 api group.
Add `ClusterCIDRConfig` type to networking/v1alpha1 api group, this type
will enable the NodeIPAM controller to support multiple ClusterCIDRs.
* Change ClusterCIDRConfig.NodeSelector type in api
* Fix review comments for API
* Update ClusterCIDRConfig API Spec
Introduce PerNodeHostBits field, remove PerNodeMaskSize
* Add FeatureGate PodHostIPs
* Add HostIPs field and update PodIPs field
* Types conversion
* Add dropDisabledStatusFields
* Add HostIPs for kubelet
* Add fuzzer for PodStatus
* Add status.hostIPs in ConvertDownwardAPIFieldLabel
* Add status.hostIPs in validEnvDownwardAPIFieldPathExpressions
* Downward API support for status.hostIPs
* Add DownwardAPI validation for status.hostIPs
* Add e2e to check that hostIPs works
* Add e2e to check that Downward API works
* Regenerate
Default to enabled
Fix validation of null-updates/patches when the "old" PVC was persisted by
an older version. Add upgrade integration tests written by liggitt.
This commit adds the framework for the new local detection
modes BridgeInterface and InterfaceNamePrefix to work.
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Remove the comment "As of v1.22, this field is beta and is controlled
via the CSRDuration feature gate" from the expirationSeconds field's
godoc.
Mark the "CSRDuration" feature gate as GA in 1.24, lock its value to
"true", and remove the various logic which handled when the gate was
"false".
Update conformance test to check that the CertificateSigningRequest's
Spec.ExpirationSeconds field is stored, but do not check if the field
is honored since this functionality is optional.
- Lock feature gate to true and schedule for deletion in 1.26
- Remove checks on feature gate
- Graduate E2E test to Conformance
Change-Id: I6814819d318edaed5c86dae4055f4b050a4d39fd
This change adds 2 options for windows:
--forward-healthcheck-vip: If true forward service VIP for health check
port
--root-hnsendpoint-name: The name of the hns endpoint name for root
namespace attached to l2bridge, default is cbr0
When --forward-healthcheck-vip is set as true and winkernel is used,
kube-proxy will add an hns load balancer to forward health check request
that was sent to lb_vip:healthcheck_port to the node_ip:healthcheck_port.
Without this forwarding, the health check from google load balancer will
fail, and it will stop forwarding traffic to the windows node.
This change fixes the following 2 cases for service:
- `externalTrafficPolicy: Cluster` (default option): healthcheck_port is
10256 for all services. Without this fix, all traffic won't be directly
forwarded to windows node. It will always go through a linux node and
get forwarded to windows from there.
- `externalTrafficPolicy: Local`: different healthcheck_port for each
service that is configured as local. Without this fix, this feature
won't work on windows node at all. This feature preserves client ip
that tries to connect to their application running in windows pod.
Change-Id: If4513e72900101ef70d86b91155e56a1f8c79719
Automatic merge from submit-queue
Add dockershim only mode
This PR added a `experimental-dockershim` hidden flag in kubelet to run dockershim only.
We introduce this flag mainly for cri validation test. In the future we should compile dockershim into another binary.
@yujuhong @feiskyer @xlgao-zju
/cc @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue
Scheduler can recieve its policy configuration from a ConfigMap
**What this PR does / why we need it**: This PR adds the ability to scheduler to receive its policy configuration from a ConfigMap. Before this, scheduler could receive its policy config only from a file. The logic to watch the ConfigMap object will be added in a subsequent PR.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```Add the ability to the default scheduler to receive its policy configuration from a ConfigMap object.
```
Automatic merge from submit-queue (batch tested with PRs 43777, 44121)
Add patchMergeKey and patchStrategy support to OpenAPI
Support generating Open API extensions for strategic merge patch tags in go struct tags
Support `patchStrategy` and `patchMergeKey`.
Also support checking if the Open API extension and struct tags match.
```release-note
Support generating Open API extensions for strategic merge patch tags in go struct tags
```
cc: @pwittrock @ymqytw
(Description mostly copied from #43833)
Automatic merge from submit-queue (batch tested with PRs 44119, 42538, 43802, 42336, 43396)
iSCSI CHAP support
**What this PR does / why we need it**:
To support CHAP authentication in a multi-tenant setup
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Support iSCSI CHAP authentication
```
Automatic merge from submit-queue
Support status.hostIP in downward API
**What this PR does / why we need it**:
Exposes pod's hostIP (node IP) via downward API.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes https://github.com/kubernetes/kubernetes/issues/24657
**Special notes for your reviewer**:
Not sure if there's more documentation that's needed, please point me in the right direction and I will add some :)
Kubelet flags are not necessarily appropriate for the KubeletConfiguration
object. For example, this PR also removes HostnameOverride and NodeIP
from KubeletConfiguration. This is a preleminary step to enabling Nodes
to share configurations, as part of the dynamic Kubelet configuration
feature (#29459). Fields that must be unique for each node inhibit
sharing, because their values, by definition, cannot be shared.
Automatic merge from submit-queue (batch tested with PRs 43429, 43416, 43312, 43141, 43421)
add singular resource names to discovery
Adds the singular resource name to our resource for discovery. This is something we've discussed to remove our pseudo-pluralization library which is unreliable even for english and really has no hope of properly handling other languages or variations we can expect from TPRs and aggregated API servers.
This pull simply adds the information to discovery, it doesn't not re-wire any RESTMappers.
@kubernetes/sig-cli-misc @kubernetes/sig-apimachinery-misc @kubernetes/api-review
```release-note
API resource discovery now includes the `singularName` used to refer to the resource.
```
Automatic merge from submit-queue (batch tested with PRs 42998, 42902, 42959, 43020, 42948)
Add Host field to TCPSocketAction
Currently, TCPSocketAction always uses Pod's IP in connection. But when a pod uses the host network, sometimes firewall rules may prevent kubelet from connecting through the Pod's IP.
This PR introduces the 'Host' field for TCPSocketAction, and if it is set to non-empty string, the probe will be performed on the configured host rather than the Pod's IP. This gives users an opportunity to explicitly specify 'localhost' as the target for the above situations.
```release-note
Add Host field to TCPSocketAction
```
Automatic merge from submit-queue (batch tested with PRs 43313, 43257, 43271, 43307)
Remove 'all namespaces' meaning of empty list in PodAffinityTerm
Removes the distinction between `null` and `[]` for the PodAffinityTerm#namespaces field (option 4 discussed in https://github.com/kubernetes/kubernetes/issues/43203#issuecomment-287237992), since we can't distinguish between them in protobuf (and it's a less than ideal API)
Leaves the door open to reintroducing "all namespaces" function via a dedicated field or a dedicated token in the list of namespaces
Wanted to get a PR open and tests green in case we went with this option.
Not sure what doc/release-note is needed if the "all namespaces" function is not present in 1.6
Automatic merge from submit-queue (batch tested with PRs 42692, 42169, 42173)
Add pprof trace support
Add support for `/debug/pprof/trace`
Can wait for master to reopen for 1.7.
cc @smarterclayton @wojtek-t @gmarek @timothysc @jeremyeder @kubernetes/sig-scalability-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)
Dell EMC ScaleIO Volume Plugin
**What this PR does / why we need it**
This PR implements the Kubernetes volume plugin to allow pods to seamlessly access and use data stored on ScaleIO volumes. [ScaleIO](https://www.emc.com/storage/scaleio/index.htm) is a software-based storage platform that creates a pool of distributed block storage using locally attached disks on every server. The code for this PR supports persistent volumes using PVs, PVCs, and dynamic provisioning.
You can find examples of how to use and configure the ScaleIO Kubernetes volume plugin in [examples/volumes/scaleio/README.md](examples/volumes/scaleio/README.md).
**Special notes for your reviewer**:
To facilitate code review, commits for source code implementation are separated from other artifacts such as generated, docs, and vendored sources.
```release-note
ScaleIO Kubernetes Volume Plugin added enabling pods to seamlessly access and use data stored on ScaleIO volumes.
```
Automatic merge from submit-queue (batch tested with PRs 41919, 41149, 42350, 42351, 42285)
kubelet: enable qos-level memory limits
```release-note
Experimental support to reserve a pod's memory request from being utilized by pods in lower QoS tiers.
```
Enables the QoS-level memory cgroup limits described in https://github.com/kubernetes/community/pull/314
**Note: QoS level cgroups have to be enabled for any of this to take effect.**
Adds a new `--experimental-qos-reserved` flag that can be used to set the percentage of a resource to be reserved at the QoS level for pod resource requests.
For example, `--experimental-qos-reserved="memory=50%`, means that if a Guaranteed pod sets a memory request of 2Gi, the Burstable and BestEffort QoS memory cgroups will have their `memory.limit_in_bytes` set to `NodeAllocatable - (2Gi*50%)` to reserve 50% of the guaranteed pod's request from being used by the lower QoS tiers.
If a Burstable pod sets a request, its reserve will be deducted from the BestEffort memory limit.
The result is that:
- Guaranteed limit matches root cgroup at is not set by this code
- Burstable limit is `NodeAllocatable - Guaranteed reserve`
- BestEffort limit is `NodeAllocatable - Guaranteed reserve - Burstable reserve`
The only resource currently supported is `memory`; however, the code is generic enough that other resources can be added in the future.
@derekwaynecarr @vishh
Automatic merge from submit-queue (batch tested with PRs 41931, 39821, 41841, 42197, 42195)
Admission Controller: Add Pod Preset
Based off the proposal in https://github.com/kubernetes/community/pull/254
cc @pmorie @pwittrock
TODO:
- [ ] tests
**What this PR does / why we need it**: Implements the Pod Injection Policy admission controller
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Added new Api `PodPreset` to enable defining cross-cutting injection of Volumes and Environment into Pods.
```
Introduced chages:
1. Re-writing of the resolv.conf file generated by docker.
Cluster dns settings aren't passed anymore to docker api in all cases, not only for pods with host network:
the resolver conf will be overwritten after infra-container creation to override docker's behaviour.
2. Added new one dnsPolicy - 'ClusterFirstWithHostNet', so now there are:
- ClusterFirstWithHostNet - use dns settings in all cases, i.e. with hostNet=true as well
- ClusterFirst - use dns settings unless hostNetwork is true
- Default
Fixes#17406
Automatic merge from submit-queue
Extend experimental support to multiple Nvidia GPUs
Extended from #28216
```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with support for multiple Nvidia GPUs.
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```
1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```
TODO:
- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package