Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add proxy for container streaming in kubelet for streaming auth.
For https://github.com/kubernetes/kubernetes/issues/36666, option 2 of https://github.com/kubernetes/kubernetes/issues/36666#issuecomment-378440458.
This PR:
1. Removed the `DirectStreamingRuntime`, and changed `IndirectStreamingRuntime` to `StreamingRuntime`. All `DirectStreamingRuntime`s, `dockertools` and `rkt`, were removed.
2. Proxy container streaming in kubelet instead of returning redirect to apiserver. This solves the container runtime authentication issue, which is what we agreed on in https://github.com/kubernetes/kubernetes/issues/36666.
Please note that, this PR replaced the redirect with proxy directly instead of adding a knob to switch between the 2 behaviors. For existing CRI runtimes like containerd and cri-o, they should change to serve container streaming on localhost, so as to make the whole container streaming connection secure.
If a general authentication mechanism proposed in https://github.com/kubernetes/kubernetes/issues/62747 is ready, we can switch back to redirect, and all code can be found in github history.
Please also note that this added some overhead in kubelet when there are container streaming connections. However, the actual bottleneck is in the apiserver anyway, because it does proxy for all container streaming happens in the cluster. So it seems fine to get security and simplicity with this overhead. @derekwaynecarr @mrunalp Are you ok with this? Or do you prefer a knob?
@yujuhong @timstclair @dchen1107 @mikebrow @feiskyer
/cc @kubernetes/sig-node-pr-reviews
**Release note**:
```release-note
Kubelet now proxies container streaming between apiserver and container runtime. The connection between kubelet and apiserver is authenticated. Container runtime should change streaming server to serve on localhost, to make the connection between kubelet and container runtime local.
In this way, the whole container streaming connection is secure. To switch back to the old behavior, set `--redirect-container-streaming=true` flag.
```
While I normally try to avoid adding flags, this is a short term
scalability fix for v1.11, and there are other long-term solutions in
the works, so we shouldn't commit to this in the v1beta1 Kubelet config.
Flags are our escape hatch.
The goal of this change is to remove the registration of signal
handling from pkg/kubelet. We now pass in a stop channel.
If you register a signal handler in `main()` to aid in a controlled
and deliberate exit then the handler registered in `pkg/kubelet` often
wins and the process exits immediately. This means all other signal
handler registrations are currently racy if `DockerServer.Start()` is
directly or indirectly invoked.
This change also removes another signal handler registration from
`NewAPIServerCommand()`; a stop channel is now passed to this
function.
Automatic merge from submit-queue (batch tested with PRs 63314, 63884, 63799, 63521, 62242). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add memcg notifications for allocatable cgroup
**What this PR does / why we need it**:
Use memory cgroup notifications to trigger the eviction manager when the allocatable eviction threshold is crossed. This allows the eviction manager to respond more quickly when the allocatable cgroup's available memory becomes low. Evictions are preferable to OOMs in the cgroup since the kubelet can enforce its priorities on which pod is killed.
**Which issue(s) this PR fixes**:
Fixes https://github.com/kubernetes/kubernetes/issues/57901
**Special notes for your reviewer**:
This adds the alloctable cgroup from the container manager to the eviction config.
**Release note**:
```release-note
NONE
```
/sig node
/priority important-soon
/kind feature
I would like this to be included in the 1.11 release.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
track/close kubelet->API connections on heartbeat failure
xref #48638
xref https://github.com/kubernetes-incubator/kube-aws/issues/598
we're already typically tracking kubelet -> API connections and have the ability to force close them as part of client cert rotation. if we do that tracking unconditionally, we gain the ability to also force close connections on heartbeat failure as well. it's a big hammer (means reestablishing pod watches, etc), but so is having all your pods evicted because you didn't heartbeat.
this intentionally does minimal refactoring/extraction of the cert connection tracking transport in case we want to backport this
* first commit unconditionally sets up the connection-tracking dialer, and moves all the cert management logic inside an if-block that gets skipped if no certificate manager is provided (view with whitespace ignored to see what actually changed)
* second commit plumbs the connection-closing function to the heartbeat loop and calls it on repeated failures
follow-ups:
* consider backporting this to 1.10, 1.9, 1.8
* refactor the connection managing dialer to not be so tightly bound to the client certificate management
/sig node
/sig api-machinery
```release-note
kubelet: fix hangs in updating Node status after network interruptions/changes between the kubelet and API server
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Report node DNS info with --node-ip
**What this PR does / why we need it**:
This PR adds `ExternalDNS`, `InternalDNS`, and `ExternalIP` info for kubelets with the `--nodeip` flag enabled.
**Which issue(s) this PR fixes**
Fixes#63158
**Special notes for your reviewer**:
I added a field to the Kubelet to make IP validation more testable (`validateNodeIP` relies on the `net` package and the IP address of the host that is executing the test.) I also converted the test to use a table so new cases could be added more easily.
**Release Notes**
```release-note
Report node DNS info with --node-ip flag
```
@andrewsykim
@nckturner
/sig node
/sig network
Automatic merge from submit-queue (batch tested with PRs 62448, 59317, 59947, 62418, 62352). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: add configuration to optionally enable server tls bootstrap
right now if the RotateKubeletServerCertificate feature is enabled,
kubelet will bootstrap server tls. this is undesirable if the deployment
is not or cannot run an approver to handle these certificate signing
requests.
Fixes https://github.com/kubernetes/kubernetes/issues/62077
```release-note
NONE
```
right now if the RotateKubeletServerCertificate feature is enabled,
kubelet will bootstrap server tls. this is undesirable if the deployment
is not or cannot run an approver to handle these certificate signing
requests.
The code was added to support rktnetes and non-CRI docker integrations.
These legacy integrations have already been removed from the codebase.
This change removes the compatibility code existing soley for the
legacy integrations.
Automatic merge from submit-queue (batch tested with PRs 57658, 61304, 61560, 61859, 61870). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
certs: exclude more nonsensical addresses from SANs
I noticed this when I saw 169.254.* SANs using server TLS bootstrap.
This change excludes more nonsensical addresses from being requested as
SANs in that flow.
I noticed this when I saw 169.254.* SANs using server TLS bootstrap.
This change excludes more nonsensical addresses from being requested as
SANs in that flow.
rktnetes is scheduled to be deprecated in 1.10 (#53601). According to
the deprecation policy for beta CLI and flags, we can remove the feature
in 1.11.
Fixes#58721
Automatic merge from submit-queue (batch tested with PRs 61487, 58353, 61078, 61219, 60792). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove dead code in kubelet
clean up dead code
/kind cleanup
/sig node
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: make --cni-bin-dir accept a comma-separated list of CNI plugin directories
Allow CNI-related network plugin drivers (kubenet, cni) to search a list of
directories for plugin binaries instead of just one. This allows using an
administrator-provided path and fallbacks to others (like the previous default
of /opt/cni/bin) for backwards compatibility.
```release-note
kubelet's --cni-bin-dir option now accepts multiple comma-separated CNI binary directory paths, which are search for CNI plugins in the given order.
```
@kubernetes/rh-networking @kubernetes/sig-network-misc @freehan @pecameron @rajatchopra
Automatic merge from submit-queue (batch tested with PRs 51423, 53880). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disable ImageGC when high threshold is set to 100
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#51268
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
The LocalStorageCapacityIsolation feature added a new resource type
ResourceEphemeralStorage "ephemeral-storage" so that this resource can
be allocated, limited, and consumed as the same way as CPU/memory. All
the features related to resource management (resource request/limit, quota, limitrange) are avaiable for local ephemeral storage.
This local ephemeral storage represents the storage for root file system, which will be consumed by containers' writtable layer and logs. Some volumes such as emptyDir might also consume this storage.
Allow CNI-related network plugin drivers (kubenet, cni) to search a list of
directories for plugin binaries instead of just one. This allows using an
administrator-provided path and fallbacks to others (like the previous default
of /opt/cni/bin) for backwards compatibility.
Automatic merge from submit-queue (batch tested with PRs 60157, 60337, 60246, 59714, 60467). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
backoff runtime errors in kubelet sync loop
The runtime health check can race with PLEG's first relist, and this
often results in an unnecessary 5 second wait during Kubelet bootstrap.
This change aims to improve the performance.
```release-note
NONE
```
The word 'manifest' technically refers to a container-group specification
that predated the Pod abstraction. We should avoid using this legacy
terminology where possible. Fortunately, the Kubelet's config API will
be beta in 1.10 for the first time, so we still had the chance to make
this change.
I left the flags alone, since they're deprecated anyway.
I changed a few var names in files I touched too, but this PR is the
just the first shot, not the whole campaign
(`git grep -i manifest | wc -l -> 1248`).
The runtime health check can race with PLEG's first relist, and this
often results in an unnecessary 5 second wait during Kubelet bootstrap.
This change aims to improve the performance.
Automatic merge from submit-queue (batch tested with PRs 54191, 59374, 59824, 55032, 59906). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding per container stats for CRI runtimes
**What this PR does / why we need it**
This commit aims to collect per container log stats. The change was proposed as a part of #55905. The change includes change the log path from /var/pod/<pod uid>/containername_attempt.log to /var/pod/<pod uid>/containername/containername_attempt.log. The logs are collected by reusing volume package to collect metrics from the log path.
Fixes#55905
**Special notes for your reviewer:**
cc @Random-Liu
**Release note:**
```
Adding container log stats for CRI runtimes.
```
This commit aims to collect per container log stats. The
change was proposed as a part of #55905. The change includes
change of the log path from /var/pod/<pod uid>/containername_attempt.log
to /var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.
Signed-off-by: abhi <abhi@docker.com>
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Collect ephemeral storage capacity on initialization
**What this PR does / why we need it**:
We have had some node e2e flakes where a pod can be rejected if it requests ephemeral storage. This is because we don't set capacity and allocatable for ephemeral storage on initialization.
This PR causes cAdvisor to do one round of stats collection during initialization, which will allow it to get the disk capacity when it first sets the node status.
It also sets the node to NotReady if capacities have not been initialized yet.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @jingxu97 @Random-Liu
/sig node
/kind bug
/priority important-soon
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix kubelet PVC stale metrics
**What this PR does / why we need it**:
Volumes on each node changes, we should not only add PVC metrics into
gauge vector. It's better use a collector to collector metrics from internal
stats.
Currently, if a PV (bound to a PVC `testpv`) is attached and used by node A, then migrated to node B or just deleted from node A later. `testpvc` metrics will not disappear from kubelet on node A. After a long running time, `kubelet` process will keep a lot of stale volume metrics in memory.
For these dynamic metrics, it's better to use a collector to collect metrics from a data source (`StatsProvider` here), like [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) scraping metrics from kube-apiserver.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/57686
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix kubelet PVC stale metrics
```
Automatic merge from submit-queue (batch tested with PRs 59276, 51042, 58973, 59377, 59472). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: only register api source when connecting
**What this PR does / why we need it**:
before this change, an api source was always registered, even when there
was no kubeclient. this lead to some operations blocking waiting for
podConfig.SeenAllSources to pass, which it never would.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#59275
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This adds context to all the relevant cloud provider interface signatures.
Callers of those APIs are currently satisfied using context.TODO().
There will be follow on PRs to push the context through the stack.
For an idea of the full scope of this change please look at PR #58532.
before this change, an api source was always registered, even when there
was no kubeclient. this lead to some operations blocking waiting for
podConfig.SeenAllSources to pass, which it never would.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Only rotate certificates in the background
Change the Kubelet to not block until the first certs have rotated (we didn't act on it anyway) and fall back to the bootstrap cert if the most recent rotated cert is expired on startup.
The certificate manager originally had a "block on startup" rotation behavior to ensure at least one rotation happened on startup. However, since rotation may not succeed within the first time window the code was changed to simply print the error rather than return it. This meant that the blocking rotation has no purpose - it cannot cause the kubelet to fail, and it *does* block the kubelet from starting static pods before the api server becomes available.
The current block behavior causes a bootstrapped kubelet that is also set to run static pods to wait several minutes before actually launching the static pods, which means self-hosted masters using static pods have a pointless delay on startup.
Since blocking rotation has no benefit and can't actually fail startup, this commit removes the blocking behavior and simplifies the code at the same time. The goroutine for rotation now completely owns the deadline, the shouldRotate() method is removed, and the method that sets rotationDeadline now returns it. We also explicitly guard against a negative sleep interval and omit the message.
Should have no impact on bootstrapping except the removal of a long delay on startup before static pods start.
The other change is that an expired certificate from the cert manager is *not* considered a valid cert, which triggers an immediate rotation. This causes the cert manager to fall back to the original bootstrap certificate until a new certificate is issued. This allows the bootstrap certificate on masters to be "higher powered" and allow the node to function prior to initial approval, which means someone configuring the masters with a pre-generated client cert can be guaranteed that the kubelet will be able to communicate to report self-hosted static pod status, even if the first client rotation hasn't happened. This makes master self-hosting more predictable for static configuration environments.
```release-note
When using client or server certificate rotation, the Kubelet will no longer wait until the initial rotation succeeds or fails before starting static pods. This makes running self-hosted masters with rotation more predictable.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove setInitError.
**What this PR does / why we need it**:
Removes setInitError, it's not sure it was ever really used, and it causes the kubelet to hang and get wedged.
**Which issue(s) this PR fixes**
Fixes#46086
**Special notes for your reviewer**:
If `initializeModules()` in `kubelet.go` encounters an error, it calls `runtimeState.setInitError(...)`
47d61ef472/pkg/kubelet/kubelet.go (L1339)
The trouble with this is that `initError` is never cleared, which means that `runtimeState.runtimeErrors()` always returns this `initError`, and thus pods never start sync-ing.
In normal operation, this is expected and desired because eventually the runtime is expected to become healthy, but in this case, `initError` is never updated, and so the system just gets wedged.
47d61ef472/pkg/kubelet/kubelet.go (L1751)
We could add some retry to `initializeModules()` but that seems unnecessary, as eventually we'd want to just die anyway. Instead, just log fatal and die, a supervisor will restart us.
Note, I'm happy to add some retry here too, if that makes reviewers happier.
**Release note**:
```release-note
Prevent kubelet from getting wedged if initialization of modules returns an error.
```
@feiskyer @dchen1107 @janetkuo
@kubernetes/sig-node-bugs
The certificate manager originally had a "block on startup" rotation
behavior to ensure at least one rotation happened on startup. However,
since rotation may not succeed within the first time window the code was
changed to simply print the error rather than return it. This meant that
the blocking rotation has no purpose - it cannot cause the kubelet to
fail, and it *does* block the kubelet from starting static pods before
the api server becomes available.
The current block behavior causes a bootstrapped kubelet that is also
set to run static pods to wait several minutes before actually launching
the static pods, which means self-hosted masters using static pods have
a pointless delay on startup.
Since blocking rotation has no benefit and can't actually fail startup,
this commit removes the blocking behavior and simplifies the code at the
same time. The goroutine for rotation now completely owns the deadline,
the shouldRotate() method is removed, and the method that sets
rotationDeadline now returns it. We also explicitly guard against a
negative sleep interval and omit the message.
Should have no impact on bootstrapping except the removal of a long
delay on startup before static pods start.
Also add a guard condition where if the current cert in the store is
expired, we fall back to the bootstrap cert initially (we use the
bootstrap cert to communicate with the server). This is consistent with
when we don't have a cert yet.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add deprecation warnings for rktnetes flags
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#53601
**Special notes for your reviewer**:
**Release note**:
```release-note
rktnetes has been deprecated in favor of rktlet. Please see https://github.com/kubernetes-incubator/rktlet for more information.
```
This moves plugin/pkg/scheduler to pkg/scheduler and
plugin/cmd/kube-scheduler to cmd/kube-scheduler.
Bulk of the work was done with gomvpkg, except for kube-scheduler main
package.
Automatic merge from submit-queue (batch tested with PRs 57127, 57011, 56754, 56601, 56483). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove hacks added for mesos
**What this PR does / why we need it**:
Since Mesos is no longer in your main repository and since we have
things like dynamic kubelet configuration in progress, we should
drop these undocumented, untested, private hooks.
cmd/kubelet/app/server.go::CreateAPIServerClientConfig
CreateAPIServerClientConfig::getRuntime
pkg/kubelet/kubelet_pods.go::getPhase
Also remove stuff from Dependencies struct that were specific to
the Mesos integration (ContainerRuntimeOptions and Options)
Also remove stale references in test/e2e and and test owners file
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Drop hacks used for Mesos integration that was already removed from main kubernetes repository
```
Automatic merge from submit-queue (batch tested with PRs 57172, 55382, 56147, 56146, 56158). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
delete useless params containerized
Signed-off-by: zhangjie <zhangjie0619@yeah.net>
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
delete useless params containerized
```
Since Mesos is no longer in your main repository and since we have
things like dynamic kubelet configuration in progress, we should
drop these undocumented, untested, private hooks.
cmd/kubelet/app/server.go::CreateAPIServerClientConfig
CreateAPIServerClientConfig::getRuntime
pkg/kubelet/kubelet_pods.go::getPhase
Also remove stuff from Dependencies struct that were specific to
the Mesos integration (ContainerRuntimeOptions and Options)
Also remove stale references in test/e2e and and test owners file
Automatic merge from submit-queue (batch tested with PRs 55812, 55752, 55447, 55848, 50984). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Initial basic bootstrap-checkpoint support
**What this PR does / why we need it**:
Adds initial support for Pod checkpointing to allow for controlled recovery of the control plane during self host failure conditions.
fixes#49236
xref https://github.com/kubernetes/features/issues/378
**Special notes for your reviewer**:
Proposal is here: https://docs.google.com/document/d/1hhrCa_nv0Sg4O_zJYOnelE8a5ClieyewEsQM6c7-5-o/edit?ts=5988fba8#
1. Controlled tests work, but I have not tested the self hosted api-server recovery, that requires validation and logs. /cc @luxas
2. In adding hooks for checkpoint manager much of the tests around basicpodmanager appears to be stub'd. This has become an anti-pattern in the code and should be avoided.
3. I need a node-e2e to ensure consistency of behavior.
**Release note**:
```
Add basic bootstrap checkpointing support to the kubelet for control plane recovery
```
/cc @kubernetes/sig-cluster-lifecycle-misc @kubernetes/sig-node-pr-reviews
- Instead of using cm.capacity field to communicate device plugin resource
capacity, this PR changes to use an explicit cm.GetDevicePluginResourceCapacity()
function that returns device plugin resource capacity as well as any inactive
device plugin resource. Kubelet syncNodeStatus call this function during its
periodic run to update node status capacity and allocatable. After this call,
device plugin can remove the inactive device plugin resource from its allDevices
field as the update is already pushed to API server.
- Extends device plugin checkpoint data to record registered resources
so that we can finish resource removing even upon kubelet restarts.
- Passes sourcesReady from kubelet to device plugin to avoid removing
inactive pods during grace period of kubelet restart.
Automatic merge from submit-queue (batch tested with PRs 55839, 54495, 55884, 55983, 56069). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
seccomp is an alpha feature and not feature gated
Move SeccompProfileRoot to KubeletFlags and document flag as alpha.
wrt https://github.com/kubernetes/kubernetes/pull/53833#issuecomment-345396575, seccomp is an alpha feature, but this isn't clearly documented anywhere (the annotation just has the word "alpha" in it, and that's your signal that it's alpha).
Since seccomp was around before feature gates, it doesn't have one.
Thus SeccompProfileRoot should not be part of KubeletConfiguration, and this PR moves it to KubeletFlags, and amends the help text to note the alpha state of the feature.
fixes: #56087
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 55642, 55897, 55835, 55496, 55313). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disable container disk metrics when using the CRI stats integration
Issue: https://github.com/kubernetes/kubernetes/issues/51798
As explained in the issue, runtimes which make use of the CRI Stats API still have the performance overhead of collecting those same stats through cAdvisor.
The CRI Stats API has metrics for CPU, Memory, and Disk. This PR significantly reduces the added overhead due to collecting these stats in both cAdvisor and in the runtime.
This PR disables container disk metrics, which are very expensive to collect.
This PR does not disable node-level disk stats, as the "Raw" container handler does not currently respect ignoring DiskUsageMetrics.
This PR factors out the logic for determining whether or not to use the CRI stats provider into a helper function, as cAdvisor is instantiated before it is passed to the kubelet as a dependency.
cc @kubernetes/sig-node-pr-reviews @derekwaynecarr
/kind feature
/sig node
/assign @Random-Liu @derekwaynecarr
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move 'alpha' KubeletConfiguration fields that aren't feature-gated and self-registration fields to KubeletFlags
Some of these fields are marked "alpha" in help text. They cannot be in the KubeletConfiguration object unless they are feature gated or graduated from alpha.
Others relate to Kubelet self-registration, and given https://github.com/kubernetes/community/pull/911 I think its prudent to wait and see if these really should be in the KubeletConfiguration type.
For now we just leave them all as flags.
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[Part 1] Remove docker dep in kubelet startup
**What this PR does / why we need it**:
Remove dependency of docker during kubelet start up.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Part 1 of #54090
**Special notes for your reviewer**:
Changes include:
1. Move docker client initialization into dockershim pkg.
2. Pass a docker `ClientConfig` from kubelet to dockershim
3. Pass parameters needed by `FakeDockerClient` thru `ClientConfig` to dockershim
(TODO, the second part) Make dockershim tolerate when dockerd is down, otherwise it will still fail kubelet
Please note after this PR, kubelet will still fail if dockerd is down, this will be fixed in the subsequent PR by making dockershim tolerate dockerd failure (initializing docker client in a separate goroutine), and refactoring cgroup and log driver detection.
**Release note**:
```release-note
Remove docker dependency during kubelet start up
```
Automatic merge from submit-queue (batch tested with PRs 54460, 55258, 54858, 55506, 55510). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
redendancy code and error log message in cni
**What this PR does / why we need it**:
redendancy code and error log message in cni
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/sig-node
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add admission handler for device resources allocation
**What this PR does / why we need it**:
Add admission handler for device resources allocation to fail fast during pod creation
**Which issue this PR fixes**
fixes#51592
**Special notes for your reviewer**:
@jiayingz Sorry, there is something wrong with my branch in #51895. And I think the existing comments in the PR might be too long for others to view. So I closed it and opened the new one, as we have basically reach an agreement on the implement :)
I have covered the functionality and unit test part here, and would set about the e2e part ASAP
/cc @jiayingz @vishh @RenaudWasTaken
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52367, 53363, 54989, 54872, 54643). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Lift embedded structure out of ManifestURLHeader field
Related: #53833
```release-note
It is now possible to set multiple manifest url headers via the Kubelet's --manifest-url-header flag. Multiple headers for the same key will be added in the order provided. The ManifestURLHeader field in KubeletConfiguration object (kubeletconfig/v1alpha1) is now a map[string][]string, which facilitates writing JSON and YAML files.
```
Automatic merge from submit-queue (batch tested with PRs 54040, 52503). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Get fallback termination msg from docker when using journald log driver
**What this PR does / why we need it**:
When using the legacy docker container runtime and when a container has `terminationMessagePolicy=FallbackToLogsOnError` and when docker is configured with a log driver other than `json-log` (such as `journald`), the kubelet should not try to get the container's log from the json log file (since it's not there) but should instead ask docker for the logs.
**Which issue this PR fixes** fixes#52502
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixed log fallback termination messages when using docker with journald log driver
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix pkg/ depends on cmd/ problems
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Partial fix for https://github.com/kubernetes/kubernetes/issues/53341
**Special notes for your reviewer**:
No logic changes, Just moving things around
**Release note**:
```release-note
NONE
```
Revert "Merge pull request #51857 from kubernetes/revert-51307-kc-type-refactor"
This reverts commit 9d27d92420, reversing
changes made to 2e69d4e625.
See original: #51307
We punted this from 1.8 so it could go through an API review. The point
of this PR is that we are trying to stabilize the kubeletconfig API so
that we can move it out of alpha, and unblock features like Dynamic
Kubelet Config, Kubelet loading its initial config from a file instead
of flags, kubeadm and other install tools having a versioned API to rely
on, etc.
We shouldn't rev the version without both removing all the deprecated
junk from the KubeletConfiguration struct, and without (at least
temporarily) removing all of the fields that have "Experimental" in
their names. It wouldn't make sense to lock in to deprecated fields.
"Experimental" fields can be audited on a 1-by-1 basis after this PR,
and if found to be stable (or sufficiently alpha-gated), can be restored
to the KubeletConfiguration without the "Experimental" prefix.
Automatic merge from submit-queue (batch tested with PRs 53403, 53233). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove containers from deleted pods once containers have exited
Issue #51899
Since container deletion is currently done through periodic garbage collection every 30 seconds, it takes a long time for pods to be deleted, and causes the kubelet to send all delete pod requests at the same time, which has performance issues. This PR makes the kubelet actively remove containers of deleted pods rather than wait for them to be removed in periodic garbage collection.
/release-note-none
Automatic merge from submit-queue (batch tested with PRs 51021, 53225, 53094, 53219). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change ImageGCManage to consume ImageFS stats from StatsProvider
Fixes#53083.
**Release note**:
```
Change ImageGCManage to consume ImageFS stats from StatsProvider
```
/assign @Random-Liu
When using the legacy docker container runtime and when a container has
terminationMessagePolicy=FallbackToLogsOnError and when docker is
configured with a log driver other than json-log (such as journald),
the kubelet should not try to get the container's log from the
json log file (since it's not there) but should instead ask docker for
the logs.
Stop supporting the "nsenter" exec handler. Only the Docker native exec
handler is supported.
The flag was deprecated in Kubernetes 1.6 and is safe to remove
in Kubernetes 1.9 according to the deprecation policy.
Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..
Eliminate hangs/throttling of node heartbeat
Fixes https://github.com/kubernetes/kubernetes/issues/48638Fixes#50304
Stops kubelet from wedging when updating node status if unable to establish tcp connection.
Notes that this only affects the node status loop. The pod sync loop would still hang until the dead TCP connections timed out, so more work is needed to keep the sync loop responsive in the face of network issues, but this change lets existing pods coast without the node controller trying to evict them
```release-note
kubelet to master communication when doing node status updates now has a timeout to prevent indefinite hangs
```
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Set up DNS server in containerized mounter path
During NFS/GlusterFS mount, it requires to have DNS server to be able to
resolve service name. This PR gets the DNS server ip from kubelet and
add it to the containerized mounter path. So if containerized mounter is
used, service name could be resolved during mount
**Release note**:
```release-note
Allow DNS resolution of service name for COS using containerized mounter. It fixed the issue with DNS resolution of NFS and Gluster services.
```
Automatic merge from submit-queue (batch tested with PRs 51186, 50350, 51751, 51645, 51837)
Wait for container cleanup before deletion
We should wait to delete pod API objects until the pod's containers have been cleaned up. See issue: #50268 for background.
This changes the kubelet container gc, which deletes containers belonging to pods considered "deleted".
It adds two conditions under which a pod is considered "deleted", allowing containers to be deleted:
Pods where deletionTimestamp is set, and containers are not running
Pods that are evicted
This PR also changes the function PodResourcesAreReclaimed by making it return false if containers still exist.
The eviction manager will wait for containers of previous evicted pod to be deleted before evicting another pod.
The status manager will wait for containers to be deleted before removing the pod API object.
/assign @vishh
During NFS/GlusterFS mount, it requires to have DNS server to be able to
resolve service name. This PR gets the DNS server ip from kubelet and
add it to the containerized mounter path. So if containerized mounter is
used, service name could be resolved during mount
Kubelet makes sure that /var/lib/kubelet is rshared when it starts.
If not, it bind-mounts it with rshared propagation to containers
that mount volumes to /var/lib/kubelet can benefit from mount propagation.
Automatic merge from submit-queue (batch tested with PRs 50932, 49610, 51312, 51415, 50705)
Deprecation warnings for auto detecting cloud providers
**What this PR does / why we need it**:
Adds deprecation warnings for auto detecting cloud providers. As part of the initiative for out-of-tree cloud providers, this feature is conflicting since we're shifting the dependency of kubernetes core into cAdvisor. In the future kubelets should be using `--cloud-provider=external` or no cloud provider at all.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#50986
**Special notes for your reviewer**:
NOTE: I still have to coordinate with sig-node and kubernetes-dev to get approval for this deprecation, I'm only opening this PR since we're close to code freeze and it's something presentable.
**Release note**:
```release-note
Deprecate auto detecting cloud providers in kubelet. Auto detecting cloud providers go against the initiative for out-of-tree cloud providers as we'll now depend on cAdvisor integrations with cloud providers instead of the core repo. In the near future, `--cloud-provider` for kubelet will either be an empty string or `external`.
```
Automatic merge from submit-queue (batch tested with PRs 50932, 49610, 51312, 51415, 50705)
Implement StatsProvider interface using cadvisor
Ref: https://github.com/kubernetes/kubernetes/issues/46984
- This PR changes the `StatsProvider` interface in `pkg/kubelet/server/stats` so that it can provide container stats from either cadvisor or CRI, and the summary API can consume the stats without knowing how they are provided.
- The `StatsProvider` struct in the newly added package `pkg/kubelet/stats` implements part of the `StatsProvider` interface in `pkg/kubelet/server/stats`.
- In `pkg/kubelet/stats`,
- `stats_provider.go`: implements the node level stats and provides the entry point for this package.
- `cadvisor_stats_provider.go`: implements the container level stats using cadvisor.
- `cri_stats_provider.go`: implements the container level stats using CRI.
- `helper.go`: utility functions shared by the above three components.
- There should be no user visible behaviors change in this PR.
- A follow up PR will implement the StatsProvider interface using CRI.
**Release note**:
```
None
```
/assign @yujuhong
/assign @WIZARD-CXY
Automatic merge from submit-queue (batch tested with PRs 49284, 49555, 47639, 49526, 49724)
skip WaitForAttachAndMount for terminated pods in syncPod
Fixes https://github.com/kubernetes/kubernetes/issues/49663
I tried to tread lightly with a small localized change because this needs to be picked to 1.7 and 1.6 as well.
I suspect this has been as issue since we started unmounting volumes on pod termination https://github.com/kubernetes/kubernetes/pull/37228
xref openshift/origin#14383
@derekwaynecarr @eparis @smarterclayton @saad-ali @jwforres
/release-note-none
Automatic merge from submit-queue (batch tested with PRs 49651, 49707, 49662, 47019, 49747)
Add support for `no_new_privs` via AllowPrivilegeEscalation
**What this PR does / why we need it**:
Implements kubernetes/community#639
Fixes#38417
Adds `AllowPrivilegeEscalation` and `DefaultAllowPrivilegeEscalation` to `PodSecurityPolicy`.
Adds `AllowPrivilegeEscalation` to container `SecurityContext`.
Adds the proposed behavior to `kuberuntime`, `dockershim`, and `rkt`. Adds a bunch of unit tests to ensure the desired default behavior and that when `DefaultAllowPrivilegeEscalation` is explicitly set.
Tests pass locally with docker and rkt runtimes. There are also a few integration tests with a `setuid` binary for sanity.
**Release note**:
```release-note
Adds AllowPrivilegeEscalation to control whether a process can gain more privileges than it's parent process
```
Automatic merge from submit-queue (batch tested with PRs 48976, 49474, 40050, 49426, 49430)
Use presence of kubeconfig file to toggle standalone mode
Fixes#40049
```release-note
The deprecated --api-servers flag has been removed. Use --kubeconfig to provide API server connection information instead. The --require-kubeconfig flag is now deprecated. The default kubeconfig path is also deprecated. Both --require-kubeconfig and the default kubeconfig path will be removed in Kubernetes v1.10.0.
```
/cc @kubernetes/sig-cluster-lifecycle-misc @kubernetes/sig-node-misc
Automatic merge from submit-queue (batch tested with PRs 48976, 49474, 40050, 49426, 49430)
Remove duplicated import and wrong alias name of api package
**What this PR does / why we need it**:
**Which issue this PR fixes**: fixes#48975
**Special notes for your reviewer**:
/assign @caesarxuchao
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Remove flags low-diskspace-threshold-mb and outofdisk-transition-frequency
issue: #48843
This removes two flags replaced by the eviction manager. These have been depreciated for two releases, which I believe correctly follows the kubernetes depreciation guidelines.
```release-note
Remove depreciated flags: --low-diskspace-threshold-mb and --outofdisk-transition-frequency, which are replaced by --eviction-hard
```
cc @mtaufen since I am changing kubelet flags
cc @vishh @derekwaynecarr
/sig node
Automatic merge from submit-queue (batch tested with PRs 48636, 49088, 49251, 49417, 49494)
Fix issues for local storage allocatable feature
This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
This PR fixes issue #47809
Replaces use of --api-servers with --kubeconfig in Kubelet args across
the turnup scripts. In many cases this involves generating a kubeconfig
file for the Kubelet and placing it in the correct location on the node.
Automatic merge from submit-queue (batch tested with PRs 48981, 47316, 49180)
Added golint check for pkg/kubelet.
**What this PR does / why we need it**:
Added golint check for pkg/kubelet, and make golint happy.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#47315
**Release note**:
```release-note-none
```
This PR fixes the following issues:
1. Use ResourceStorageScratch instead of ResourceStorage API to represent
local storage capacity
2. In eviction manager, use container manager instead of node provider
(kubelet) to retrieve the node capacity and reserved resources. Node
provider (kubelet) has a feature gate so that storagescratch information
may not be exposed if feature gate is not set. On the other hand,
container manager has all the capacity and allocatable resource
information.
Since v1.5 and the removal of --configure-cbr0:
0800df74ab "Remove the legacy networking mode --configure-cbr0"
kubelet hasn't done any shaping operations internally. They
have all been delegated to network plugins like kubenet or
external CNI plugins. But some shaping code was still left
in kubelet, so remove it now that it's unused.
Centralize Capacity discovery of standard resources in Container manager.
Have storage derive node capacity from container manager.
Move certain cAdvisor interfaces to the cAdvisor package in the process.
This patch fixes a bug in container manager where it was writing to a map without synchronization.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
Automatic merge from submit-queue (batch tested with PRs 47694, 47772, 47783, 47803, 47673)
Make different container runtimes constant
Make different container runtimes constant to avoid hardcode
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124)
fix sync loop health check
This PR will do error logging about the fall behind sync for kubelet instead of sync loop healthz checking.
The reason is kubelet can not do sync loop and therefore can not update sync loop time when there is any runtime error, such as docker hung.
When there is any runtime error, according to current implementation, kubelet will not do sync operation and thus kubelet's sync loop time will not be updated. This will make when there is any runtime error, kubelet will also return non 200 response status code when accessing healthz endpoint. This is contrary with #37865 which prevents kubelet from being killed when docker hangs.
**Release note**:
```release-note
fix sync loop health check with seperating runtime errors
```
/cc @yujuhong @Random-Liu @dchen1107
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
Automatic merge from submit-queue (batch tested with PRs 46635, 45619, 46637, 45059, 46415)
Certificate rotation for kubelet server certs.
Replaces the current kubelet server side self signed certs with certs signed by
the Certificate Request Signing API on the API server. Also renews expiring
kubelet server certs as expiration approaches.
Two Points:
1. With `--feature-gates=RotateKubeletServerCertificate=true` set, the kubelet will
request a certificate during the boot cycle and pause waiting for the request to
be satisfied.
2. In order to have the kubelet's certificate signing request auto approved,
`--insecure-experimental-approve-all-kubelet-csrs-for-group=` must be set on
the cluster controller manager. There is an improved mechanism for auto
approval [proposed](https://github.com/kubernetes/kubernetes/issues/45030).
**Release note**:
```release-note
With `--feature-gates=RotateKubeletServerCertificate=true` set, the kubelet will
request a server certificate from the API server during the boot cycle and pause
waiting for the request to be satisfied. It will continually refresh the certificate as
the certificates expiration approaches.
```
Replaces the current kubelet server side self signed certs with certs
signed by the Certificate Request Signing API on the API server. Also
renews expiring kubelet server certs as expiration approaches.
Automatic merge from submit-queue (batch tested with PRs 46124, 46434, 46089, 45589, 46045)
Support TCP type runtime endpoint for kubelet
**What this PR does / why we need it**:
Currently the grpc server for kubelet and dockershim has a hardcoded endpoint: unix socket '/var/run/dockershim.sock', which is not applicable on non-unix OS.
This PR is to support TCP endpoint type besides unix socket.
**Which issue this PR fixes**
This is a first attempt to address issue https://github.com/kubernetes/kubernetes/issues/45927
**Special notes for your reviewer**:
Before this change, running on Windows node results in:
```
Container Manager is unsupported in this build
```
After adding the cm stub, error becomes:
```
listen unix /var/run/dockershim.sock: socket: An address incompatible with the requested protocol was used.
```
This PR is to fix those two issues.
After this change, still meets 'seccomp' related issue when running on Windows node, needs more updates later.
**Release note**:
Automatic merge from submit-queue
Fix some typo of comment in kubelet.go
**What this PR does / why we need it**:
The PR is to fix some typo in kubelet.go
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
N/A
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46022, 46055, 45308, 46209, 43590)
Eviction does not evict unless the previous pod has been cleaned up
Addresses #43166
This PR makes two main changes:
First, it makes the eviction loop re-trigger immediately if there may still be pressure. This way, if we already waited 10 seconds to delete a pod, we dont need to wait another 10 seconds for the next synchronize call.
Second, it waits for the pod to be cleaned up (including volumes, cgroups, etc), before moving on to the next synchronize call. It has a timeout for this operation currently set to 30 seconds.
Automatic merge from submit-queue
Reorganize kubelet tree so apis can be independently versioned
@yujuhong @lavalamp @thockin @bgrant0607
This is an example of how we might reorganize `pkg/kubelet` so the apis it exposes can be independently versioned. This would also provide a logical place to put the `KubeletConfiguration` type, which currently lives in `pkg/apis/componentconfig`; it could live in e.g. `pkg/kubelet/apis/config` instead.
Take a look when you have a chance and let me know what you think. The most significant change in this PR is reorganizing `pkg/kubelet/api` to `pkg/kubelet/apis`, the rest is pretty much updating import paths and `BUILD` files.
Automatic merge from submit-queue (batch tested with PRs 45685, 45572, 45624, 45723, 45733)
Remove unused fields from Kubelet struct
Just a small attempt to clean up some unused fields in the kubelet struct. This doesn't make any actual code changes.
/assign @mtaufen
Automatic merge from submit-queue
Enable shared PID namespace by default for docker pods
**What this PR does / why we need it**: This PR enables PID namespace sharing for docker pods by default, bringing the behavior of docker in line with the other CRI runtimes when used with docker >= 1.13.1.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: ref #1615
**Special notes for your reviewer**: cc @dchen1107 @yujuhong
**Release note**:
```release-note
Kubernetes now shares a single PID namespace among all containers in a pod when running with docker >= 1.13.1. This means processes can now signal processes in other containers in a pod, but it also means that the `kubectl exec {pod} kill 1` pattern will cause the pod to be restarted rather than a single container.
```
Automatic merge from submit-queue (batch tested with PRs 45453, 45307, 44987)
Migrate the docker client code from dockertools to dockershim
Move docker client code from dockertools to dockershim/libdocker. This includes
DockerInterface (renamed to Interface), FakeDockerClient, etc.
This is part of #43234
Automatic merge from submit-queue (batch tested with PRs 45508, 44258, 44126, 45441, 45320)
cloud initialize node in external cloud controller
@thockin This PR adds support in the `cloud-controller-manager` to initialize nodes (instead of kubelet, which did it previously)
This also adds support in the kubelet to skip node cloud initialization when `--cloud-provider=external`
Specifically,
Kubelet
1. The kubelet has a new flag called `--provider-id` which uniquely identifies a node in an external DB
2. The kubelet sets a node taint - called "ExternalCloudProvider=true:NoSchedule" if cloudprovider == "external"
Cloud-Controller-Manager
1. The cloud-controller-manager listens on "AddNode" events, and then processes nodes that starts with that above taint. It performs the cloud node initialization steps that were previously being done by the kubelet.
2. On addition of node, it figures out the zone, region, instance-type, removes the above taint and updates the node.
3. Then periodically queries the cloudprovider for node addresses (which was previously done by the kubelet) and updates the node if there are new addresses
```release-note
NONE
```
Automatic merge from submit-queue
adds log when gpuManager.start() failed
If gpuManager.start() returns error, there is no log.
We confused with scheduler do not schedule any pod(with gpu) to one node.
kubectl describe node xxx shows there is no gpu on that node, because the gpu driver do not work on that node, gpuManager.start() failed, but we can not see anything in log.
Automatic merge from submit-queue (batch tested with PRs 45316, 45341)
Pass NoOpLegacyHost to dockershim in --experimental-dockershim mode
This allows dockershim to use network plugins, if needed.
/cc @Random-Liu
This commit deletes code in dockertools that is only used by
DockerManager. A follow-up change will rename and clean up the rest of
the files in this package.
The commit also sets EnableCRI to true if the container runtime is not
rkt. A follow-up change will remove the flag/field and all references to
it.
rktnetes is not a CRI implementation, and does not provide runtime
conditions. This change fixes the issue where rkt will never be
considered running from kubelet's point of view.
Make the location of dockershim.sock configurable, so downstream
projects (such as OpenShift) can place it in a location that does not
require root access (e.g. for integration tests).
Make the kubelet respect and use the values of
--container-runtime-endpoint and --image-service-endpoint, if set. If
unset, the default value of /var/run/dockershim.sock is used.
Automatic merge from submit-queue
Clearer ImageGC failure errors. Fewer events.
Addresses #26000. Kubelet often "fails" image garbage collection if cAdvisor has not completed the first round of stats collection. Don't create events for a single failure, and make log messages more specific.
@kubernetes/sig-node-bugs
Kubelet flags are not necessarily appropriate for the KubeletConfiguration
object. For example, this PR also removes HostnameOverride and NodeIP
from KubeletConfiguration. This is a preleminary step to enabling Nodes
to share configurations, as part of the dynamic Kubelet configuration
feature (#29459). Fields that must be unique for each node inhibit
sharing, because their values, by definition, cannot be shared.
Automatic merge from submit-queue
[Bug] Fix gpu initialization in Kubelet
Kubelet incorrectly fails if `AllAlpha=true` feature gate is enabled with container runtimes that are not `docker`.
Replaces #42407
Automatic merge from submit-queue (batch tested with PRs 41931, 39821, 41841, 42197, 42195)
Use `docker logs` directly if the docker logging driver is not `json-file`
Fixes https://github.com/kubernetes/kubernetes/issues/41996.
Post the PR first, I still need to manually test this, because we don't have test coverage for journald logging pluggin.
@yujuhong @dchen1107
/cc @kubernetes/sig-node-pr-reviews
Introduced chages:
1. Re-writing of the resolv.conf file generated by docker.
Cluster dns settings aren't passed anymore to docker api in all cases, not only for pods with host network:
the resolver conf will be overwritten after infra-container creation to override docker's behaviour.
2. Added new one dnsPolicy - 'ClusterFirstWithHostNet', so now there are:
- ClusterFirstWithHostNet - use dns settings in all cases, i.e. with hostNet=true as well
- ClusterFirst - use dns settings unless hostNetwork is true
- Default
Fixes#17406
Automatic merge from submit-queue
Extend experimental support to multiple Nvidia GPUs
Extended from #28216
```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with support for multiple Nvidia GPUs.
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```
1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```
TODO:
- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)
Guaranteed admission for Critical Pods
This is the first step in implementing node-level preemption for critical pods.
It defines the AdmissionFailureHandler interface, which allows callers, like the kubelet, to define how failed predicates are handled, and take steps to correct failures if necessary.
In the kubelet's implementation, it triggers preemption if the pod being admitted is critical, and if the only failed predicates are InsufficientResourceErrors, then it prempts (not yet implemented) other other pods to allow admission of the critical pod.
cc: @vishh
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
Automatic merge from submit-queue (batch tested with PRs 38101, 41431, 39606, 41569, 41509)
Report node not ready on failed PLEG health check
Report node not ready if PLEG health check fails.
Automatic merge from submit-queue (batch tested with PRs 38101, 41431, 39606, 41569, 41509)
optimize killPod() and syncPod() functions
make sure that one of the two arguments must be non-nil: runningPod, status ,just like the function note says
and judge the return value in syncPod() function before setting podKilled
Automatic merge from submit-queue (batch tested with PRs 41466, 41456, 41550, 41238, 41416)
Delay Deletion of a Pod until volumes are cleaned up
#41436 fixed the bug that caused #41095 and #40239 to have to be reverted. Now that the bug is fixed, this shouldn't cause problems.
@vishh @derekwaynecarr @sjenning @jingxu97 @kubernetes/sig-storage-misc
make sure that one of the two arguments must be non-nil: runningPod, status ,just like the function note says
and judge the return value in syncPod() function before setting podKilled
Automatic merge from submit-queue
Allow multipe DNS servers as comma-seperated argument for kubelet --dns
This PR explores how kubectls "--dns" could be extended to specify multiple DNS servers for in-cluster PODs. Testing on the local libvirt-coreos cluster shows that multiple DNS server are injected without issues.
Specifying multiple DNS servers increases resilience against
- Packet drops
- Single server failure
I am debugging services that do 50+ DNS requests for a single incoming interactive request, thus highly increase the chance of a slowdown (+5s) due to a single packet drop. Switching to two DNS servers will reduce the impact of the issues (roughly +1s on glibc, 0s on musl, error-rate goes down to error-rate^2).
Note that there is no need to change any runtime related code as far as I know. In the case of "default" dns the /etc/resolv.conf is parsed and multiple DNS server are send to the backend anyway. This only adds the same capability for the clusterFirst case.
I've heard from @thockin that multiple DNS entries are somehow considered. I've no idea what was considered, though. This is what I would like to see for our production use, though.
```release-note
NONE
```
This change makes kubelet to use the CRI implementation by default,
unless the users opt out explicitly by using --enable-cri=false.
For the rkt integration, the --enable-cri flag will have no effect
since rktnetes does not use CRI.
Also, mark the original --experimental-cri flag hidden and deprecated,
so that we can remove it in the next release.
Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210)
Implement TTL controller and use the ttl annotation attached to node in secret manager
For every secret attached to a pod as volume, Kubelet is trying to refresh it every sync period. Currently Kubelet has a ttl-cache of secrets of its pods and the ttl is set to 1 minute. That means that in large clusters we are targetting (5k nodes, 30pods/node), given that each pod has a secret associated with ServiceAccount from its namespaces, and with large enough number of namespaces (where on each node (almost) every pod is from a different namespace), that resource in ~30 GETs to refresh all secrets every minute from one node, which gives ~2500QPS for GET secrets to apiserver.
Apiserver cannot keep up with it very easily.
Desired solution would be to watch for secret changes, but because of security we don't want a node watching for all secrets, and it is not possible for now to watch only for secrets attached to pods from my node.
So as a temporary solution, we are introducing an annotation that would be a suggestion for kubelet for the TTL of secrets in the cache and a very simple controller that would be setting this annotation based on the cluster size (the large cluster is, the bigger ttl is).
That workaround mean that only very local changes are needed in Kubelet, we are creating a well separated very simple controller, and once watching "my secrets" will be possible it will be easy to remove it and switch to that. And it will allow us to reach scalability goals.
@dchen1107 @thockin @liggitt
Automatic merge from submit-queue (batch tested with PRs 40873, 40948, 39580, 41065, 40815)
[CRI] Enable Hostport Feature for Dockershim
Commits:
1. Refactor common hostport util logics and add more tests
2. Add HostportManager which can ADD/DEL hostports instead of a complete sync.
3. Add Interface for retreiving portMappings information of a pod in Network Host interface.
Implement GetPodPortMappings interface in dockerService.
4. Teach kubenet to use HostportManager
Automatic merge from submit-queue (batch tested with PRs 41103, 41042, 41097, 40946, 40770)
Use Clientset interface in KubeletDeps
**What this PR does / why we need it**:
This replaces the Clientset struct with the equivalent interface for the KubeClient injected via KubeletDeps. This is useful for testing and for accessing the Node and Pod status event stream without an API server.
**Special notes for your reviewer**:
Follow up to #4907
**Release note**:
`NONE`
Automatic merge from submit-queue (batch tested with PRs 40289, 40877, 40879, 39972, 40942)
Rename experimental-cgroups-per-pod flag
**What this PR does / why we need it**:
1. Rename `experimental-cgroups-per-qos` to `cgroups-per-qos`
1. Update hack/local-up-cluster to match `CGROUP_DRIVER` with docker runtime if used.
**Special notes for your reviewer**:
We plan to roll this feature out in the upcoming release. Previous node e2e runs were running with this feature on by default. We will default this feature on for all e2es next week.
**Release note**:
```release-note
Rename --experiemental-cgroups-per-qos to --cgroups-per-qos
```
Automatic merge from submit-queue
Optionally avoid evicting critical pods in kubelet
For #40573
```release-note
When feature gate "ExperimentalCriticalPodAnnotation" is set, Kubelet will avoid evicting pods in "kube-system" namespace that contains a special annotation - `scheduler.alpha.kubernetes.io/critical-pod`
This feature should be used in conjunction with the rescheduler to guarantee availability for critical system pods - https://kubernetes.io/docs/admin/rescheduler/
```
Depending on an exact cluster setup multiple dns may make sense.
Comma-seperated lists of DNS server are quite common as DNS servers
are always plain IPs.
- split out port forwarding into its own package
Allow multiple port forwarding ports
- Make it easy to determine which port is tied to which channel
- odd channels are for data
- even channels are for errors
- allow comma separated ports to specify multiple ports
Add portfowardtester 1.2 to whitelist
Automatic merge from submit-queue (batch tested with PRs 40232, 40235, 40237, 40240)
move listers out of cache to reduce import tree
Moving the listers from `pkg/client/cache` snips links to all the different API groups from `pkg/storage`, but the dreaded `ListOptions` remains.
@sttts
Automatic merge from submit-queue (batch tested with PRs 37228, 40146, 40075, 38789, 40189)
kubelet: storage: teardown terminated pod volumes
This is a continuation of the work done in https://github.com/kubernetes/kubernetes/pull/36779
There really is no reason to keep volumes for terminated pods attached on the node. This PR extends the removal of volumes on the node from memory-backed (the current policy) to all volumes.
@pmorie raised a concern an impact debugging volume related issues if terminated pod volumes are removed. To address this issue, the PR adds a `--keep-terminated-pod-volumes` flag the kubelet and sets it for `hack/local-up-cluster.sh`.
For consideration in 1.6.
Fixes#35406
@derekwaynecarr @vishh @dashpole
```release-note
kubelet tears down pod volumes on pod termination rather than pod deletion
```
Automatic merge from submit-queue (batch tested with PRs 40168, 40165, 39158, 39966, 40190)
dockershim: add support for the 'nsenter' exec handler
This change simply plumbs the kubelet configuration
(--docker-exec-handler) to DockerService.
This fixes#35747.
- exclude duplicates while merging of host's and dns' search lines to form pod's one
- truncate pod's search line if it exceeds resolver limits: is > 255 chars and containes > 6 searches
- monitoring the resolv.conf file which is used by kubelet (set thru --resolv-conf="") and logging and eventing if search line in it consists of more than 3 entries
(or 6 if Cluster Domain is set) or its lenght is > 255 chars
- logging and eventing when a pod's search line is > 255 chars or containes > 6 searches during forming
Fixes#29270
Automatic merge from submit-queue
kubelet: remove the pleg health check from healthz
This prevents kubelet from being killed when docker hangs.
Also, kubelet will report node not ready if PLEG hangs (`docker ps` + `docker inspect`).
Automatic merge from submit-queue (batch tested with PRs 39079, 38991, 38673)
Support systemd based pod qos in CRI dockershim
This PR makes pod level QoS works for CRI dockershim for systemd based cgroups. And will also fix#36807
- [x] Add cgroupDriver to dockerService and use docker info api to set value for it
- [x] Add a NOTE that detection only works for docker 1.11+, see [CHANGE LOG](https://github.com/docker/docker/blob/master/CHANGELOG.md#1110-2016-04-13)
- [x] Generate cgroupParent in syntax expected by cgroupDriver
- [x] Set cgroupParent to hostConfig for both sandbox and user container
- [x] Check if kubelet conflicts with cgroup driver of docker
cc @derekwaynecarr @vishh
Automatic merge from submit-queue (batch tested with PRs 38154, 38502)
Rename "release_1_5" clientset to just "clientset"
We used to keep multiple releases in the main repo. Now that [client-go](https://github.com/kubernetes/client-go) does the versioning, there is no need to keep releases in the main repo. This PR renames the "release_1_5" clientset to just "clientset", clientset development will be done in this directory.
@kubernetes/sig-api-machinery @deads2k
```release-note
The main repository does not keep multiple releases of clientsets anymore. Please find previous releases at https://github.com/kubernetes/client-go
```
Automatic merge from submit-queue (batch tested with PRs 37208, 37446, 37420)
Kubelet log modification
Keep in line with the other error logs in the function.
After return, the caller records the error log.Delete redundant logs
Automatic merge from submit-queue
Function annotation modification
“return kl.pleg.Healthy()”,Based on the return function,"healty" to "healthy" better
Automatic merge from submit-queue
kubelet: don't reject pods without adding them to the pod manager
kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.
This should fix#37658
Automatic merge from submit-queue
remove checking mount point in cleanupOrphanedPodDirs
To avoid nfs hung problem, remove the mountpoint checking code in
cleanupOrphanedPodDirs(). This removal should still be safe because it checks whether there are still directories under pod's volume and if so, do not delete the pod directory.
Note: After removing the mountpoint check code in cleanupOrphanedPodDirs(), the directories might not be cleaned up in such situation.
1. delete pod, kubelet reconciler tries to unmount the volume directory successfully
2. before reconciler tries to delete the volume directory, kubelet gets retarted
3. since under pod directory, there are still volume directors exist (but not mounted), cleanupOrphanedPodDIrs() will not clean them up.
Will work on a follow up PR to solve above issue.
kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.
Also check if the pod to-be-admitted has terminated or not. In the case where
it has terminated, skip the admission process completely.
Automatic merge from submit-queue
Add e2e node test for log path
fixes#34661
A node e2e test to check if container logs files are properly created with right content.
Since the log files under `/var/log/containers` are actually symbolic of docker containers log files, we can not use a pod to mount them in and do check (symbolic doesn't supported by docker volume).
cc @Random-Liu
Automatic merge from submit-queue
Use indirect streaming path for remote CRI shim
Last step for https://github.com/kubernetes/kubernetes/issues/29579
- Wire through the remote indirect streaming methods in the docker remote shim
- Add the docker streaming server as a handler at `<node>:10250/cri/{exec,attach,portforward}`
- Disable legacy streaming for dockershim
Note: This requires PR https://github.com/kubernetes/kubernetes/pull/34987 to work.
Tested manually on an E2E cluster.
/cc @euank @feiskyer @kubernetes/sig-node
Automatic merge from submit-queue
kubelet bootstrap: start hostNetwork pods before we have PodCIDR
Network readiness was checked in the pod admission phase, but pods that
fail admission are not retried. Move the check to the pod start phase.
Issue #35409
Issue #35521
Automatic merge from submit-queue
CRI: rearrange kubelet rutnime initialization
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
Automatic merge from submit-queue
CRI: Add Status into CRI.
For https://github.com/kubernetes/kubernetes/issues/35701.
Fixes https://github.com/kubernetes/kubernetes/issues/35701.
This PR added a `Status` call in CRI, and the `RuntimeStatus` is defined as following:
``` protobuf
message RuntimeCondition {
// Type of runtime condition.
optional string type = 1;
// Status of the condition, one of true/false.
optional bool status = 2;
// Brief reason for the condition's last transition.
optional string reason = 3;
// Human readable message indicating details about last transition.
optional string message = 4;
}
message RuntimeStatus {
// Conditions is an array of current observed runtime conditions.
repeated RuntimeCondition conditions = 1;
}
```
Currently, only `conditions` is included in `RuntimeStatus`, and the definition is almost the same with `NodeCondition` and `PodCondition` in K8s api.
@yujuhong @feiskyer @bprashanth If this makes sense, I'll send a follow up PR to let dockershim return `RuntimeStatus` and let kubelet make use of it.
@yifan-gu @euank Does this make sense to rkt?
/cc @kubernetes/sig-node
Automatic merge from submit-queue
[AppArmor] Hold bad AppArmor pods in pending rather than rejecting
Fixes https://github.com/kubernetes/kubernetes/issues/32837
Overview of the fix:
If the Kubelet needs to reject a Pod for a reason that the control plane doesn't understand (e.g. which AppArmor profiles are installed on the node), then it might contiinuously try to run the pod on the same rejecting node. This change adds a concept of "soft rejection", in which the Pod is admitted, but not allowed to run (and therefore held in a pending state). This prevents the pod from being retried on other nodes, but also prevents the high churn. This is consistent with how other missing local resources (e.g. volumes) is handled.
A side effect of the change is that Pods which are not initially runnable will be retried. This is desired behavior since it avoids a race condition when a new node is brought up but the AppArmor profiles have not yet been loaded on it.
``` release-note
Pods with invalid AppArmor configurations will be held in a Pending state, rather than rejected (failed). Check the pod status message to find out why it is not running.
```
@kubernetes/sig-node @timothysc @rrati @davidopp
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
This change add a container manager inside the dockershim to move docker daemon
and associated processes to a specified cgroup. The original kubelet container
manager will continue checking the name of the cgroup, so that kubelet know how
to report runtime stats.
Automatic merge from submit-queue
Add kubelet awareness to taint tolerant match caculator.
Add kubelet awareness to taint tolerant match caculator.
Ref: #25320
This is required by `TaintEffectNoScheduleNoAdmit` & `TaintEffectNoScheduleNoAdmitNoExecute `, so that node will know if it should expect the taint&tolerant
Automatic merge from submit-queue
Add node event for container/image GC failure
Follow up to #31988. Add an event for a node when container/image GC fails.
Automatic merge from submit-queue
Add seccomp and apparmor support.
This PR adds seccomp and apparmor support in new CRI.
This a WIP because I'm still adding unit test for some of the functions. Sent this PR here for design discussion.
This PR is similar with https://github.com/kubernetes/kubernetes/pull/33450.
The differences are:
* This PR passes seccomp and apparmor configuration via annotations;
* This PR keeps the seccomp handling logic in docker shim because current seccomp implementation is very docker specific, and @timstclair told me that even the json seccomp profile file is defined by docker.
Notice that this PR almost passes related annotations in `api.Pod` to the runtime directly instead of introducing new CRI annotation.
@yujuhong @feiskyer @timstclair
Automatic merge from submit-queue
Node-ip is not used when cloud provider is used
Currently --node-ip in kubelet is not being used when kubelet is configured with a cloud provider. With this fix, kubelet will get a list of IPs from the provider and parse it to return the one that matches node-ip.
This fixes#23568
Automatic merge from submit-queue
Use strongly-typed types.NodeName for a node name
We had another bug where we confused the hostname with the NodeName.
Also, if we want to use different values for the Node.Name (which is
an important step for making installation easier), we need to keep
better control over this.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Automatic merge from submit-queue
Move Kubelet pod-management code into kubelet_pods.go
Finish the kubelet code moves started during the 1.3 dev cycle -- move pod management code into a file called `kubelet_pods.go`.
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
Automatic merge from submit-queue
Add positive logging for GC events
We have no positive logging for GC events. This PR:
1. Adds positive logging at V(4) for success cases
2. Adds positive logging at V(1) for the first successful GC after a failure
Automatic merge from submit-queue
Move image pull throttling logic to pkg/kubelet/images
This is part of #31458
This allows runtimes in different packages (dockertools, rkt, kuberuntime) to
share the same logic. Before this change, only dockertools support this
feature. Now all three packages support image pull throttling.
/cc @kubernetes/sig-node
This allows runtimes in different packages (dockertools, rkt, kuberuntime) to
share the same logic. Before this change, only dockertools support this
feature. Now all three packages support image pull throttling.
Automatic merge from submit-queue
simplify RC and SVC listers
Make the RC and SVC listers use the common list functions that more closely match client APIs, are consistent with other listers, and avoid unnecessary copies.
The new flag, if specified, and if --container-runtime=docker, switches
kubelet to use the new CRI implementation for testing. This is hidden flag
since the feature is still under heavy development and the flag may be changed
in the near future.
Automatic merge from submit-queue
Check kubeClient nil in Kubelet and bugfix
1. check kubeClient nil first before using as it maybe nil
2. configMaps and secrets map do not be used properly and should use it as cache
Automatic merge from submit-queue
Fixed TODO: move predicate check into a pod admitter
refractoring AdmitPod func to move predicate check into a pod admitter
Automatic merge from submit-queue
Fix hang/websocket timeout when streaming container log with no content
When streaming and following a container log, no response headers are sent from the kubelet `containerLogs` endpoint until the first byte of content is written to the log. This propagates back to the API server, which also will not send response headers until it gets response headers from the kubelet. That includes upgrade headers, which means a websocket connection upgrade is not performed and can time out.
To recreate, create a busybox pod that runs `/bin/sh -c 'sleep 30 && echo foo && sleep 10'`
As soon as the pod starts, query the kubelet API:
```
curl -N -k -v 'https://<node>:10250/containerLogs/<ns>/<pod>/<container>?follow=true&limitBytes=100'
```
or the master API:
```
curl -N -k -v 'http://<master>:8080/api/v1/<ns>/pods/<pod>/log?follow=true&limitBytes=100'
```
In both cases, notice that the response headers are not sent until the first byte of log content is available.
This PR:
* does a 0-byte write prior to handing off to the container runtime stream copy. That commits the response header, even if the subsequent copy blocks waiting for the first byte of content from the log.
* fixes a bug with the "ping" frame sent to websocket streams, which was not respecting the requested protocol (it was sending a binary frame to a websocket that requested a base64 text protocol)
* fixes a bug in the limitwriter, which was not propagating 0-length writes, even before the writer's limit was reached
This refactor removes the legacy KubeletConfig object and adds a new
KubeletDeps object, which contains injected runtime objects and
separates them from static config. It also reduces NewMainKubelet to two
arguments: a KubeletConfiguration and a KubeletDeps.
Some mesos and kubemark code was affected by this change, and has been
modified accordingly.
And a few final notes:
KubeletDeps:
KubeletDeps will be a temporary bin for things we might consider
"injected dependencies", until we have a better dependency injection
story for the Kubelet. We will have to discuss this eventually.
RunOnce:
We will likely not pull new KubeletConfiguration from the API server
when in runonce mode, so it doesn't make sense to make this something
that can be configured centrally. We will leave it as a flag-only option
for now. Additionally, it is increasingly looking like nobody actually uses the
Kubelet's runonce mode anymore, so it may be a candidate for deprecation
and removal.
Automatic merge from submit-queue
Kubelet code move: volume / util
Addresses some odds and ends that I apparently missed earlier. Preparation for kubelet code-move ENDGAME.
cc @kubernetes/sig-node
Automatic merge from submit-queue
Add kubelet --network-plugin-mtu flag for MTU selection
* Add network-plugin-mtu option which lets us pass down a MTU to a network provider (currently processed by kubenet)
* Add a test, and thus make sysctl testable
MTU selection is difficult, and if there is a transport such as IPSEC in
use may be impossible. So we allow specification of the MTU with the
network-plugin-mtu flag, and we pass this down into the network
provider.
Currently implemented by kubenet.
Automatic merge from submit-queue
Kubelet: add --container-runtime-endpoint and --image-service-endpoint
Flag `--container-runtime-endpoint` (overrides `--container-runtime`) is introduced to identify the unix socket file of the remote runtime service. And flag `--image-service-endpoint` is introduced to identify the unix socket file of the image service.
This PR is part of #28789 Milestone 0.
CC @yujuhong @Random-Liu