Automatic merge from submit-queue
Fix hang/websocket timeout when streaming container log with no content
When streaming and following a container log, no response headers are sent from the kubelet `containerLogs` endpoint until the first byte of content is written to the log. This propagates back to the API server, which also will not send response headers until it gets response headers from the kubelet. That includes upgrade headers, which means a websocket connection upgrade is not performed and can time out.
To recreate, create a busybox pod that runs `/bin/sh -c 'sleep 30 && echo foo && sleep 10'`
As soon as the pod starts, query the kubelet API:
```
curl -N -k -v 'https://<node>:10250/containerLogs/<ns>/<pod>/<container>?follow=true&limitBytes=100'
```
or the master API:
```
curl -N -k -v 'http://<master>:8080/api/v1/<ns>/pods/<pod>/log?follow=true&limitBytes=100'
```
In both cases, notice that the response headers are not sent until the first byte of log content is available.
This PR:
* does a 0-byte write prior to handing off to the container runtime stream copy. That commits the response header, even if the subsequent copy blocks waiting for the first byte of content from the log.
* fixes a bug with the "ping" frame sent to websocket streams, which was not respecting the requested protocol (it was sending a binary frame to a websocket that requested a base64 text protocol)
* fixes a bug in the limitwriter, which was not propagating 0-length writes, even before the writer's limit was reached
This refactor removes the legacy KubeletConfig object and adds a new
KubeletDeps object, which contains injected runtime objects and
separates them from static config. It also reduces NewMainKubelet to two
arguments: a KubeletConfiguration and a KubeletDeps.
Some mesos and kubemark code was affected by this change, and has been
modified accordingly.
And a few final notes:
KubeletDeps:
KubeletDeps will be a temporary bin for things we might consider
"injected dependencies", until we have a better dependency injection
story for the Kubelet. We will have to discuss this eventually.
RunOnce:
We will likely not pull new KubeletConfiguration from the API server
when in runonce mode, so it doesn't make sense to make this something
that can be configured centrally. We will leave it as a flag-only option
for now. Additionally, it is increasingly looking like nobody actually uses the
Kubelet's runonce mode anymore, so it may be a candidate for deprecation
and removal.
Automatic merge from submit-queue
Kubelet code move: volume / util
Addresses some odds and ends that I apparently missed earlier. Preparation for kubelet code-move ENDGAME.
cc @kubernetes/sig-node
Automatic merge from submit-queue
Add kubelet --network-plugin-mtu flag for MTU selection
* Add network-plugin-mtu option which lets us pass down a MTU to a network provider (currently processed by kubenet)
* Add a test, and thus make sysctl testable
MTU selection is difficult, and if there is a transport such as IPSEC in
use may be impossible. So we allow specification of the MTU with the
network-plugin-mtu flag, and we pass this down into the network
provider.
Currently implemented by kubenet.
Automatic merge from submit-queue
Kubelet: add --container-runtime-endpoint and --image-service-endpoint
Flag `--container-runtime-endpoint` (overrides `--container-runtime`) is introduced to identify the unix socket file of the remote runtime service. And flag `--image-service-endpoint` is introduced to identify the unix socket file of the image service.
This PR is part of #28789 Milestone 0.
CC @yujuhong @Random-Liu
The serviceAccountName is occasionally useful for clients running on
Kube that need to know who they are when talking to other components.
The nodeName is useful for PetSet or DaemonSet pods that need to make
calls back to the API to fetch info about their node.
Both fields are immutable, and cannot easily be retrieved in another
way.
Automatic merge from submit-queue
Always return command output for exec probes and kubelet RunInContainer
Always return command output for exec probes and kubelet RunInContainer, even if the command invocation returns nonzero.
When #24921 replaced RunInContainer with ExecInContainer, it introduced a change where an exec probe that failed no longer included the stdout/stderr from the probe in the event. For example, when running at log level 4, you see:
```
I0816 15:01:36.259826 29713 exec.go:38] Exec probe response: "Failed to access the status endpoint : HTTP Error 404: Not Found.\nHawkular metrics has only been running for 7\n seconds not aborting yet.\n"
```
But the event looks like this:
```
54s 22s 5 hawkular-metrics-hjme4 Pod spec.containers{hawkular-metrics} Warning Unhealthy {kubelet corbeau} Readiness probe failed:
```
Note the absence of the exec probe response after "Readiness probe failed". This PR restores the previous behavior.
cc @kubernetes/rh-cluster-infra @mwringe
xref https://github.com/openshift/origin/issues/10424
Automatic merge from submit-queue
Unblock iterative development on pod-level cgroups
In order to allow forward progress on this feature, it takes the commits from #28017#29049 and then it globally disables the flag that allows these features to be exercised in the kubelet. The flag can be re-added to the kubelet when its actually ready.
/cc @vishh @dubstack @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Add Events for operation_executor to show status of mounts, failed/successful to show in describe events
Fixes#27590
@saad-ali @pmorie @erinboyd
After talking with @pmorie last week about the above issue, I decided to poke around and see if I could remedy. The refactoring broke my previous UXP merged PR's that correctly showed failed mount errors in the describe events. However, Not sure I implemented correctly, but it tested out and seems to be working, let me know what I missed or if this is not the correct approach.
```
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned nfs-bb-pod1 to 127.0.0.1
44s 44s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "nfs-bb-pod1_default(a94f64f1-37c9-11e6-9aa5-52540073d346)": timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
44s 44s 1 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
38s 38s 1 {kubelet } Warning FailedMount Unable to mount volumes for pod "a94f64f1-37c9-11e6-9aa5-52540073d346": Mount failed: exit status 32
Mounting arguments: nfs1.rhs:/opt/data99 /var/lib/kubelet/pods/a94f64f1-37c9-11e6-9aa5-52540073d346/volumes/kubernetes.io~nfs/nfsvol nfs []
Output: mount.nfs: Connection timed out
Resolution hint: Check and make sure the NFS Server exists (ensure that correct IPAddress/Hostname was given) and is available/reachable.
Also make sure firewall ports are open on both client and NFS Server (2049 v4 and 2049, 20048 and 111 for v3).
Use commands telnet <nfs server> <port> and showmount <nfs server> to help test connectivity.
```
New flag --container-runtime-endpoint (overrides --container-runtime)
is introduced to kubelet which identifies the unix socket file of
the remote runtime service. And new flag --image-service-endpoint is
introduced to kubelet which identifies the unix socket file of the
image service.
Automatic merge from submit-queue
Fix default resource limits (node allocatable) for downward api volumes and env vars
@kubernetes/rh-cluster-infra @pmorie @derekwaynecarr
Automatic merge from submit-queue
syncNetworkUtil in kubelet and fix loadbalancerSourceRange on GCE
fixes: #29997#29039
@yujuhong Can you take a look at the kubelet part?
@girishkalele KUBE-MARK-DROP is the chain for dropping connections. Marked connection will be drop in INPUT/OUTPUT chain of filter table. Let me know if this is good enough for your use case.
Automatic merge from submit-queue
Implement AppArmor Kubelet support
Includes PR https://github.com/kubernetes/kubernetes/pull/29812
Implements the Kubelet logic for AppArmor based on the alpha API proposed [here](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/apparmor.md). Also adds an E2E test, and I ran manual tests.
Remaining work: PodSecurityPolicy support, profile loader daemon, documentation, (maybe) beta API.
/cc @jfrazelle @Amey-D @kubernetes/sig-node
*Note on release-note-none: I am implementing AppArmor over multiple PRs. I will submit a single release note once the implementation is done to cover all of them.*
Currently kubelet volume management works on the concept of desired
and actual world of states. The volume manager periodically compares the
two worlds and perform volume mount/unmount and/or attach/detach
operations. When kubelet restarts, the cache of those two worlds are
gone. Although desired world can be recovered through apiserver, actual
world can not be recovered which may cause some volumes cannot be cleaned
up if their information is deleted by apiserver. This change adds the
reconstruction of the actual world by reading the pod directories from
disk. The reconstructed volume information is added to both desired
world and actual world if it cannot be found in either world. The rest
logic would be as same as before, desired world populator may clean up
the volume entry if it is no longer in apiserver, and then volume
manager should invoke unmount to clean it up.
Add a new helper method volume.GetPath(Mounter) instead of calling
the GetPath() of the Mounter directly. Check if GetPath() is returning
a "" and convert that into an error. At this point, we only have
information about the type of the Mounter, so let's log that if
there is a problem
Fixes#23163
Automatic merge from submit-queue
Node disk pressure should induce image gc
If the node reports disk pressure, prior to evicting pods, the node should clean up unused images.
Automatic merge from submit-queue
Use the CNI bridge plugin to set hairpin mode
Following up this part of #23711:
> I'd like to wait until containernetworking/cni#175 lands and then just pass the request through to CNI.
The code here just
* passes the required setting down from kubenet to CNI
* disables `DockerManager` from doing hairpin-veth, if kubenet is in use
Note to test you need a very recent version of the CNI `bridge` plugin; the one brought in by #28799 should be OK.
Also relates to https://github.com/kubernetes/kubernetes/issues/19766#issuecomment-232722864
Automatic merge from submit-queue
pkg/various: plug leaky time.New{Timer,Ticker}s
According to the documentation for Go package time, `time.Ticker` and
`time.Timer` are uncollectable by garbage collector finalizers. They
leak until otherwise stopped. This commit ensures that all remaining
instances are stopped upon departure from their relative scopes.
Similar efforts were incrementally done in #29439 and #29114.
```release-note
* pkg/various: plugged various time.Ticker and time.Timer leaks.
```