Automatic merge from submit-queue
Fix bug in status manager TerminatePod
In TerminatePod, we previously pass pod.Status to updateStatusInternal. This is a bug, since it is the original status that we are given. Not only does it skip updates made to container statuses, but in some cases it reverted the pod's status to an earlier version, since it was being passed a stale status initially.
This was the case in #40239 and #41095. As shown in #40239, the pod's status is set to running after it is set to failed, occasionally causing very long delays in pod deletion since we have to wait for this to be corrected.
This PR fixes the bug, adds some helpful debugging statements, and adds a unit test for TerminatePod (which for some reason didnt exist before?).
@kubernetes/sig-node-bugs @vish @Random-Liu
Automatic merge from submit-queue
Make EnableCRI default to true
This change makes kubelet to use the CRI implementation by default,
unless the users opt out explicitly by using --enable-cri=false.
For the rkt integration, the --enable-cri flag will have no effect
since rktnetes does not use CRI.
Also, mark the original --experimental-cri flag hidden and deprecated,
so that we can remove it in the next release. If both flags are specified,
the --enable-cri flag overrides the --experimental-cri flag.
Automatic merge from submit-queue
fix comment
**What this PR does / why we need it**:
fix comment
Thanks.
**Special notes for your reviewer**:
**Release note**:
```release-note
```
This change makes kubelet to use the CRI implementation by default,
unless the users opt out explicitly by using --enable-cri=false.
For the rkt integration, the --enable-cri flag will have no effect
since rktnetes does not use CRI.
Also, mark the original --experimental-cri flag hidden and deprecated,
so that we can remove it in the next release.
To safely mark a volume detached when the volume controller manager is used.
An example of one such problem:
1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler detects volume is no longer in desired state of world,
removes it from volumes in use
7. MountVolume tries to mark volume in use, throws an error because
volume is no longer in actual state of world list.
8. controller-manager tries to detach the volume, this fails because it
is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.
Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210)
Implement TTL controller and use the ttl annotation attached to node in secret manager
For every secret attached to a pod as volume, Kubelet is trying to refresh it every sync period. Currently Kubelet has a ttl-cache of secrets of its pods and the ttl is set to 1 minute. That means that in large clusters we are targetting (5k nodes, 30pods/node), given that each pod has a secret associated with ServiceAccount from its namespaces, and with large enough number of namespaces (where on each node (almost) every pod is from a different namespace), that resource in ~30 GETs to refresh all secrets every minute from one node, which gives ~2500QPS for GET secrets to apiserver.
Apiserver cannot keep up with it very easily.
Desired solution would be to watch for secret changes, but because of security we don't want a node watching for all secrets, and it is not possible for now to watch only for secrets attached to pods from my node.
So as a temporary solution, we are introducing an annotation that would be a suggestion for kubelet for the TTL of secrets in the cache and a very simple controller that would be setting this annotation based on the cluster size (the large cluster is, the bigger ttl is).
That workaround mean that only very local changes are needed in Kubelet, we are creating a well separated very simple controller, and once watching "my secrets" will be possible it will be easy to remove it and switch to that. And it will allow us to reach scalability goals.
@dchen1107 @thockin @liggitt
Automatic merge from submit-queue (batch tested with PRs 41074, 41147, 40854, 41167, 40045)
Add debug logging to eviction manager
**What this PR does / why we need it**:
This PR adds debug logging to eviction manager.
We need it to help users understand when/why eviction manager is/is not making decisions to support information gathering during support.
Automatic merge from submit-queue (batch tested with PRs 40873, 40948, 39580, 41065, 40815)
[CRI] Enable Hostport Feature for Dockershim
Commits:
1. Refactor common hostport util logics and add more tests
2. Add HostportManager which can ADD/DEL hostports instead of a complete sync.
3. Add Interface for retreiving portMappings information of a pod in Network Host interface.
Implement GetPodPortMappings interface in dockerService.
4. Teach kubenet to use HostportManager
Automatic merge from submit-queue (batch tested with PRs 38796, 40823, 40756, 41083, 41105)
Let ReadLogs return when there is a read error.
Fixes a bug in kuberuntime log.
Today, @yujuhong found that once we cancel `kubectl logs -f` with `Ctrl+C`, kuberuntime will keep complaining:
```
27939 kuberuntime_logs.go:192] Failed with err write tcp 10.240.0.4:10250->10.240.0.2:53913: write: broken pipe when writing log for log file "/var/log/pods/5bb76510-ed71-11e6-ad02-42010af00002/busybox_0.log": &{timestamp:{sec:63622095387 nsec:625309193 loc:0x484c440} stream:stdout log:[84 117 101 32 70 101 98 32 32 55 32 50 48 58 49 54 58 50 55 32 85 84 67 32 50 48 49 55 10]}
```
This is because kuberuntime keeps writing to the connection even though it is already closed. Actually, kuberuntime should return and report error whenever there is a writing error.
Ref the [docker code](3a4ae1f661/pkg/stdcopy/stdcopy.go (L159-L167))
I'm still creating the cluster and verifying this fix. Will post the result here after that.
/cc @yujuhong @kubernetes/sig-node-bugs
Automatic merge from submit-queue (batch tested with PRs 38796, 40823, 40756, 41083, 41105)
kubelet/network-cni-plugin: modify the log's info
**What this PR does / why we need it**:
Checking the startup logs of kubelet, i can always find a error like this:
"E1215 10:19:24.891724 2752 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d"
It will appears, neither i use cni network-plugin or not.
After analysis codes, i thought it should be a warn log, because it will not produce any actions like as exit or abort, and just ignored when not any valid plugins exit.
thank you!
Automatic merge from submit-queue (batch tested with PRs 41103, 41042, 41097, 40946, 40770)
Use Clientset interface in KubeletDeps
**What this PR does / why we need it**:
This replaces the Clientset struct with the equivalent interface for the KubeClient injected via KubeletDeps. This is useful for testing and for accessing the Node and Pod status event stream without an API server.
**Special notes for your reviewer**:
Follow up to #4907
**Release note**:
`NONE`
Automatic merge from submit-queue (batch tested with PRs 41103, 41042, 41097, 40946, 40770)
dockershim: set security option separators based on the docker version
Also add a version cache to avoid hitting the docker daemon frequently.
This is part of #38164
Automatic merge from submit-queue (batch tested with PRs 40971, 41027, 40709, 40903, 39369)
Set docker opt separator correctly for SELinux options
This is based on @pmorie's commit from #40179
Automatic merge from submit-queue (batch tested with PRs 40930, 40951)
Fix CRI port forwarding
Websocket support was introduced #33684, which broke the CRI
implementation. This change fixes it.
Automatic merge from submit-queue (batch tested with PRs 40289, 40877, 40879, 39972, 40942)
Rename experimental-cgroups-per-pod flag
**What this PR does / why we need it**:
1. Rename `experimental-cgroups-per-qos` to `cgroups-per-qos`
1. Update hack/local-up-cluster to match `CGROUP_DRIVER` with docker runtime if used.
**Special notes for your reviewer**:
We plan to roll this feature out in the upcoming release. Previous node e2e runs were running with this feature on by default. We will default this feature on for all e2es next week.
**Release note**:
```release-note
Rename --experiemental-cgroups-per-qos to --cgroups-per-qos
```
Automatic merge from submit-queue
CRI: Handle cri in-place upgrade
Fixes https://github.com/kubernetes/kubernetes/issues/40051.
## How does this PR restart/remove legacy containers/sandboxes?
With this PR, dockershim will convert and return legacy containers and infra containers as regular containers/sandboxes. Then we can rely on the SyncPod logic to stop the legacy containers/sandboxes, and the garbage collector to remove the legacy containers/sandboxes.
To forcibly trigger restart:
* For infra containers, we manually set `hostNetwork` to opposite value to trigger a restart (See [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L389))
* For application containers, they will be restarted with the infra container.
## How does this PR avoid extra overhead when there is no legacy container/sandbox?
For the lack of some labels, listing legacy containers needs extra `docker ps`. We should not introduce constant performance regression for legacy container cleanup. So we added the `legacyCleanupFlag`:
* In `ListContainers` and `ListPodSandbox`, only do extra `ListLegacyContainers` and `ListLegacyPodSandbox` when `legacyCleanupFlag` is `NotDone`.
* When dockershim starts, it will check whether there are legacy containers/sandboxes.
* If there are none, it will mark `legacyCleanupFlag` as `Done`.
* If there are any, it will leave `legacyCleanupFlag` as `NotDone`, and start a goroutine periodically check whether legacy cleanup is done.
This makes sure that there is overhead only when there are legacy containers/sandboxes not cleaned up yet.
## Caveats
* In-place upgrade will cause kubelet to restart all running containers.
* RestartNever container will not be restarted.
* Garbage collector sometimes keep the legacy containers for a long time if there aren't too many containers on the node. In that case, dockershim will keep performing extra `docker ps` which introduces overhead.
* Manually remove all legacy containers will fix this.
* Should we garbage collect legacy containers/sandboxes in dockershim by ourselves? /cc @yujuhong
* Host port will not be reclaimed for the lack of checkpoint for legacy sandboxes. https://github.com/kubernetes/kubernetes/pull/39903 /cc @freehan
/cc @yujuhong @feiskyer @dchen1107 @kubernetes/sig-node-api-reviews
**Release note**:
```release-note
We should mention the caveats of in-place upgrade in release note.
```
Automatic merge from submit-queue
Optionally avoid evicting critical pods in kubelet
For #40573
```release-note
When feature gate "ExperimentalCriticalPodAnnotation" is set, Kubelet will avoid evicting pods in "kube-system" namespace that contains a special annotation - `scheduler.alpha.kubernetes.io/critical-pod`
This feature should be used in conjunction with the rescheduler to guarantee availability for critical system pods - https://kubernetes.io/docs/admin/rescheduler/
```
Automatic merge from submit-queue (batch tested with PRs 40795, 40863)
Use caching secret manager in kubelet
I just found that this is in my local branch I'm using for testing, but not in master :)
Automatic merge from submit-queue
Add websocket support for port forwarding
#32880
**Release note**:
```release-note
Port forwarding can forward over websockets or SPDY.
```
- adjust ports to int32
- CRI flows the websocket ports as query params
- Do not validate ports since the protocol is unknown
SPDY flows the ports as headers and websockets uses query params
- Only flow query params if there is at least one port query param
Automatic merge from submit-queue
securitycontext: move docker-specific logic into kubelet/dockertools
This change moves the code specific to docker to kubelet/dockertools,
while leaving the common utility functions at its current package
(pkg/securitycontext).
When we deprecate dockertools in the future, the code will be moved to
pkg/kubelet/dockershim instead.
Depending on an exact cluster setup multiple dns may make sense.
Comma-seperated lists of DNS server are quite common as DNS servers
are always plain IPs.
- split out port forwarding into its own package
Allow multiple port forwarding ports
- Make it easy to determine which port is tied to which channel
- odd channels are for data
- even channels are for errors
- allow comma separated ports to specify multiple ports
Add portfowardtester 1.2 to whitelist
Automatic merge from submit-queue (batch tested with PRs 40638, 40742, 40710, 40718, 40763)
move client/record
An attempt at moving client/record to client-go. It's proving very stubborn and needs a lot manual intervention and near as I can tell, no one actually gets any benefit from the sink and source complexity it adds.
@sttts @caesarchaoxu
Automatic merge from submit-queue
kuberuntime: remove the kubernetesManagedLabel label
The CRI shim should be responsible for returning only those
containers/sandboxes created through CRI. Remove this label in kubelet.
Automatic merge from submit-queue (batch tested with PRs 40527, 40738, 39366, 40609, 40748)
pkg/kubelet/dockertools/docker_manager.go: removing unused stuff
This PR removes unused constants and variables. I checked that neither kubernetes nor openshift code aren't using them.
Automatic merge from submit-queue (batch tested with PRs 40392, 39242, 40579, 40628, 40713)
optimize podSandboxChanged() function and fix some function notes
Automatic merge from submit-queue (batch tested with PRs 38443, 40145, 40701, 40682)
fix GetVolumeInUse() function
Since we just want to get volume name info, each volume name just need to added once. desiredStateOfWorld.GetVolumesToMount() will return volume and pod binding info,
if one volume is mounted to several pods, the volume name will be return several times. That is not what we want in this function.
We can add a new function to only get the volume name info or judge whether the volume name is added to the desiredVolumesMap array.
This change moves the code specific to docker to kubelet/dockertools,
while leaving the common utility functions at its current package
(pkg/securitycontext).
When we deprecate dockertools in the future, the code will be moved to
pkg/kubelet/dockershim instead.
Automatic merge from submit-queue (batch tested with PRs 40126, 40565, 38777, 40564, 40572)
docker-CRI: Remove legacy code for non-grpc integration
A minor cleanup to remove the code that is no longer in use to simplify the logic.
Automatic merge from submit-queue (batch tested with PRs 40126, 40565, 38777, 40564, 40572)
Add IsContainerNotFound in kube_docker_client
This PR added `IsContainerNotFound` function in kube_docker_client and changed dockershim to use it.
@yujuhong @freehan
Automatic merge from submit-queue (batch tested with PRs 40046, 40073, 40547, 40534, 40249)
fix a typo in cni log
**What this PR does / why we need it**:
fixes a typo s/unintialized/uninitialized in pkg/kubelet/network/cni/cni.go
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
CRI: Work around container create conflict.
Fixes https://github.com/kubernetes/kubernetes/issues/40443.
This PR added a random suffix in the container name when we:
* Failed to create the container because of "Conflict".
* And failed to remove the container because of "No such container".
@yujuhong @feiskyer
/cc @kubernetes/sig-node-bugs
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
move the discovery and dynamic clients
Moved the dynamic client, discovery client, testing/core, and testing/cache to `client-go`. Dependencies on api groups we don't have generated clients for have dropped out, so federation, kubeadm, and imagepolicy.
@caesarxuchao @sttts
approved based on https://github.com/kubernetes/kubernetes/issues/40363
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
CRI: use more gogoprotobuf plugins
Generate marshaler/unmarshaler code should help improve the performance.
This addresses #40098
Automatic merge from submit-queue
Delay deletion of pod from the API server until volumes are deleted
Depends on #37228, and will not pass tests until that PR is merged, and this is rebased.
Keeps all kubelet behavior the same, except the kubelet will not make the "Delete" call (kubeClient.Core().Pods(pod.Namespace).Delete(pod.Name, deleteOptions)) until the volumes associated with that pod are removed. I will perform some performance testing so that we better understand the latency impact of this change.
Is kubelet_pods.go the correct file to include the "when can I delete this pod" logic?
cc: @vishh @sjenning @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 38739, 40480, 40495, 40172, 40393)
Use fnv hash in the CRI implementation
fnv is more stable than adler. This PR changes CRI implementation to
use fnv for generating container hashes, but leaving the old
implementation (dockertools/rkt). This is because hash is what kubelet
uses to identify a container -- changes to the hash will cause kubelet
to restart existing containers. This is ok for CRI implementation (which
requires a disruptive upgrade already), but not for older implementations.
#40140
Automatic merge from submit-queue (batch tested with PRs 39538, 40188, 40357, 38214, 40195)
Use SecretManager when getting secrets for EnvFrom
Merges crossed in the night which missed this needed change.
Leave the old implementation (dockertools/rkt) untouched so that
containers will not be restarted during kubelet upgrade. For CRI
implementation (kuberuntime), container restart is required for kubelet
upgrade.
Automatic merge from submit-queue
Move remaining *Options to metav1
Primarily delete options, but will remove all internal references to non-metav1 options (except ListOptions).
Still working through it @sttts @deads2k
Automatic merge from submit-queue (batch tested with PRs 39275, 40327, 37264)
dockertools: remove some dead code
Remove `dockerRoot` that's not used anywhere.
Automatic merge from submit-queue
Fix bad time values in kubelet FakeRuntimeService
These values don't affect tests but they can be confusing
for developers looking at the code for reference.
Automatic merge from submit-queue
Optional configmaps and secrets
Allow configmaps and secrets for environment variables and volume sources to be optional
Implements approved proposal c9f881b7bb
Release note:
```release-note
Volumes and environment variables populated from ConfigMap and Secret objects can now tolerate the named source object or specific keys being missing, by adding `optional: true` to the volume or environment variable source specifications.
```
Enforce the following limits:
12kb for total message length in container status
4kb for the termination message path file
2kb or 80 lines (whichever is shorter) from the log on error
Fallback to log output if the user requests it.
Automatic merge from submit-queue
Remove TODOs to refactor kubelet labels
To address #39650 completely.
Remove label refactoring TODOs, we don't need them since CRI rollout is on the way.
Automatic merge from submit-queue (batch tested with PRs 40250, 40134, 40210)
Typo fix: Change logging function to formatting version
**What this PR does / why we need it**:
Slightly broken logging message:
```
I0120 10:56:08.555712 7575 kubelet_node_status.go:135] Deleted old node object %qkubernetes-cit-kubernetes-cr0-0
```
Automatic merge from submit-queue (batch tested with PRs 40232, 40235, 40237, 40240)
move listers out of cache to reduce import tree
Moving the listers from `pkg/client/cache` snips links to all the different API groups from `pkg/storage`, but the dreaded `ListOptions` remains.
@sttts
Automatic merge from submit-queue (batch tested with PRs 37228, 40146, 40075, 38789, 40189)
Cleanup temp dirs
So funny story my /tmp ran out of space running the unit tests so I am cleaning up all the temp dirs we create.
Automatic merge from submit-queue (batch tested with PRs 37228, 40146, 40075, 38789, 40189)
kubelet: storage: teardown terminated pod volumes
This is a continuation of the work done in https://github.com/kubernetes/kubernetes/pull/36779
There really is no reason to keep volumes for terminated pods attached on the node. This PR extends the removal of volumes on the node from memory-backed (the current policy) to all volumes.
@pmorie raised a concern an impact debugging volume related issues if terminated pod volumes are removed. To address this issue, the PR adds a `--keep-terminated-pod-volumes` flag the kubelet and sets it for `hack/local-up-cluster.sh`.
For consideration in 1.6.
Fixes#35406
@derekwaynecarr @vishh @dashpole
```release-note
kubelet tears down pod volumes on pod termination rather than pod deletion
```
Automatic merge from submit-queue (batch tested with PRs 40011, 40159)
dockertools/nsenterexec: fix err shadow
The shadow of err meant the combination of `exec-handler=nsenter` +
`tty` + a non-zero exit code meant that the exit code would be LOST
FOREVER 👻
This isn't all that important since no one really used the nsenter exec
handler as I understand it
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 36693, 40154, 40170, 39033)
make client-go authoritative for pkg/client/restclient
Moves client/restclient to client-go and a util/certs, util/testing as transitives.
Automatic merge from submit-queue (batch tested with PRs 40168, 40165, 39158, 39966, 40190)
dockershim: add support for the 'nsenter' exec handler
This change simply plumbs the kubelet configuration
(--docker-exec-handler) to DockerService.
This fixes#35747.
Automatic merge from submit-queue (batch tested with PRs 40168, 40165, 39158, 39966, 40190)
CRI: upgrade protobuf to v3
For #38854, this PR upgrades CRI protobuf version to v3, and also updated related packages for confirming to new api.
**Release note**:
```
CRI: upgrade protobuf version to v3.
```
The shadow of err meant the combination of `exec-handler=nsenter` +
`tty` + a non-zero exit code meant that the exit code would be LOST
FOREVER 👻
This isn't all that important since no one really used the nsenter exec
handler as I understand it
Automatic merge from submit-queue (batch tested with PRs 39446, 40023, 36853)
Create environment variables from secrets
Allow environment variables to be populated from entire secrets.
**Release note**:
```release-note
Populate environment variables from a secrets.
```
Automatic merge from submit-queue
promote certificates api to beta
Mostly posting to see what breaks but also this API is ready to be promoted.
```release-note
Promote certificates.k8s.io to beta and enable it by default. Users using the alpha certificates API should delete v1alpha1 CSRs from the API before upgrading and recreate them as v1beta1 CSR after upgrading.
```
@kubernetes/api-approvers @jcbsmpsn @pipejakob
Automatic merge from submit-queue
move pkg/fields to apimachinery
Purely mechanical move of `pkg/fields` to apimachinery.
Discussed with @lavalamp on slack. Moving this an `labels` to apimachinery.
@liggitt any concerns? I think the idea of field selection should become generic and this ends up shared between client and server, so this is a more logical location.
Automatic merge from submit-queue
make client-go more authoritative
Builds on https://github.com/kubernetes/kubernetes/pull/40103
This moves a few more support package to client-go for origination.
1. restclient/watch - nodep
1. util/flowcontrol - used interface
1. util/integer, util/clock - used in controllers and in support of util/flowcontrol
Automatic merge from submit-queue
Fixed merging of host's and dns' search lines
Fixed forming of pod's Search line in resolv.conf:
- exclude duplicates while merging of host's and dns' search lines to form pod's one
- truncate pod's search line if it exceeds resolver limits: is > 255 chars and containes > 6 searches
- monitoring the resolv.conf file which is used by kubelet (set thru --resolv-conf="") and logging and eventing if search line in it consists of more than 3 entries (or 6 if Cluster Domain is set) or its lenght is > 255 chars
- logging and eventing when a pod's search line is > 255 chars or containes > 6 searches during forming
Fixes#29270
**Release note**:
```release-note
Fixed forming resolver search line for pods: exclude duplicates, obey libc limitations, logging and eventing appropriately.
```
Automatic merge from submit-queue
Curating Owners: pkg/kubelet
cc @euank @vishh @dchen1107 @feiskyer @yujuhong @yifan-gu @derekwaynecarr @saad-ali
In an effort to expand the existing pool of reviewers and establish a
two-tiered review process (first someone lgtms and then someone
experienced in the project approves), we are adding new reviewers to
existing owners files.
If You Care About the Process:
------------------------------
We did this by algorithmically figuring out who’s contributed code to
the project and in what directories. Unfortunately, that doesn’t work
well: people that have made mechanical code changes (e.g change the
copyright header across all directories) end up as reviewers in lots of
places.
Instead of using pure commit data, we generated an excessively large
list of reviewers and pruned based on all time commit data, recent
commit data and review data (number of PRs commented on).
At this point we have a decent list of reviewers, but it needs one last
pass for fine tuning.
Also, see https://github.com/kubernetes/contrib/issues/1389.
TLDR:
-----
As an owner of a sig/directory and a leader of the project, here’s what
we need from you:
1. Use PR https://github.com/kubernetes/kubernetes/pull/35715 as an example.
2. The pull-request is made editable, please edit the `OWNERS` file to
remove the names of people that shouldn't be reviewing code in the
future in the **reviewers** section. You probably do NOT need to modify
the **approvers** section. Names asre sorted by relevance, using some
secret statistics.
3. Notify me if you want some OWNERS file to be removed. Being an
approver or reviewer of a parent directory makes you a reviewer/approver
of the subdirectories too, so not all OWNERS files may be necessary.
4. Please use ALIAS if you want to use the same list of people over and
over again (don't hesitate to ask me for help, or use the pull-request
above as an example)
Automatic merge from submit-queue
Made tracing of calls and container lifecycle steps in FakeDockerClient optional
Fixes#39717
Slightly refactored the FakeDockerClient code and made tracing optional (but enabled by default).
@yujuhong @Random-Liu
- exclude duplicates while merging of host's and dns' search lines to form pod's one
- truncate pod's search line if it exceeds resolver limits: is > 255 chars and containes > 6 searches
- monitoring the resolv.conf file which is used by kubelet (set thru --resolv-conf="") and logging and eventing if search line in it consists of more than 3 entries
(or 6 if Cluster Domain is set) or its lenght is > 255 chars
- logging and eventing when a pod's search line is > 255 chars or containes > 6 searches during forming
Fixes#29270
Automatic merge from submit-queue
Report the Pod name and namespace when kubelet fails to sync the container
This helps debugging problems with SELinux (and other problems related to the Docker failed to run the container) as currently only the UUID of the Pod is reported:
```
Error syncing pod 670f607d-b5a8-11a4-b673-005056b7468b, skipping: failed to "StartContainer" for "deployment" with RunContainerError: "runContainer: Error response from daemon: Relabeling content in /usr is not allowed."
```
Here it would be useful to know what pod in which namespace is trying to mount the "/usr".
Automatic merge from submit-queue (batch tested with PRs 39417, 39679)
Fix 2 `sucessfully` typos
**What this PR does / why we need it**: Only fixes two typos in comments/logging
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
replace global registry in apimachinery with global registry in k8s.io/kubernetes
We'd like to remove all globals, but our immediate problem is that a shared registry between k8s.io/kubernetes and k8s.io/client-go doesn't work. Since client-go makes a copy, we can actually keep a global registry with other globals in pkg/api for now.
@kubernetes/sig-api-machinery-misc @lavalamp @smarterclayton @sttts
Automatic merge from submit-queue
break from the for loop
**What this PR does / why we need it**:
exit loop, because the following actions will not affect the result
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Fix cadvisor_unsupported.go build tags
Make it so cadvisor_unsupported.go is used for linux without cgo or
non-linux/windows OSes.
Automatic merge from submit-queue
kubelet: remove the pleg health check from healthz
This prevents kubelet from being killed when docker hangs.
Also, kubelet will report node not ready if PLEG hangs (`docker ps` + `docker inspect`).
Automatic merge from submit-queue (batch tested with PRs 39803, 39698, 39537, 39478)
[scheduling] Moved pod affinity and anti-affinity from annotations to api fields #25319
Converted pod affinity and anti-affinity from annotations to api fields
Related: #25319
Related: #34508
**Release note**:
```Pod affinity and anti-affinity has moved from annotations to api fields in the pod spec. Pod affinity or anti-affinity that is defined in the annotations will be ignored.```
Automatic merge from submit-queue
[CRI] Don't include user data in CRI streaming redirect URLs
Fixes: https://github.com/kubernetes/kubernetes/issues/36187
Avoid userdata in the redirect URLs by caching the {Exec,Attach,PortForward}Requests with a unique token. When the redirect URL is created, the token is substituted for the request params. When the streaming server receives the token request, the token is used to fetch the actual request parameters out of the cache.
For additional security, the token is generated using the secure random function, is single use (i.e. the first request with the token consumes it), and has a short expiration time.
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Fix kubelet cross build
**What this PR does / why we need it**: Cross builds are not passing for MacOS and Windows. We are expecting Windows binaries for `kubelet` and `kube-proxy` to be released by the first time with 1.5.2 to be released later today.
**Which issue this PR fixes**:
fixes#39005fixes#39714
**Special notes for your reviewer**: /cc @feiskyer @smarterclayton @vishh this should be P0 in order to be merged before 1.5.2 and obviously fix the cross build.
Automatic merge from submit-queue (batch tested with PRs 39475, 38666, 39327, 38396, 39613)
Create k8s.io/apimachinery repo
Don't panic.
The diff is quite large, but its all generated change. The first few commits are where are all the action is. I built a script to find the fanout from
```
k8s.io/kubernetes/pkg/apimachinery/registered
k8s.io/kubernetes/pkg/runtime/serializer
k8s.io/kubernetes/pkg/runtime/serializer/yaml
k8s.io/kubernetes/pkg/runtime/serializer/streaming
k8s.io/kubernetes/pkg/runtime/serializer/recognizer/testing
```
It copied
```
k8s.io/kubernetes/pkg/api/meta
k8s.io/kubernetes/pkg/apimachinery
k8s.io/kubernetes/pkg/apimachinery/registered
k8s.io/kubernetes/pkg/apis/meta/v1
k8s.io/kubernetes/pkg/apis/meta/v1/unstructured
k8s.io/kubernetes/pkg/conversion
k8s.io/kubernetes/pkg/conversion/queryparams
k8s.io/kubernetes/pkg/genericapiserver/openapi/common - this needs to renamed post-merge. It's just types
k8s.io/kubernetes/pkg/labels
k8s.io/kubernetes/pkg/runtime
k8s.io/kubernetes/pkg/runtime/schema
k8s.io/kubernetes/pkg/runtime/serializer
k8s.io/kubernetes/pkg/runtime/serializer/json
k8s.io/kubernetes/pkg/runtime/serializer/protobuf
k8s.io/kubernetes/pkg/runtime/serializer/recognizer
k8s.io/kubernetes/pkg/runtime/serializer/recognizer/testing
k8s.io/kubernetes/pkg/runtime/serializer/streaming
k8s.io/kubernetes/pkg/runtime/serializer/versioning
k8s.io/kubernetes/pkg/runtime/serializer/yaml
k8s.io/kubernetes/pkg/selection
k8s.io/kubernetes/pkg/types
k8s.io/kubernetes/pkg/util/diff
k8s.io/kubernetes/pkg/util/errors
k8s.io/kubernetes/pkg/util/framer
k8s.io/kubernetes/pkg/util/json
k8s.io/kubernetes/pkg/util/net
k8s.io/kubernetes/pkg/util/runtime
k8s.io/kubernetes/pkg/util/sets
k8s.io/kubernetes/pkg/util/validation
k8s.io/kubernetes/pkg/util/validation/field
k8s.io/kubernetes/pkg/util/wait
k8s.io/kubernetes/pkg/util/yaml
k8s.io/kubernetes/pkg/watch
k8s.io/kubernetes/third_party/forked/golang/reflect
```
The script does the import rewriting and gofmt. Then you do a build, codegen, bazel update, and it produces all the updates.
If we agree this is the correct approach. I'll create a verify script to make sure that no one messes with any files in the "dead" packages above.
@kubernetes/sig-api-machinery-misc @smarterclayton @sttts @lavalamp @caesarxuchao
`staging/prime-apimachinery.sh && hack/update-codegen.sh && nice make WHAT="federation/cmd/federation-apiserver/ cmd/kube-apiserver" && hack/update-openapi-spec.sh && hack/update-federation-openapi-spec.sh && hack/update-codecgen.sh && hack/update-codegen.sh && hack/update-generated-protobuf.sh && hack/update-bazel.sh`
Automatic merge from submit-queue (batch tested with PRs 39684, 39577, 38989, 39534, 39702)
kubelet: request client auth certificates from certificate API.
This fixes kubeadm and --experiment-kubelet-bootstrap.
cc @liggitt
Automatic merge from submit-queue (batch tested with PRs 39684, 39577, 38989, 39534, 39702)
Set PodStatus QOSClass field
This PR continues the work for https://github.com/kubernetes/kubernetes/pull/37968
It converts all local usage of the `qos` package class types to the new API level types (first commit) and sets the pod status QOSClass field in the at pod creation time on the API server in `PrepareForCreate` and in the kubelet in the pod status update path (second commit). This way the pod QOS class is set even if the pod isn't scheduled yet.
Fixes#33255
@ConnorDoyle @derekwaynecarr @vishh
Automatic merge from submit-queue (batch tested with PRs 38212, 38792, 39641, 36390, 39005)
Set MemorySwap to zero on Windows
Fixes https://github.com/kubernetes/kubernetes/issues/39003
@dchen1107 @michmike @kubernetes/sig-node-misc
Automatic merge from submit-queue (batch tested with PRs 38212, 38792, 39641, 36390, 39005)
Generate a kubelet CA and kube-apiserver cert-pair for kubelet auth.
cc @cjcullen
Automatic merge from submit-queue (batch tested with PRs 39486, 37288, 39477, 39455, 39542)
Revert "Small improve for GetContainerOOMScoreAdjust"
Reverts kubernetes/kubernetes#39306
This does not help current code healthy, let's revert it to avoid further confusing.
Automatic merge from submit-queue (batch tested with PRs 39493, 39496)
kubelet: fix nil deref in volume type check
An attempt to address memory exhaustion through a build up of terminated pods with memory backed volumes on the node in PR https://github.com/kubernetes/kubernetes/pull/36779 introduced this.
For the `VolumeSpec`, either the `Volume` or `PersistentVolume` field is set, not both. This results in a situation where there is a nil deref on PVs. Since PVs are inherently not memory-backend, only local/temporal volumes should be considered.
This needs to go into 1.5 as well.
Fixes#39480
@saad-ali @derekwaynecarr @grosskur @gnufied
```release-note
fixes nil dereference when doing a volume type check on persistent volumes
```
Automatic merge from submit-queue
Start moving genericapiserver to staging
This moves `pkg/auth/user` to `staging/k8s.io/genericapiserver/pkg/authentication/user`. I'll open a separate pull into the upstream gengo to support using `import-boss` on vendored folders to support staging.
After we agree this is the correct approach and see everything build, I'll start moving other packages over which don't have k8s.io/kubernetes deps.
@kubernetes/sig-api-machinery-misc @lavalamp
@sttts @caesarxuchao ptal
Automatic merge from submit-queue (batch tested with PRs 38084, 39306)
Small improve for GetContainerOOMScoreAdjust
In `GetContainerOOMScoreAdjust`, make logic more clear for the case `oomScoreAdjust >= besteffortOOMScoreAdj`. If `besteffortOOMScoreAdj` is defined to another value(e.g. 996), suppose `oomScoreAdjust` is 999, the function will return 998(which equals 999 - 1) instead of 995(996 -1).
Automatic merge from submit-queue (batch tested with PRs 38433, 36245)
Allow pods to define multiple environment variables from a whole ConfigMap
Allow environment variables to be populated from ConfigMaps
- ConfigMaps represent an entire set of EnvVars
- EnvVars can override ConfigMaps
fixes#26299
Automatic merge from submit-queue (batch tested with PRs 39280, 37350, 39389, 39390, 39313)
delete meaningless judgments
What this PR does / why we need it:
Whether "err" is nil or not, "err" can be return, so the judgment "err !=nil " is unnecessary
Automatic merge from submit-queue (batch tested with PRs 39001, 39104, 35978, 39361, 39273)
delete SetNodeStatus() function and fix some function notes words
Since we just want to get volume name info, each volume name just need to added once. desiredStateOfWorld.GetVolumesToMount() will return volume and pod binding info,
if one volume is mounted to several pods, the volume name will be return several times. That is not what we want in this function.
We can add a new function to only get the volume name info or judge whether the volume name is added to the desiredVolumesMap array.
drop SetNodeStatus() Since it is never called now. klet.defaultNodeStatusFuncs() is set to klet.setNodeStatusFuncs now and setNodeStatus() function is called by other functions.
Automatic merge from submit-queue
Kubelet: add image ref to ImageService interfaces
This PR adds image ref (digest or ID, depending on runtime) to PullImage result, and pass image ref in CreateContainer instead of image name. It also
* Adds image ref to CRI's PullImageResponse
* Updates related image puller
* Updates related testing utilities
~~One remaining issue is: it breaks some e2e tests because they [checks image repoTags](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/util.go#L1941) while docker always returns digest in this PR. Should we update e2e test or continue to return repoTags in `containerStatuses.image`?~~
Fixes#38833.
Automatic merge from submit-queue (batch tested with PRs 39307, 39300)
kubenet: define KubenetPluginName for all platforms
This PR moved KubenetPluginName to a general file for all platforms.
Fixes#39299.
cc/ @yifan-gu @freehan
Automatic merge from submit-queue
dockertools: don't test linux-specific cases on OSX
There are a few test cases in dockertools are linux-specific. This PR moves them to docker_manager_linux_test.go
Fixes#39183.
Automatic merge from submit-queue (batch tested with PRs 39053, 36446)
CRI: clarify purpose of annotations
Add language to make it explicit that annotations are not to be altered
by runtimes, and should only be used for features that are opaque to the
Kubernetes APIs. Unfortunately there are currently exceptions
introduced in [1][1], but this change makes it clear that they are to be
changed and that no more such semantic-affecting annotations should be
introduced.
In the spirit of the discussion and conclusion in [2][2].
Also captures the link between the annotations returned by various
status queries and those supplied in associated configs.
[1]: https://github.com/kubernetes/kubernetes/pull/34819
[2]: https://github.com/kubernetes/kubernetes/issues/30819#issuecomment-253369441
Automatic merge from submit-queue
Refactor operation_executor to make it testable
**What this PR does / why we need it**:
To refactor operation_executor to make it unit testable
**Release note**:
`NONE`
Add language to make it explicit that annotations are not to be altered
by runtimes, and should only be used for features that are opaque to the
Kubernetes APIs. Unfortunately there are currently exceptions
introduced in [1][1], but this change makes it clear that they are to be
changed and that no more such semantic-affecting annotations should be
introduced.
In the spirit of the discussion and conclusion in [2][2].
Also captures the link between the annotations returned by various
status queries and those supplied in associated configs.
[1]: https://github.com/kubernetes/kubernetes/pull/34819
[2]: https://github.com/kubernetes/kubernetes/issues/30819#issuecomment-253369441
Automatic merge from submit-queue (batch tested with PRs 39079, 38991, 38673)
Support systemd based pod qos in CRI dockershim
This PR makes pod level QoS works for CRI dockershim for systemd based cgroups. And will also fix#36807
- [x] Add cgroupDriver to dockerService and use docker info api to set value for it
- [x] Add a NOTE that detection only works for docker 1.11+, see [CHANGE LOG](https://github.com/docker/docker/blob/master/CHANGELOG.md#1110-2016-04-13)
- [x] Generate cgroupParent in syntax expected by cgroupDriver
- [x] Set cgroupParent to hostConfig for both sandbox and user container
- [x] Check if kubelet conflicts with cgroup driver of docker
cc @derekwaynecarr @vishh
Automatic merge from submit-queue (batch tested with PRs 36888, 38180, 38855, 38590)
wrong pod reference in error message for volume attach timeout
**What this PR does / why we need it**:
when a disk mount times out you get the following error:
```
Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nginx"/"default". list of unattached/unmounted volumes=[data]
```
where the pod is referenced by "podname"/"namespace", but should be "namespace"/"podname".
**Which issue this PR fixes**
no issue number
**Special notes for your reviewer**:
untested :(
Automatic merge from submit-queue
Admit critical pods in the kubelet
Haven't verified in a live cluster yet, just unittested, so applying do-not-merge label.
Automatic merge from submit-queue
Migrated fluentd addon to daemon set
fix#23224
supersedes #23306
``` release-note
Migrated fluentd addon to daemon set
```
Automatic merge from submit-queue (batch tested with PRs 38154, 38502)
Wrong comment to describe docker version
The original comment about minimal docker version fo `room_score_adj` is wrong (though the code is right).
Really sorry for misleading :/
Automatic merge from submit-queue (batch tested with PRs 38154, 38502)
Rename "release_1_5" clientset to just "clientset"
We used to keep multiple releases in the main repo. Now that [client-go](https://github.com/kubernetes/client-go) does the versioning, there is no need to keep releases in the main repo. This PR renames the "release_1_5" clientset to just "clientset", clientset development will be done in this directory.
@kubernetes/sig-api-machinery @deads2k
```release-note
The main repository does not keep multiple releases of clientsets anymore. Please find previous releases at https://github.com/kubernetes/client-go
```
Automatic merge from submit-queue (batch tested with PRs 38689, 38743, 38734, 38430)
apply sandbox network mode based on network plugin
This allows CRI to use docker's network bridge. Can be combined with noop network plugin. This allows to use docker0 with no further configuration. Good for tools like minikube/hyperkube.
Automatic merge from submit-queue
Refactor remotecommand options parsing
Prerequisite to https://github.com/kubernetes/kubernetes/issues/36187 - This separates the options from the request, so they can be pulled from elsewhere.
/cc @liggitt
Automatic merge from submit-queue (batch tested with PRs 38727, 38726, 38347, 38348)
Add 'privileged' to sandbox to indicate if any container might be privileged in it, document privileged
Right now, the privileged flag is this magic thing which does "whatever Docker does". This documents it to make it a little less magic.
In addition, due to how rkt uses `systemd-nspawn` as an outer layer of isolation in creating the sandbox, it's helpful to know beforehand whether the pod will be privileged so additional security options can be applied earlier / applied at all.
I suspect the same indication will be useful for userns since userns should also occur at the pod layer, but it's possible that will be a separate/additional field.
cc @lucab @jonboulle @yujuhong @feiskyer @kubernetes/sig-node
```release-note
NONE
```
Automatic merge from submit-queue
CRI: fix ImageStatus comment
**What this PR does / why we need it**:
GRPC cannot encode `nil` (CRI-O itself panic while trying to encode `nil` for `ImageStatus`). This PR fixes `ImageStatus` comment to say that when the image does not exist the call returns a response having `Image` set to `nil` (instead of saying implementors should return `nil` directly).
/cc @mrunalp @vishh @feiskyer
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Automatic merge from submit-queue (batch tested with PRs 38419, 38457, 38607)
Fix pod level QoS does not works on CRI dockershim
Fixes: https://github.com/kubernetes/kubernetes/issues/38458
We did set `CgroupParent ` in `CreateContainer`, but the `HostConfig.Resources` which `CgroupParent` belongs to is override by the following code:
```
hc.CgroupParent = lc.GetCgroupParent()
...
hc.Resources = dockercontainer.Resources{
Memory: rOpts.GetMemoryLimitInBytes(),
...
}
```
That's why `HostConfig.CgroupParent` is always empty and pod level QoS does not work.
Automatic merge from submit-queue (batch tested with PRs 38453, 36672, 38629, 34966, 38630)
Fix threshold notifier build tags
Fix threshold notifier build tags so the linux version is only built if cgo is
enabled, and the unsupported version is built if it's either not linux or not
cgo.
Fix threshold notifier build tags so the linux version is only built if cgo is
enabled, and the unsupported version is built if it's either not linux or not
cgo.
Automatic merge from submit-queue (batch tested with PRs 38432, 36887, 38415)
Add --image-pull-stuck-timeout option to kubelet
In this PR, add --image-pull-stuck-time option to specify the stuck timeout for pulling image.
When docker extracts image layer, there is no progress. The progress will exceed 1m if the layer is big or system is busy. It happend in our cluster, so I add above option to specify the timeout.
Related error log:
<pre>
[... kube_docker_client.go:29] Cancel pulling image "our_registry/demo/test" because of no progress for 1m0s, latest progress "c914ad57d670": Extracting [==================>] 513.5 MB/513.5MB"
[... docker_manager.go:2254] container start failed: ErrImagePull: net/http: request canceled
</pre>
Automatic merge from submit-queue (batch tested with PRs 36419, 38330, 37718, 38244, 38375)
Kubelet: Add image cache.
Fixes#38373.
This should be patched into 1.5.1 to solve the customer issue.
@yujuhong
/cc @kubernetes/sig-node
Adding the `privileged` bool to the sandbox allows runtimes, like rkt,
to make better security choices in some cases.
This also enumerates what "privileged" actually means and how it
interacts with other options (or more accurately, does not).
The documentation closely matches docker's current behavior because, so
far, that's what privileged has meant.
Automatic merge from submit-queue (batch tested with PRs 38318, 38258)
kernel memcg notification enabled via experimental flag
Kubelet integrates with kernel memcg notification API if and only if enabled via experimental flag.
Automatic merge from submit-queue
add a configuration for kubelet to register as a node with taints
and deprecate --register-schedulable
ref #28687#29178
cc @dchen1107 @davidopp @roberthbailey
Automatic merge from submit-queue (batch tested with PRs 37208, 37446, 37420)
Kubelet log modification
Keep in line with the other error logs in the function.
After return, the caller records the error log.Delete redundant logs
We should use:
nsenter --net=netnsPath -- -F some_command
instend of:
nsenter -n netnsPath -- -F some_command
Because "nsenter -n netnsPath" get an error output:
# nsenter -n /proc/67197/ns/net ip addr
nsenter: neither filename nor target pid supplied for ns/net
If we really want use -n, we need to use -n in such format:
# sudo nsenter -n/proc/67197/ns/net ip addr
Automatic merge from submit-queue
Clean up redundant tests in image_manager_test
There was a lot of overlap between parallel and serialized puller tests,
extracted most of these tests internals to separate functions.
Automatic merge from submit-queue
Function annotation modification
“return kl.pleg.Healthy()”,Based on the return function,"healty" to "healthy" better
Automatic merge from submit-queue
Keep host port socket open for kubenet
fixes#37087
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
When cni is set to kubenet, kubelet should hold the host port socket,
so that other application in this node could not listen/bind this port
any more. However, the sockets are closed accidentally, because
kubelet forget to reconcile the protocol format before comparing.
@kubernetes/sig-network
Automatic merge from submit-queue
RunnningContainerStatues spelling mistake
runtime.go:in the function GetRunningContainerStatuses, runnningContainerStatues spelling mistake, modified into runningContainerStatus
Automatic merge from submit-queue
kubelet: don't reject pods without adding them to the pod manager
kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.
This should fix#37658
Automatic merge from submit-queue
remove checking mount point in cleanupOrphanedPodDirs
To avoid nfs hung problem, remove the mountpoint checking code in
cleanupOrphanedPodDirs(). This removal should still be safe because it checks whether there are still directories under pod's volume and if so, do not delete the pod directory.
Note: After removing the mountpoint check code in cleanupOrphanedPodDirs(), the directories might not be cleaned up in such situation.
1. delete pod, kubelet reconciler tries to unmount the volume directory successfully
2. before reconciler tries to delete the volume directory, kubelet gets retarted
3. since under pod directory, there are still volume directors exist (but not mounted), cleanupOrphanedPodDIrs() will not clean them up.
Will work on a follow up PR to solve above issue.
kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.
Also check if the pod to-be-admitted has terminated or not. In the case where
it has terminated, skip the admission process completely.
Automatic merge from submit-queue
Fix package aliases to follow golang convention
Some package aliases are not not align with golang convention https://blog.golang.org/package-names. This PR fixes them. Also adds a verify script and presubmit checks.
Fixes#35070.
cc/ @timstclair @Random-Liu
When cni is set to kubenet, kubelet should hold the host port socket,
so that other application in this node could not listen/bind this port
any more. However, the sockets are closed accidentally, because
kubelet forget to reconcile the protocol format before comparing.
Automatic merge from submit-queue
kubelet: eviction: add memcg threshold notifier to improve eviction responsiveness
This PR adds the ability for the eviction code to get immediate notification from the kernel when the available memory in the root cgroup falls below a user defined threshold, controlled by setting the `memory.available` siginal with the `--eviction-hard` flag.
This PR by itself, doesn't change anything as the frequency at which new stats can be obtained is currently controlled by the cadvisor housekeeping interval. That being the case, the call to `synchronize()` by the notification loop will very likely get stale stats and not act any more quickly than it does now.
However, whenever cadvisor does get on-demand stat gathering ability, this will improve eviction responsiveness by getting async notification of the root cgroup memory state rather than relying on polling cadvisor.
@vishh @derekwaynecarr @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
fix leaking memory backed volumes of terminated pods
Currently, we allow volumes to remain mounted on the node, even though the pod is terminated. This creates a vector for a malicious user to exhaust memory on the node by creating memory backed volumes containing large files.
This PR removes memory backed volumes (emptyDir w/ medium Memory, secrets, configmaps) of terminated pods from the node.
@saad-ali @derekwaynecarr
Automatic merge from submit-queue
Fix hostname truncate.
Fixes https://github.com/kubernetes/kubernetes/issues/36951.
This PR will keep truncating the hostname until the ending character is valid.
/cc @kubernetes/sig-node
Mark v1.5 because this is a bug fix.
/cc @saad-ali
We have observed that, after failing to create a container due to "device or
resource busy", docker may end up having inconsistent internal state. One
symptom is that docker will not report the existence of the "failed to create"
container, but if kubelet tries to create a new container with the same name,
docker will error out with a naming conflict message.
To work around this, this commit parses the creation error message and if there
is a naming conflict, it would attempt to remove the existing container.
Automatic merge from submit-queue
CRI: add docs for sysctls
#34830 adds `sysctls` features in CRI, it is based on sandbox annotations, this PR adds docs for it.
@yujuhong @timstclair @jonboulle
Automatic merge from submit-queue
CRI: Clarify User in CRI.
Addressed https://github.com/kubernetes/kubernetes/pull/36423#issuecomment-259343135.
This PR clarifies the user related fields in CRI.
One question is that:
What is the meaning of the `run_as_user` field in `LinuxSandboxSecurityContext`?
* **Is it user on the host?** Then it doesn't make sense, user shouldn't care about what users are on the host.
* **Is it user inside the infra container image?** This is how the field is currently used. However, Infra container is docker specific, I'm not sure whether we should expose this in CRI.
* **Is it the default user inside the pod?** It tells runtime that if there is a container (infra container, or some other helper containers like streaming container etc.), if their `user` is not specified, use the default "sandbox user". Then how can we guarantee that infra or helper container image have the `user`?
* **It doesn't make sense?** If we remove it, we are relying on the shim to set right user (maybe always root) for infra or helper containers (if there will be any in the future), I'm not sure whether this is what we expect.
@yujuhong @feiskyer @jonboulle @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
[kubelet] rename --cgroups-per-qos to --experimental-cgroups-per-qos
This reflects the true nature of "cgroups per qos" feature.
```release-note
* Rename `--cgroups-per-qos` to `--experimental-cgroups-per-qos` in Kubelet
```
Automatic merge from submit-queue
fix issue in reconstruct volume data when kubelet restarts
During state reconstruction when kubelet restarts, outerVolueSpecName
cannot be recovered by scanning the disk directories. But this
information is used by volume manager to check whether pod's volume is
mounted or not. There are two possible cases:
1. pod is not deleted during kubelet restarts so that desired state
should have the information. reconciler.updateState() will use this
inforamtion to update.
2. pod is deleted during this period, reconciler has to use
InnerVolumeSpecName, but it should be ok since this information will not
be used for volume cleanup (umount)
Automatic merge from submit-queue
CRI: general grammar/spelling/consistency cleanup
No semantic changes, but a lot of shuffling of docstrings to make things
more consistent. In particular, standardise on the zeroth-article (i.e.
prefer `// Version` to `// The version`) and ending all docstrings with
periods.
(This knowingly conflicts with #36446 and intentionally omits changing the
Annotations field - I'll rebase this or that respectively as necessary.)
During state reconstruction when kubelet restarts, outerVolueSpecName
cannot be recovered by scanning the disk directories. But this
information is used by volume manager to check whether pod's volume is
mounted or not. There are two possible cases:
1. pod is not deleted during kubelet restarts so that desired state
should have the information. reconciler.updateState() will use this
inforamtion to update.
2. pod is deleted during this period, reconciler has to use
InnerVolumeSpecName, but it should be ok since this information will not
be used for volume cleanup (umount)
Automatic merge from submit-queue
Fix getting cgroup pids
Fixes https://github.com/kubernetes/kubernetes/issues/35214, https://github.com/kubernetes/kubernetes/issues/33232
Verified manually, but I didn't have time to run all the e2e's yet (will check it in the morning).
This should be cherry-picked into 1.4, and merged into 1.5 (/cc @saad-ali )
```release-note
Fix fetching pids running in a cgroup, which caused problems with OOM score adjustments & setting the /system cgroup ("misc" in the summary API).
```
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add e2e node test for log path
fixes#34661
A node e2e test to check if container logs files are properly created with right content.
Since the log files under `/var/log/containers` are actually symbolic of docker containers log files, we can not use a pod to mount them in and do check (symbolic doesn't supported by docker volume).
cc @Random-Liu
Automatic merge from submit-queue
Use indirect streaming path for remote CRI shim
Last step for https://github.com/kubernetes/kubernetes/issues/29579
- Wire through the remote indirect streaming methods in the docker remote shim
- Add the docker streaming server as a handler at `<node>:10250/cri/{exec,attach,portforward}`
- Disable legacy streaming for dockershim
Note: This requires PR https://github.com/kubernetes/kubernetes/pull/34987 to work.
Tested manually on an E2E cluster.
/cc @euank @feiskyer @kubernetes/sig-node
Automatic merge from submit-queue
Cleanup kubelet eviction manager tests
It cleans up kubelet eviction manager tests
Extracted parts of tests that were similar to each other to functions
No semantic changes, but a lot of shuffling of docstrings to make things
more consistent. In particular, standardise on the zeroth-article (i.e.
prefer `// Version` to `// The version`) and ending all docstrings with
periods.
Provides an opt-in flag, --experimental-fail-swap-on (and corresponding
KubeletConfiguration value, ExperimentalFailSwapOn), which is false by default.
Automatic merge from submit-queue
CRI: Add security context for sandbox/container
Part of #29478. This PR
- adds security context for sandbox and fixes#33139
- encaps container security context to `SecurityContext` and adds missing features
- Note that capability is not fully accomplished in this PR because it is under discussion at #33614.
cc/ @yujuhong @yifan-gu @Random-Liu @kubernetes/sig-node
This allows us to interrupt/kill the executed command if it exceeds the
timeout (not implemented by this commit).
Set timeout in Exec probes. HTTPGet and TCPSocket probes respect the
timeout, while Exec probes used to ignore it.
Add e2e test for exec probe with timeout. However, the test is skipped
while the default exec handler doesn't support timeouts.
Automatic merge from submit-queue
kubelet bootstrap: start hostNetwork pods before we have PodCIDR
Network readiness was checked in the pod admission phase, but pods that
fail admission are not retried. Move the check to the pod start phase.
Issue #35409
Issue #35521
Automatic merge from submit-queue
CRI: rearrange kubelet rutnime initialization
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
Automatic merge from submit-queue
CRI: Add Status into CRI.
For https://github.com/kubernetes/kubernetes/issues/35701.
Fixes https://github.com/kubernetes/kubernetes/issues/35701.
This PR added a `Status` call in CRI, and the `RuntimeStatus` is defined as following:
``` protobuf
message RuntimeCondition {
// Type of runtime condition.
optional string type = 1;
// Status of the condition, one of true/false.
optional bool status = 2;
// Brief reason for the condition's last transition.
optional string reason = 3;
// Human readable message indicating details about last transition.
optional string message = 4;
}
message RuntimeStatus {
// Conditions is an array of current observed runtime conditions.
repeated RuntimeCondition conditions = 1;
}
```
Currently, only `conditions` is included in `RuntimeStatus`, and the definition is almost the same with `NodeCondition` and `PodCondition` in K8s api.
@yujuhong @feiskyer @bprashanth If this makes sense, I'll send a follow up PR to let dockershim return `RuntimeStatus` and let kubelet make use of it.
@yifan-gu @euank Does this make sense to rkt?
/cc @kubernetes/sig-node
Automatic merge from submit-queue
We only report diskpressure to users, and no longer report inodepressure
See #36180 for more information on why #33218 was reverted.
Automatic merge from submit-queue
CRI: stop sandbox before removing it
Stopping a sandbox includes reclaiming the network resources. By always
stopping the sandbox before removing it, we reduce the possibility of leaking
resources in some corner cases.
Automatic merge from submit-queue
Remove GetRootContext method from VolumeHost interface
Remove the `GetRootContext` call from the `VolumeHost` interface, since Kubernetes no longer needs to know the SELinux context of the Kubelet directory.
Per #33951 and #35127.
Depends on #33663; only the last commit is relevant to this PR.
Automatic merge from submit-queue
Initial work on running windows containers on Kubernetes
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
This is the first stab at getting the Kubelet running on Windows (fixes#30279), and getting it to deploy network-accessible pods that consist of Windows containers. Thanks @csrwng, @jbhurat for helping out.
The main challenge with Windows containers at this point is that container networking is not supported. In other words, each container in the pod will get it's own IP address. For this reason, we had to make a couple of changes to the kubelet when it comes to setting the pod's IP in the Pod Status. Instead of using the infra-container's IP, we use the IP address of the first container.
Other approaches we investigated involved "disabling" the infra container, either conditionally on `runtime.GOOS` or having a separate windows-docker container runtime that re-implemented some of the methods (would require some refactoring to avoid maintainability nightmare).
Other changes:
- The default docker endpoint was removed. This results in the docker client using the default for the specific underlying OS.
More detailed documentation on how to setup the Windows kubelet can be found at https://docs.google.com/document/d/1IjwqpwuRdwcuWXuPSxP-uIz0eoJNfAJ9MWwfY20uH3Q.
cc: @ikester @brendandburns @jstarks
Automatic merge from submit-queue
Don't add duplicate Hostname address
If the cloudprovider returned an address of type Hostname, we shouldn't
add a duplicate one.
Fixes#36234
Automatic merge from submit-queue
Per Volume Inode Accounting
Collects volume inode stats using the same find command as cadvisor. The command is "find _path_ -xdev -printf '.' | wc -c". The output is passed to the summary api, and will be consumed by the eviction manager.
This cannot be merged yet, as it depends on changes adding the InodesUsed field to the summary api, and the eviction manager consuming this. Expect tests to fail until this happens.
DEPENDS ON #35137
Automatic merge from submit-queue
[AppArmor] Hold bad AppArmor pods in pending rather than rejecting
Fixes https://github.com/kubernetes/kubernetes/issues/32837
Overview of the fix:
If the Kubelet needs to reject a Pod for a reason that the control plane doesn't understand (e.g. which AppArmor profiles are installed on the node), then it might contiinuously try to run the pod on the same rejecting node. This change adds a concept of "soft rejection", in which the Pod is admitted, but not allowed to run (and therefore held in a pending state). This prevents the pod from being retried on other nodes, but also prevents the high churn. This is consistent with how other missing local resources (e.g. volumes) is handled.
A side effect of the change is that Pods which are not initially runnable will be retried. This is desired behavior since it avoids a race condition when a new node is brought up but the AppArmor profiles have not yet been loaded on it.
``` release-note
Pods with invalid AppArmor configurations will be held in a Pending state, rather than rejected (failed). Check the pod status message to find out why it is not running.
```
@kubernetes/sig-node @timothysc @rrati @davidopp
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
Automatic merge from submit-queue
Separate Direct and Indirect streaming paths, implement indirect path for CRI
This PR refactors the `pkg/kubelet/container.Runtime` interface to remove the `ExecInContainer`, `PortForward` and `AttachContainer` methods. Instead, those methods are part of the `DirectStreamingRuntime` interface which all "legacy" runtimes implement. I also added an `IndirectStreamingRuntime` which handles the redirect path and is implemented by CRI runtimes. To control the size of this PR, I did not fully setup the indirect streaming path for the dockershim, so I left legacy path behind.
Most of this PR is moving & renaming associated with the refactoring. To understand the functional changes, I suggest tracing the code from `getExec` in `pkg/kubelet/server/server.go`, which calls `GetExec` in `pkg/kubelet/kubelet_pods.go` to determine whether to follow the direct or indirect path.
For https://github.com/kubernetes/kubernetes/issues/29579
/cc @kubernetes/sig-node
Automatic merge from submit-queue
CRI: Add devices to ContainerConfig
This PR adds devices to ContainerConfig and adds experimental GPU support.
cc/ @yujuhong @Hui-Zhi @vishh @kubernetes/sig-node
Stopping a sandbox includes reclaiming the network resources. By always
stopping the sandbox before removing it, we reduce the possibility of leaking
resources in some corner cases.
Automatic merge from submit-queue
Populate Node.Status.Addresses with Hostname
This PR is supposed to address #22063
Currently `NodeName` has to be a resolvable dns address on the master to allow apiserver -> kubelet communication (exec, log, port-forward operations on a pod). In some situations this is unfortunate (see the discussions on the issue).
The PR aims to do the following:
- Populate the `Type: Hostname` in the `Node.Status.Addresses` array, the type is already defined, but was not used so far.
- Add logic to resolve a Node's Hostname when the apiserver initiates communication with the Kubelet, instead of using the Nodename string as Hostname.
```release-note
The hostname of the node (as autodetected by the kubelet, specified via --hostname-override, or determined by the cloudprovider) is now recorded as an address of type "Hostname" in the status of the Node API object. The hostname is expected to be resolveable from the apiserver.
```
Automatic merge from submit-queue
pod and qos level cgroup support
```release-note
[Kubelet] Add alpha support for `--cgroups-per-qos` using the configured `--cgroup-driver`. Disabled by default.
```
Automatic merge from submit-queue
CRI: Handle empty container name in dockershim.
Fixes https://github.com/kubernetes/kubernetes/issues/35924.
Dead container may have no name, we should handle this properly.
@yujuhong @bprashanth
Automatic merge from submit-queue
CRI: Add kuberuntime container logs
Based on https://github.com/kubernetes/kubernetes/pull/34858.
The first 2 commits are from #34858. And the last 2 commits are new.
This PR added kuberuntime container logs support and add unit test for it.
I've tested all the functions manually, and I'll send another PR to write a node e2e test for container log.
**_Notice: current implementation doesn't support log rotation**_, which means that:
- It will not retrieve logs in rotated log file.
- If log rotation happens when following the log:
- If the rotation is using create mode, we'll still follow the old file.
- If the rotation is using copytruncate, we'll be reading at the original position and get nothing.
To solve these issues, kubelet needs to rotate the log itself, or at least kubelet should be able to control the the behavior of log rotator. These are doable but out of the scope of 1.5 and will be addressed in future release.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
CRI: Rename container/sandbox states
The enum constants are not namespaced. The shorter, unspecifc names are likely
to cause naming conflicts in the future.
Also replace "SandBox" with "Sandbox" in the API for consistency.
/cc @kubernetes/sig-node
The enum constants are not namespaced. The shorter, unspecifc names are likely
to cause naming conflicts in the future.
Also replace "SandBox" with "Sandbox" in the API.
This change add a container manager inside the dockershim to move docker daemon
and associated processes to a specified cgroup. The original kubelet container
manager will continue checking the name of the cgroup, so that kubelet know how
to report runtime stats.
Automatic merge from submit-queue
Eviction manager evicts based on inode consumption
Fixes: #32526 Integrate Cadvisor per-container inode stats into the summary api. Make the eviction manager act based on inode consumption to evict pods using the most inodes.
This PR is pending on a cadvisor godeps update which will be included in PR #35136
Automatic merge from submit-queue
Only set sysctls for infra containers
We did set the sysctls for each container in a pod. This opens up a way to set un-whitelisted sysctls during upgrade from v1.3:
- set annotation in v1.3 with an un-whitelisted sysctl. Set restartPolicy=Always
- upgrade cluster to v1.4
- kill container process
- un-whitelisted sysctl is set on restart of the killed container.
Automatic merge from submit-queue
SELinux Overhaul
Overhauls handling of SELinux in Kubernetes. TLDR: Kubelet dir no longer has to be labeled `svirt_sandbox_file_t`.
Fixes#33351 and #33510. Implements #33951.
Automatic merge from submit-queue
Implement streaming CRI methods in dockershim
*NOTE: Temporarily includes commit from https://github.com/kubernetes/kubernetes/pull/35330 - only review the second commit.*
Builds on https://github.com/kubernetes/kubernetes/pull/35330, using the library to implement the streaming methods in various CRI shims.
This does not actually wire up the new streaming methods in the kubelet (that will be my next PR). Once the new methods are wired up, I will delete the `Legacy{Exec,Attach,PortForward}` methods.
/cc @kubernetes/sig-node @feiskyer
Automatic merge from submit-queue
Simplify negotiation in server in preparation for multi version support
This is a pre-factor for #33900 to simplify runtime.NegotiatedSerializer, tighten up a few abstractions that may break when clients can request different client versions, and pave the way for better negotiation.
View this as pure simplification.
Automatic merge from submit-queue
Fix cadvisor_unsupported and the crossbuild
Resolves a bug in the `cadvisor_unsupported.go` code.
Fixes https://github.com/kubernetes/kubernetes/issues/35735
Introduced by: https://github.com/kubernetes/kubernetes/pull/35136
We should consider to cherrypick this as #35136 also was cherrypicked
cc @kubernetes/sig-testing @vishh @dashpole @jessfraz
```release-note
Fix cadvisor_unsupported and the crossbuild
```
Automatic merge from submit-queue
[PHASE 1] Opaque integer resource accounting.
## [PHASE 1] Opaque integer resource accounting.
This change provides a simple way to advertise some amount of arbitrary countable resource for a node in a Kubernetes cluster. Users can consume these resources by including them in pod specs, and the scheduler takes them into account when placing pods on nodes. See the example at the bottom of the PR description for more info.
Summary of changes:
- Defines opaque integer resources as any resource with prefix `pod.alpha.kubernetes.io/opaque-int-resource-`.
- Prevent kubelet from overwriting capacity.
- Handle opaque resources in scheduler.
- Validate integer-ness of opaque int quantities in API server.
- Tests for above.
Feature issue: https://github.com/kubernetes/features/issues/76
Design: http://goo.gl/IoKYP1
Issues:
kubernetes/kubernetes#28312kubernetes/kubernetes#19082
Related:
kubernetes/kubernetes#19080
CC @davidopp @timothysc @balajismaniam
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Added support for accounting opaque integer resources.
Allows cluster operators to advertise new node-level resources that would be
otherwise unknown to Kubernetes. Users can consume these resources in pod
specs just like CPU and memory. The scheduler takes care of the resource
accounting so that no more than the available amount is simultaneously
allocated to pods.
```
## Usage example
```sh
$ echo '[{"op": "add", "path": "pod.alpha.kubernetes.io~1opaque-int-resource-bananas", "value": "555"}]' | \
> http PATCH http://localhost:8080/api/v1/nodes/localhost.localdomain/status \
> Content-Type:application/json-patch+json
```
```http
HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 11 Aug 2016 16:44:55 GMT
Transfer-Encoding: chunked
{
"apiVersion": "v1",
"kind": "Node",
"metadata": {
"annotations": {
"volumes.kubernetes.io/controller-managed-attach-detach": "true"
},
"creationTimestamp": "2016-07-12T04:07:43Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/hostname": "localhost.localdomain"
},
"name": "localhost.localdomain",
"resourceVersion": "12837",
"selfLink": "/api/v1/nodes/localhost.localdomain/status",
"uid": "2ee9ea1c-47e6-11e6-9fb4-525400659b2e"
},
"spec": {
"externalID": "localhost.localdomain"
},
"status": {
"addresses": [
{
"address": "10.0.2.15",
"type": "LegacyHostIP"
},
{
"address": "10.0.2.15",
"type": "InternalIP"
}
],
"allocatable": {
"alpha.kubernetes.io/nvidia-gpu": "0",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
},
"capacity": {
"alpha.kubernetes.io/nvidia-gpu": "0",
"pod.alpha.kubernetes.io/opaque-int-resource-bananas": "555",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
},
"conditions": [
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-07-12T04:07:43Z",
"message": "kubelet has sufficient disk space available",
"reason": "KubeletHasSufficientDisk",
"status": "False",
"type": "OutOfDisk"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-07-12T04:07:43Z",
"message": "kubelet has sufficient memory available",
"reason": "KubeletHasSufficientMemory",
"status": "False",
"type": "MemoryPressure"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-08-10T06:27:11Z",
"message": "kubelet is posting ready status",
"reason": "KubeletReady",
"status": "True",
"type": "Ready"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-08-10T06:27:01Z",
"message": "kubelet has no disk pressure",
"reason": "KubeletHasNoDiskPressure",
"status": "False",
"type": "DiskPressure"
}
],
"daemonEndpoints": {
"kubeletEndpoint": {
"Port": 10250
}
},
"images": [],
"nodeInfo": {
"architecture": "amd64",
"bootID": "1f7e95ca-a4c2-490e-8ca2-6621ae1eb5f0",
"containerRuntimeVersion": "docker://1.10.3",
"kernelVersion": "4.5.7-202.fc23.x86_64",
"kubeProxyVersion": "v1.3.0-alpha.4.4285+7e4b86c96110d3-dirty",
"kubeletVersion": "v1.3.0-alpha.4.4285+7e4b86c96110d3-dirty",
"machineID": "cac4063395254bc89d06af5d05322453",
"operatingSystem": "linux",
"osImage": "Fedora 23 (Cloud Edition)",
"systemUUID": "D6EE0782-5DEB-4465-B35D-E54190C5EE96"
}
}
}
```
After patching, the kubelet's next sync fills in allocatable:
```
$ kubectl get node localhost.localdomain -o json | jq .status.allocatable
```
```json
{
"alpha.kubernetes.io/nvidia-gpu": "0",
"pod.alpha.kubernetes.io/opaque-int-resource-bananas": "555",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
}
```
Create two pods, one that needs a single banana and another that needs a truck load:
```
$ kubectl create -f chimp.yaml
$ kubectl create -f superchimp.yaml
```
Inspect the scheduler result and pod status:
```
$ kubectl describe pods chimp
Name: chimp
Namespace: default
Node: localhost.localdomain/10.0.2.15
Start Time: Thu, 11 Aug 2016 19:58:46 +0000
Labels: <none>
Status: Running
IP: 172.17.0.2
Controllers: <none>
Containers:
nginx:
Container ID: docker://46ff268f2f9217c59cc49f97cc4f0f085d5ac0e251f508cc08938601117c0cec
Image: nginx:1.10
Image ID: docker://sha256:82e97a2b0390a20107ab1310dea17f539ff6034438099384998fd91fc540b128
Port: 80/TCP
Limits:
cpu: 500m
memory: 64Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 3
Requests:
cpu: 250m
memory: 32Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 1
State: Running
Started: Thu, 11 Aug 2016 19:58:51 +0000
Ready: True
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
No volumes.
QoS Class: Burstable
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
9m 9m 1 {default-scheduler } Normal Scheduled Successfully assigned chimp to localhost.localdomain
9m 9m 2 {kubelet localhost.localdomain} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Pulled Container image "nginx:1.10" already present on machine
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Created Created container with docker id 46ff268f2f92
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Started Started container with docker id 46ff268f2f92
```
```
$ kubectl describe pods superchimp
Name: superchimp
Namespace: default
Node: /
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
nginx:
Image: nginx:1.10
Port: 80/TCP
Requests:
cpu: 250m
memory: 32Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 10Ki
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
PodScheduled False
No volumes.
QoS Class: Burstable
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
3m 1s 15 {default-scheduler } Warning FailedScheduling pod (superchimp) failed to fit in any node
fit failure on node (localhost.localdomain): Insufficient pod.alpha.kubernetes.io/opaque-int-resource-bananas
```
- Prevents kubelet from overwriting capacity during sync.
- Handles opaque integer resources in the scheduler.
- Adds scheduler predicate tests for opaque resources.
- Validates opaque int resources:
- Ensures supplied opaque int quantities in node capacity,
node allocatable, pod request and pod limit are integers.
- Adds tests for new validation logic (node update and pod spec).
- Added e2e tests for opaque integer resources.
Alter how runtime.SerializeInfo is represented to simplify negotiation
and reduce the need to allocate during negotiation. Simplify the dynamic
client's logic around negotiating type. Add more tests for media type
handling where necessary.
Automatic merge from submit-queue
First pass at CRI stream server library implementation
This is a first pass at implementing a library for serving attach/exec/portforward calls from a CRI shim process as discussed in [CRI Streaming Requests](https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit#).
Remaining library work:
- implement authn/z
- implement `stayUp=false`, a.k.a. auto-stop the server once all connections are closed
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add sysctls for dockershim
This PR adds sysctls support for dockershim. All sysctls e2e tests are passed in my local settings.
Note that sysctls runtimeAdmit is not included in this PR, it is addressed in #32803.
cc/ @yujuhong @Random-Liu
Automatic merge from submit-queue
Fix devices information struct in container
So far nowhere use the ```Devices``` which in ```RunContainerOptions```. But when I want to use it, found that it could be better if change it, because Devices in container is like:
```json
"Devices": [
{
"PathOnHost": "/dev/nvidiactl",
"PathInContainer": "/dev/nvidiactl",
"CgroupPermissions": "mrw"
},
{
"PathOnHost": "/dev/nvidia-uvm",
"PathInContainer": "/dev/nvidia-uvm",
"CgroupPermissions": "mrw"
},
{
"PathOnHost": "/dev/nvidia0",
"PathInContainer": "/dev/nvidia0",
"CgroupPermissions": "mrw"
}
],
```
Automatic merge from submit-queue
CRI: Instrumented cri service
For https://github.com/kubernetes/kubernetes/issues/29478.
This PR added instrumented CRI service. Because we are adding the instrumented wrapper inside kuberuntime, it should work for both grpc and non-grpc integration.
This will be useful to compare latency difference between grpc and non-grpc integration, although there shouldn't be too much difference.
@yujuhong @feiskyer
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Refactor PortForward server methods into the portforward package
Refactor PortForward code into it's own package so it can be reused in the CRI streaming library without pulling in lots of extra dependencies.
This is a straightforward move. Nothing is changed other than a few references to the package.
Automatic merge from submit-queue
Fix volume states out of sync problem after kubelet restarts
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.
This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)
The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.
Detailed design proposal is #33203
Automatic merge from submit-queue
CRI: Add dockershim grpc server.
This PR adds a in-process grpc server for dockershim.
Flags change:
1. `container-runtime` will not be automatically set to remote when `container-runtime-endpoint` is set. @feiskyer
2. set kubelet flag `--experimental-runtime-integration-type=remote --container-runtime-endpoint=UNIX_SOCKET_FILE_PATH` to enable the in-process dockershim grpc server.
3. set node e2e test flag `--runtime-integration-type=remote -container-runtime-endpoint=UNIX_SOCKET_FILE_PATH` to run node e2e test against in-process dockershim grpc server.
I've run node e2e test against the remote cri integration, tests which don't rely on stream and log functions can pass.
This unblocks the following work:
1) CRI conformance test.
2) Performance comparison between in-process integration and in-process grpc integration.
@yujuhong @feiskyer
/cc @kubernetes/sig-node
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.
This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)
The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.
Details are in proposal PR #33203
Automatic merge from submit-queue
Do not log stack trace for the error http.StatusBadRequest (400).
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
This PR fixes an issue where stack trace is being logged in kubelet when the status http.StatusBadRequest occurs.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
Automatic merge from submit-queue
Use the rawTerminal setting from the container itself
**What this PR does / why we need it**:
Checks whether the container is set for rawTerminal connection and uses the appropriate connection.
Prevents the output `Error from server: Unrecognized input header` when doing `kubectl run`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
helps with case 1 in #28695, resolves#30159
**Special notes for your reviewer**:
**Release note**:
```
release-note-none
```
Automatic merge from submit-queue
CRI: Refactor kuberuntime unit test
Based on https://github.com/kubernetes/kubernetes/pull/34858
This PR:
1) Refactor the fake runtime service and some kuberuntime unit test.
2) Add better garbage collection unit test.
3) Fix init container unit test which isn't testing correctly. Some other unit tests may also need to be fixed.
4) Add pod log directory garbage collection unit test.
@feiskyer @yujuhong
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Kubelet getting node from apiserver cache before update.
This is blocked on #35218 (however it's ready for review).
It seems to visibly reduce the apiserver metrics (and I didn't observe higher number of conflicts even in 2000-node kubemark).
Automatic merge from submit-queue
Create restclient interface
Refactoring of code to allow replace *restclient.RESTClient with any RESTClient implementation that implements restclient.RESTClientInterface interface.
Automatic merge from submit-queue
CRI: Handle container/sandbox restarts for pod with RestartPolicy == …
If all sandbox and containers are dead in a pod, and the restart policy is
"Never", kubelet should not try to recreate all of them.
Automatic merge from submit-queue
Return an empty network namespace path for exited infra containers
If the infra container has already terminated, `docker inspect` will report
pid 0. The path constructed using the pid to check the network namespace of
the process will be invalid. This commit changes docker to report an empty
path to stop kubenet from erroring out whenever TearDown is called on an
exited infra container.
This is not a fix for all the plugins, as some plugins may require the actual
network namespace to tear down properly.
If the infra container has already terminated, `docker inspect` will report
pid 0. The path constructed using the pid to check the network namespace of
the process will be invalid. This commit changes docker to report an empty
path to stop kubenet from erroring out whenever TearDown is called on an
exited infra container.
This is not a fix for all the plugins, as some plugins may require the actual
network namespace to tear down properly.
Automatic merge from submit-queue
rkt: Convert image name to be a valid acidentifier
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Fix a bug under the rkt runtime whereby image-registries with ports would not be fetched from
```
This fixes a bug whereby an image reference that included a port was not
recognized after being downloaded, and so could not be run
This is the quick-and-simple fix. In the longer term, we'll want to refactor image logic a bit more to handle the many special cases that the current code does not, mostly related to library images on dockerhub.
/cc @yifan-gu @kubernetes/sig-rktnetes
Automatic merge from submit-queue
WIP: Remove the legacy networking mode
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
Removes the deprecated configure-cbr0 flag and networking mode to avoid having untested and maybe unstable code in kubelet, see: #33789
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#30589fixes#31937
**Special notes for your reviewer**: There are a lot of deployments who rely on this networking mode. Not sure how we deal with that: force switch to kubenet or just delete the old deployment?
But please review the code changes first (the first commit)
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Removed the deprecated kubelet --configure-cbr0 flag, and with that the "classic" networking mode as well
```
PTAL @kubernetes/sig-network @kubernetes/sig-node @mikedanese
Automatic merge from submit-queue
Remove static kubelet client, refactor ConnectionInfoGetter
Follow up to https://github.com/kubernetes/kubernetes/pull/33718
* Collapses the multi-valued return to a `ConnectionInfo` struct
* Removes the "raw" connection info method and interface, since it was only used in a single non-test location (by the "real" connection info method)
* Disentangles the node REST object from being a ConnectionInfoProvider itself by extracting an implementation of ConnectionInfoProvider that takes a node (using a provided NodeGetter) and determines ConnectionInfo
* Plumbs the KubeletClientConfig to the point where we construct the helper object that combines the config and the node lookup. I anticipate adding a preference order for choosing an address type in https://github.com/kubernetes/kubernetes/pull/34259
Automatic merge from submit-queue
kubelet: storage: don't hang kubelet on unresponsive nfs
Fixes#31272
Currently, due to the nature of nfs, an unresponsive nfs volume in a pod can wedge the kubelet such that additional pods can not be run.
The discussion thus far surrounding this issue was to wrap the `lstat`, the syscall that ends up hanging in uninterruptible sleep, in a goroutine and limiting the number of goroutines that hang to one per-pod per-volume.
However, in my investigation, I found that the callsites that request a listing of the volumes from a particular volume plugin directory don't care anything about the properties provided by the `lstat` call. They only care about whether or not a directory exists.
Given that constraint, this PR just avoids the `lstat` call by using `Readdirnames()` instead of `ReadDir()` or `ReadDirNoExit()`
### More detail for reviewers
Consider the pod mounted nfs volume at `/var/lib/kubelet/pods/881341b5-9551-11e6-af4c-fa163e815edd/volumes/kubernetes.io~nfs/myvol`. The kubelet wedges because when we do a `ReadDir()` or `ReadDirNoExit()` it calls `syscall.Lstat` on `myvol` which requires communication with the nfs server. If the nfs server is unreachable, this call hangs forever.
However, for our code, we only care what about the names of files/directory contained in `kubernetes.io~nfs` directory, not any of the more detailed information the `Lstat` call provides. Getting the names can be done with `Readdirnames()`, which doesn't need to involve the nfs server.
@pmorie @eparis @ncdc @derekwaynecarr @saad-ali @thockin @vishh @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Fix edge case in qos evaluation
If a pod has a container C1 and C2, where sum(C1.requests, C2.requests) equals (C1.Limits), the code was reporting that the pod had "Guaranteed" qos, when it should have been Burstable.
/cc @vishh @dchen1107
Automatic merge from submit-queue
Log more information on pod status updates
Also bump the logging level to V2 so that we can see them in a non-test
cluster.
Automatic merge from submit-queue
add UpdateRuntimeConfig interface
Expose UpdateRuntimeConfig interface in RuntimeService for kubelet to pass a set of configurations to runtime. Currently it only takes PodCIDR.
The use case is for kubelet to pass configs to runtime. Kubelet holds some config/information which runtime does not have, such as PodCIDR. I expect some of kubelet configurations will gradually move to runtime, but I believe cases like PodCIDR, which dynamically assigned by k8s master, need to stay for a while.
Automatic merge from submit-queue
Allow kuberuntime to get network namespace for not ready sandboxes
Kubelet calls TearDownPod to clean up the network resources for a pod sandbox.
TearDownPod relies on GetNetNS to retrieve network namespace, and the current
implementation makes this impossible for not-ready sandboxes. This change
removes the unnecessary filter to fix this issue.
Kubelet calls TearDownPod to clean up the network resources for a pod sandbox.
TearDownPod relies on GetNetNS to retrieve network namespace, and the current
implementation makes this impossible for not-ready sandboxes. This change
removes the unnecessary filter to fix this issue.
Automatic merge from submit-queue
CRI: Image pullable support in dockershim
For #33189.
The new test `ImageID should be set to the manifest digest (from RepoDigests) when available` introduced in #33014 is failing, because:
1) `docker-pullable://` conversion is not supported in dockershim;
2) `kuberuntime` and `dockershim` is using `ListImages with image name filter` to check whether image presents. However, `ListImages` doesn't support filter with `digest`.
This PR:
1) Change `kuberuntime.IsImagePresent` to use `runtime.ImageStatus` and `dockershim.InspectImage` instead. ***Notice an API change: `ImageStatus` should return `(nil, nil)` for non-existing image.***
2) Add `docker-pullable://` support.
3) Fix `RemoveImage` in dockershim https://github.com/kubernetes/kubernetes/pull/29316.
I've tried myself, the test can pass now.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add version cache for cri APIVersion
ref https://github.com/kubernetes/kubernetes/issues/29478
1. Added a version cache for `APIVersion()` by using object cache., with ttl=1 min
2. Leaving `Version()` as it is today