Automatic merge from submit-queue
Add websocket support for port forwarding
#32880
**Release note**:
```release-note
Port forwarding can forward over websockets or SPDY.
```
- adjust ports to int32
- CRI flows the websocket ports as query params
- Do not validate ports since the protocol is unknown
SPDY flows the ports as headers and websockets uses query params
- Only flow query params if there is at least one port query param
Automatic merge from submit-queue
securitycontext: move docker-specific logic into kubelet/dockertools
This change moves the code specific to docker to kubelet/dockertools,
while leaving the common utility functions at its current package
(pkg/securitycontext).
When we deprecate dockertools in the future, the code will be moved to
pkg/kubelet/dockershim instead.
Depending on an exact cluster setup multiple dns may make sense.
Comma-seperated lists of DNS server are quite common as DNS servers
are always plain IPs.
- split out port forwarding into its own package
Allow multiple port forwarding ports
- Make it easy to determine which port is tied to which channel
- odd channels are for data
- even channels are for errors
- allow comma separated ports to specify multiple ports
Add portfowardtester 1.2 to whitelist
Automatic merge from submit-queue (batch tested with PRs 40638, 40742, 40710, 40718, 40763)
move client/record
An attempt at moving client/record to client-go. It's proving very stubborn and needs a lot manual intervention and near as I can tell, no one actually gets any benefit from the sink and source complexity it adds.
@sttts @caesarchaoxu
Automatic merge from submit-queue
kuberuntime: remove the kubernetesManagedLabel label
The CRI shim should be responsible for returning only those
containers/sandboxes created through CRI. Remove this label in kubelet.
Automatic merge from submit-queue (batch tested with PRs 40527, 40738, 39366, 40609, 40748)
pkg/kubelet/dockertools/docker_manager.go: removing unused stuff
This PR removes unused constants and variables. I checked that neither kubernetes nor openshift code aren't using them.
Automatic merge from submit-queue (batch tested with PRs 40392, 39242, 40579, 40628, 40713)
optimize podSandboxChanged() function and fix some function notes
Automatic merge from submit-queue (batch tested with PRs 38443, 40145, 40701, 40682)
fix GetVolumeInUse() function
Since we just want to get volume name info, each volume name just need to added once. desiredStateOfWorld.GetVolumesToMount() will return volume and pod binding info,
if one volume is mounted to several pods, the volume name will be return several times. That is not what we want in this function.
We can add a new function to only get the volume name info or judge whether the volume name is added to the desiredVolumesMap array.
This change moves the code specific to docker to kubelet/dockertools,
while leaving the common utility functions at its current package
(pkg/securitycontext).
When we deprecate dockertools in the future, the code will be moved to
pkg/kubelet/dockershim instead.
Automatic merge from submit-queue (batch tested with PRs 40126, 40565, 38777, 40564, 40572)
docker-CRI: Remove legacy code for non-grpc integration
A minor cleanup to remove the code that is no longer in use to simplify the logic.
Automatic merge from submit-queue (batch tested with PRs 40126, 40565, 38777, 40564, 40572)
Add IsContainerNotFound in kube_docker_client
This PR added `IsContainerNotFound` function in kube_docker_client and changed dockershim to use it.
@yujuhong @freehan
Automatic merge from submit-queue (batch tested with PRs 40046, 40073, 40547, 40534, 40249)
fix a typo in cni log
**What this PR does / why we need it**:
fixes a typo s/unintialized/uninitialized in pkg/kubelet/network/cni/cni.go
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
CRI: Work around container create conflict.
Fixes https://github.com/kubernetes/kubernetes/issues/40443.
This PR added a random suffix in the container name when we:
* Failed to create the container because of "Conflict".
* And failed to remove the container because of "No such container".
@yujuhong @feiskyer
/cc @kubernetes/sig-node-bugs
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
move the discovery and dynamic clients
Moved the dynamic client, discovery client, testing/core, and testing/cache to `client-go`. Dependencies on api groups we don't have generated clients for have dropped out, so federation, kubeadm, and imagepolicy.
@caesarxuchao @sttts
approved based on https://github.com/kubernetes/kubernetes/issues/40363
Automatic merge from submit-queue (batch tested with PRs 40239, 40397, 40449, 40448, 40360)
CRI: use more gogoprotobuf plugins
Generate marshaler/unmarshaler code should help improve the performance.
This addresses #40098
Automatic merge from submit-queue
Delay deletion of pod from the API server until volumes are deleted
Depends on #37228, and will not pass tests until that PR is merged, and this is rebased.
Keeps all kubelet behavior the same, except the kubelet will not make the "Delete" call (kubeClient.Core().Pods(pod.Namespace).Delete(pod.Name, deleteOptions)) until the volumes associated with that pod are removed. I will perform some performance testing so that we better understand the latency impact of this change.
Is kubelet_pods.go the correct file to include the "when can I delete this pod" logic?
cc: @vishh @sjenning @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 38739, 40480, 40495, 40172, 40393)
Use fnv hash in the CRI implementation
fnv is more stable than adler. This PR changes CRI implementation to
use fnv for generating container hashes, but leaving the old
implementation (dockertools/rkt). This is because hash is what kubelet
uses to identify a container -- changes to the hash will cause kubelet
to restart existing containers. This is ok for CRI implementation (which
requires a disruptive upgrade already), but not for older implementations.
#40140
Automatic merge from submit-queue (batch tested with PRs 39538, 40188, 40357, 38214, 40195)
Use SecretManager when getting secrets for EnvFrom
Merges crossed in the night which missed this needed change.
Leave the old implementation (dockertools/rkt) untouched so that
containers will not be restarted during kubelet upgrade. For CRI
implementation (kuberuntime), container restart is required for kubelet
upgrade.
Automatic merge from submit-queue
Move remaining *Options to metav1
Primarily delete options, but will remove all internal references to non-metav1 options (except ListOptions).
Still working through it @sttts @deads2k
Automatic merge from submit-queue (batch tested with PRs 39275, 40327, 37264)
dockertools: remove some dead code
Remove `dockerRoot` that's not used anywhere.
Automatic merge from submit-queue
Fix bad time values in kubelet FakeRuntimeService
These values don't affect tests but they can be confusing
for developers looking at the code for reference.
Automatic merge from submit-queue
Optional configmaps and secrets
Allow configmaps and secrets for environment variables and volume sources to be optional
Implements approved proposal c9f881b7bb
Release note:
```release-note
Volumes and environment variables populated from ConfigMap and Secret objects can now tolerate the named source object or specific keys being missing, by adding `optional: true` to the volume or environment variable source specifications.
```
Enforce the following limits:
12kb for total message length in container status
4kb for the termination message path file
2kb or 80 lines (whichever is shorter) from the log on error
Fallback to log output if the user requests it.
Automatic merge from submit-queue
Remove TODOs to refactor kubelet labels
To address #39650 completely.
Remove label refactoring TODOs, we don't need them since CRI rollout is on the way.
Automatic merge from submit-queue (batch tested with PRs 40250, 40134, 40210)
Typo fix: Change logging function to formatting version
**What this PR does / why we need it**:
Slightly broken logging message:
```
I0120 10:56:08.555712 7575 kubelet_node_status.go:135] Deleted old node object %qkubernetes-cit-kubernetes-cr0-0
```
Automatic merge from submit-queue (batch tested with PRs 40232, 40235, 40237, 40240)
move listers out of cache to reduce import tree
Moving the listers from `pkg/client/cache` snips links to all the different API groups from `pkg/storage`, but the dreaded `ListOptions` remains.
@sttts
Automatic merge from submit-queue (batch tested with PRs 37228, 40146, 40075, 38789, 40189)
Cleanup temp dirs
So funny story my /tmp ran out of space running the unit tests so I am cleaning up all the temp dirs we create.
Automatic merge from submit-queue (batch tested with PRs 37228, 40146, 40075, 38789, 40189)
kubelet: storage: teardown terminated pod volumes
This is a continuation of the work done in https://github.com/kubernetes/kubernetes/pull/36779
There really is no reason to keep volumes for terminated pods attached on the node. This PR extends the removal of volumes on the node from memory-backed (the current policy) to all volumes.
@pmorie raised a concern an impact debugging volume related issues if terminated pod volumes are removed. To address this issue, the PR adds a `--keep-terminated-pod-volumes` flag the kubelet and sets it for `hack/local-up-cluster.sh`.
For consideration in 1.6.
Fixes#35406
@derekwaynecarr @vishh @dashpole
```release-note
kubelet tears down pod volumes on pod termination rather than pod deletion
```
Automatic merge from submit-queue (batch tested with PRs 40011, 40159)
dockertools/nsenterexec: fix err shadow
The shadow of err meant the combination of `exec-handler=nsenter` +
`tty` + a non-zero exit code meant that the exit code would be LOST
FOREVER 👻
This isn't all that important since no one really used the nsenter exec
handler as I understand it
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 36693, 40154, 40170, 39033)
make client-go authoritative for pkg/client/restclient
Moves client/restclient to client-go and a util/certs, util/testing as transitives.
Automatic merge from submit-queue (batch tested with PRs 40168, 40165, 39158, 39966, 40190)
dockershim: add support for the 'nsenter' exec handler
This change simply plumbs the kubelet configuration
(--docker-exec-handler) to DockerService.
This fixes#35747.
Automatic merge from submit-queue (batch tested with PRs 40168, 40165, 39158, 39966, 40190)
CRI: upgrade protobuf to v3
For #38854, this PR upgrades CRI protobuf version to v3, and also updated related packages for confirming to new api.
**Release note**:
```
CRI: upgrade protobuf version to v3.
```
The shadow of err meant the combination of `exec-handler=nsenter` +
`tty` + a non-zero exit code meant that the exit code would be LOST
FOREVER 👻
This isn't all that important since no one really used the nsenter exec
handler as I understand it
Automatic merge from submit-queue (batch tested with PRs 39446, 40023, 36853)
Create environment variables from secrets
Allow environment variables to be populated from entire secrets.
**Release note**:
```release-note
Populate environment variables from a secrets.
```
Automatic merge from submit-queue
promote certificates api to beta
Mostly posting to see what breaks but also this API is ready to be promoted.
```release-note
Promote certificates.k8s.io to beta and enable it by default. Users using the alpha certificates API should delete v1alpha1 CSRs from the API before upgrading and recreate them as v1beta1 CSR after upgrading.
```
@kubernetes/api-approvers @jcbsmpsn @pipejakob
Automatic merge from submit-queue
move pkg/fields to apimachinery
Purely mechanical move of `pkg/fields` to apimachinery.
Discussed with @lavalamp on slack. Moving this an `labels` to apimachinery.
@liggitt any concerns? I think the idea of field selection should become generic and this ends up shared between client and server, so this is a more logical location.
Automatic merge from submit-queue
make client-go more authoritative
Builds on https://github.com/kubernetes/kubernetes/pull/40103
This moves a few more support package to client-go for origination.
1. restclient/watch - nodep
1. util/flowcontrol - used interface
1. util/integer, util/clock - used in controllers and in support of util/flowcontrol
Automatic merge from submit-queue
Fixed merging of host's and dns' search lines
Fixed forming of pod's Search line in resolv.conf:
- exclude duplicates while merging of host's and dns' search lines to form pod's one
- truncate pod's search line if it exceeds resolver limits: is > 255 chars and containes > 6 searches
- monitoring the resolv.conf file which is used by kubelet (set thru --resolv-conf="") and logging and eventing if search line in it consists of more than 3 entries (or 6 if Cluster Domain is set) or its lenght is > 255 chars
- logging and eventing when a pod's search line is > 255 chars or containes > 6 searches during forming
Fixes#29270
**Release note**:
```release-note
Fixed forming resolver search line for pods: exclude duplicates, obey libc limitations, logging and eventing appropriately.
```
Automatic merge from submit-queue
Curating Owners: pkg/kubelet
cc @euank @vishh @dchen1107 @feiskyer @yujuhong @yifan-gu @derekwaynecarr @saad-ali
In an effort to expand the existing pool of reviewers and establish a
two-tiered review process (first someone lgtms and then someone
experienced in the project approves), we are adding new reviewers to
existing owners files.
If You Care About the Process:
------------------------------
We did this by algorithmically figuring out who’s contributed code to
the project and in what directories. Unfortunately, that doesn’t work
well: people that have made mechanical code changes (e.g change the
copyright header across all directories) end up as reviewers in lots of
places.
Instead of using pure commit data, we generated an excessively large
list of reviewers and pruned based on all time commit data, recent
commit data and review data (number of PRs commented on).
At this point we have a decent list of reviewers, but it needs one last
pass for fine tuning.
Also, see https://github.com/kubernetes/contrib/issues/1389.
TLDR:
-----
As an owner of a sig/directory and a leader of the project, here’s what
we need from you:
1. Use PR https://github.com/kubernetes/kubernetes/pull/35715 as an example.
2. The pull-request is made editable, please edit the `OWNERS` file to
remove the names of people that shouldn't be reviewing code in the
future in the **reviewers** section. You probably do NOT need to modify
the **approvers** section. Names asre sorted by relevance, using some
secret statistics.
3. Notify me if you want some OWNERS file to be removed. Being an
approver or reviewer of a parent directory makes you a reviewer/approver
of the subdirectories too, so not all OWNERS files may be necessary.
4. Please use ALIAS if you want to use the same list of people over and
over again (don't hesitate to ask me for help, or use the pull-request
above as an example)
Automatic merge from submit-queue
Made tracing of calls and container lifecycle steps in FakeDockerClient optional
Fixes#39717
Slightly refactored the FakeDockerClient code and made tracing optional (but enabled by default).
@yujuhong @Random-Liu
- exclude duplicates while merging of host's and dns' search lines to form pod's one
- truncate pod's search line if it exceeds resolver limits: is > 255 chars and containes > 6 searches
- monitoring the resolv.conf file which is used by kubelet (set thru --resolv-conf="") and logging and eventing if search line in it consists of more than 3 entries
(or 6 if Cluster Domain is set) or its lenght is > 255 chars
- logging and eventing when a pod's search line is > 255 chars or containes > 6 searches during forming
Fixes#29270
Automatic merge from submit-queue
Report the Pod name and namespace when kubelet fails to sync the container
This helps debugging problems with SELinux (and other problems related to the Docker failed to run the container) as currently only the UUID of the Pod is reported:
```
Error syncing pod 670f607d-b5a8-11a4-b673-005056b7468b, skipping: failed to "StartContainer" for "deployment" with RunContainerError: "runContainer: Error response from daemon: Relabeling content in /usr is not allowed."
```
Here it would be useful to know what pod in which namespace is trying to mount the "/usr".
Automatic merge from submit-queue (batch tested with PRs 39417, 39679)
Fix 2 `sucessfully` typos
**What this PR does / why we need it**: Only fixes two typos in comments/logging
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
replace global registry in apimachinery with global registry in k8s.io/kubernetes
We'd like to remove all globals, but our immediate problem is that a shared registry between k8s.io/kubernetes and k8s.io/client-go doesn't work. Since client-go makes a copy, we can actually keep a global registry with other globals in pkg/api for now.
@kubernetes/sig-api-machinery-misc @lavalamp @smarterclayton @sttts
Automatic merge from submit-queue
break from the for loop
**What this PR does / why we need it**:
exit loop, because the following actions will not affect the result
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Fix cadvisor_unsupported.go build tags
Make it so cadvisor_unsupported.go is used for linux without cgo or
non-linux/windows OSes.
Automatic merge from submit-queue
kubelet: remove the pleg health check from healthz
This prevents kubelet from being killed when docker hangs.
Also, kubelet will report node not ready if PLEG hangs (`docker ps` + `docker inspect`).
Automatic merge from submit-queue (batch tested with PRs 39803, 39698, 39537, 39478)
[scheduling] Moved pod affinity and anti-affinity from annotations to api fields #25319
Converted pod affinity and anti-affinity from annotations to api fields
Related: #25319
Related: #34508
**Release note**:
```Pod affinity and anti-affinity has moved from annotations to api fields in the pod spec. Pod affinity or anti-affinity that is defined in the annotations will be ignored.```
Automatic merge from submit-queue
[CRI] Don't include user data in CRI streaming redirect URLs
Fixes: https://github.com/kubernetes/kubernetes/issues/36187
Avoid userdata in the redirect URLs by caching the {Exec,Attach,PortForward}Requests with a unique token. When the redirect URL is created, the token is substituted for the request params. When the streaming server receives the token request, the token is used to fetch the actual request parameters out of the cache.
For additional security, the token is generated using the secure random function, is single use (i.e. the first request with the token consumes it), and has a short expiration time.
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Fix kubelet cross build
**What this PR does / why we need it**: Cross builds are not passing for MacOS and Windows. We are expecting Windows binaries for `kubelet` and `kube-proxy` to be released by the first time with 1.5.2 to be released later today.
**Which issue this PR fixes**:
fixes#39005fixes#39714
**Special notes for your reviewer**: /cc @feiskyer @smarterclayton @vishh this should be P0 in order to be merged before 1.5.2 and obviously fix the cross build.
Automatic merge from submit-queue (batch tested with PRs 39475, 38666, 39327, 38396, 39613)
Create k8s.io/apimachinery repo
Don't panic.
The diff is quite large, but its all generated change. The first few commits are where are all the action is. I built a script to find the fanout from
```
k8s.io/kubernetes/pkg/apimachinery/registered
k8s.io/kubernetes/pkg/runtime/serializer
k8s.io/kubernetes/pkg/runtime/serializer/yaml
k8s.io/kubernetes/pkg/runtime/serializer/streaming
k8s.io/kubernetes/pkg/runtime/serializer/recognizer/testing
```
It copied
```
k8s.io/kubernetes/pkg/api/meta
k8s.io/kubernetes/pkg/apimachinery
k8s.io/kubernetes/pkg/apimachinery/registered
k8s.io/kubernetes/pkg/apis/meta/v1
k8s.io/kubernetes/pkg/apis/meta/v1/unstructured
k8s.io/kubernetes/pkg/conversion
k8s.io/kubernetes/pkg/conversion/queryparams
k8s.io/kubernetes/pkg/genericapiserver/openapi/common - this needs to renamed post-merge. It's just types
k8s.io/kubernetes/pkg/labels
k8s.io/kubernetes/pkg/runtime
k8s.io/kubernetes/pkg/runtime/schema
k8s.io/kubernetes/pkg/runtime/serializer
k8s.io/kubernetes/pkg/runtime/serializer/json
k8s.io/kubernetes/pkg/runtime/serializer/protobuf
k8s.io/kubernetes/pkg/runtime/serializer/recognizer
k8s.io/kubernetes/pkg/runtime/serializer/recognizer/testing
k8s.io/kubernetes/pkg/runtime/serializer/streaming
k8s.io/kubernetes/pkg/runtime/serializer/versioning
k8s.io/kubernetes/pkg/runtime/serializer/yaml
k8s.io/kubernetes/pkg/selection
k8s.io/kubernetes/pkg/types
k8s.io/kubernetes/pkg/util/diff
k8s.io/kubernetes/pkg/util/errors
k8s.io/kubernetes/pkg/util/framer
k8s.io/kubernetes/pkg/util/json
k8s.io/kubernetes/pkg/util/net
k8s.io/kubernetes/pkg/util/runtime
k8s.io/kubernetes/pkg/util/sets
k8s.io/kubernetes/pkg/util/validation
k8s.io/kubernetes/pkg/util/validation/field
k8s.io/kubernetes/pkg/util/wait
k8s.io/kubernetes/pkg/util/yaml
k8s.io/kubernetes/pkg/watch
k8s.io/kubernetes/third_party/forked/golang/reflect
```
The script does the import rewriting and gofmt. Then you do a build, codegen, bazel update, and it produces all the updates.
If we agree this is the correct approach. I'll create a verify script to make sure that no one messes with any files in the "dead" packages above.
@kubernetes/sig-api-machinery-misc @smarterclayton @sttts @lavalamp @caesarxuchao
`staging/prime-apimachinery.sh && hack/update-codegen.sh && nice make WHAT="federation/cmd/federation-apiserver/ cmd/kube-apiserver" && hack/update-openapi-spec.sh && hack/update-federation-openapi-spec.sh && hack/update-codecgen.sh && hack/update-codegen.sh && hack/update-generated-protobuf.sh && hack/update-bazel.sh`
Automatic merge from submit-queue (batch tested with PRs 39684, 39577, 38989, 39534, 39702)
kubelet: request client auth certificates from certificate API.
This fixes kubeadm and --experiment-kubelet-bootstrap.
cc @liggitt
Automatic merge from submit-queue (batch tested with PRs 39684, 39577, 38989, 39534, 39702)
Set PodStatus QOSClass field
This PR continues the work for https://github.com/kubernetes/kubernetes/pull/37968
It converts all local usage of the `qos` package class types to the new API level types (first commit) and sets the pod status QOSClass field in the at pod creation time on the API server in `PrepareForCreate` and in the kubelet in the pod status update path (second commit). This way the pod QOS class is set even if the pod isn't scheduled yet.
Fixes#33255
@ConnorDoyle @derekwaynecarr @vishh
Automatic merge from submit-queue (batch tested with PRs 38212, 38792, 39641, 36390, 39005)
Set MemorySwap to zero on Windows
Fixes https://github.com/kubernetes/kubernetes/issues/39003
@dchen1107 @michmike @kubernetes/sig-node-misc
Automatic merge from submit-queue (batch tested with PRs 38212, 38792, 39641, 36390, 39005)
Generate a kubelet CA and kube-apiserver cert-pair for kubelet auth.
cc @cjcullen
Automatic merge from submit-queue (batch tested with PRs 39486, 37288, 39477, 39455, 39542)
Revert "Small improve for GetContainerOOMScoreAdjust"
Reverts kubernetes/kubernetes#39306
This does not help current code healthy, let's revert it to avoid further confusing.
Automatic merge from submit-queue (batch tested with PRs 39493, 39496)
kubelet: fix nil deref in volume type check
An attempt to address memory exhaustion through a build up of terminated pods with memory backed volumes on the node in PR https://github.com/kubernetes/kubernetes/pull/36779 introduced this.
For the `VolumeSpec`, either the `Volume` or `PersistentVolume` field is set, not both. This results in a situation where there is a nil deref on PVs. Since PVs are inherently not memory-backend, only local/temporal volumes should be considered.
This needs to go into 1.5 as well.
Fixes#39480
@saad-ali @derekwaynecarr @grosskur @gnufied
```release-note
fixes nil dereference when doing a volume type check on persistent volumes
```
Automatic merge from submit-queue
Start moving genericapiserver to staging
This moves `pkg/auth/user` to `staging/k8s.io/genericapiserver/pkg/authentication/user`. I'll open a separate pull into the upstream gengo to support using `import-boss` on vendored folders to support staging.
After we agree this is the correct approach and see everything build, I'll start moving other packages over which don't have k8s.io/kubernetes deps.
@kubernetes/sig-api-machinery-misc @lavalamp
@sttts @caesarxuchao ptal
Automatic merge from submit-queue (batch tested with PRs 38084, 39306)
Small improve for GetContainerOOMScoreAdjust
In `GetContainerOOMScoreAdjust`, make logic more clear for the case `oomScoreAdjust >= besteffortOOMScoreAdj`. If `besteffortOOMScoreAdj` is defined to another value(e.g. 996), suppose `oomScoreAdjust` is 999, the function will return 998(which equals 999 - 1) instead of 995(996 -1).
Automatic merge from submit-queue (batch tested with PRs 38433, 36245)
Allow pods to define multiple environment variables from a whole ConfigMap
Allow environment variables to be populated from ConfigMaps
- ConfigMaps represent an entire set of EnvVars
- EnvVars can override ConfigMaps
fixes#26299
Automatic merge from submit-queue (batch tested with PRs 39280, 37350, 39389, 39390, 39313)
delete meaningless judgments
What this PR does / why we need it:
Whether "err" is nil or not, "err" can be return, so the judgment "err !=nil " is unnecessary
Automatic merge from submit-queue (batch tested with PRs 39001, 39104, 35978, 39361, 39273)
delete SetNodeStatus() function and fix some function notes words
Since we just want to get volume name info, each volume name just need to added once. desiredStateOfWorld.GetVolumesToMount() will return volume and pod binding info,
if one volume is mounted to several pods, the volume name will be return several times. That is not what we want in this function.
We can add a new function to only get the volume name info or judge whether the volume name is added to the desiredVolumesMap array.
drop SetNodeStatus() Since it is never called now. klet.defaultNodeStatusFuncs() is set to klet.setNodeStatusFuncs now and setNodeStatus() function is called by other functions.
Automatic merge from submit-queue
Kubelet: add image ref to ImageService interfaces
This PR adds image ref (digest or ID, depending on runtime) to PullImage result, and pass image ref in CreateContainer instead of image name. It also
* Adds image ref to CRI's PullImageResponse
* Updates related image puller
* Updates related testing utilities
~~One remaining issue is: it breaks some e2e tests because they [checks image repoTags](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/util.go#L1941) while docker always returns digest in this PR. Should we update e2e test or continue to return repoTags in `containerStatuses.image`?~~
Fixes#38833.
Automatic merge from submit-queue (batch tested with PRs 39307, 39300)
kubenet: define KubenetPluginName for all platforms
This PR moved KubenetPluginName to a general file for all platforms.
Fixes#39299.
cc/ @yifan-gu @freehan
Automatic merge from submit-queue
dockertools: don't test linux-specific cases on OSX
There are a few test cases in dockertools are linux-specific. This PR moves them to docker_manager_linux_test.go
Fixes#39183.
Automatic merge from submit-queue (batch tested with PRs 39053, 36446)
CRI: clarify purpose of annotations
Add language to make it explicit that annotations are not to be altered
by runtimes, and should only be used for features that are opaque to the
Kubernetes APIs. Unfortunately there are currently exceptions
introduced in [1][1], but this change makes it clear that they are to be
changed and that no more such semantic-affecting annotations should be
introduced.
In the spirit of the discussion and conclusion in [2][2].
Also captures the link between the annotations returned by various
status queries and those supplied in associated configs.
[1]: https://github.com/kubernetes/kubernetes/pull/34819
[2]: https://github.com/kubernetes/kubernetes/issues/30819#issuecomment-253369441
Automatic merge from submit-queue
Refactor operation_executor to make it testable
**What this PR does / why we need it**:
To refactor operation_executor to make it unit testable
**Release note**:
`NONE`
Add language to make it explicit that annotations are not to be altered
by runtimes, and should only be used for features that are opaque to the
Kubernetes APIs. Unfortunately there are currently exceptions
introduced in [1][1], but this change makes it clear that they are to be
changed and that no more such semantic-affecting annotations should be
introduced.
In the spirit of the discussion and conclusion in [2][2].
Also captures the link between the annotations returned by various
status queries and those supplied in associated configs.
[1]: https://github.com/kubernetes/kubernetes/pull/34819
[2]: https://github.com/kubernetes/kubernetes/issues/30819#issuecomment-253369441
Automatic merge from submit-queue (batch tested with PRs 39079, 38991, 38673)
Support systemd based pod qos in CRI dockershim
This PR makes pod level QoS works for CRI dockershim for systemd based cgroups. And will also fix#36807
- [x] Add cgroupDriver to dockerService and use docker info api to set value for it
- [x] Add a NOTE that detection only works for docker 1.11+, see [CHANGE LOG](https://github.com/docker/docker/blob/master/CHANGELOG.md#1110-2016-04-13)
- [x] Generate cgroupParent in syntax expected by cgroupDriver
- [x] Set cgroupParent to hostConfig for both sandbox and user container
- [x] Check if kubelet conflicts with cgroup driver of docker
cc @derekwaynecarr @vishh
Automatic merge from submit-queue (batch tested with PRs 36888, 38180, 38855, 38590)
wrong pod reference in error message for volume attach timeout
**What this PR does / why we need it**:
when a disk mount times out you get the following error:
```
Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nginx"/"default". list of unattached/unmounted volumes=[data]
```
where the pod is referenced by "podname"/"namespace", but should be "namespace"/"podname".
**Which issue this PR fixes**
no issue number
**Special notes for your reviewer**:
untested :(
Automatic merge from submit-queue
Admit critical pods in the kubelet
Haven't verified in a live cluster yet, just unittested, so applying do-not-merge label.
Automatic merge from submit-queue
Migrated fluentd addon to daemon set
fix#23224
supersedes #23306
``` release-note
Migrated fluentd addon to daemon set
```
Automatic merge from submit-queue (batch tested with PRs 38154, 38502)
Wrong comment to describe docker version
The original comment about minimal docker version fo `room_score_adj` is wrong (though the code is right).
Really sorry for misleading :/
Automatic merge from submit-queue (batch tested with PRs 38154, 38502)
Rename "release_1_5" clientset to just "clientset"
We used to keep multiple releases in the main repo. Now that [client-go](https://github.com/kubernetes/client-go) does the versioning, there is no need to keep releases in the main repo. This PR renames the "release_1_5" clientset to just "clientset", clientset development will be done in this directory.
@kubernetes/sig-api-machinery @deads2k
```release-note
The main repository does not keep multiple releases of clientsets anymore. Please find previous releases at https://github.com/kubernetes/client-go
```
Automatic merge from submit-queue (batch tested with PRs 38689, 38743, 38734, 38430)
apply sandbox network mode based on network plugin
This allows CRI to use docker's network bridge. Can be combined with noop network plugin. This allows to use docker0 with no further configuration. Good for tools like minikube/hyperkube.
Automatic merge from submit-queue
Refactor remotecommand options parsing
Prerequisite to https://github.com/kubernetes/kubernetes/issues/36187 - This separates the options from the request, so they can be pulled from elsewhere.
/cc @liggitt
Automatic merge from submit-queue (batch tested with PRs 38727, 38726, 38347, 38348)
Add 'privileged' to sandbox to indicate if any container might be privileged in it, document privileged
Right now, the privileged flag is this magic thing which does "whatever Docker does". This documents it to make it a little less magic.
In addition, due to how rkt uses `systemd-nspawn` as an outer layer of isolation in creating the sandbox, it's helpful to know beforehand whether the pod will be privileged so additional security options can be applied earlier / applied at all.
I suspect the same indication will be useful for userns since userns should also occur at the pod layer, but it's possible that will be a separate/additional field.
cc @lucab @jonboulle @yujuhong @feiskyer @kubernetes/sig-node
```release-note
NONE
```
Automatic merge from submit-queue
CRI: fix ImageStatus comment
**What this PR does / why we need it**:
GRPC cannot encode `nil` (CRI-O itself panic while trying to encode `nil` for `ImageStatus`). This PR fixes `ImageStatus` comment to say that when the image does not exist the call returns a response having `Image` set to `nil` (instead of saying implementors should return `nil` directly).
/cc @mrunalp @vishh @feiskyer
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Automatic merge from submit-queue (batch tested with PRs 38419, 38457, 38607)
Fix pod level QoS does not works on CRI dockershim
Fixes: https://github.com/kubernetes/kubernetes/issues/38458
We did set `CgroupParent ` in `CreateContainer`, but the `HostConfig.Resources` which `CgroupParent` belongs to is override by the following code:
```
hc.CgroupParent = lc.GetCgroupParent()
...
hc.Resources = dockercontainer.Resources{
Memory: rOpts.GetMemoryLimitInBytes(),
...
}
```
That's why `HostConfig.CgroupParent` is always empty and pod level QoS does not work.
Automatic merge from submit-queue (batch tested with PRs 38453, 36672, 38629, 34966, 38630)
Fix threshold notifier build tags
Fix threshold notifier build tags so the linux version is only built if cgo is
enabled, and the unsupported version is built if it's either not linux or not
cgo.
Fix threshold notifier build tags so the linux version is only built if cgo is
enabled, and the unsupported version is built if it's either not linux or not
cgo.
Automatic merge from submit-queue (batch tested with PRs 38432, 36887, 38415)
Add --image-pull-stuck-timeout option to kubelet
In this PR, add --image-pull-stuck-time option to specify the stuck timeout for pulling image.
When docker extracts image layer, there is no progress. The progress will exceed 1m if the layer is big or system is busy. It happend in our cluster, so I add above option to specify the timeout.
Related error log:
<pre>
[... kube_docker_client.go:29] Cancel pulling image "our_registry/demo/test" because of no progress for 1m0s, latest progress "c914ad57d670": Extracting [==================>] 513.5 MB/513.5MB"
[... docker_manager.go:2254] container start failed: ErrImagePull: net/http: request canceled
</pre>
Automatic merge from submit-queue (batch tested with PRs 36419, 38330, 37718, 38244, 38375)
Kubelet: Add image cache.
Fixes#38373.
This should be patched into 1.5.1 to solve the customer issue.
@yujuhong
/cc @kubernetes/sig-node
Adding the `privileged` bool to the sandbox allows runtimes, like rkt,
to make better security choices in some cases.
This also enumerates what "privileged" actually means and how it
interacts with other options (or more accurately, does not).
The documentation closely matches docker's current behavior because, so
far, that's what privileged has meant.
Automatic merge from submit-queue (batch tested with PRs 38318, 38258)
kernel memcg notification enabled via experimental flag
Kubelet integrates with kernel memcg notification API if and only if enabled via experimental flag.
Automatic merge from submit-queue
add a configuration for kubelet to register as a node with taints
and deprecate --register-schedulable
ref #28687#29178
cc @dchen1107 @davidopp @roberthbailey
Automatic merge from submit-queue (batch tested with PRs 37208, 37446, 37420)
Kubelet log modification
Keep in line with the other error logs in the function.
After return, the caller records the error log.Delete redundant logs
We should use:
nsenter --net=netnsPath -- -F some_command
instend of:
nsenter -n netnsPath -- -F some_command
Because "nsenter -n netnsPath" get an error output:
# nsenter -n /proc/67197/ns/net ip addr
nsenter: neither filename nor target pid supplied for ns/net
If we really want use -n, we need to use -n in such format:
# sudo nsenter -n/proc/67197/ns/net ip addr
Automatic merge from submit-queue
Clean up redundant tests in image_manager_test
There was a lot of overlap between parallel and serialized puller tests,
extracted most of these tests internals to separate functions.
Automatic merge from submit-queue
Function annotation modification
“return kl.pleg.Healthy()”,Based on the return function,"healty" to "healthy" better
Automatic merge from submit-queue
Keep host port socket open for kubenet
fixes#37087
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
When cni is set to kubenet, kubelet should hold the host port socket,
so that other application in this node could not listen/bind this port
any more. However, the sockets are closed accidentally, because
kubelet forget to reconcile the protocol format before comparing.
@kubernetes/sig-network
Automatic merge from submit-queue
RunnningContainerStatues spelling mistake
runtime.go:in the function GetRunningContainerStatuses, runnningContainerStatues spelling mistake, modified into runningContainerStatus
Automatic merge from submit-queue
kubelet: don't reject pods without adding them to the pod manager
kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.
This should fix#37658
Automatic merge from submit-queue
remove checking mount point in cleanupOrphanedPodDirs
To avoid nfs hung problem, remove the mountpoint checking code in
cleanupOrphanedPodDirs(). This removal should still be safe because it checks whether there are still directories under pod's volume and if so, do not delete the pod directory.
Note: After removing the mountpoint check code in cleanupOrphanedPodDirs(), the directories might not be cleaned up in such situation.
1. delete pod, kubelet reconciler tries to unmount the volume directory successfully
2. before reconciler tries to delete the volume directory, kubelet gets retarted
3. since under pod directory, there are still volume directors exist (but not mounted), cleanupOrphanedPodDIrs() will not clean them up.
Will work on a follow up PR to solve above issue.
kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.
Also check if the pod to-be-admitted has terminated or not. In the case where
it has terminated, skip the admission process completely.
Automatic merge from submit-queue
Fix package aliases to follow golang convention
Some package aliases are not not align with golang convention https://blog.golang.org/package-names. This PR fixes them. Also adds a verify script and presubmit checks.
Fixes#35070.
cc/ @timstclair @Random-Liu
When cni is set to kubenet, kubelet should hold the host port socket,
so that other application in this node could not listen/bind this port
any more. However, the sockets are closed accidentally, because
kubelet forget to reconcile the protocol format before comparing.
Automatic merge from submit-queue
kubelet: eviction: add memcg threshold notifier to improve eviction responsiveness
This PR adds the ability for the eviction code to get immediate notification from the kernel when the available memory in the root cgroup falls below a user defined threshold, controlled by setting the `memory.available` siginal with the `--eviction-hard` flag.
This PR by itself, doesn't change anything as the frequency at which new stats can be obtained is currently controlled by the cadvisor housekeeping interval. That being the case, the call to `synchronize()` by the notification loop will very likely get stale stats and not act any more quickly than it does now.
However, whenever cadvisor does get on-demand stat gathering ability, this will improve eviction responsiveness by getting async notification of the root cgroup memory state rather than relying on polling cadvisor.
@vishh @derekwaynecarr @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
fix leaking memory backed volumes of terminated pods
Currently, we allow volumes to remain mounted on the node, even though the pod is terminated. This creates a vector for a malicious user to exhaust memory on the node by creating memory backed volumes containing large files.
This PR removes memory backed volumes (emptyDir w/ medium Memory, secrets, configmaps) of terminated pods from the node.
@saad-ali @derekwaynecarr
Automatic merge from submit-queue
Fix hostname truncate.
Fixes https://github.com/kubernetes/kubernetes/issues/36951.
This PR will keep truncating the hostname until the ending character is valid.
/cc @kubernetes/sig-node
Mark v1.5 because this is a bug fix.
/cc @saad-ali
We have observed that, after failing to create a container due to "device or
resource busy", docker may end up having inconsistent internal state. One
symptom is that docker will not report the existence of the "failed to create"
container, but if kubelet tries to create a new container with the same name,
docker will error out with a naming conflict message.
To work around this, this commit parses the creation error message and if there
is a naming conflict, it would attempt to remove the existing container.
Automatic merge from submit-queue
CRI: add docs for sysctls
#34830 adds `sysctls` features in CRI, it is based on sandbox annotations, this PR adds docs for it.
@yujuhong @timstclair @jonboulle
Automatic merge from submit-queue
CRI: Clarify User in CRI.
Addressed https://github.com/kubernetes/kubernetes/pull/36423#issuecomment-259343135.
This PR clarifies the user related fields in CRI.
One question is that:
What is the meaning of the `run_as_user` field in `LinuxSandboxSecurityContext`?
* **Is it user on the host?** Then it doesn't make sense, user shouldn't care about what users are on the host.
* **Is it user inside the infra container image?** This is how the field is currently used. However, Infra container is docker specific, I'm not sure whether we should expose this in CRI.
* **Is it the default user inside the pod?** It tells runtime that if there is a container (infra container, or some other helper containers like streaming container etc.), if their `user` is not specified, use the default "sandbox user". Then how can we guarantee that infra or helper container image have the `user`?
* **It doesn't make sense?** If we remove it, we are relying on the shim to set right user (maybe always root) for infra or helper containers (if there will be any in the future), I'm not sure whether this is what we expect.
@yujuhong @feiskyer @jonboulle @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
[kubelet] rename --cgroups-per-qos to --experimental-cgroups-per-qos
This reflects the true nature of "cgroups per qos" feature.
```release-note
* Rename `--cgroups-per-qos` to `--experimental-cgroups-per-qos` in Kubelet
```
Automatic merge from submit-queue
fix issue in reconstruct volume data when kubelet restarts
During state reconstruction when kubelet restarts, outerVolueSpecName
cannot be recovered by scanning the disk directories. But this
information is used by volume manager to check whether pod's volume is
mounted or not. There are two possible cases:
1. pod is not deleted during kubelet restarts so that desired state
should have the information. reconciler.updateState() will use this
inforamtion to update.
2. pod is deleted during this period, reconciler has to use
InnerVolumeSpecName, but it should be ok since this information will not
be used for volume cleanup (umount)
Automatic merge from submit-queue
CRI: general grammar/spelling/consistency cleanup
No semantic changes, but a lot of shuffling of docstrings to make things
more consistent. In particular, standardise on the zeroth-article (i.e.
prefer `// Version` to `// The version`) and ending all docstrings with
periods.
(This knowingly conflicts with #36446 and intentionally omits changing the
Annotations field - I'll rebase this or that respectively as necessary.)
During state reconstruction when kubelet restarts, outerVolueSpecName
cannot be recovered by scanning the disk directories. But this
information is used by volume manager to check whether pod's volume is
mounted or not. There are two possible cases:
1. pod is not deleted during kubelet restarts so that desired state
should have the information. reconciler.updateState() will use this
inforamtion to update.
2. pod is deleted during this period, reconciler has to use
InnerVolumeSpecName, but it should be ok since this information will not
be used for volume cleanup (umount)
Automatic merge from submit-queue
Fix getting cgroup pids
Fixes https://github.com/kubernetes/kubernetes/issues/35214, https://github.com/kubernetes/kubernetes/issues/33232
Verified manually, but I didn't have time to run all the e2e's yet (will check it in the morning).
This should be cherry-picked into 1.4, and merged into 1.5 (/cc @saad-ali )
```release-note
Fix fetching pids running in a cgroup, which caused problems with OOM score adjustments & setting the /system cgroup ("misc" in the summary API).
```
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add e2e node test for log path
fixes#34661
A node e2e test to check if container logs files are properly created with right content.
Since the log files under `/var/log/containers` are actually symbolic of docker containers log files, we can not use a pod to mount them in and do check (symbolic doesn't supported by docker volume).
cc @Random-Liu
Automatic merge from submit-queue
Use indirect streaming path for remote CRI shim
Last step for https://github.com/kubernetes/kubernetes/issues/29579
- Wire through the remote indirect streaming methods in the docker remote shim
- Add the docker streaming server as a handler at `<node>:10250/cri/{exec,attach,portforward}`
- Disable legacy streaming for dockershim
Note: This requires PR https://github.com/kubernetes/kubernetes/pull/34987 to work.
Tested manually on an E2E cluster.
/cc @euank @feiskyer @kubernetes/sig-node
Automatic merge from submit-queue
Cleanup kubelet eviction manager tests
It cleans up kubelet eviction manager tests
Extracted parts of tests that were similar to each other to functions
No semantic changes, but a lot of shuffling of docstrings to make things
more consistent. In particular, standardise on the zeroth-article (i.e.
prefer `// Version` to `// The version`) and ending all docstrings with
periods.
Provides an opt-in flag, --experimental-fail-swap-on (and corresponding
KubeletConfiguration value, ExperimentalFailSwapOn), which is false by default.
Automatic merge from submit-queue
CRI: Add security context for sandbox/container
Part of #29478. This PR
- adds security context for sandbox and fixes#33139
- encaps container security context to `SecurityContext` and adds missing features
- Note that capability is not fully accomplished in this PR because it is under discussion at #33614.
cc/ @yujuhong @yifan-gu @Random-Liu @kubernetes/sig-node
This allows us to interrupt/kill the executed command if it exceeds the
timeout (not implemented by this commit).
Set timeout in Exec probes. HTTPGet and TCPSocket probes respect the
timeout, while Exec probes used to ignore it.
Add e2e test for exec probe with timeout. However, the test is skipped
while the default exec handler doesn't support timeouts.
Automatic merge from submit-queue
kubelet bootstrap: start hostNetwork pods before we have PodCIDR
Network readiness was checked in the pod admission phase, but pods that
fail admission are not retried. Move the check to the pod start phase.
Issue #35409
Issue #35521
Automatic merge from submit-queue
CRI: rearrange kubelet rutnime initialization
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
Automatic merge from submit-queue
CRI: Add Status into CRI.
For https://github.com/kubernetes/kubernetes/issues/35701.
Fixes https://github.com/kubernetes/kubernetes/issues/35701.
This PR added a `Status` call in CRI, and the `RuntimeStatus` is defined as following:
``` protobuf
message RuntimeCondition {
// Type of runtime condition.
optional string type = 1;
// Status of the condition, one of true/false.
optional bool status = 2;
// Brief reason for the condition's last transition.
optional string reason = 3;
// Human readable message indicating details about last transition.
optional string message = 4;
}
message RuntimeStatus {
// Conditions is an array of current observed runtime conditions.
repeated RuntimeCondition conditions = 1;
}
```
Currently, only `conditions` is included in `RuntimeStatus`, and the definition is almost the same with `NodeCondition` and `PodCondition` in K8s api.
@yujuhong @feiskyer @bprashanth If this makes sense, I'll send a follow up PR to let dockershim return `RuntimeStatus` and let kubelet make use of it.
@yifan-gu @euank Does this make sense to rkt?
/cc @kubernetes/sig-node
Automatic merge from submit-queue
We only report diskpressure to users, and no longer report inodepressure
See #36180 for more information on why #33218 was reverted.
Automatic merge from submit-queue
CRI: stop sandbox before removing it
Stopping a sandbox includes reclaiming the network resources. By always
stopping the sandbox before removing it, we reduce the possibility of leaking
resources in some corner cases.
Automatic merge from submit-queue
Remove GetRootContext method from VolumeHost interface
Remove the `GetRootContext` call from the `VolumeHost` interface, since Kubernetes no longer needs to know the SELinux context of the Kubelet directory.
Per #33951 and #35127.
Depends on #33663; only the last commit is relevant to this PR.
Automatic merge from submit-queue
Initial work on running windows containers on Kubernetes
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
This is the first stab at getting the Kubelet running on Windows (fixes#30279), and getting it to deploy network-accessible pods that consist of Windows containers. Thanks @csrwng, @jbhurat for helping out.
The main challenge with Windows containers at this point is that container networking is not supported. In other words, each container in the pod will get it's own IP address. For this reason, we had to make a couple of changes to the kubelet when it comes to setting the pod's IP in the Pod Status. Instead of using the infra-container's IP, we use the IP address of the first container.
Other approaches we investigated involved "disabling" the infra container, either conditionally on `runtime.GOOS` or having a separate windows-docker container runtime that re-implemented some of the methods (would require some refactoring to avoid maintainability nightmare).
Other changes:
- The default docker endpoint was removed. This results in the docker client using the default for the specific underlying OS.
More detailed documentation on how to setup the Windows kubelet can be found at https://docs.google.com/document/d/1IjwqpwuRdwcuWXuPSxP-uIz0eoJNfAJ9MWwfY20uH3Q.
cc: @ikester @brendandburns @jstarks
Automatic merge from submit-queue
Don't add duplicate Hostname address
If the cloudprovider returned an address of type Hostname, we shouldn't
add a duplicate one.
Fixes#36234
Automatic merge from submit-queue
Per Volume Inode Accounting
Collects volume inode stats using the same find command as cadvisor. The command is "find _path_ -xdev -printf '.' | wc -c". The output is passed to the summary api, and will be consumed by the eviction manager.
This cannot be merged yet, as it depends on changes adding the InodesUsed field to the summary api, and the eviction manager consuming this. Expect tests to fail until this happens.
DEPENDS ON #35137
Automatic merge from submit-queue
[AppArmor] Hold bad AppArmor pods in pending rather than rejecting
Fixes https://github.com/kubernetes/kubernetes/issues/32837
Overview of the fix:
If the Kubelet needs to reject a Pod for a reason that the control plane doesn't understand (e.g. which AppArmor profiles are installed on the node), then it might contiinuously try to run the pod on the same rejecting node. This change adds a concept of "soft rejection", in which the Pod is admitted, but not allowed to run (and therefore held in a pending state). This prevents the pod from being retried on other nodes, but also prevents the high churn. This is consistent with how other missing local resources (e.g. volumes) is handled.
A side effect of the change is that Pods which are not initially runnable will be retried. This is desired behavior since it avoids a race condition when a new node is brought up but the AppArmor profiles have not yet been loaded on it.
``` release-note
Pods with invalid AppArmor configurations will be held in a Pending state, rather than rejected (failed). Check the pod status message to find out why it is not running.
```
@kubernetes/sig-node @timothysc @rrati @davidopp
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
Automatic merge from submit-queue
Separate Direct and Indirect streaming paths, implement indirect path for CRI
This PR refactors the `pkg/kubelet/container.Runtime` interface to remove the `ExecInContainer`, `PortForward` and `AttachContainer` methods. Instead, those methods are part of the `DirectStreamingRuntime` interface which all "legacy" runtimes implement. I also added an `IndirectStreamingRuntime` which handles the redirect path and is implemented by CRI runtimes. To control the size of this PR, I did not fully setup the indirect streaming path for the dockershim, so I left legacy path behind.
Most of this PR is moving & renaming associated with the refactoring. To understand the functional changes, I suggest tracing the code from `getExec` in `pkg/kubelet/server/server.go`, which calls `GetExec` in `pkg/kubelet/kubelet_pods.go` to determine whether to follow the direct or indirect path.
For https://github.com/kubernetes/kubernetes/issues/29579
/cc @kubernetes/sig-node
Automatic merge from submit-queue
CRI: Add devices to ContainerConfig
This PR adds devices to ContainerConfig and adds experimental GPU support.
cc/ @yujuhong @Hui-Zhi @vishh @kubernetes/sig-node
Stopping a sandbox includes reclaiming the network resources. By always
stopping the sandbox before removing it, we reduce the possibility of leaking
resources in some corner cases.
Automatic merge from submit-queue
Populate Node.Status.Addresses with Hostname
This PR is supposed to address #22063
Currently `NodeName` has to be a resolvable dns address on the master to allow apiserver -> kubelet communication (exec, log, port-forward operations on a pod). In some situations this is unfortunate (see the discussions on the issue).
The PR aims to do the following:
- Populate the `Type: Hostname` in the `Node.Status.Addresses` array, the type is already defined, but was not used so far.
- Add logic to resolve a Node's Hostname when the apiserver initiates communication with the Kubelet, instead of using the Nodename string as Hostname.
```release-note
The hostname of the node (as autodetected by the kubelet, specified via --hostname-override, or determined by the cloudprovider) is now recorded as an address of type "Hostname" in the status of the Node API object. The hostname is expected to be resolveable from the apiserver.
```
Automatic merge from submit-queue
pod and qos level cgroup support
```release-note
[Kubelet] Add alpha support for `--cgroups-per-qos` using the configured `--cgroup-driver`. Disabled by default.
```
Automatic merge from submit-queue
CRI: Handle empty container name in dockershim.
Fixes https://github.com/kubernetes/kubernetes/issues/35924.
Dead container may have no name, we should handle this properly.
@yujuhong @bprashanth
Automatic merge from submit-queue
CRI: Add kuberuntime container logs
Based on https://github.com/kubernetes/kubernetes/pull/34858.
The first 2 commits are from #34858. And the last 2 commits are new.
This PR added kuberuntime container logs support and add unit test for it.
I've tested all the functions manually, and I'll send another PR to write a node e2e test for container log.
**_Notice: current implementation doesn't support log rotation**_, which means that:
- It will not retrieve logs in rotated log file.
- If log rotation happens when following the log:
- If the rotation is using create mode, we'll still follow the old file.
- If the rotation is using copytruncate, we'll be reading at the original position and get nothing.
To solve these issues, kubelet needs to rotate the log itself, or at least kubelet should be able to control the the behavior of log rotator. These are doable but out of the scope of 1.5 and will be addressed in future release.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
CRI: Rename container/sandbox states
The enum constants are not namespaced. The shorter, unspecifc names are likely
to cause naming conflicts in the future.
Also replace "SandBox" with "Sandbox" in the API for consistency.
/cc @kubernetes/sig-node
The enum constants are not namespaced. The shorter, unspecifc names are likely
to cause naming conflicts in the future.
Also replace "SandBox" with "Sandbox" in the API.
This change add a container manager inside the dockershim to move docker daemon
and associated processes to a specified cgroup. The original kubelet container
manager will continue checking the name of the cgroup, so that kubelet know how
to report runtime stats.
Automatic merge from submit-queue
Eviction manager evicts based on inode consumption
Fixes: #32526 Integrate Cadvisor per-container inode stats into the summary api. Make the eviction manager act based on inode consumption to evict pods using the most inodes.
This PR is pending on a cadvisor godeps update which will be included in PR #35136
Automatic merge from submit-queue
Only set sysctls for infra containers
We did set the sysctls for each container in a pod. This opens up a way to set un-whitelisted sysctls during upgrade from v1.3:
- set annotation in v1.3 with an un-whitelisted sysctl. Set restartPolicy=Always
- upgrade cluster to v1.4
- kill container process
- un-whitelisted sysctl is set on restart of the killed container.
Automatic merge from submit-queue
SELinux Overhaul
Overhauls handling of SELinux in Kubernetes. TLDR: Kubelet dir no longer has to be labeled `svirt_sandbox_file_t`.
Fixes#33351 and #33510. Implements #33951.
Automatic merge from submit-queue
Implement streaming CRI methods in dockershim
*NOTE: Temporarily includes commit from https://github.com/kubernetes/kubernetes/pull/35330 - only review the second commit.*
Builds on https://github.com/kubernetes/kubernetes/pull/35330, using the library to implement the streaming methods in various CRI shims.
This does not actually wire up the new streaming methods in the kubelet (that will be my next PR). Once the new methods are wired up, I will delete the `Legacy{Exec,Attach,PortForward}` methods.
/cc @kubernetes/sig-node @feiskyer
Automatic merge from submit-queue
Simplify negotiation in server in preparation for multi version support
This is a pre-factor for #33900 to simplify runtime.NegotiatedSerializer, tighten up a few abstractions that may break when clients can request different client versions, and pave the way for better negotiation.
View this as pure simplification.
Automatic merge from submit-queue
Fix cadvisor_unsupported and the crossbuild
Resolves a bug in the `cadvisor_unsupported.go` code.
Fixes https://github.com/kubernetes/kubernetes/issues/35735
Introduced by: https://github.com/kubernetes/kubernetes/pull/35136
We should consider to cherrypick this as #35136 also was cherrypicked
cc @kubernetes/sig-testing @vishh @dashpole @jessfraz
```release-note
Fix cadvisor_unsupported and the crossbuild
```
Automatic merge from submit-queue
[PHASE 1] Opaque integer resource accounting.
## [PHASE 1] Opaque integer resource accounting.
This change provides a simple way to advertise some amount of arbitrary countable resource for a node in a Kubernetes cluster. Users can consume these resources by including them in pod specs, and the scheduler takes them into account when placing pods on nodes. See the example at the bottom of the PR description for more info.
Summary of changes:
- Defines opaque integer resources as any resource with prefix `pod.alpha.kubernetes.io/opaque-int-resource-`.
- Prevent kubelet from overwriting capacity.
- Handle opaque resources in scheduler.
- Validate integer-ness of opaque int quantities in API server.
- Tests for above.
Feature issue: https://github.com/kubernetes/features/issues/76
Design: http://goo.gl/IoKYP1
Issues:
kubernetes/kubernetes#28312kubernetes/kubernetes#19082
Related:
kubernetes/kubernetes#19080
CC @davidopp @timothysc @balajismaniam
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Added support for accounting opaque integer resources.
Allows cluster operators to advertise new node-level resources that would be
otherwise unknown to Kubernetes. Users can consume these resources in pod
specs just like CPU and memory. The scheduler takes care of the resource
accounting so that no more than the available amount is simultaneously
allocated to pods.
```
## Usage example
```sh
$ echo '[{"op": "add", "path": "pod.alpha.kubernetes.io~1opaque-int-resource-bananas", "value": "555"}]' | \
> http PATCH http://localhost:8080/api/v1/nodes/localhost.localdomain/status \
> Content-Type:application/json-patch+json
```
```http
HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 11 Aug 2016 16:44:55 GMT
Transfer-Encoding: chunked
{
"apiVersion": "v1",
"kind": "Node",
"metadata": {
"annotations": {
"volumes.kubernetes.io/controller-managed-attach-detach": "true"
},
"creationTimestamp": "2016-07-12T04:07:43Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/hostname": "localhost.localdomain"
},
"name": "localhost.localdomain",
"resourceVersion": "12837",
"selfLink": "/api/v1/nodes/localhost.localdomain/status",
"uid": "2ee9ea1c-47e6-11e6-9fb4-525400659b2e"
},
"spec": {
"externalID": "localhost.localdomain"
},
"status": {
"addresses": [
{
"address": "10.0.2.15",
"type": "LegacyHostIP"
},
{
"address": "10.0.2.15",
"type": "InternalIP"
}
],
"allocatable": {
"alpha.kubernetes.io/nvidia-gpu": "0",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
},
"capacity": {
"alpha.kubernetes.io/nvidia-gpu": "0",
"pod.alpha.kubernetes.io/opaque-int-resource-bananas": "555",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
},
"conditions": [
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-07-12T04:07:43Z",
"message": "kubelet has sufficient disk space available",
"reason": "KubeletHasSufficientDisk",
"status": "False",
"type": "OutOfDisk"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-07-12T04:07:43Z",
"message": "kubelet has sufficient memory available",
"reason": "KubeletHasSufficientMemory",
"status": "False",
"type": "MemoryPressure"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-08-10T06:27:11Z",
"message": "kubelet is posting ready status",
"reason": "KubeletReady",
"status": "True",
"type": "Ready"
},
{
"lastHeartbeatTime": "2016-08-11T16:44:47Z",
"lastTransitionTime": "2016-08-10T06:27:01Z",
"message": "kubelet has no disk pressure",
"reason": "KubeletHasNoDiskPressure",
"status": "False",
"type": "DiskPressure"
}
],
"daemonEndpoints": {
"kubeletEndpoint": {
"Port": 10250
}
},
"images": [],
"nodeInfo": {
"architecture": "amd64",
"bootID": "1f7e95ca-a4c2-490e-8ca2-6621ae1eb5f0",
"containerRuntimeVersion": "docker://1.10.3",
"kernelVersion": "4.5.7-202.fc23.x86_64",
"kubeProxyVersion": "v1.3.0-alpha.4.4285+7e4b86c96110d3-dirty",
"kubeletVersion": "v1.3.0-alpha.4.4285+7e4b86c96110d3-dirty",
"machineID": "cac4063395254bc89d06af5d05322453",
"operatingSystem": "linux",
"osImage": "Fedora 23 (Cloud Edition)",
"systemUUID": "D6EE0782-5DEB-4465-B35D-E54190C5EE96"
}
}
}
```
After patching, the kubelet's next sync fills in allocatable:
```
$ kubectl get node localhost.localdomain -o json | jq .status.allocatable
```
```json
{
"alpha.kubernetes.io/nvidia-gpu": "0",
"pod.alpha.kubernetes.io/opaque-int-resource-bananas": "555",
"cpu": "2",
"memory": "8175808Ki",
"pods": "110"
}
```
Create two pods, one that needs a single banana and another that needs a truck load:
```
$ kubectl create -f chimp.yaml
$ kubectl create -f superchimp.yaml
```
Inspect the scheduler result and pod status:
```
$ kubectl describe pods chimp
Name: chimp
Namespace: default
Node: localhost.localdomain/10.0.2.15
Start Time: Thu, 11 Aug 2016 19:58:46 +0000
Labels: <none>
Status: Running
IP: 172.17.0.2
Controllers: <none>
Containers:
nginx:
Container ID: docker://46ff268f2f9217c59cc49f97cc4f0f085d5ac0e251f508cc08938601117c0cec
Image: nginx:1.10
Image ID: docker://sha256:82e97a2b0390a20107ab1310dea17f539ff6034438099384998fd91fc540b128
Port: 80/TCP
Limits:
cpu: 500m
memory: 64Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 3
Requests:
cpu: 250m
memory: 32Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 1
State: Running
Started: Thu, 11 Aug 2016 19:58:51 +0000
Ready: True
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
No volumes.
QoS Class: Burstable
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
9m 9m 1 {default-scheduler } Normal Scheduled Successfully assigned chimp to localhost.localdomain
9m 9m 2 {kubelet localhost.localdomain} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Pulled Container image "nginx:1.10" already present on machine
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Created Created container with docker id 46ff268f2f92
9m 9m 1 {kubelet localhost.localdomain} spec.containers{nginx} Normal Started Started container with docker id 46ff268f2f92
```
```
$ kubectl describe pods superchimp
Name: superchimp
Namespace: default
Node: /
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
nginx:
Image: nginx:1.10
Port: 80/TCP
Requests:
cpu: 250m
memory: 32Mi
pod.alpha.kubernetes.io/opaque-int-resource-bananas: 10Ki
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
PodScheduled False
No volumes.
QoS Class: Burstable
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
3m 1s 15 {default-scheduler } Warning FailedScheduling pod (superchimp) failed to fit in any node
fit failure on node (localhost.localdomain): Insufficient pod.alpha.kubernetes.io/opaque-int-resource-bananas
```
- Prevents kubelet from overwriting capacity during sync.
- Handles opaque integer resources in the scheduler.
- Adds scheduler predicate tests for opaque resources.
- Validates opaque int resources:
- Ensures supplied opaque int quantities in node capacity,
node allocatable, pod request and pod limit are integers.
- Adds tests for new validation logic (node update and pod spec).
- Added e2e tests for opaque integer resources.
Alter how runtime.SerializeInfo is represented to simplify negotiation
and reduce the need to allocate during negotiation. Simplify the dynamic
client's logic around negotiating type. Add more tests for media type
handling where necessary.
Automatic merge from submit-queue
First pass at CRI stream server library implementation
This is a first pass at implementing a library for serving attach/exec/portforward calls from a CRI shim process as discussed in [CRI Streaming Requests](https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit#).
Remaining library work:
- implement authn/z
- implement `stayUp=false`, a.k.a. auto-stop the server once all connections are closed
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add sysctls for dockershim
This PR adds sysctls support for dockershim. All sysctls e2e tests are passed in my local settings.
Note that sysctls runtimeAdmit is not included in this PR, it is addressed in #32803.
cc/ @yujuhong @Random-Liu
Automatic merge from submit-queue
Fix devices information struct in container
So far nowhere use the ```Devices``` which in ```RunContainerOptions```. But when I want to use it, found that it could be better if change it, because Devices in container is like:
```json
"Devices": [
{
"PathOnHost": "/dev/nvidiactl",
"PathInContainer": "/dev/nvidiactl",
"CgroupPermissions": "mrw"
},
{
"PathOnHost": "/dev/nvidia-uvm",
"PathInContainer": "/dev/nvidia-uvm",
"CgroupPermissions": "mrw"
},
{
"PathOnHost": "/dev/nvidia0",
"PathInContainer": "/dev/nvidia0",
"CgroupPermissions": "mrw"
}
],
```
Automatic merge from submit-queue
CRI: Instrumented cri service
For https://github.com/kubernetes/kubernetes/issues/29478.
This PR added instrumented CRI service. Because we are adding the instrumented wrapper inside kuberuntime, it should work for both grpc and non-grpc integration.
This will be useful to compare latency difference between grpc and non-grpc integration, although there shouldn't be too much difference.
@yujuhong @feiskyer
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Refactor PortForward server methods into the portforward package
Refactor PortForward code into it's own package so it can be reused in the CRI streaming library without pulling in lots of extra dependencies.
This is a straightforward move. Nothing is changed other than a few references to the package.
Automatic merge from submit-queue
Fix volume states out of sync problem after kubelet restarts
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.
This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)
The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.
Detailed design proposal is #33203
Automatic merge from submit-queue
CRI: Add dockershim grpc server.
This PR adds a in-process grpc server for dockershim.
Flags change:
1. `container-runtime` will not be automatically set to remote when `container-runtime-endpoint` is set. @feiskyer
2. set kubelet flag `--experimental-runtime-integration-type=remote --container-runtime-endpoint=UNIX_SOCKET_FILE_PATH` to enable the in-process dockershim grpc server.
3. set node e2e test flag `--runtime-integration-type=remote -container-runtime-endpoint=UNIX_SOCKET_FILE_PATH` to run node e2e test against in-process dockershim grpc server.
I've run node e2e test against the remote cri integration, tests which don't rely on stream and log functions can pass.
This unblocks the following work:
1) CRI conformance test.
2) Performance comparison between in-process integration and in-process grpc integration.
@yujuhong @feiskyer
/cc @kubernetes/sig-node
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.
This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)
The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.
Details are in proposal PR #33203
Automatic merge from submit-queue
Do not log stack trace for the error http.StatusBadRequest (400).
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
This PR fixes an issue where stack trace is being logged in kubelet when the status http.StatusBadRequest occurs.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
Automatic merge from submit-queue
Use the rawTerminal setting from the container itself
**What this PR does / why we need it**:
Checks whether the container is set for rawTerminal connection and uses the appropriate connection.
Prevents the output `Error from server: Unrecognized input header` when doing `kubectl run`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
helps with case 1 in #28695, resolves#30159
**Special notes for your reviewer**:
**Release note**:
```
release-note-none
```
Automatic merge from submit-queue
CRI: Refactor kuberuntime unit test
Based on https://github.com/kubernetes/kubernetes/pull/34858
This PR:
1) Refactor the fake runtime service and some kuberuntime unit test.
2) Add better garbage collection unit test.
3) Fix init container unit test which isn't testing correctly. Some other unit tests may also need to be fixed.
4) Add pod log directory garbage collection unit test.
@feiskyer @yujuhong
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Kubelet getting node from apiserver cache before update.
This is blocked on #35218 (however it's ready for review).
It seems to visibly reduce the apiserver metrics (and I didn't observe higher number of conflicts even in 2000-node kubemark).
Automatic merge from submit-queue
Create restclient interface
Refactoring of code to allow replace *restclient.RESTClient with any RESTClient implementation that implements restclient.RESTClientInterface interface.
Automatic merge from submit-queue
CRI: Handle container/sandbox restarts for pod with RestartPolicy == …
If all sandbox and containers are dead in a pod, and the restart policy is
"Never", kubelet should not try to recreate all of them.
Automatic merge from submit-queue
Return an empty network namespace path for exited infra containers
If the infra container has already terminated, `docker inspect` will report
pid 0. The path constructed using the pid to check the network namespace of
the process will be invalid. This commit changes docker to report an empty
path to stop kubenet from erroring out whenever TearDown is called on an
exited infra container.
This is not a fix for all the plugins, as some plugins may require the actual
network namespace to tear down properly.
If the infra container has already terminated, `docker inspect` will report
pid 0. The path constructed using the pid to check the network namespace of
the process will be invalid. This commit changes docker to report an empty
path to stop kubenet from erroring out whenever TearDown is called on an
exited infra container.
This is not a fix for all the plugins, as some plugins may require the actual
network namespace to tear down properly.
Automatic merge from submit-queue
rkt: Convert image name to be a valid acidentifier
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Fix a bug under the rkt runtime whereby image-registries with ports would not be fetched from
```
This fixes a bug whereby an image reference that included a port was not
recognized after being downloaded, and so could not be run
This is the quick-and-simple fix. In the longer term, we'll want to refactor image logic a bit more to handle the many special cases that the current code does not, mostly related to library images on dockerhub.
/cc @yifan-gu @kubernetes/sig-rktnetes
Automatic merge from submit-queue
WIP: Remove the legacy networking mode
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
Removes the deprecated configure-cbr0 flag and networking mode to avoid having untested and maybe unstable code in kubelet, see: #33789
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#30589fixes#31937
**Special notes for your reviewer**: There are a lot of deployments who rely on this networking mode. Not sure how we deal with that: force switch to kubenet or just delete the old deployment?
But please review the code changes first (the first commit)
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Removed the deprecated kubelet --configure-cbr0 flag, and with that the "classic" networking mode as well
```
PTAL @kubernetes/sig-network @kubernetes/sig-node @mikedanese
Automatic merge from submit-queue
Remove static kubelet client, refactor ConnectionInfoGetter
Follow up to https://github.com/kubernetes/kubernetes/pull/33718
* Collapses the multi-valued return to a `ConnectionInfo` struct
* Removes the "raw" connection info method and interface, since it was only used in a single non-test location (by the "real" connection info method)
* Disentangles the node REST object from being a ConnectionInfoProvider itself by extracting an implementation of ConnectionInfoProvider that takes a node (using a provided NodeGetter) and determines ConnectionInfo
* Plumbs the KubeletClientConfig to the point where we construct the helper object that combines the config and the node lookup. I anticipate adding a preference order for choosing an address type in https://github.com/kubernetes/kubernetes/pull/34259
Automatic merge from submit-queue
kubelet: storage: don't hang kubelet on unresponsive nfs
Fixes#31272
Currently, due to the nature of nfs, an unresponsive nfs volume in a pod can wedge the kubelet such that additional pods can not be run.
The discussion thus far surrounding this issue was to wrap the `lstat`, the syscall that ends up hanging in uninterruptible sleep, in a goroutine and limiting the number of goroutines that hang to one per-pod per-volume.
However, in my investigation, I found that the callsites that request a listing of the volumes from a particular volume plugin directory don't care anything about the properties provided by the `lstat` call. They only care about whether or not a directory exists.
Given that constraint, this PR just avoids the `lstat` call by using `Readdirnames()` instead of `ReadDir()` or `ReadDirNoExit()`
### More detail for reviewers
Consider the pod mounted nfs volume at `/var/lib/kubelet/pods/881341b5-9551-11e6-af4c-fa163e815edd/volumes/kubernetes.io~nfs/myvol`. The kubelet wedges because when we do a `ReadDir()` or `ReadDirNoExit()` it calls `syscall.Lstat` on `myvol` which requires communication with the nfs server. If the nfs server is unreachable, this call hangs forever.
However, for our code, we only care what about the names of files/directory contained in `kubernetes.io~nfs` directory, not any of the more detailed information the `Lstat` call provides. Getting the names can be done with `Readdirnames()`, which doesn't need to involve the nfs server.
@pmorie @eparis @ncdc @derekwaynecarr @saad-ali @thockin @vishh @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Fix edge case in qos evaluation
If a pod has a container C1 and C2, where sum(C1.requests, C2.requests) equals (C1.Limits), the code was reporting that the pod had "Guaranteed" qos, when it should have been Burstable.
/cc @vishh @dchen1107
Automatic merge from submit-queue
Log more information on pod status updates
Also bump the logging level to V2 so that we can see them in a non-test
cluster.
Automatic merge from submit-queue
add UpdateRuntimeConfig interface
Expose UpdateRuntimeConfig interface in RuntimeService for kubelet to pass a set of configurations to runtime. Currently it only takes PodCIDR.
The use case is for kubelet to pass configs to runtime. Kubelet holds some config/information which runtime does not have, such as PodCIDR. I expect some of kubelet configurations will gradually move to runtime, but I believe cases like PodCIDR, which dynamically assigned by k8s master, need to stay for a while.
Automatic merge from submit-queue
Allow kuberuntime to get network namespace for not ready sandboxes
Kubelet calls TearDownPod to clean up the network resources for a pod sandbox.
TearDownPod relies on GetNetNS to retrieve network namespace, and the current
implementation makes this impossible for not-ready sandboxes. This change
removes the unnecessary filter to fix this issue.
Kubelet calls TearDownPod to clean up the network resources for a pod sandbox.
TearDownPod relies on GetNetNS to retrieve network namespace, and the current
implementation makes this impossible for not-ready sandboxes. This change
removes the unnecessary filter to fix this issue.
Automatic merge from submit-queue
CRI: Image pullable support in dockershim
For #33189.
The new test `ImageID should be set to the manifest digest (from RepoDigests) when available` introduced in #33014 is failing, because:
1) `docker-pullable://` conversion is not supported in dockershim;
2) `kuberuntime` and `dockershim` is using `ListImages with image name filter` to check whether image presents. However, `ListImages` doesn't support filter with `digest`.
This PR:
1) Change `kuberuntime.IsImagePresent` to use `runtime.ImageStatus` and `dockershim.InspectImage` instead. ***Notice an API change: `ImageStatus` should return `(nil, nil)` for non-existing image.***
2) Add `docker-pullable://` support.
3) Fix `RemoveImage` in dockershim https://github.com/kubernetes/kubernetes/pull/29316.
I've tried myself, the test can pass now.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add version cache for cri APIVersion
ref https://github.com/kubernetes/kubernetes/issues/29478
1. Added a version cache for `APIVersion()` by using object cache., with ttl=1 min
2. Leaving `Version()` as it is today
Automatic merge from submit-queue
Update godeps for libcontainer+cadvisor
Needed to unblock more progress on pod cgroup.
/cc @vishh @dchen1107 @timstclair
Automatic merge from submit-queue
Kubelet: Use RepoDigest for ImageID when available
```release-note
Use manifest digest (as `docker-pullable://`) as ImageID when available (exposes a canonical, pullable image ID for containers).
```
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Related to #32159
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Previously, the `InspectImage` method of the Docker interface expected a
"pullable" image ref (name, tag, or manifest digest). If you tried to
inspect an image by its ID (config digest), the inspect would fail to
validate the image against the input identifier. This commit changes
the original method to be named `InspectImageByRef`, and introduces a
new method called `InspectImageByID` which validates that the input
identifier was an image ID.
Automatic merge from submit-queue
Use nodeutil.GetHostIP consistently when talking to nodes
Most of our communications from apiserver -> nodes used
nodutil.GetNodeHostIP, but a few places didn't - and this meant that the
node name needed to be resolvable _and_ we needed to populate valid IP
addresses.
```release-note
The apiserver now uses addresses reported by the kubelet in the Node object's status for apiserver->kubelet communications, rather than the name of the Node object. The address type used defaults to `InternalIP`, `ExternalIP`, and `LegacyHostIP` address types, in that order.
```
Automatic merge from submit-queue
Add sandbox gc minage
Fixes https://github.com/kubernetes/kubernetes/issues/34272.
Fixes https://github.com/kubernetes/kubernetes/issues/33984.
This PR:
1) Change the `GetPodStatus` to get statuses of all containers in a pod instead of only containers belonging to existing sandboxes. This is because sandbox may be removed by GC or by users, kubelet should be able to deal with this case.
2) Change the CRI comment to clarify the timestamp unit (nanosecond).
2) Add MinAge for sandbox GC Policy.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
remove testapi.Default.GroupVersion
I'm going to try to take this as a series of mechanicals. This removes `testapi.Default.GroupVersion()` and replaces it with `registered.GroupOrDie(api.GroupName).GroupVersion`.
@caesarxuchao I'm trying to see how much of `pkg/api/testapi` I can remove.
Automatic merge from submit-queue
Revert "Add kubelet awareness to taint tolerant match caculator."
Reverts kubernetes/kubernetes#26501
Original PR was not fully reviewed by @kubernetes/sig-node
cc/ @timothysc @resouer
Automatic merge from submit-queue
Kubelet: Use RepoDigest for ImageID when available
**Release note**:
```release-note
Use manifest digest (as `docker-pullable://`) as ImageID when available (exposes a canonical, pullable image ID for containers).
```
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Related to #32159
Automatic merge from submit-queue
Add kubelet awareness to taint tolerant match caculator.
Add kubelet awareness to taint tolerant match caculator.
Ref: #25320
This is required by `TaintEffectNoScheduleNoAdmit` & `TaintEffectNoScheduleNoAdmitNoExecute `, so that node will know if it should expect the taint&tolerant
Automatic merge from submit-queue
Refactor: separate KubeletClient & ConnectionInfoGetter concepts
KubeletClient implements ConnectionInfoGetter, but it is not a complete
implementation: it does not set the kubelet port from the node record,
for example.
By renaming the method so that it does not implement the interface, we
are able to cleanly see where the "raw" GetConnectionInfo is used (it is
correct) and also have go type-checking enforce this for us.
This is related to #25532; I wanted to satisfy myself that what we were doing there was correct, and I wanted also to ensure that the compiler could enforce this going forwards.
Automatic merge from submit-queue
Add node event for container/image GC failure
Follow up to #31988. Add an event for a node when container/image GC fails.
Automatic merge from submit-queue
Fix nil pointer issue when getting metrics from volume mounter
Currently it is possible that the mounter object stored in Mounted
Volume data structure in the actual state of kubelet volume manager is
nil if this information is recovered from state sync process. This will
cause nil pointer issue when calculating stats in volume_stat_calculator.
A quick fix is to not return the volume if its mounter is nil. A more
complete fix is to also recover mounter object when reconstructing the
volume data structure which will be addressed in PR #33616
Currently it is possible that the mounter object stored in Mounted
Volume data structure in the actual state of kubelet volume manager is
nil if this information is recovered from state sync process. This will
cause nil pointer issue when calculating stats in volume_stat_calculator.
A quick fix is to not return the volume if its mounter is nil. A more
complete fix is to also recover mounter object when reconstructing the
volume data structure which will be addressed in PR #33616
Automatic merge from submit-queue
kubelet: eviction: avoid duplicate action on stale stats
Currently, the eviction code can be overly aggressive when synchronize() is called two (or more) times before a particular stat has been recollected by cadvisor. The eviction manager will take additional action based on information for which it has already taken actions.
This PR provides a method for the eviction manager to track the timestamp of the last obversation and not take action if the stat has not been updated since the last time synchronize() was run.
@derekwaynecarr @vishh @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
kubelet: eviction: allow minimum reclaim as percentage
Fixes#33354
xref #32537
**Release note**:
```release-note
The kubelet --eviction-minimum-reclaim option can now take precentages as well as absolute values for resources quantities
```
@derekwaynecarr @vishh @mtaufen
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Automatic merge from submit-queue
CRI: Remove the mount name and port name.
Per discussion on https://github.com/kubernetes/kubernetes/issues/33873.
Currently the mount name is not being used and also involves some
incorrect usage (sometimes it's referencing a mount name, sometimes
it's referecing a volume name), so we decide to remove it from CRI.
The port name is also not used, so remove it as well.
Fix#33873Fix#33526
/cc @kubernetes/sig-node @kubernetes/sig-rktnetes
Automatic merge from submit-queue
CRI: Implement temporary ImageStats in kuberuntime_manager
For #33048 and #33189.
This PR:
1) Implement a temporary `ImageStats` in kuberuntime_manager.go
2) Add container name label on infra container to make the current summary api logic work with dockershim.
I run the summary api test locally and it passed for me. Notice that the original summary api test is not showing up on CRI testgrid because it was removed yesterday. It will be added back in https://github.com/kubernetes/kubernetes/pull/33779.
@yujuhong @feiskyer
Previously, the `InspectImage` method of the Docker interface expected a
"pullable" image ref (name, tag, or manifest digest). If you tried to
inspect an image by its ID (config digest), the inspect would fail to
validate the image against the input identifier. This commit changes
the original method to be named `InspectImageByRef`, and introduces a
new method called `InspectImageByID` which validates that the input
identifier was an image ID.
Per discussion on https://github.com/kubernetes/kubernetes/issues/33873.
Currently the mount name is not being used and also involves some
incorrect usage (sometimes it's referencing a mount name, sometimes
it's referecing a volume name), so we decide to remove it from CRI.
The port name is also not used, so remove it as well.
Automatic merge from submit-queue
CRI: Enable custom infra container image
A minor fix to enable custom infra container image ref #29478
- Need to address:
Not sure how do deal with infra image credential, leave it as it is today. Should we allow user to specify credentials in pod yaml?
Automatic merge from submit-queue
CRI: Add init containers
This PR adds init containers support in CRI.
CC @yujuhong @Random-Liu @yifan-gu
Also CC @kubernetes/sig-node @kubernetes/sig-rktnetes
Automatic merge from submit-queue
Kubelet: fix port forward for dockershim
This PR fixes port forward for dockershim and also adds a `kubecontainer.FormatPod`.
Locally cluster has passed `--ginkgo.focus=Port\sforwarding'` tests.
cc/ @Random-Liu @yujuhong
Automatic merge from submit-queue
Fix issue in updating device path when volume is attached multiple times
When volume is attached, it is possible that the actual state
already has this volume object (e.g., the volume is attached to multiple
nodes, or volume was detached and attached again). We need to update the
device path in such situation, otherwise, the device path would be stale
information and cause kubelet mount to the wrong device.
This PR partially fixes issue #29324
Automatic merge from submit-queue
Fix#33784, IN_CREATE event does not guarantee file content written
Fixed#33784.
The CREATE inotify event [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/config/file_linux_test.go#L275) is triggered by os.OpenFile(), however the content would be written by the following f.Write(). It will fail if the program try to process the event in between.
IN_CREAE event is triggered by open(2), mkdir(2), link(2), symlink(2), bind(2), but not all of them will guarantee the content is written ([ref](http://man7.org/linux/man-pages/man7/inotify.7.html)). <s>Hence we should not respond to IN_CREATE event for pod creation. I believe listen on IN_MODIFY and IN_MOVED_TO would be sufficient for pod addition&update.
Would like to see the Jenkins test results for further evaluation.
@Random-Liu
Automatic merge from submit-queue
Split NodeDiskPressure into NodeInodePressure and NodeDiskPressure
Added NodeInodePressure as a NodeConditionType. SignalImageFsInodesFree and SignalNodeFsInodesFree signal this pressure. Also added simple pieces to the scheduler predicates so that it takes InodePressure into account.
Automatic merge from submit-queue
Add seccomp and apparmor support.
This PR adds seccomp and apparmor support in new CRI.
This a WIP because I'm still adding unit test for some of the functions. Sent this PR here for design discussion.
This PR is similar with https://github.com/kubernetes/kubernetes/pull/33450.
The differences are:
* This PR passes seccomp and apparmor configuration via annotations;
* This PR keeps the seccomp handling logic in docker shim because current seccomp implementation is very docker specific, and @timstclair told me that even the json seccomp profile file is defined by docker.
Notice that this PR almost passes related annotations in `api.Pod` to the runtime directly instead of introducing new CRI annotation.
@yujuhong @feiskyer @timstclair
When volume is attached, it is possible that the actual state
already has this volume object (e.g., the volume is attached to multiple
nodes, or volume was detached and attached again). We need to update the
device path in such situation, otherwise, the device path would be stale
information and cause kubelet mount to the wrong device.
This PR partially fixes issue #29324
Automatic merge from submit-queue
CRI: Fix bug in dockershim to set sandbox id properly.
For https://github.com/kubernetes/kubernetes/issues/33189#issuecomment-249307796.
During debugging `Variable Expansion should allow composing env vars into new env vars`, I found that the root cause is that the sandbox was removed before all containers were deleted, which caused the pod to be started again after succeed.
This happened because the `PodSandboxID` field is not set. This PR fixes the bug.
Some other test flakes are also caused by this
```
Downward API volume should provide node allocatable (cpu) as default cpu limit if the limit is not set
Downward API volume should provide container's memory limit
EmptyDir volumes should support (non-root,0666,tmpfs)
...
```
/cc @yujuhong @feiskyer
Automatic merge from submit-queue
Apply default image tags for all runtimes
Move the docker-specific logic up to the ImageManager to allow code sharing
among different implementations.
Part of #31459
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Node-ip is not used when cloud provider is used
Currently --node-ip in kubelet is not being used when kubelet is configured with a cloud provider. With this fix, kubelet will get a list of IPs from the provider and parse it to return the one that matches node-ip.
This fixes#23568
Automatic merge from submit-queue
Fake docker portfoward for in-process docker CRI integration
This is necessary to pass e2e tests for in-process docker CRI integration.
This is part of #31459.
cc/ @Random-Liu @kubernetes/sig-node
Most of our communications from apiserver -> nodes used
nodutil.GetNodeHostIP, but a few places didn't - and this
meant that the node name needed to be resolvable _and_ we needed
to populate valid IP addresses.
Fix the last few places that used the NodeName.
Issue #18525
Issue #9451
Issue #9728
Issue #17643
Issue #11543
Issue #22063
Issue #2462
Issue #22109
Issue #22770
Issue #32286
KubeletClient implements ConnectionInfoGetter, but it is not a complete
implementation: it does not set the kubelet port from the node record,
for example.
By renaming the method so that it does not implement the interface, we
are able to cleanly see where the "raw" GetConnectionInfo is used (it is
correct) and also have go type-checking enforce this for us.
Automatic merge from submit-queue
Pod creation moved outside of docker manager tests
**What this PR does / why we need it**:
It cleans up docker manager tests a little.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: related to #31550
**Special notes for your reviewer**:
I don't claim that working on this issue is finished, I cleaned up the tests just a bit
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
CRI: Add more docs abount pod sandbox config in CreateContainerRequest.
Makes it clear that the config will not change during the pod lifecycle.
The field is only for convenience.
Automatic merge from submit-queue
Use strongly-typed types.NodeName for a node name
We had another bug where we confused the hostname with the NodeName.
Also, if we want to use different values for the Node.Name (which is
an important step for making installation easier), we need to keep
better control over this.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Automatic merge from submit-queue
Move Kubelet pod-management code into kubelet_pods.go
Finish the kubelet code moves started during the 1.3 dev cycle -- move pod management code into a file called `kubelet_pods.go`.
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
Automatic merge from submit-queue
Variables should be initialized near where it would be used
As inner the for-loop, it would continue before hash-value being used, so i thinks the hash value calculation should be moved below