isKernelPid should explicitly check the error returned from os.Readlink and return true
only if the error value is ENOENT. Without this fix, if Readlink
returned say ENAMETOOLONG or EACESS, we would still count the process as
a kernel process (which is not true).
The container manager used in kubelet checks for docker daemon process either via pidfile
or process name. While the pidfile points to the docker daemon process PID, the
dockerProcessName constant stores a docker cli name ( docker ) instead of docker daemon
name ( dockerd ).
Got the proxy-server coming up in the master.
Added certs and have it comiung up with those certs.
Added a daemonset to run the network-agent.
Adding support for agent running as a sameon set on every node.
Added quick hack to test that proxy server/agent were correctly
tunneling traffic to the kubelet.
Added more WIP for reading network proxy configuration.
Get flags set correctly and fix connection services.
Adding missing ApplyTo
Added ConnectivityService.
Fixed build directives. Added connectivity service configuration.
Fixed log levels.
Fixed minor issues for feature turned off.
Fixed boilerplate and format.
Moved log dialer initialization earlier as per Liggits suggestion.
Fixed a few minor issues in the configuration for GCE.
Fixed scheme allocation
Adding unit test.
Added test for direct connectivity service.
Switching to injecting the Lookup method rather than using a Singleton.
First round of mikedaneses feedback.
Fixed deployment to use yaml and other changes suggested by MikeDanese.
Switched network proxy server/agent which are kebab-case not camelCase.
Picked up DIAL_RSP fix.
Factored in deads2k feedback.
Feedback from mikedanese
Factored in second round of feedback from David.
Fix path in verify.
Factored in anfernee's feedback.
First part of lavalamps feedback.
Factored in more changes from lavalamp and mikedanese.
Renamed network-proxy to konnectivity-server and konnectivity-agent.
Fixed tolerations and config file checking.
Added missing strptr
Finished lavalamps requested rename.
Disambiguating konnectivity service by renaming it egress selector.
Switched feature flag to KUBE_ENABLE_EGRESS_VIA_KONNECTIVITY_SERVICE
This starts ephemeral containers prior to init containers so that
ephemeral containers will still be started when init containers fail to
start.
Also improves tests and comments with review suggestions.
buildPodRef creates a unique key with the {podName, namespace, UID}
tuple. By omitting the UID in the metric, duplicate metrics can be sent
to prometheus causing 500's on the /metrics endpoint.
This patch fixes a bug in the CPUManager, whereby it doesn't honor the
"effective requests/limits" of a Pod as defined by:
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources
The rule states that a Pod’s "effective request/limit" for a resource
should be the larger of:
* The highest of any particular resource request or limit
defined on all init Containers
* The sum of all app Containers request/limit for a
resource
Moreover, the rule states that:
* The effective QoS tier is the same for init Containers
and app containers alike
This means that the resource requests of init Containers and app
Containers should be able to overlap, such that the larger of the two
becomes the "effective resource request/limit" for the Pod. Likewise,
if a QoS tier of "Guaranteed" is determined for the Pod, then both init
Containers and app Containers should run in this tier.
In its current implementation, the CPU manager honors the effective QoS
tier for both init and app containers, but doesn't honor the "effective
request/limit" correctly.
Instead, it treats the "effective request/limit" as:
* The sum of all init Containers plus the sum of all app
Containers request/limit for a resource
It does this by not proactively removing the CPUs given to previous init
containers when new containers are being created. In the worst case,
this causes the CPUManager to give non-overlapping CPUs to all
containers (whether init or app) in the "Guaranteed" QoS tier before any
of the containers in the Pod actually start.
This effectively blocks these Pods from running if the total number of
CPUs being requested across init and app Containers goes beyond the
limits of the system.
This patch fixes this problem by updating the CPUManager static policy
so that it proactively removes any guaranteed CPUs it has granted to
init Containers before allocating CPUs to app containers. Since all init
container are run sequentially, it also makes sure this proactive
removal happens for previous init containers when allocating CPUs to
later ones.
Previously, the topologymanager would simply fall back to the None() policy
if an invalid policy was specified. This patch updates this to return an
error when an invalid policy is passed, forcing the kubelet to fail
fast when this occurs.
These semantics should be preferable because an invalid policy likely
indicates operator error in setting the policy flag on the kubelet
correctly (e.g. misspelling 'strict' as 'striict'). In this case it is
better to fail fast so the operator can detect this and correct the
mistake, than to mask the error and essentially disable the
topologymanager unexpectedly.
Previously, the cpumanager would simply fall back to the None() policy
if an invalid policy was specified. This patch updates this to return an
error when an invalid policy is passed, forcing the kubelet to fail
fast when this occurs.
These semantics should be preferable because an invalid policy likely
indicates operator error in setting the policy flag on the kubelet
correctly (e.g. misspelling 'static' as 'statiic'). In this case it is
better to fail fast so the operator can detect this and correct the
mistake, than to mask the error and essentially disable the cpumanager
unexpectedly.
Nit: remove capitalization of preferred
Remove line from kubelet and add to separate PR for easier merge
nit: dependency added to separate PR
Add check to ensure strict policy cannot be set without feature gate enabled
Topology Manager runs "none" policy by default.
Added constants for policies and updated documentation.
https://github.com/moby/moby/pull/27510 switched the container already
in use message from a bare string to a quoted string, so the
auto-deletion of "in use" containers no longer works in Docker > 17.04.
libcni 0.7.0 caches ADD operation results and allows the runtime to
retrieve these from the cache. In case the user wants a different
cache directory than the defaul, plumb that through like we do
for --cni-bin-dir and --cni-conf-dir.
Previous commit "Use ip address from CNI output" introduces
ability to run pod which can havn't eth0. But also it
add problem: after kubelet restart, if we have already started
pod w/o eth0, kubelet can't find proper interface (it's
normal for vhostuser type of cni plugin when eth0 doesn't exist)
and kubelet restarts "broken" pod.
Fix of this issue requeres new feature of libcni - caching
results.
Looks like new libcni requires cniVersion in CNI output.
This patch specifies version both for CNI conf and CNI output.
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Use the exported list from runc that uses "KB" and not "kB".
This issue breaks kubelet on AArch64 (arm 64).
var HugePageSizeUnitList = []string{"B", "KB", "MB", "GB", "TB", "PB"}
The hugetlb cgroup control files (introduced here in 2012:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abb8206cb0773)
use "KB" and not "kB"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?h=v5.0#n349).
The behavior in the kernel has not changed since the introduction, and
the current code using "kB" will therefore fail on devices with huge
pages smaller than 1MiB. This is the case for AArch64.
As seen from the code in "mem_fmt" inside hugetlb_cgroup.c, only "KB",
"MB" and "GB" are used, so the others may be removed as well.
Here is a real world example of the files inside the
"/sys/kernel/mm/hugepages/" directory:
- "hugepages-64kB"
- "hugepages-2048kB"
- "hugepages-32768kB"
- "hugepages-1048576kB"
And the corresponding cgroup files:
- "hugetlb.64KB._____"
- "hugetlb.2MB._____"
- "hugetlb.32MB._____"
- "hugetlb.1GB._____"
Signed-off-by: Odin Ugedal <odin@ugedal.com>
If kubelet never gets past sandbox creation (i.e., never attempted to
create containers for a pod), it should retry the sandbox creation on
failure, regardless of the restart policy of the pod.
Two small code cleanup changes for `probe/http`.
- Tests name the `followNonLocalRedirects` variable before passing to
`New`, so its clear what the boolean flag in the construct impacts.
- Change import name from `httprobe` to `httpprobe` when used by
`pkg/kubelet/prober/prober.go`. Establishes consistency with other uses
in the repo.
This patch refactors pkg/util/mount to be more usable outside of
Kubernetes. This is done by refactoring mount.Interface to only contain
methods that are not K8s specific. Methods that are not relevant to
basic mount activities but still have OS-specific implementations are
now found in a mount.HostUtils interface.
Fix an issue in which, when trying to specify the `--kube-reserved-cgroup`
(or `--system-reserved-cgroup`) with `--cgroup-driver=systemd`, we will
not properly convert the `systemd` cgroup name into the internal cgroup
name that k8s expects. Without this change, specifying
`--kube-reserved-cgroup=/test.slice --cgroup-driver=systemd` will fail,
and only `--kube-reserved-cgroup=/test --crgroup-driver=systemd` will succeed,
even if the actual cgroup existing on the host is `/test.slice`.
Additionally, add light unit testing of our process from converting to a
systemd cgroup name to kubernetes internal cgroup name.
On Windows the only way to access the container's network interfaces is
by running another process in the pod from which we can use a
netcat-like program to proxy the TCP stream
Proposed wincat.exe can be found here: https://github.com/benmoss/wincat
These updates are based on discussions had about the preferred semantics
of the TopologyManager and will be reflected in changes to an upcoming
PR that adds the actual TopologyManager implementation.
Add comments to exported functions in `pkg/kubelet/client/client.go`
In `KubeletClientConfig` rename `EnableHttps` to `EnableHTTPS`. This requires
renaming it in `pkg/kubelet/client/client_test.go`
The metrics structure passed to volume stat calculator
can contain real stats on subset of metrics fields. For example,
the metrics structure filled by a CSI driver can have
either INODES or BYTES filled, IOW it a valid return.
In such cases the volume stat calculator panic with below
trace:
0516 21:36:19.013143 14452 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/home/hchiramm/gopath/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/home/hchiramm/gopath/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/home/hchiramm/gopath/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/panic.go:522
/usr/local/go/src/runtime/panic.go:82
/usr/local/go/src/runtime/signal_unix.go:390
/home/hchiramm/gopath/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/api/resource/quantity.go:697
/home/hchiramm/gopath/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/api/resource/quantity.go:685
/home/hchiramm/gopath/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_calculator.go:144
/home/hchiramm/gopath/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_calculator.go:125
/home/hchiramm/gopath/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/server/stats/volume_stat_calculator.go:65
/home/hchiramm/gopath/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152
/home/hchiramm/gopath/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153
/usr/local/go/src/runtime/asm_amd64.s:1337
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Previously, the code base had both `housekeeping` and `houseKeeping`,
which made case sensitive search in vim difficult. Standardize on
`housekeeping`, which was by far the most popular.
* fix duplicated imports of api/core/v1
* fix duplicated imports of client-go/kubernetes
* fix duplicated imports of rest code
* change import name to more reasonable
From:
MountVolume.SetUp failed for volume "secret-prometheus-k8s-proxy" :
couldn't propagate object cache: timed out waiting for the condition
To:
MountVolume.SetUp failed for volume "secret-prometheus-k8s-proxy" : failed
to sync secret cache: timed out waiting for the condition
Increase the grpc max message size to be the same as the value defined
in `pkg/kubelet/remote/utils.go`.
Increase the limit because, `ListPodSandbox` (and possibly other) calls
are hitting the limit. Long term, the best solution to this issue is to
use pagination, but that is not currently available.
The cpumanager loops through all init Containers and app Containers when
reconciling its state. However, the current implementation of
findContainerIDByName(), which is call by the reconciler, does not
resolve for init Containers.
This patch updates findContainerIDByName() to account for init
Containers and adds a regression test that fails before the change and
succeeds after.
Currently, the method `pluginwatcher.traversePluginDir` descends into
a directory adding filesystem watchers and creating synthetic `create`
events when it finds sockets files. However, a race condition might
happen when a recently-added watcher observes a `delete` event in a
socket file before `pluginwatcher.traversePlugindir` itself notices
this file.
This patch changes this behavior by registering watchers on
directories, enqueueing and processing `create` events from sockets
found, and only then processing the events from the registered watchers.
This patch moves the ExecMounter found in pkg/util/mount to
pkg/volume/util/exec. This is done in preparation for pkg/util/mount to
move out of tree. This specific implemention of mount.Interface is only
used internally to K8s and does not need to move out of tree.
Clarifies that requesting no conversion is part of the codec factory, and
future refactors will make the codec factory less opionated about conversion.
If the container is not found, do not stop reading the log file
immediately but wait until we reach again EOF.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
it seems fsnotify can miss some read events, blocking the kubelet to
receive more data from the log file.
If we end up waiting for events with fsnotify, force a read from the
log file every second so that are sure to not miss new data for longer
than that.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
if the runtime is configured to rotate the log file, we might end up
watching the old fd where there are no more writes.
When a fsnotify event other than Write is received, reopen the log
file and recreate the watcher.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
manager
This PR fixes the issue #75345. This fix modified the checking volume in
actual state when validating whether volume can be removed from desired state or not. Only if volume status is already mounted in actual state, it can be removed from desired state.
For the case of mounting fails always, it can still work because the
check also validate whether pod still exist in pod manager. In case of
mount fails, pod should be able to removed from pod manager so that
volume can also be removed from desired state.
Minor optimization in the code that attempts to assign whole
sockets/cores in case the number of CPUs requested is higher
than CPUs-per-socket/core: check if the number of requested
CPUs is higher than CPUs-per-socket/core before retrieving
and iterating the free sockets/cores, and break the loops
when that is no longer the case.
Signed-off-by: Arik Hadas <ahadas@redhat.com>
CRI runtimes do not supply cpu nano core usage as it is not part of CRI
stats. However, there are upstream components that still rely on such
stats to function. The previous fix was faulty because the multiple
callers could compete and update the stats, causing
inconsistent/incoherent metrics. This change, instead, creates a
separate call for updating the usage, and rely on eviction manager,
which runs periodically, to trigger the updates. The caveat is that if
eviction manager is completley turned off, no one would compute the
usage.
This is the 2nd PR to move CSINodeInfo/CSIDriver APIs to
v1beta1 core storage APIs. It includes controller side changes.
It depends on the PR with API changes:
https://github.com/kubernetes/kubernetes/pull/73883
This change explicits the restart policy, as on some docker version
(e.g. 11.07-ce) the default for this field is "". which seems to be not
respected by dockerd
A previous PR (https://github.com/kubernetes/kubernetes/pull/73726)
added GMSA support to the dockershim. Unfortunately, there was a
bug in there: the registry keys used to pass the cred specs down
to Docker were being cleaned up too early, right after the containers'
creation - before Docker would ever try to read them, when trying to
actually start the container.
This patch fixes this.
An e2e test is also provided in a separate PR.
Signed-off-by: Jean Rouge <rougej+github@gmail.com>
1) Add suffix (`seconds` or `total`) to metric name
2) Switch Summary metric to Histogram metric (Summary metrics are not
supported completely by prometheus-to-sd and can't be aggregated.)
This package contains public/private key utilities copied directly from
client-go/util/cert. All imports were updated.
Future PRs will actually refactor the libraries.
Updates #71004
1. Find the minimal thread number within a core using a
single loop rather than by sorting the thread numbers.
2. Inline getUniqueCoreID#err and Discover#numCPUs variables.
3. Narrow the scope of Discover#coreID and Discover#err variables.
Signed-off-by: Arik Hadas <ahadas@redhat.com>
The messages for container lifecycle events are subtly inconsistent
and should be unified.
First, the field format for containers is hard to parse for a human,
so include the container name directly in the message for create
and start, and for kill remove the container runtime prefix.
Second, the pulling image event has inconsistent capitalization, fix
that to be sentence without punctuation.
Third, the kill container event was unnecessarily wordy and inconsistent
with the create and start events. Make the following changes:
* Use 'Stopping' instead of 'Killing' since kill is usually reserved for
when we decide to hard stop a container
* Send the event before we dispatch the prestop hook, since this is an
"in-progress" style event vs a "already completed" type event
* Remove the 'cri-o://' / 'docker://' prefix by printing the container
name instead of id (we already do that replacement at the lower level
to prevent high cardinality events)
* Use 'message' instead of 'reason' as the argument name since this is a
string for humans field, not a string for machines field
* Remove the hash values on the container spec changed event because no
human will ever be able to do anything with the hash value
* Use 'Stopping container %s(, explanation)?' form without periods to
follow event conventions
The end result is a more pleasant message for humans:
```
35m Normal Created Pod Created container
35m Normal Started Pod Started container
10m Normal Killing Pod Killing container cri-o://installer:Need to kill Pod
10m Normal Pulling Pod pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-10-172026@sha256:3da5303d4384d24691721c1cf2333584ba60e8f82c9e782f593623ce8f83ddc5"
```
becomes
```
35m Normal Created Pod Created container installer
35m Normal Started Pod Started container installer
10m Normal Killing Pod Stopping container installer
10m Normal Pulling Pod Pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-10-172026@sha256:3da5303d4384d24691721c1cf2333584ba60e8f82c9e782f593623ce8f83ddc5"
```
that requests any device plugin resource. If not, re-issue Allocate
grpc calls. This allows us to handle the edge case that a pod got
assigned to a node even before it populates its extended resource
capacity.
* value names are now purely random
* cleaning up leaked registry keys at Kubelet init
* fixing a small bug masking create errors
Signed-off-by: Jean Rouge <rougej+github@gmail.com>
This patch comprises the kubelet changes outlined in the GMSA KEP
(https://github.com/kubernetes/enhancements/blob/master/keps/sig-windows/20181221-windows-group-managed-service-accounts-for-container-identity.md)
to add GMSA support to Windows workloads.
More precisely, it includes the logic proposed in the KEP to resolve
which GMSA spec should be applied to which containers, and changes
`dockershim` to copy the relevant GMSA credential specs to Windows
registry values prior to creating the container, passing them down
to docker itself, and finally removing the values from the registry
afterwards; both these changes need to be activated with the `WindowsGMSA`
feature gate.
Includes unit tests.
Signed-off-by: Jean Rouge <rougej+github@gmail.com>
When upgrading to 1.13, pods that were created prior to the upgrade have
no pod.Spec.EnableServiceLinks set. This causes a segfault and prevents
the pod from ever starting.
Check and set to the default if nil.
Fixes#71749
For the next release, we include both sets of labels for pods and
containers: "container_name" and "container", "pod_name" and "pod".
In future releases, the "*_name" metrics will be deprecated.
We've changed the Ephemeral Containers API, and container type will no
longer be required. Since this is the only feature using it, remove it.
This reverts commit ba6f31a6c6.
Currently, Docker make IPC of every container shareable by default,
which means other containers can join it's IPC namespace. This is
implemented by creating a tmpfs mount on the host, and then
bind-mounting it to a container's /dev/shm. Other containers
that want to share the same IPC (and the same /dev/shm) can also
bind-mount the very same host's mount.
Now, since https://github.com/moby/moby/commit/7120976d7
(https://github.com/moby/moby/pull/34087) there is a possiblity
to have per-daemon default of having "private" IPC mode,
meaning all the containers created will have non-shareable
/dev/shm.
For shared IPC to work in the above scenario, we need to
explicitly make the "pause" container's IPC mode as "shareable",
which is what this commit does.
To test: add "default-ipc-mode: private" to /etc/docker/daemon.json,
try using kube as usual, there should be no errors.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Modify kubelet plugin watcher to support older CSI drivers that use an
the old plugins directory for socket registration.
Also modify CSI plugin registration to support multiple versions of CSI
registering with the same name.
kubeadm uses certificate rotation to replace the initial high-power
cert provided in --kubeconfig with a less powerful certificate on
the masters. This requires that we pass the contents of the client
config certData and keyData down into the cert store to populate
the initial client.
Add better comments to describe why the flow is required. Add a test
that verifies initial cert contents are written to disk. Change
the cert manager to not use MustRegister for prometheus so that
it can be tested.
Expose both a Stop() method (for cleanup) and a method to force
cert rotation, but only expose Stop() on the interface.
Verify that we choose the correct client.
Ensure that bootstrap+clientcert-rotation in the Kubelet can:
1. happen in the background so that static pods aren't blocked by bootstrap
2. collapse down to a single call path for requesting a CSR
3. reorganize the code to allow future flexibility in retrieving bootstrap creds
Fetching the first certificate and later certificates when the kubelet
is using client rotation and bootstrapping should share the same code
path. We also want to start the Kubelet static pod loop before
bootstrapping completes. Finally, we want to take an incremental step
towards improving how the bootstrap credentials are loaded from disk
(potentially allowing for a CLI call to get credentials, or a remote
plugin that better integrates with cloud providers or KSMs).
Reorganize how the kubelet client config is determined. If rotation is
off, simplify the code path. If rotation is on, load the config
from disk, and then pass that into the cert manager. The cert manager
creates a client each time it tries to request a new cert.
Preserve existing behavior where:
1. bootstrap kubeconfig is used if the current kubeconfig is invalid/expired
2. we create the kubeconfig file based on the bootstrap kubeconfig, pointing to
the location that new client certs will be placed
3. the newest client cert is used once it has been loaded
UnmountDevice must not clear devicepath, because such devicePath
may come from node.status (e.g. on AWS) and subsequent MountDevice
operation (that may be already enqueued) needs it.
**What type of PR is this?**
/kind cleanup
**What this PR does / why we need it**:
Fix typos for stats_provider_test.go
**Which issue(s) this PR fixes** *(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
The reason for the bump is the new functionality of the
k8s.io/utils/exec package which allows
- to get a hold of the process' std{out,err} as `io.Reader`s
- to `Start` a process and `Wait` for it
This should help on addressing #70890 by allowing to wrap std{out,err}
of the process to be wrapped with a `io.limitedReader`.
It also updates
- k8s.io/kubernetes/pkg/probe/exec.FakeCmd
- k8s.io/kubernetes/pkg/kubelet/prober.execInContainer
- k8s.io/kubernetes/cmd/kubeadm/app/phases/kubelet.fakeCmd
to implement the changed interface.
The dependency on 'k8s.io/utils/pointer' to the new version has also
been bumped in some staging repos:
- apiserver
- kube-controller-manager
- kube-scheduler
The inotify code was removed from golang.org/x/exp several years ago. Therefore
importing it from that path prevents downstream consumers from using any module
that makes use of more recent features of golang.org/x/exp.
This change is a followup to google/cadvisor#2060 which was merged with #70889
This fixes#68478
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
This previously caused a panic when moving lastKnownGood between two
non-nil values, because we were comparing the interface wrapper instead
of comparing the NodeConfigSources. The case of moving from one non-nil
lastKnownGood config to another doesn't appear to be tested by the e2e
node tests. I added a unit test and an e2e node test to help catch bugs
with this case in the future.
When node lease feature is enabled, kubelet reports node status to api server
only if there is some change or it didn't report over last report interval.
Individual implementations are not yet being moved.
Fixed all dependencies which call the interface.
Fixed golint exceptions to reflect the move.
Added project info as per @dims and
https://github.com/kubernetes/kubernetes-template-project.
Added dims to the security contacts.
Fixed minor issues.
Added missing template files.
Copied ControllerClientBuilder interface to cp.
This allows us to break the only dependency on K8s/K8s.
Added TODO to ControllerClientBuilder.
Fixed GoDeps.
Factored in feedback from JustinSB.
In the recent PR on adding ProcMount, we introduced a regression when
pods are privileged. This shows up in 18.06 docker with kubeadm in the
kube-proxy container.
The kube-proxy container is privilged, but we end up setting the
`/proc/sys` to Read-Only which causes failures when running kube-proxy
as a pod. This shows up as a failure when using sysctl to set various
network things.
Change-Id: Ic61c4c9c961843a4e064e783fab0b54350762a8d
This change adds comments to exported things and renames the tcp,
http, and exec probe interfaces to just be Prober within their
namespace.
Issue #68026
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
add missing LastTransitionTime of ContainerReady condition
**What this PR does / why we need it**:
add missing LastTransitionTime of ContainerReady condition
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
xref #64646
**Special notes for your reviewer**:
/cc freehan yujuhong
**Release note**:
```release-note
add missing LastTransitionTime of ContainerReady condition
```
We are removing dependencies on docker types where possible in the core
libraries. credentialprovider is generic to Docker and uses a public API
(the config file format) that must remain stable. Create an equivalent type
and use a type cast (which would error if we ever change the type) in the
dockershim. We already perform a transformation like this for CRI and so
we aren't changing much.
Automatic merge from submit-queue (batch tested with PRs 67950, 68195). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Consolidate componentconfig code standards
**What this PR does / why we need it**:
This PR fixes a bunch of very small misalignments in ComponentConfig packages:
- Add sane comments to all functions/variables in componentconfig `register.go` files
- Make the `register.go` files of componentconfig pkgs follow the same pattern and not differ from each other like they do today.
- Register the `openapi-gen` tag in all `doc.go` files where the pkg contains _external_ types.
- Add the `groupName` tag where missing
- Fix cases where `addKnownTypes` was registered twice in the `SchemeBuilder`
- Add `Readme` and `OWNERS` files to `Godeps` directories if missing.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/assign @sttts @thockin
Automatic merge from submit-queue (batch tested with PRs 68119, 68191). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
fix token controller keyFunc bug
Currently, token manager use keyFunc like: `fmt.Sprintf("%q/%q/%#v", name, namespace, tr.Spec)`.
Since tr.Spec contains point fields, new token request would not reuse the cache at all.
This patch fix this, also adds unit test.
```release-note
NONE
```
Since tr.Spec contains point fields, new token request would not reuse
the cache at all. This patch fix this, also adds unit test.
Signed-off-by: Mike Danese <mikedanese@google.com>
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Kubelet: only sync iptables on linux
**What this PR does / why we need it**:
Iptables is only supported on Linux, kubelet should only sync NAT rules on Linux.
Without this PR, Kubelet on Windows would logs following errors on each `syncNetworkUtil()`:
```
kubelet.err.log:4692:E0711 22:03:42.103939 2872 kubelet_network.go:102] Failed to ensure that nat chain KUBE-MARK-DROP exists: error creating chain "KUBE-MARK-DROP": executable file
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65713
**Special notes for your reviewer**:
**Release note**:
```release-note
Kubelet now only sync iptables on Linux.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Kubelet: only apply default hard evictions of nodefs.inodesFree on Linux
**What this PR does / why we need it**:
Kubelet sets default hard evictions of `nodefs.inodesFree ` for all platforms today. This will cause errors on Windows and a lot `no observation found for eviction signal nodefs.inodesFree` errors will be logs for kubelet.
```
kubelet.err.log:4961:W0711 22:21:12.378789 2872 helpers.go:808] eviction manager: no observation found for eviction signal nodefs.inodesFree
kubelet.err.log:4967:W0711 22:21:30.411371 2872 helpers.go:808] eviction manager: no observation found for eviction signal nodefs.inodesFree
kubelet.err.log:4974:W0711 22:21:48.446456 2872 helpers.go:808] eviction manager: no observation found for eviction signal nodefs.inodesFree
kubelet.err.log:4978:W0711 22:22:06.482441 2872 helpers.go:808] eviction manager: no observation found for eviction signal nodefs.inodesFree
```
This PR updates the default hard eviction value and only apply nodefs.inodesFree on Linux.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66088
**Special notes for your reviewer**:
**Release note**:
```release-note
Kubelet only applies default hard evictions of nodefs.inodesFree on Linux
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Add kubelet stats for windows system container "pods"
**What this PR does / why we need it**:
This PR adds kubelet stats for windows system container "pods". Without this, kubelet will always logs error:
```
kubelet.err.log:4832:E0711 22:12:49.241358 2872 helpers.go:735] eviction manager: failed to construct signal: "allocatableMemory.available" error: system container "pods" not found
```
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66087
**Special notes for your reviewer**:
/sig windows
/sig node
**Release note**:
```release-note
Add kubelet stats for windows system container "pods"
```
Automatic merge from submit-queue (batch tested with PRs 65251, 67255, 67224, 67297, 68105). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Promote mount propagation to GA
**What this PR does / why we need it**:
This PR promotes mount propagation to GA.
Website PR: https://github.com/kubernetes/website/pull/9823
**Release note**:
```release-note
Mount propagation has promoted to GA. The `MountPropagation` feature gate is deprecated and will be removed in 1.13.
```
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
CSI Cluster Registry and Node Info CRDs
**What this PR does / why we need it**:
Introduces the new `CSIDriver` and `CSINodeInfo` API Object as proposed in https://github.com/kubernetes/community/pull/2514 and https://github.com/kubernetes/community/pull/2034
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/features/issues/594
**Special notes for your reviewer**:
Per the discussion in https://groups.google.com/d/msg/kubernetes-sig-storage-wg-csi/x5CchIP9qiI/D_TyOrn2CwAJ the API is being added to the staging directory of the `kubernetes/kubernetes` repo because the consumers will be attach/detach controller and possibly kubelet, but it will be installed as a CRD (because we want to move in the direction where the API server is Kubernetes agnostic, and all Kubernetes specific types are installed).
**Release note**:
```release-note
Introduce CSI Cluster Registration mechanism to ease CSI plugin discovery and allow CSI drivers to customize Kubernetes' interaction with them.
```
CC @jsafrane
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
Add a ProcMount option to the SecurityContext & AllowedProcMountTypes to PodSecurityPolicy
So there is a bit of a chicken and egg problem here in that the CRI runtimes will need to implement this for there to be any sort of e2e testing.
**What this PR does / why we need it**: This PR implements design proposal https://github.com/kubernetes/community/pull/1934. This adds a ProcMount option to the SecurityContext and AllowedProcMountTypes to PodSecurityPolicy
Relies on https://github.com/google/cadvisor/pull/1967
**Release note**:
```release-note
ProcMount added to SecurityContext and AllowedProcMounts added to PodSecurityPolicy to allow paths in the container's /proc to not be masked.
```
cc @Random-Liu @mrunalp
Automatic merge from submit-queue (batch tested with PRs 67349, 66056). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.
wait until apiserver connection before starting kubelet tls bootstrap
I wonder if this helps with sometimes slow network programming
cc @mwielgus @awly
This vendor change was purely for the changes in docker to allow for
setting the Masked and Read-only paths.
See: moby/moby#36644
But because of the docker dep update it also needed cadvisor to be
updated and winterm due to changes in pkg/tlsconfig in docker
See: google/cadvisor#1967
Signed-off-by: Jess Frazelle <acidburn@microsoft.com>
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
should not event directly
**What this PR does / why we need it**:
should not event directly, using recordContainerEvent() to generate ref and deduplicate events instead.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
none
```
Automatic merge from submit-queue (batch tested with PRs 67739, 65222). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Honor --hostname-override, report compatible hostname addresses with cloud provider
xref #677147828e5d made cloud providers authoritative for the addresses reported on Node objects, so that the addresses used by the node (and requested as SANs in serving certs) could be verified via cloud provider metadata.
This had the effect of no longer reporting addresses of type Hostname for Node objects for some cloud providers. Cloud providers that have the instance hostname available in metadata should add a `type: Hostname` address to node status. This is being tracked in #67714
This PR does a couple other things to ease the transition to authoritative cloud providers:
* if `--hostname-override` is set on the kubelet, make the kubelet report that `Hostname` address. if it can't be verified via cloud-provider metadata (for cert approval, etc), the kubelet deployer is responsible for fixing the situation by adjusting the kubelet configuration (as they were in 1.11 and previously)
* if `--hostname-override` is not set, *and* the cloud provider didn't report a Hostname address, *and* the auto-detected hostname matches one of the addresses the cloud provider *did* report, make the kubelet report that as a Hostname address. That lets the addresses remain verifiable via cloud provider metadata, while still including a `Hostname` address whenever possible.
/sig node
/sig cloud-provider
/cc @mikedanese
fyi @hh
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 67694, 64973, 67902). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
SCTP support implementation for Kubernetes
**What this PR does / why we need it**: This PR adds SCTP support to Kubernetes, including Service, Endpoint, and NetworkPolicy.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#44485
**Special notes for your reviewer**:
**Release note**:
```release-note
SCTP is now supported as additional protocol (alpha) alongside TCP and UDP in Pod, Service, Endpoint, and NetworkPolicy.
```
The requested Service Protocol is checked against the supported protocols of GCE Internal LB. The supported protocols are TCP and UDP.
SCTP is not supported by OpenStack LBaaS. If SCTP is requested in a Service with type=LoadBalancer, the request is rejected. Comment style is also corrected.
SCTP is not allowed for LoadBalancer Service and for HostPort. Kube-proxy can be configured not to start listening on the host port for SCTP: see the new SCTPUserSpaceNode parameter
changed the vendor github.com/nokia/sctp to github.com/ishidawataru/sctp. I.e. from now on we use the upstream version.
netexec.go compilation fixed. Various test cases fixed
SCTP related conformance tests removed. Netexec's pod definition and Dockerfile are updated to expose the new SCTP port(8082)
SCTP related e2e test cases are removed as the e2e test systems do not support SCTP
sctp related firewall config is removed from cluster/gce/util.sh. Variable name sctp_addr is corrected to sctpAddr in pkg/proxy/ipvs/proxier.go
cluster/gce/util.sh is copied from master
This extends the Kubelet to create and periodically update leases in a
new kube-node-lease namespace. Based on [KEP-0009](https://github.com/kubernetes/community/blob/master/keps/sig-node/0009-node-heartbeat.md),
these leases can be used as a node health signal, and will allow us to
reduce the load caused by over-frequent node status reporting.
- add NodeLease feature gate
- add kube-node-lease system namespace for node leases
- add Kubelet option for lease duration
- add Kubelet-internal lease controller to create and update lease
- add e2e test for NodeLease feature
- modify node authorizer and node restriction admission controller
to allow Kubelets access to corresponding leases
Automatic merge from submit-queue (batch tested with PRs 65247, 63633, 67425). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix kubelet iptclient in ipv6 cluster
**What this PR does / why we need it**:
Kubelet uses "iptables" instead of "ip6tables" in an ipv6-only cluster. This causes failed traffic for type: LoadBalancer services (and probably a lot of other problems).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#67398
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove rescheduler since scheduling DS pods by default scheduler is moving to beta
**What this PR does / why we need it**:
remove rescheduler since scheduling DS pods by default scheduler is moving to beta
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#64725
**Special notes for your reviewer**:
**Release note**:
```release-note
Remove rescheduler since scheduling DS pods by default scheduler is moving to beta.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Reduce latency to node ready after CIDR is assigned.
This adds code to execute an immediate runtime and node status update when the Kubelet sees that it has a CIDR, which significantly decreases the latency to node ready.
```release-note
Speed up kubelet start time by executing an immediate runtime and node status update when the Kubelet sees that it has a CIDR.
```
Automatic merge from submit-queue (batch tested with PRs 67430, 67550). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
cpumanager: rollback state if updateContainerCPUSet failed
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63018
If `updateContainerCPUSet` failed, the container will start failed. We should rollback the state to avoid CPU leak.
**Special notes for your reviewer**:
**Release note**:
```release-note
cpumanager: rollback state if updateContainerCPUSet failed
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add labels to kubelet OWNERS files
**What this PR does / why we need it**:
This change makes it possible to automatically add the two labels: `area/kubelet` to PRs that touch the paths in question.
this already exists for kubeadm:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/OWNERS#L17-L19
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
refs https://github.com/kubernetes/community/issues/1808
**Special notes for your reviewer**:
none
**Release note**:
```release-note
NONE
```
/area kubelet
@kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add Labels to various OWNERS files
**What this PR does / why we need it**:
Will reduce the burden of manually adding labels. Information pulled
from:
https://github.com/kubernetes/community/blob/master/sigs.yaml
Change-Id: I17e661e37719f0bccf63e41347b628269cef7c8b
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 67661, 67497, 66523, 67622, 67632). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Reduce verbose logs of node addresses requesting
**What this PR does / why we need it**:
Kubelet build from the master branch is flushing node addresses requesting logs, which is too verbose:
```sh
Aug 16 10:09:40 node-1 kubelet[24217]: I0816 10:09:40.658479 24217 cloud_request_manager.go:97] Requesting node addresses from cloud provider for node "node-1"
Aug 16 10:09:40 node-1 kubelet[24217]: I0816 10:09:40.666114 24217 cloud_request_manager.go:116] Node addresses from cloud provider for node "node-1" collected
Aug 16 10:09:50 node-1 kubelet[24217]: I0816 10:09:50.666357 24217 cloud_request_manager.go:97] Requesting node addresses from cloud provider for node "node-1"
Aug 16 10:09:50 node-1 kubelet[24217]: I0816 10:09:50.674322 24217 cloud_request_manager.go:116] Node addresses from cloud provider for node "node-1" collected
Aug 16 10:10:01 node-1 kubelet[24217]: I0816 10:10:00.674644 24217 cloud_request_manager.go:97] Requesting node addresses from cloud provider for node "node-1"
Aug 16 10:10:01 node-1 kubelet[24217]: I0816 10:10:00.682794 24217 cloud_request_manager.go:116] Node addresses from cloud provider for node "node-1" collected
Aug 16 10:10:10 node-1 kubelet[24217]: I0816 10:10:10.683002 24217 cloud_request_manager.go:97] Requesting node addresses from cloud provider for node "node-1"
Aug 16 10:10:10 node-1 kubelet[24217]: I0816 10:10:10.689641 24217 cloud_request_manager.go:116] Node addresses from cloud provider for node "node-1" collected
Aug 16 10:10:20 node-1 kubelet[24217]: I0816 10:10:20.690006 24217 cloud_request_manager.go:97] Requesting node addresses from cloud provider for node "node-1"
Aug 16 10:10:20 node-1 kubelet[24217]: I0816 10:10:20.696545 24217 cloud_request_manager.go:116] Node addresses from cloud provider for node "node-1" collected
```
This PR sets them to level 5.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/cc @ingvagabund
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fail container start if its requested device plugin resource is unknown.
With the change, Kubelet device manager now checks whether it has cached option state for the requested device plugin resource to make sure the resource is in ready state when we start the container.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes https://github.com/kubernetes/kubernetes/issues/67107
**Special notes for your reviewer**:
**Release note**:
```release-note
Fail container start if its requested device plugin resource hasn't registered after Kubelet restart.
```
Automatic merge from submit-queue (batch tested with PRs 66209, 67380, 67499, 67437, 67498). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
nits in manager.go
**What this PR does / why we need it**:
just found some nits in the manager.go
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 64445, 67459, 67434). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
dockershim/network: pass ipRange CNI capabilities
**What this PR does / why we need it**:
Updates the dynamic (capability args) passed from Kubernetes to the CNI plugin. This means CNI plugin authors can offer more features and / or reduce their dependency on the APIServer.
Currently, we only pass the `portMappings` capability. CNI now supports `bandwidth` for bandwidth limiting and `ipRanges` for preferred IP blocks. This PR adds support for these two new capabilities.
Bandwidth limits are provided - as implemented in kubenet - via the pod annotations `kubernetes.io/ingress-bandwidth` and `kubernetes.io/egress-bandwidth`.
The ipRanges field simply passes the PodCIDR. This does mean that we need to change the NodeReady algorithm. Previously, we would only set NodeNotReady on missing PodCIDR when using Kubenet. Now, if the CNI configuration includes the `ipRanges` capability, we need to do the same.
**Which issue(s) this PR fixes**:
Fixes#64393
**Release note**:
```release-note
The dockershim now sets the "bandwidth" and "ipRanges" CNI capabilities (dynamic parameters). Plugin authors and administrators can now take advantage of this by updating their CNI configuration file. For more information, see the [CNI docs](https://github.com/containernetworking/cni/blob/master/CONVENTIONS.md#dynamic-plugin-specific-fields-capabilities--runtime-configuration)
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Other components support set log level dynamically
**What this PR does / why we need it**:
#63777 introduced a way to set glog.logging.verbosity dynamically.
We should enable this for all other components, which is specially useful in debugging.
**Release note**:
```release-note
Expose `/debug/flags/v` to allow kubelet dynamically set glog logging level. If want to change glog level to 3, you only have to send a PUT request like `curl -X PUT http://127.0.0.1:8080/debug/flags/v -d "3"`.
```
Automatic merge from submit-queue (batch tested with PRs 65561, 67109, 67450, 67456, 67402). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
error text refers to wrong stream type
**What this PR does / why we need it**:
clarify error text
**Special notes for your reviewer**:
I think this was a copy and paste error.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65561, 67109, 67450, 67456, 67402). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Compared preemption by priority in Kubelet
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65372
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Attacher/Detacher refactor for local storage
Proposal link: https://github.com/kubernetes/community/pull/2438
**What this PR does / why we need it**:
Attacher/Detacher refactor for the plugins which just need to mount device, but do not need to attach, such as local storage plugin.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
```release-note
Attacher/Detacher refactor for local storage
```
/sig storage
/kind feature
Automatic merge from submit-queue (batch tested with PRs 66177, 66185, 67136, 67157, 65065). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: reduce logging for backoff situations
xref https://bugzilla.redhat.com/show_bug.cgi?id=1555057#c6
Pods that are in `ImagePullBackOff` or `CrashLoopBackOff` currently generate a lot of logging at the `glog.Info()` level. This PR moves some of that logging to `V(3)` and avoids logging in situations where the `SyncPod` only fails because pod are in a BackOff error condition.
@derekwaynecarr @liggitt
Automatic merge from submit-queue (batch tested with PRs 66491, 66587, 66856, 66657, 66923). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert #63905: Setup dns servers and search domains for Windows Pods
**What this PR does / why we need it**:
From https://github.com/kubernetes/kubernetes/pull/63905#issuecomment-396709775:
> I don't think this change does anything on Windows. On windows, the network endpoint configuration is taken care of completely by CNI. If you would like to pass on the custom dns polices from the pod spec, it should be dynamically going to the cni configuration that gets passed to CNI. From there, it would be passed down to platform and would be taken care of appropriately by HNS.
> etc\resolve.conf is very specific to linux and that should remain linux speicfic implementation. We should be trying to move away from platform specific code in Kubelet.
Docker is not managing the networking here for windows. So it doens't really care about any network settings. So passing it to docker shim's hostconfig also doens;t make sense here.
DNS for Windows containers will be set by CNI plugins. And this change also introduced two endpoints for sandbox container. So this PR reverts #63905 .
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
The PR should also be cherry-picked to release-1.11.
Also, https://github.com/kubernetes/kubernetes/issues/66588 is opened to track the process of pushing this to CNI.
**Release note**:
```release-note
Revert #63905: Setup dns servers and search domains for Windows Pods. DNS for Windows containers will be set by CNI plugins.
```
/sig windows
/sig node
/kind bug
This allows kubelets to stop the necessary work when the context has
been canceled (e.g., connection closed), and not leaking a goroutine
and inotify watcher waiting indefinitely.
Automatic merge from submit-queue (batch tested with PRs 67177, 53042). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding unit tests to methods of pod's format
What this PR does / why we need it:
Add unit test cases, thank you!
Automatic merge from submit-queue (batch tested with PRs 66512, 66946, 66083). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet/cm/cpumanager: Fix unused variable "skipIfPermissionsError"
The variable "skipIfPermissionsError" is not needed even when
permission error happened.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
clean up useless variables in deviceplugin/types.go
**What this PR does / why we need it**:
some variables is useless for reasons, I think we need a clean up.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```NONE
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
error out empty hostname
**What this PR does / why we need it**:
For linux, the hostname is read from file `/proc/sys/kernel/hostname` directly, which can be overwritten with whitespaces.
Should error out such invalid hostnames.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixeskubernetes/kubeadm#835
**Special notes for your reviewer**:
/cc luxas timothysc
**Release note**:
```release-note
nodes: improve handling of erroneous host names
```
Automatic merge from submit-queue (batch tested with PRs 62901, 66562, 66938, 66927, 66926). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: volumemanager: poll immediate when waiting for volume attachment
Currently, `WaitForAttachAndMount()` introduces a 300ms minimum delay by using `wait.Poll()` rather than `wait.PollImmediate()`. This wait constitutes >99% of the total processing time for `syncPod()`. Changing this reduced `syncPod()` processing time for a simple busybox pod with one emptyDir volume from 302ms to 2ms.
@derekwaynecarr @pmorie @smarterclayton @jsafrane
/sig node
/release-note-none
Automatic merge from submit-queue (batch tested with PRs 66190, 66871, 66617, 66293, 66891). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Do not set cgroup parent when --cgroups-per-qos is disabled
When --cgroups-per-qos=false (default is true), kubelet sets pod
container management to podContainerManagerNoop implementation and
GetPodContainerName() returns '/' as cgroup parent (default cgroup root).
(1) In case of 'systemd' cgroup driver, '/' is invalid parent as
docker daemon expects '.slice' suffix and throws this error:
'cgroup-parent for systemd cgroup should be a valid slice named as \"xxx.slice\"'
(5fc12449d8/daemon/daemon_unix.go (L618))
'/' corresponds to '-.slice' (root slice) in systemd but I don't think
we want to assign root slice instead of runtime specific default value.
In case of docker runtime, this will be 'system.slice'
(e2593239d9/daemon/oci_linux.go (L698))
(2) In case of 'cgroupfs' cgroup driver, '/' is valid parent but I don't
think we want to assign root instead of runtime specific default value.
In case of docker runtime, this will be '/docker'
(e2593239d9/daemon/oci_linux.go (L695))
Current fix will not set the cgroup parent when --cgroups-per-qos is disabled.
```release-note
Fix pod launch by kubelet when --cgroups-per-qos=false and --cgroup-driver="systemd"
```
Automatic merge from submit-queue (batch tested with PRs 66190, 66871, 66617, 66293, 66891). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix nil pointer dereference in node_container_manager#enforceExisting
**What this PR does / why we need it**:
fix nil pointer dereference in node_container_manager#enforceExisting
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66189
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
kubelet: fix nil pointer dereference while enforce-node-allocatable flag is not config properly
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Skip checking when failSwapOn=false
**What this PR does / why we need it**:
Skip checking when failSwapOn=false
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66850, 66902, 66779, 66864, 66912). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
add methods to apimachinery to easy unit testing
When unit testing, you often want a selective scheme and codec factory. Rather than writing the vars and the init function and the error handling, you can simply do
`scheme, codecs := testing.SchemeForInstallOrDie(install.Install)`
@kubernetes/sig-api-machinery-misc
@sttts
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65730, 66615, 66684, 66519, 66510). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubelet: add image-gc low/high validation check
Currently, there is no protection against setting the high watermark <= the low watermark for image GC
This PR adds a validation rule for that.
@smarterclayton
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
move feature gate checks inside IsCriticalPod
Currently `IsCriticalPod()` calls throughout the code are protected by `utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation)`.
However, with Pod Priority, this gate could be disabled which skips the priority check inside IsCriticalPod().
This PR moves the feature gate checking inside `IsCriticalPod()` and handles both situations properly.
@aveshagarwal @ravisantoshgudimetla @derekwaynecarr
/sig node
/sig scheduling
/king bug
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
delete unused events
**What this PR does / why we need it**:
events (HostNetworkNotSupported, UndefinedShaper) is unused since #47058
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 66623, 66718). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
cpumanager: validate topology in static policy
**What this PR does / why we need it**:
This patch adds a check for the static policy state validation. The check fails if the CPU topology obtained from cadvisor doesn't match with the current topology in the state file.
If the CPU topology has changed in a node, cpumanager static policy might try to assign non-present cores to containers.
For example in my test case, static policy had the default CPU set of `0-1,4-7`. Then kubelet was shut down and CPU 7 was offlined. After restarting the kubelet, CPU manager tries to assign the non-existent CPU 7 to containers which don't have exclusive allocations assigned to them:
Error response from daemon: Requested CPUs are not available - requested 0-1,4-7, available: 0-6)
This breaks the exclusivity, since the CPUs from the shared pool don't get assigned to non-exclusive containers, meaning that they can execute on the exclusive CPUs.
**Release note**:
```release-note
Added CPU Manager state validation in case of changed CPU topology.
```
Test the cases where the number of CPUs available in the system is
smaller or larger than the number of CPUs known in the state, which
should lead to a panic. This covers both CPU onlining and offlining. The
case where the number of CPUs matches is already covered by the
"non-corrupted state" test.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move the` k8s.io/kubernetes/pkg/util/pointer` package to` k8s.io/utils/pointer`
**What this PR does / why we need it**:
Move `k8s.io/kubernetes/pkg/util/pointer` to `shared utils` directory, so that we can use it easily.
Close#66010 accidentally, and can't reopen it, so the same as #66010
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This patch adds a check for the static policy state validation. The
check fails if the CPU topology obtained from cadvisor doesn't match
with the current topology in the state file.
If the CPU topology has changed in a node, cpu manager static policy
might try to assign non-present cores to containers.
For example in my test case, static policy had the default CPU set of
0-1,4-7. Then kubelet was shut down and CPU 7 was offlined. After
restarting the kubelet, CPU manager tries to assign the non-existent CPU
7 to containers which don't have exclusive allocations assigned to them:
Error response from daemon: Requested CPUs are not available - requested 0-1,4-7, available: 0-6)
This breaks the exclusivity, since the CPUs from the shared pool don't
get assigned to non-exclusive containers, meaning that they can execute
on the exclusive CPUs.
the caching layer on endpoint is redundant.
Here are the 3 related objects in picture:
devicemanager <-> endpoint <-> plugin
Plugin is the source of truth for devices
and device health status.
devicemanager maintain healthyDevices,
unhealthyDevices, allocatedDevices based on updates
from plugin.
So there is no point for endpoint caching devices,
this patch is removing this caching layer on endpoint,
Also removing the Manager.Devices() since i didn't
find any caller of this other than test, i am adding a
notification channel to facilitate testing,
If we need to get all devices from manager in future,
it just need to return healthyDevices + unhealthyDevices,
we don't have to call endpoint after all.
This patch makes code more readable, data model been simplified.
Automatic merge from submit-queue (batch tested with PRs 66593, 66727, 66558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove the outdate comments in tryRegisterWithAPIServer
**What this PR does / why we need it**:
some judgement about ExternalID removed in #61877, so remove the outdate comments in tryRegisterWithAPIServer
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 58755, 66414). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use probe based plugin watcher mechanism in Device Manager
**What this PR does / why we need it**:
Uses this probe based utility in the device plugin manager.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56944
**Notes For Reviewers**:
Changes are backward compatible and existing device plugins will continue to work. At the same time, any new plugins that has required support for probing model (Identity service implementation), will also work.
**Release note**
```release-note
Add support kubelet plugin watcher in device manager.
```
/sig node
/area hw-accelerators
/cc /cc @jiayingz @RenaudWasTaken @vishh @ScorpioCPH @sjenning @derekwaynecarr @jeremyeder @lichuqiang @tengqm @saad-ali @chakri-nelluri @ConnorDoyle
Automatic merge from submit-queue (batch tested with PRs 66665, 66707, 66596). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix kubelet npe panic on device plugin return zero container
Signed-off-by: bingshen.wbs <bingshen.wbs@alibaba-inc.com>
**What this PR does / why we need it**:
Fix kubelet panic when device plugin return zero containers. Panic logs like follows:
```
Jul 17 12:50:24 iZwz9bqgzuo4i8qu435zk8Z kubelet[25815]: /workspace/anago-v1.10.4-beta.0.68+5ca598b4ba5abb/src/k8s.io/kubernetes/_output/dockerized/go/src/
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
Jul 17 12:50:24 iZwz9bqgzuo4i8qu435zk8Z kubelet[25815]: /workspace/anago-v1.10.4-beta.0.68+5ca598b4ba5abb/src/k8s.io/kubernetes/_output/dockerized/go/src/
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
Jul 17 12:50:24 iZwz9bqgzuo4i8qu435zk8Z kubelet[25815]: /workspace/anago-v1.10.4-beta.0.68+5ca598b4ba5abb/src/k8s.io/kubernetes/_output/dockerized/go/src/
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
Jul 17 12:50:24 iZwz9bqgzuo4i8qu435zk8Z kubelet[25815]: E0717 12:50:24.726856 25815 runtime.go:66] Observed a panic: "index out of range" (runtime error
: index out of range)
```
**Release note**:
```
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Taint node when initializing node.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#63897
**Release note**:
```release-note
If `TaintNodesByCondition` enabled, taint node with `TaintNodeUnschedulable` when
initializing node to avoid race condition.
```