Commit Graph

247 Commits

Author SHA1 Message Date
Jiaying Zhang
35efc4f96a Reconcile extended resource capacity after kubelet restart. 2018-06-05 14:38:49 -07:00
Hemant Kumar
1f9404dfc0 Implement kubelet side changes for writing volume limit to node
Add tests for checking node limits
2018-06-01 19:17:30 -04:00
Michael Taufen
0539086ff3 add a flag to control the cap on images reported in node status
While I normally try to avoid adding flags, this is a short term
scalability fix for v1.11, and there are other long-term solutions in
the works, so we shouldn't commit to this in the v1beta1 Kubelet config.
Flags are our escape hatch.
2018-05-30 12:54:30 -07:00
xuzhonghu
9492cf368e move oldNodeUnschedulable pkg var to kubelet struct 2018-05-30 14:09:13 +08:00
Kubernetes Submit Queue
792832bafc
Merge pull request #62242 from feiskyer/pod-cidr
Automatic merge from submit-queue (batch tested with PRs 63314, 63884, 63799, 63521, 62242). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Check CIDR before updating node status

**What this PR does / why we need it**:

Check CIDR before updating node status.  See #62164.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #62164

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-05-15 19:55:19 -07:00
Kubernetes Submit Queue
8220171d8a
Merge pull request #63492 from liggitt/node-heartbeat-close-connections
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

track/close kubelet->API connections on heartbeat failure

xref #48638
xref https://github.com/kubernetes-incubator/kube-aws/issues/598

we're already typically tracking kubelet -> API connections and have the ability to force close them as part of client cert rotation. if we do that tracking unconditionally, we gain the ability to also force close connections on heartbeat failure as well. it's a big hammer (means reestablishing pod watches, etc), but so is having all your pods evicted because you didn't heartbeat.

this intentionally does minimal refactoring/extraction of the cert connection tracking transport in case we want to backport this

* first commit unconditionally sets up the connection-tracking dialer, and moves all the cert management logic inside an if-block that gets skipped if no certificate manager is provided (view with whitespace ignored to see what actually changed)
* second commit plumbs the connection-closing function to the heartbeat loop and calls it on repeated failures

follow-ups:
* consider backporting this to 1.10, 1.9, 1.8
* refactor the connection managing dialer to not be so tightly bound to the client certificate management

/sig node
/sig api-machinery

```release-note
kubelet: fix hangs in updating Node status after network interruptions/changes between the kubelet and API server
```
2018-05-14 16:56:35 -07:00
Jordan Liggitt
814b065928
Close all kubelet->API connections on heartbeat failure 2018-05-07 15:06:31 -04:00
Micah Hausler
1a218aaee2 Report node DNS info with --node-ip
```release-note
Report node DNS info with --node-ip flag
```
2018-04-27 13:18:40 -07:00
Pengfei Ni
335d70a6d1 Check CIDR before updating node status 2018-04-27 11:07:48 +08:00
Kubernetes Submit Queue
5b77996433
Merge pull request #62543 from ingvagabund/timeout-on-cloud-provider-request
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Timeout on instances.NodeAddresses cloud provider request

**What this PR does / why we need it**:

In cases the cloud provider does not respond before the node gets evicted.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:


**Release note**:
```release-note
stop kubelet to cloud provider integration potentially wedging kubelet sync loop
```
2018-04-23 09:12:42 -07:00
Jan Chaloupka
61efc29394 Timeout on instances.NodeAddresses cloud provider request 2018-04-23 13:28:43 +02:00
Kubernetes Submit Queue
48243a9c24
Merge pull request #62780 from RenaudWasTaken/master
Automatic merge from submit-queue (batch tested with PRs 62780, 62886). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Change Capacity log verbosity in status update

*What this PR does / why we need it:*

While in production we noticed that the log verbosity for the Capacity field in the node status was to high.
This log message is called for every device plugin resource at every update.

A proposed solution is to tune it down from V(2) to V(5). In a normal setting you'll be able to see the effect by looking at the node status.

Release note:
```
NONE
```

/sig node
/area hw-accelerators
/assign @vikaschoudhary16 @jiayingz @vishh
2018-04-20 20:06:10 -07:00
Renaud Gaubert
7297dd33bb Change Capacity log verbosity in node status update 2018-04-20 16:11:02 +02:00
Mike Danese
d02cf10123 remove last usage of external ID 2018-04-18 09:54:56 -07:00
Kubernetes Submit Queue
09ec7bf548
Merge pull request #60692 from adnavare/bug/60466
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Cleanup the use of ExternalID as it is deprecated

The patch removes ExternalID usage from node_controller
and node_lifecycle_oontroller. The code instead uses InstanceID
which returns the cloud provider ID as well.

fixes #60466
2018-04-09 11:58:12 -07:00
Kubernetes Submit Queue
1d030799e3
Merge pull request #61183 from ingvagabund/node-status-be-more-verbose
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Node status be more verbose

**What this PR does / why we need it**:
Improve logging ability of node status so it is easier to debug update of a node status

```release-note
NONE
```
2018-04-06 19:25:19 -07:00
Rohit Agarwal
87dda3375b Delete in-tree support for NVIDIA GPUs.
This removes the alpha Accelerators feature gate which was deprecated in 1.10.
The alternative feature DevicePlugins went beta in 1.10.
2018-04-02 20:17:01 -07:00
Anup Navare
1335e6e2d4 Cleanup the use of ExternalID as it is deprecated
The patch removes ExternalID usage from node_controller
and node_lifecycle_oontroller. The code instead uses InstanceID
which returns the cloud provider ID as well.
2018-04-02 10:15:32 -07:00
Jan Chaloupka
6d820d5a66 Node status be more verbose 2018-03-14 17:02:28 +01:00
Jing Xu
b2e744c620 Promote LocalStorageCapacityIsolation feature to beta
The LocalStorageCapacityIsolation feature added a new resource type
ResourceEphemeralStorage "ephemeral-storage" so that this resource can
be allocated, limited, and consumed as the same way as CPU/memory. All
the features related to resource management (resource request/limit, quota, limitrange) are avaiable for local ephemeral storage.

This local ephemeral storage represents the storage for root file system, which will be consumed by containers' writtable layer and logs. Some volumes such as emptyDir might also consume this storage.
2018-03-02 15:10:08 -08:00
Yang Guo
8d880506fe Support cluster-level extended resources in kubelet and kube-scheduler
Co-authored-by: Yang Guo <ygg@google.com>
Co-authored-by: Chun Chen <chenchun.feed@gmail.com>
2018-02-27 17:25:30 -08:00
wackxu
f737ad62ed update import 2018-02-27 20:23:35 +08:00
Kubernetes Submit Queue
244549f02a
Merge pull request #59769 from dashpole/capacity_ephemeral_storage
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Collect ephemeral storage capacity on initialization

**What this PR does / why we need it**:
We have had some node e2e flakes where a pod can be rejected if it requests ephemeral storage.  This is because we don't set capacity and allocatable for ephemeral storage on initialization.
This PR causes cAdvisor to do one round of stats collection during initialization, which will allow it to get the disk capacity when it first sets the node status.
It also sets the node to NotReady if capacities have not been initialized yet.

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
/assign @jingxu97 @Random-Liu 

/sig node
/kind bug
/priority important-soon
2018-02-16 11:17:02 -08:00
Kubernetes Submit Queue
eac5bc0035
Merge pull request #57136 from k82cn/k8s_54313
Automatic merge from submit-queue (batch tested with PRs 57136, 59920). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Updated PID pressure node condition.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
part of #54313 

**Release note**:

```release-note
Updated PID pressure node condition
```
2018-02-16 10:35:33 -08:00
David Ashpole
b259543985 collect ephemeral storage capacity on initialization 2018-02-15 17:33:22 -08:00
Walter Fender
e18e8ec3c0 Add context to all relevant cloud APIs
This adds context to all the relevant cloud provider interface signatures.
Callers of those APIs are currently satisfied using context.TODO().
There will be follow on PRs to push the context through the stack.
For an idea of the full scope of this change please look at PR #58532.
2018-02-06 12:49:17 -08:00
Da K. Ma
9a78753144 Updated PID pressure node condition.
Signed-off-by: Da K. Ma <madaxa@cn.ibm.com>
2018-01-14 18:26:00 +08:00
Kubernetes Submit Queue
f2e46a2147
Merge pull request #57266 from vikaschoudhary16/unhealthy_device
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Handle Unhealthy devices

Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.



**What this PR does / why we need it**:
Currently node capacity only reflects healthy devices. Unhealthy devices are ignored totally while updating node status. This PR handles unhealthy devices while updating node status. 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #57241

**Special notes for your reviewer**:

**Release note**:
<!--  Write your release note:
Handle Unhealthy devices

```release-note
Handle Unhealthy devices
```
/cc @tengqm @ConnorDoyle @jiayingz @vishh @jeremyeder @sjenning @resouer @ScorpioCPH @lichuqiang @RenaudWasTaken @balajismaniam 

/sig node
2018-01-12 19:55:54 -08:00
vikaschoudhary16
e9cf3f1ac4 Handle Unhealthy devices
Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.
2018-01-09 11:38:48 -05:00
Jonathan Basseri
30b89d830b Move scheduler code out of plugin directory.
This moves plugin/pkg/scheduler to pkg/scheduler and
plugin/cmd/kube-scheduler to cmd/kube-scheduler.

Bulk of the work was done with gomvpkg, except for kube-scheduler main
package.
2018-01-05 15:05:01 -08:00
Kubernetes Submit Queue
27d2ffb32f
Merge pull request #49856 from dixudx/polish_UpdateNodeStatus
Automatic merge from submit-queue (batch tested with PRs 49856, 56257, 57027, 57695, 57432). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Change to pkg/util/node.UpdateNodeStatus

**What this PR does / why we need it**:

> // TODO: Change to pkg/util/node.UpdateNodeStatus.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:
/cc @brendandburns @dchen1107 @lavalamp 

**Release note**:

```release-note
None
```
2018-01-02 13:15:42 -08:00
stewart-yu
50520be649 completely remove the option to use auto-detect 2017-11-28 09:54:28 +08:00
Jiaying Zhang
1eb4e79453 Extends deviceplugin to gracefully handle full device plugin lifecycle.
- Instead of using cm.capacity field to communicate device plugin resource
capacity, this PR changes to use an explicit cm.GetDevicePluginResourceCapacity()
function that returns device plugin resource capacity as well as any inactive
device plugin resource. Kubelet syncNodeStatus call this function during its
periodic run to update node status capacity and allocatable. After this call,
device plugin can remove the inactive device plugin resource from its allDevices
field as the update is already pushed to API server.
- Extends device plugin checkpoint data to record registered resources
so that we can finish resource removing even upon kubelet restarts.
- Passes sourcesReady from kubelet to device plugin to avoid removing
inactive pods during grace period of kubelet restart.
2017-11-20 23:40:14 -08:00
Michael Taufen
523c68ff65 Move ungated 'alpha' KubeletConfiguration fields and self-registration fields to KubeletFlags 2017-11-15 17:47:10 -08:00
Daneyon Hansen
7ac6fe9c5d Adds Support for Node Resource IPv6 Addressing
Adds support for the following:

1. A node resource to be assigned an IPv6 address.
2. Expands IPv4/v6 address validation checks.

Which issue this PR fixes:
fixes #44848 in combination with PR #45116

Special notes for your reviewer:

Release note:
With this PR, nodes can be assigned an IPv6 address. An IPv4 address is
preferred over an IPv6 address. IP address validation has been expanded
to check for multicast, link-local and unspecified addresses.
2017-11-10 15:13:53 -08:00
Dr. Stefan Schimanski
012b085ac8 pkg/apis/core: mechanical import fixes in dependencies 2017-11-09 12:14:08 +01:00
Dr. Stefan Schimanski
d13b936a2a pkg/apis/core: fixup conversion func names in dependencies 2017-11-09 12:14:07 +01:00
Di Xu
13a355c837 refactor method to pkg/util/node 2017-11-06 09:51:09 +08:00
fisherxu
04b876e63c fix panic in kubelet 2017-11-01 17:06:17 +08:00
Kevin
4c8539cece use core client with explicit version globally 2017-10-27 15:48:32 +08:00
Jordan Liggitt
9df1f7ef11
Do not remove kubelet labels during startup 2017-10-17 11:49:02 -04:00
Michael Taufen
8180536bed Mulligan: Remove deprecated and experimental fields from KubeletConfiguration
Revert "Merge pull request #51857 from kubernetes/revert-51307-kc-type-refactor"

This reverts commit 9d27d92420, reversing
changes made to 2e69d4e625.

See original: #51307

We punted this from 1.8 so it could go through an API review. The point
of this PR is that we are trying to stabilize the kubeletconfig API so
that we can move it out of alpha, and unblock features like Dynamic
Kubelet Config, Kubelet loading its initial config from a file instead
of flags, kubeadm and other install tools having a versioned API to rely
on, etc.

We shouldn't rev the version without both removing all the deprecated
junk from the KubeletConfiguration struct, and without (at least
temporarily) removing all of the fields that have "Experimental" in
their names. It wouldn't make sense to lock in to deprecated fields.
"Experimental" fields can be audited on a 1-by-1 basis after this PR,
and if found to be stable (or sufficiently alpha-gated), can be restored
to the KubeletConfiguration without the "Experimental" prefix.
2017-10-11 09:52:39 -07:00
Dr. Stefan Schimanski
ed586da147 apimachinery: remove Scheme.DeepCopy 2017-10-06 14:59:17 +02:00
Jiaying Zhang
6fecd04924 Fixes a regression introduced by PR 52290 that extended resource
capacity may temporarily drop to zero after kubelet restarts and
PODs restarted during that time window could fail to be scheduled.
2017-10-03 10:26:53 -07:00
Kubernetes Submit Queue
2c5413b379 Merge pull request #50422 from karataliu/apid
Automatic merge from submit-queue (batch tested with PRs 50294, 50422, 51757, 52379, 52014). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Fix AnnotationProvidedIPAddr annotation for externalCloudProvider

**What this PR does / why we need it**:
In #44258, it introduced `AnnotationProvidedIPAddr`. When kubelet has 'node-ip' parameter set, and cloud provider not set, this annotation would be populated, and then will be validated by cloud-controller-manager:
https://github.com/kubernetes/kubernetes/pull/44258/files#diff-6b0808bd1afb15f9f77986f4459601c2R465

Later with #47152, externalCloudProvider is checked and func returns before that annotation got set. In this case, that annotation will not get populated.

This fix is to bring that annotation assignment to a proper location.

Please correct me if I have any misunderstanding.
@wlan0 @ublubu 

**Which issue this PR fixes**

**Special notes for your reviewer**:

**Release note**:
2017-09-23 11:40:47 -07:00
Kubernetes Submit Queue
3277de69b4 Merge pull request #52176 from liggitt/heartbeat-timeout
Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Eliminate hangs/throttling of node heartbeat

Fixes https://github.com/kubernetes/kubernetes/issues/48638
Fixes #50304

Stops kubelet from wedging when updating node status if unable to establish tcp connection.

 Notes that this only affects the node status loop. The pod sync loop would still hang until the dead TCP connections timed out,  so more work is needed to keep the sync loop responsive in the face of network issues, but this change lets existing pods coast without the node controller trying to evict them

```release-note
kubelet to master communication when doing node status updates now has a timeout to prevent indefinite hangs
```
2017-09-16 09:45:29 -07:00
Jiaying Zhang
5cac9fc984 Fixes device plugin re-registration handling logic to make sure:
- If a device plugin exits, its exported resource will be removed.
- No capacity change if a new device plugin instance comes up to replace the old instance.
2017-09-14 15:24:46 -07:00
Jordan Liggitt
f8f57d8959
Use separate client for node status loop 2017-09-14 15:56:22 -04:00
Kubernetes Submit Queue
a51eb2ac4e Merge pull request #49202 from cbonte/node-addresses
Automatic merge from submit-queue (batch tested with PRs 51728, 49202)

Fix setNodeAddress when a node IP and a cloud provider are set

**What this PR does / why we need it**:
When a node IP is set and a cloud provider returns the same address with
several types, only the first address was accepted. With the changes made
in PR #45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.

The behaviour is modified to return all the address types for the
specified node IP.

**Which issue this PR fixes**: fixes #48760

**Special notes for your reviewer**:
* I'm not a golang expert, is it possible to mock `kubelet.validateNodeIP()` to avoid the need of real host interface addresses in the test ?
* It would be great to have it backported for a next 1.6.8 release.

**Release note**:
```release-note
NONE
```
2017-09-06 20:01:00 -07:00
Derek Carr
1ec2a69d9a Kubelet changes to support hugepages 2017-09-05 09:46:08 -04:00
Jiaying Zhang
02001af752 Kubelet side extension to support device allocation 2017-09-01 11:56:35 -07:00
Renaud Gaubert
c4a1c97329 Device Plugin Kubelet integration 2017-09-01 11:47:09 -07:00
Cyril Bonté
2b2a5c6500 Fix setNodeAddress when a node IP and a cloud provider are set
When a node IP is set and a cloud provider returns the same address with
several types, on the first address was accepted. With the changes made
in PR #45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.

The behaviour is modified to return all the address types for the
specified node IP.

Issue #48760
2017-08-29 17:09:25 +02:00
Kubernetes Submit Queue
c17d70c240 Merge pull request #47044 from kubermatic/kubelet-update-default-labels
Automatic merge from submit-queue

Always check if default labels on node need to be updated in kubelet

**What this PR does / why we need it**:
Nodes join again but maybe OS/Arch/Instance-Type has changed in the meantime.
In this case the kubelet needs to check if the default labels are still correct and if not it needs to update them.

```release-note
Kubelet updates default labels if those are deprecated
```
2017-08-28 08:20:19 -07:00
NickrenREN
27901ad5df Change eviction policy to manage one single local storage resource 2017-08-26 05:14:49 +08:00
Henrik Schmidt
80156474cf Always check if default labels on node need to be updated in kubelet 2017-08-22 12:54:07 +02:00
Connor Doyle
630af5422b OIR predicate includes namespaced resources. 2017-08-16 15:29:24 -07:00
Dong Liu
c52bdc8e74 Fix AnnotationProvidedIPAddr for externalCloudProvider 2017-08-10 10:49:55 +08:00
Kubernetes Submit Queue
58819b0204 Merge pull request #47416 from allencloud/simplify-if-else
Automatic merge from submit-queue

simplify if and else for code

Signed-off-by: allencloud <allen.sun@daocloud.io>

**What this PR does / why we need it**:
This PR tries to simplify the code of if and else, and this could make code a little bit cleaner.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
NONE

**Special notes for your reviewer**:
NONE

**Release note**:

```release-note
NONE
```
2017-08-05 03:10:10 -07:00
David Ashpole
177d64213c fix outofdisk condition not reported 2017-08-03 13:44:31 -07:00
David Ashpole
8a518099ca set nodeOODCondition 2017-07-31 11:38:20 -07:00
allencloud
6300361961 simplify if and else for code
Signed-off-by: allencloud <allen.sun@daocloud.io>
2017-07-26 10:41:23 +08:00
David Ashpole
7a23f8b018 remove deprecated flags LowDiskSpaceThresholdMB and OutOfDiskTransitionFrequency 2017-07-20 13:23:13 -07:00
Klaus Ma
63b78a37e0 Added golint check for pkg/kubelet. 2017-07-19 11:33:06 +08:00
Tim Allclair
a2f2e1d491 Name change: s/timstclair/tallclair/ 2017-07-10 14:05:46 -07:00
Vishnu kannan
82f7820066 Kubelet:
Centralize Capacity discovery of standard resources in Container manager.
Have storage derive node capacity from container manager.
Move certain cAdvisor interfaces to the cAdvisor package in the process.

This patch fixes a bug in container manager where it was writing to a map without synchronization.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-06-27 18:45:02 -07:00
Kubernetes Submit Queue
7800b3ffef Merge pull request #47152 from ublubu/cloud-addresses
Automatic merge from submit-queue

kubelet should let cloud-controller-manager set the node addresses

*Before this change:*

1. cloud-controller-manager sets all the addresses for a node.
2. kubelet on that node replaces these addresses with an incomplete set. (i.e. replace InternalIP and Hostname and delete all other addresses--ExternalIP, etc.)

*After this change:*

kubelet doesn't touch its node's addresses when there is an external cloudprovider.

Fixes #47155

```release-note
NONE
```
2017-06-24 09:31:15 -07:00
Chao Xu
f4989a45a5 run root-rewrite-v1-..., compile 2017-06-22 10:25:57 -07:00
Cheng Xing
de3bf36b61 Fixing node statuses related to local storage capacity isolation.
- Wrapping all node statuses from local storage capacity isolation under an alpha feature check. Currently there should not be any storage statuses.
- Replaced all "storage" statuses with "storage.kubernetes.io/scratch". "storage" should never be exposed as a status.
2017-06-20 17:34:59 -07:00
ublubu
46465c0a5a Kubelet doesn't override addrs from Cloud provider 2017-06-07 22:27:18 -04:00
Kubernetes Submit Queue
b8c9ee8abb Merge pull request #46456 from jingxu97/May/allocatable
Automatic merge from submit-queue

Add local storage (scratch space) allocatable support

This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.

This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306

**Release note**:

```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable 
```
2017-06-03 00:24:29 -07:00
Jing Xu
dd67e96c01 Add local storage (scratch space) allocatable support
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
2017-06-01 15:57:50 -07:00
deads2k
954eb3ceb9 move labels to components which own the APIs 2017-05-31 10:32:06 -04:00
Derek Carr
8aaaca0f69 kubelet was sending negative allocatable values 2017-05-26 13:01:24 -04:00
Hemant Kumar
951a36aac7 Add Keepterminatedpodvolumes as a annotation on node
and lets make sure that controller respects it
and doesn't detaches mounted volumes.
2017-05-11 22:31:14 -04:00
wlan0
45d2bc06b7 cloud initialize node in external cloud controller 2017-05-05 16:51:45 -07:00
NickrenREN
7d00e5cfb6 remove deprecated NodeLegacyHostIP 2017-04-24 11:01:25 +08:00
Chao Xu
d4850b6c2b move pkg/api/v1/helpers.go to subpackage 2017-04-14 14:25:11 -07:00
Kubernetes Submit Queue
b0a05b4597 Merge pull request #42474 from k82cn/rm_empty_line_kl
Automatic merge from submit-queue

Removed un-necessary empty line.
2017-04-14 07:23:11 -07:00
Piotr Szczesniak
9bd05bdee4 Setup fluentd-ds-ready label in startup script not in kubelet 2017-03-16 13:18:31 +01:00
Connor Doyle
364dbc0ca5 Revert "Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available.""
- This reverts commit 60758f3fff.
- Disabled opaque integer resource end-to-end tests.
2017-03-06 17:48:09 -08:00
Dawn Chen
60758f3fff Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available." 2017-03-06 14:27:17 -08:00
Connor Doyle
8a42189690 Fix unbounded growth of cached OIRs in sched cache
- Added schedulercache.Resource.SetOpaque helper.
- Amend kubelet allocatable sync so that when OIRs are removed from capacity
  they are also removed from allocatable.
- Fixes #41861.
2017-03-04 09:26:22 -08:00
Klaus Ma
41c4426a30 Removed un-necessary empty line. 2017-03-03 19:43:48 +08:00
Vishnu kannan
3b0a408e3b improve gpu integration
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 11:27:53 -08:00
Hui-Zhi
57c77ffbdd Add support for multiple nvidia gpus 2017-02-28 11:24:48 -08:00
Vishnu Kannan
cc5f5474d5 add support for node allocatable phase 2 to kubelet
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-02-27 21:24:44 -08:00
Avesh Agarwal
9b640838a5 Change taint/toleration annotations to api fields. 2017-02-22 09:27:42 -05:00
Klaus Ma
cc26fe6ee9 Removed unnecessarly empty line. 2017-02-06 11:10:34 +08:00
Dr. Stefan Schimanski
bc6fdd925d pkg/api/resource: move to apimachinery 2017-01-29 21:41:44 +01:00
Wojciech Tyczynski
2d0fe16463 Minor cleanup in getting from apiserver cache in kubelet 2017-01-27 15:36:37 +01:00
Mike Bryant
a507777de8 Change logging function to formatting version 2017-01-20 11:24:05 +00:00
Jordan Liggitt
e49554501f
Use versioned Taint/Toleration/AllowPods objects when marshalling 2017-01-18 12:52:14 -05:00
Clayton Coleman
9a2a50cda7
refactor: use metav1.ObjectMeta in other types 2017-01-17 16:17:19 -05:00
deads2k
77b4d55982 mechanical 2017-01-16 09:35:12 -05:00
Kubernetes Submit Queue
e73e749459 Merge pull request #39679 from errows/fix_sucessfully_typos
Automatic merge from submit-queue (batch tested with PRs 39417, 39679)

Fix 2 `sucessfully` typos

**What this PR does / why we need it**: Only fixes two typos in comments/logging

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-01-14 19:51:09 -08:00
deads2k
6a4d5cd7cc start the apimachinery repo 2017-01-11 09:09:48 -05:00
Marc-Antoine Ross
c01c7023a0 Fix 2 sucessfully typos 2017-01-10 10:37:45 -05:00
NickrenREN
fd336cde07 drop SetNodeStatus() function and fix some function notes words
drop SetNodeStatus() Since it is never called now. klet.defaultNodeStatusFuncs() is set to klet.setNodeStatusFuncs now and setNodeStatus() function is called by other functions.
2017-01-03 11:26:50 +08:00
Kubernetes Submit Queue
87444522d0 Merge pull request #32088 from piosz/fluentd-daemon-set
Automatic merge from submit-queue

Migrated fluentd addon to daemon set

fix #23224
supersedes #23306 

``` release-note
Migrated fluentd addon to daemon set
```
2016-12-15 23:04:40 -08:00
Piotr Szczesniak
c00e57789d Added upgrade story from manifest pod to ds 2016-12-15 13:48:32 +01:00
Wojciech Tyczynski
6e336bfab6 Use Get from cache in apiserver in kubelet 2016-12-13 17:14:56 +01:00
Seth Jennings
a40b15d8bd error in setNodeStatus func should not abort node status update 2016-12-12 09:29:24 -06:00
Kubernetes Submit Queue
43233caaf0 Merge pull request #37871 from Random-Liu/use-patch-in-kubelet
Automatic merge from submit-queue (batch tested with PRs 36692, 37871)

Use PatchStatus to update node status in kubelet.

Fixes https://github.com/kubernetes/kubernetes/issues/37771.

This PR changes kubelet to update node status with `PatchStatus`.

@caesarxuchao @ymqytw told me that there is a limitation of current `CreateTwoWayMergePatch`, it doesn't support primitive type slice which uses strategic merge.
* I checked the node status, the only primitive type slices in NodeStatus are as follows, they are not using strategic merge:
  * [`ContainerImage.Names`](https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2963)
  * [`VolumesInUse`](https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L2909)
* Volume package is already [using `CreateStrategicMergePath` to generate node status update patch](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/statusupdater/node_status_updater.go#L111), and till now everything is fine. 

@yujuhong @dchen1107 
/cc @kubernetes/sig-node
2016-12-09 11:29:11 -08:00
Wojciech Tyczynski
e8d1cba875 GetOptions in client calls 2016-12-09 09:42:01 +01:00
Kubernetes Submit Queue
362dc81d8e Merge pull request #37713 from yujuhong/log_notready_message
Automatic merge from submit-queue (batch tested with PRs 36736, 35956, 35655, 37713, 38316)

Log the condition when node becomes not ready
2016-12-08 19:51:58 -08:00
Random-Liu
beba1ebbf8 Use PatchStatus to update node status in kubelet. 2016-12-08 17:13:59 -08:00
Mike Danese
e225625a80 add a configuration for kubelet to register as a node with taints
and deprecate register-schedulable
2016-12-06 10:32:54 -08:00
Clayton Coleman
3454a8d52c
refactor: update bazel, codec, and gofmt 2016-12-03 19:10:53 -05:00
Clayton Coleman
5df8cc39c9
refactor: generated 2016-12-03 19:10:46 -05:00
Wojciech Tyczynski
54d49cb404 While updating NodeStatus, only first get served from cache 2016-12-01 11:07:37 +01:00
Yu-Ju Hong
b216080980 Log the condition when node becomes not ready
This improves debuggability.
2016-11-30 12:18:08 -08:00
Pengfei Ni
f584ed4398 Fix package aliases to follow golang convention 2016-11-30 15:40:50 +08:00
Chao Xu
5e1adf91df cmd/kubelet 2016-11-23 15:53:09 -08:00
Alexander Block
ffce5dbbf4 Fix setNodeAddress in combination with cloud providers
Actually update node.Status.Addresses when the host name was provided by
the cloud provider.
2016-11-07 14:34:34 +01:00
Kubernetes Submit Queue
182a09c3c7 Merge pull request #35526 from justinsb/fix_35521_b
Automatic merge from submit-queue

kubelet bootstrap: start hostNetwork pods before we have PodCIDR

Network readiness was checked in the pod admission phase, but pods that
fail admission are not retried.  Move the check to the pod start phase.

Issue #35409 
Issue #35521
2016-11-06 12:53:14 -08:00
Kubernetes Submit Queue
4b1e36f970 Merge pull request #36190 from dashpole/revert_node_inode_pressure_split
Automatic merge from submit-queue

We only report diskpressure to users, and no longer report inodepressure

See #36180 for more information on why #33218 was reverted.
2016-11-06 03:00:34 -08:00
David Ashpole
9aca40dee6 revert #33218. dont need #36180. We only use diskpressure 2016-11-04 08:29:27 -07:00
Justin Santa Barbara
ab6d938247 Don't add duplicate Hostname address
If the cloudprovider returned an address of type Hostname, we shouldn't
add a duplicate one.
2016-11-04 10:00:23 -04:00
Justin Santa Barbara
f8eb179c2d Create hostNetwork pods even if network plugin not ready
We do now admit pods (unlike the first attempt), but now we will stop
non-hostnetwork pods from starting if the network is not ready.

Issue #35409
2016-11-04 00:11:55 -04:00
Magnus Kulke
b7880e7cd8 Populate NodeHostName status. 2016-11-01 01:09:50 +01:00
Connor Doyle
c93646e8da Support opaque integer resource accounting.
- Prevents kubelet from overwriting capacity during sync.
- Handles opaque integer resources in the scheduler.
  - Adds scheduler predicate tests for opaque resources.
- Validates opaque int resources:
  - Ensures supplied opaque int quantities in node capacity,
    node allocatable, pod request and pod limit are integers.
  - Adds tests for new validation logic (node update and pod spec).
- Added e2e tests for opaque integer resources.
2016-10-28 10:15:13 -07:00
Lucas Käldström
1cf00d1ff1 Remove the function of --reconcile-cidr and deprecate it 2016-10-26 20:25:35 +03:00
Jing Xu
b02481708a Fix volume states out of sync problem after kubelet restarts
When kubelet restarts, all the information about the volumes will be
gone from actual/desired states. When update node status with mounted
volumes, the volume list might be empty although there are still volumes
are mounted and in turn causing master to detach those volumes since
they are not in the mounted volumes list. This fix is to make sure only
update mounted volumes list after reconciler starts sync states process.
This sync state process will scan the existing volume directories and
reconstruct actual states if they are missing.

This PR also fixes the problem during orphaned pods' directories. In
case of the pod directory is unmounted but has not yet deleted (e.g.,
interrupted with kubelet restarts), clean up routine will delete the
directory so that the pod directoriy could be cleaned up (it is safe to
delete directory since it is no longer mounted)

The third issue this PR fixes is that during reconstruct volume in
actual state, mounter could not be nil since it is required for creating
container.VolumeMap. If it is nil, it might cause nil pointer exception
in kubelet.

Details are in proposal PR #33203
2016-10-25 12:29:12 -07:00
Yu-Ju Hong
94f580ef03 Revert "bootstrap: Start hostNetwork pods even if network plugin not ready" 2016-10-25 08:38:59 -07:00
Kubernetes Submit Queue
3c84164bdf Merge pull request #33347 from justinsb/fix_32900
Automatic merge from submit-queue

bootstrap: Start hostNetwork pods even if network plugin not ready
2016-10-24 01:14:06 -07:00
Wojciech Tyczynski
ee73fcdadb Update kubelet_node_status.go 2016-10-22 08:44:25 +02:00
Wojciech Tyczynski
ad87989378 Kubelet getting node from apiserver cache before update. 2016-10-21 09:21:39 +02:00
Justin Santa Barbara
ad6d842a65 Create hostNetwork pods even if network plugin not ready 2016-10-17 10:12:14 -04:00
Lucas Käldström
348717c50a Remove the flannel experimental overlay 2016-10-04 11:53:53 +03:00
David Ashpole
fed3f37eef Split NodeDiskPressure into NodeInodePressure and NodeDiskPressure 2016-10-03 11:42:56 -07:00
Kubernetes Submit Queue
906cb1ce70 Merge pull request #33123 from kokhang/node-ip-cloud-provider
Automatic merge from submit-queue

Node-ip is not used when cloud provider is used

Currently --node-ip in kubelet is not being used when kubelet is configured with a cloud provider. With this fix, kubelet will get a list of IPs from the provider and parse it to return the one that matches node-ip.

This fixes #23568
2016-10-01 02:51:19 -07:00
Steve Leon
a9123de9b4 Moving validateNodeIP to kubelet_node_status.go 2016-09-30 14:07:13 -07:00
Justin Santa Barbara
54195d590f Use strongly-typed types.NodeName for a node name
We had another bug where we confused the hostname with the NodeName.

To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.

A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName

Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
2016-09-27 10:47:31 -04:00
Steve Leon
6efa1172f5 Node-ip is not used when cloud provider is used
This fixes #23568
2016-09-20 13:49:16 -07:00
Yu-Ju Hong
7ada99181c Limit the number of names per image reported in the node status 2016-09-16 15:16:08 -07:00
Paul Morie
67387632dc Update node status instead of node in kubelet 2016-09-02 16:24:39 -04:00
Kubernetes Submit Queue
4e1ff53bb2 Merge pull request #31730 from pmorie/kubelet-attach-detach-update
Automatic merge from submit-queue

Make it possible to enable controller-managed attach-detach on existing nodes

Fixes #31673.  Now, if a node already exists with the given name on Kubelet startup, the Kubelet will reconcile the value of the controller-managed-attach-detach annotation so that existing nodes can have this feature turned on and off by changing the Kubelet configuration.

cc @kubernetes/sig-storage @kubernetes/rh-cluster-infra
2016-09-01 07:31:18 -07:00
Paul Morie
1805d30b67 Reconcile value of controller-managed attach-detach annotation on existing nodes in Kubelet startup 2016-08-31 17:04:54 -04:00
Tim St. Clair
3808243b9e
Append "AppArmor enabled" to the Node ready condition message 2016-08-31 09:27:47 -07:00
Paul Morie
3b23b9ba9f Add log message in Kubelet when controller attach/detach is enabled 2016-08-26 12:28:37 -04:00
Paul Morie
b91ad76066 Kubelet code move: volume / util 2016-08-22 23:35:11 -04:00
Kubernetes Submit Queue
3dad8f7c06 Merge pull request #29907 from luxas/lookup_ip_better
Automatic merge from submit-queue

[kubelet] Auto-discover node IP if neither cloud provider exists and IP is not explicitly specified

One example where the earlier implementation failed is when running kubelet on CoreOS (bare-metal), where the nameserver is set to `8.8.8.8`. kubelet tries to lookup the node name agains Google DNS, which obviously fails. The kubelet won't recover after that.

The workaround hsa been to set `--hostname-override` to an IP address, but it's quite annoying to try to make a multi-distro way of getting the IP in bash for example. This way is much cleaner.

Refactored the function a little bit at the same time

@vishh @yujuhong @resouer @Random-Liu
2016-08-06 02:26:30 -07:00
Andrey Kurilin
9f1c3a4c56 Fix various typos in kubelet 2016-08-03 01:14:44 +03:00
Lucas Käldström
25d9779f06 Make the lookup function of the node ip address more robust 2016-08-02 14:03:20 +03:00
derekwaynecarr
c3324b88a0 Eviction manager observes and acts on disk pressure 2016-07-28 16:01:38 -04:00
Paul Morie
249da77371 Extract kubelet node status into separate file 2016-07-22 01:21:30 -04:00