Commit Graph

4281 Commits

Author SHA1 Message Date
Dan Williams
60525801c1 kubelet/network: move mock network plugin to pkg/kubelet/network/testing 2017-02-16 13:48:32 -06:00
Dan Williams
5633d7423a kubelet: add network plugin manager with per-pod operation locking
The PluginManager almost duplicates the network plugin interface, but
not quite since the Init() function should be called by whatever
actually finds and creates the network plugin instance.  Only then
does it get passed off to the PluginManager.

The Manager synchronizes pod-specific network operations like setup,
teardown, and pod network status.  It passes through all other
operations so that runtimes don't have to cache the network plugin
directly, but can use the PluginManager as a wrapper.
2017-02-16 13:48:32 -06:00
Kubernetes Submit Queue
3c606cdd20 Merge pull request #41456 from dashpole/pod_volume_cleanup
Automatic merge from submit-queue (batch tested with PRs 41466, 41456, 41550, 41238, 41416)

Delay Deletion of a Pod until volumes are cleaned up

#41436 fixed the bug that caused #41095 and #40239 to have to be reverted.  Now that the bug is fixed, this shouldn't cause problems.

 @vishh @derekwaynecarr @sjenning @jingxu97 @kubernetes/sig-storage-misc
2017-02-16 10:14:05 -08:00
Kubernetes Submit Queue
11bf535e03 Merge pull request #41434 from freehan/cri-kubenet-error
Automatic merge from submit-queue (batch tested with PRs 41531, 40417, 41434)

[CRI] beef up network teardown in  StopPodSandbox

1. Added CheckpointNotFound error to allow dockershim to conduct error handling
2. Retry network teardown if failed

ref: https://github.com/kubernetes/kubernetes/issues/41225
2017-02-15 23:01:09 -08:00
Kubernetes Submit Queue
ddf4a0cad5 Merge pull request #40417 from jsravn/fix-reconciler-external-updates-race
Automatic merge from submit-queue (batch tested with PRs 41531, 40417, 41434)

Always detach volumes in operator executor

**What this PR does / why we need it**:

Instead of marking a volume as detached immediately in Kubelet's
reconciler, delegate the marking asynchronously to the operator
executor. This is necessary to prevent race conditions with other
operations mutating the same volume state.

An example of one such problem:

1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
   operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler reaches detach volume section, detects volume is no longer in desired state of world,
   removes it from volumes in use
7. MountVolume tries to mark mount, throws an error because
   volume is no longer in actual state of world list. After this, kubelet isn't aware of the mount
   so doesn't try to unmount again.
8. controller-manager tries to detach the volume, this fails because it
   is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.



**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #32881, fixes ##37854 (maybe)

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-02-15 23:01:07 -08:00
Kubernetes Submit Queue
d60d8a7b92 Merge pull request #41104 from apprenda/kubeadm_client-go_move
Automatic merge from submit-queue

kubeadm: Migrate to client-go

**What this PR does / why we need it**: Finish the migration for kubeadm to use client-go wherever possible

**Which issue this PR fixes**: fixes #https://github.com/kubernetes/kubeadm/issues/52

**Special notes for your reviewer**: /cc @luxas @pires 

**Release note**:
```release-note
NONE
```
2017-02-15 16:01:22 -08:00
Kubernetes Submit Queue
a1afc024cb Merge pull request #34931 from nhlfr/cadvisor-container-info-table
Automatic merge from submit-queue

kubelet: Make cadvisor GetContainerInfo tests table driven
2017-02-15 15:14:28 -08:00
Derek McQuay
70e7d64b46 kubeadm: moved import to client-go, where possible
Some imports dont exist yet (or so it seems) in client-go (examples
being:

  - "k8s.io/kubernetes/pkg/api/validation"
  - "k8s.io/kubernetes/pkg/util/initsystem"
  - "k8s.io/kubernetes/pkg/util/node"

one change in kubelet to import to client-go
2017-02-15 13:06:15 -08:00
Kubernetes Submit Queue
3bc575c91f Merge pull request #33550 from rtreffer/kubelet-allow-multiple-dns-server
Automatic merge from submit-queue

Allow multipe DNS servers as comma-seperated argument for kubelet --dns

This PR explores how kubectls "--dns" could be extended to specify multiple DNS servers for in-cluster PODs. Testing on the local libvirt-coreos cluster shows that multiple DNS server are injected without issues.

Specifying multiple DNS servers increases resilience against
- Packet drops
- Single server failure

I am debugging services that do 50+ DNS requests for a single incoming interactive request, thus highly increase the chance of a slowdown (+5s) due to a single packet drop. Switching to two DNS servers will reduce the impact of the issues (roughly +1s on glibc, 0s on musl, error-rate goes down to error-rate^2).

Note that there is no need to change any runtime related code as far as I know. In the case of "default" dns the /etc/resolv.conf is parsed and multiple DNS server are send to the backend anyway. This only adds the same capability for the clusterFirst case.

I've heard from @thockin that multiple DNS entries are somehow considered. I've no idea what was considered, though. This is what I would like to see for our production use, though.

```release-note
NONE
```
2017-02-15 12:45:32 -08:00
Minhan Xia
4ca2642dd3 update bazel 2017-02-15 10:06:49 -08:00
Minhan Xia
3cc837878f retry StopPodSandbox on Network teardown failure 2017-02-15 10:06:41 -08:00
David Ashpole
1d38818326 Revert "Merge pull request #41202 from dashpole/revert-41095-deletion_pod_lifecycle"
This reverts commit ff87d13b2c, reversing
changes made to 46becf2c81.
2017-02-15 08:44:03 -08:00
Michal Rostecki
4ed087e01f kubelet: Make cadvisor GetContainerInfo tests table driven 2017-02-15 16:15:21 +01:00
Kubernetes Submit Queue
dd696683b7 Merge pull request #40647 from NickrenREN/secretManager
Automatic merge from submit-queue (batch tested with PRs 41360, 41423, 41430, 40647, 41352)

optimize NewSimpleSecretManager and cleanupOrphanedPodCgroups
2017-02-15 05:06:11 -08:00
Kubernetes Submit Queue
d47ffa08c7 Merge pull request #41423 from yujuhong/better_logging
Automatic merge from submit-queue (batch tested with PRs 41360, 41423, 41430, 40647, 41352)

kubelet: reduce extraneous logging for pods using host network

For pods using the host network, kubelet/shim should not log
error/warning messages when determining the pod IP address.
2017-02-15 05:06:08 -08:00
Kubernetes Submit Queue
3a6fa67f59 Merge pull request #39179 from NickrenREN/killpod
Automatic merge from submit-queue (batch tested with PRs 41196, 41252, 41300, 39179, 41449)

record ReduceCPULimits result err info if err returned

record ReduceCPULimits result err info if err returned for debug
2017-02-15 04:14:15 -08:00
Kubernetes Submit Queue
a57967f47b Merge pull request #41436 from dashpole/status_bug
Automatic merge from submit-queue

Fix bug in status manager TerminatePod

In TerminatePod, we previously pass pod.Status to updateStatusInternal.  This is a bug, since it is the original status that we are given.  Not only does it skip updates made to container statuses, but in some cases it reverted the pod's status to an earlier version, since it was being passed a stale status initially.

This was the case in #40239 and #41095.  As shown in #40239, the pod's status is set to running after it is set to failed, occasionally causing very long delays in pod deletion since we have to wait for this to be corrected.

This PR fixes the bug, adds some helpful debugging statements, and adds a unit test for TerminatePod (which for some reason didnt exist before?).

@kubernetes/sig-node-bugs @vish @Random-Liu
2017-02-14 21:03:31 -08:00
Kubernetes Submit Queue
c485e76fe0 Merge pull request #41378 from yujuhong/enable_cri
Automatic merge from submit-queue

Make EnableCRI default to true

This change makes kubelet to use the CRI implementation by default,
unless the users opt out explicitly by using --enable-cri=false.
For the rkt integration, the --enable-cri flag will have no effect
since rktnetes does not use CRI.

Also, mark the original --experimental-cri flag hidden and deprecated,
so that we can remove it in the next release. If both flags are specified,
the --enable-cri flag overrides the --experimental-cri flag.
2017-02-14 19:22:36 -08:00
Minhan Xia
012acad32e add CheckpointNotFound error 2017-02-14 16:55:51 -08:00
Kubernetes Submit Queue
fe4a254a70 Merge pull request #41176 from tanshanshan/fix-little2
Automatic merge from submit-queue

fix comment 

**What this PR does / why we need it**:

fix comment 

Thanks.

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-02-14 16:41:45 -08:00
Yu-Ju Hong
fb94f441ce Set EnableCRI to true by default
This change makes kubelet to use the CRI implementation by default,
unless the users opt out explicitly by using --enable-cri=false.
For the rkt integration, the --enable-cri flag will have no effect
since rktnetes does not use CRI.

Also, mark the original --experimental-cri flag hidden and deprecated,
so that we can remove it in the next release.
2017-02-14 16:15:51 -08:00
Yu-Ju Hong
77286c38d3 kubelet: reduce extraneous logging for pods using host network
For pods using the host network, kubelet/shim should not log
error/warning messages when determining the pod IP address.
2017-02-14 16:09:42 -08:00
David Ashpole
c612e09acd use the status we modify, not original 2017-02-14 13:36:20 -08:00
Yu-Ju Hong
9fa1ad29fd kubelet: handle containers in the "created" state 2017-02-14 07:51:35 -08:00
NickrenREN
31bfefca3c optimize NewSimpleSecretManager and cleanupOrphanedPodCgroups
remove NewSimpleSecretManager second return value and cleanupOrphanedPodCgroups's return since they will never return err
2017-02-14 09:47:05 +08:00
James Ravn
9992bd23c2 Mark detached only if no pending operations
To safely mark a volume detached when the volume controller manager is used.

An example of one such problem:

1. pod is created, volume is added to desired state of the world
2. reconciler process starts
3. reconciler starts MountVolume, which is kicked off asynchronously via
   operation_executor.go
4. MountVolume mounts the volume, but hasn't yet marked it as mounted
5. pod is deleted, volume is removed from desired state of the world
6. reconciler detects volume is no longer in desired state of world,
   removes it from volumes in use
7. MountVolume tries to mark volume in use, throws an error because
   volume is no longer in actual state of world list.
8. controller-manager tries to detach the volume, this fails because it
   is still mounted to the OS.
9. EBS gets stuck indefinitely in busy state trying to detach.
2017-02-13 11:51:44 +00:00
Pengfei Ni
81b9064ca4 Fix typo of defualt 2017-02-11 22:28:24 +08:00
Random-Liu
8177030957 Change timeout for ExecSync, RunPodSandbox and PullImage. 2017-02-10 16:09:19 -08:00
Kubernetes Submit Queue
e9de1b0221 Merge pull request #40992 from k82cn/rm_empty_line
Automatic merge from submit-queue (batch tested with PRs 41236, 40992)

Removed unnecessarly empty line.
2017-02-10 05:38:42 -08:00
Kubernetes Submit Queue
8188c3cca4 Merge pull request #40796 from wojtek-t/use_node_ttl_in_secret_manager
Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210)

Implement TTL controller and use the ttl annotation attached to node in secret manager

For every secret attached to a pod as volume, Kubelet is trying to refresh it every sync period. Currently Kubelet has a ttl-cache of secrets of its pods and the ttl is set to 1 minute. That means that in large clusters we are targetting (5k nodes, 30pods/node), given that each pod has a secret associated with ServiceAccount from its namespaces, and with large enough number of namespaces (where on each node (almost) every pod is from a different namespace), that resource in ~30 GETs to refresh all secrets every minute from one node, which gives ~2500QPS for GET secrets to apiserver.

Apiserver cannot keep up with it very easily.

Desired solution would be to watch for secret changes, but because of security we don't want a node watching for all secrets, and it is not possible for now to watch only for secrets attached to pods from my node.

So as a temporary solution, we are introducing an annotation that would be a suggestion for kubelet for the TTL of secrets in the cache and a very simple controller that would be setting this annotation based on the cluster size (the large cluster is, the bigger ttl is). 
That workaround mean that only very local changes are needed in Kubelet, we are creating a well separated very simple controller, and once watching "my secrets" will be possible it will be easy to remove it and switch to that. And it will allow us to reach scalability goals.

@dchen1107 @thockin @liggitt
2017-02-10 00:04:44 -08:00
Kubernetes Submit Queue
76b39431d3 Merge pull request #41147 from derekwaynecarr/improve-eviction-logs
Automatic merge from submit-queue (batch tested with PRs 41074, 41147, 40854, 41167, 40045)

Add debug logging to eviction manager

**What this PR does / why we need it**:
This PR adds debug logging to eviction manager.

We need it to help users understand when/why eviction manager is/is not making decisions to support information gathering during support.
2017-02-09 17:41:41 -08:00
Kubernetes Submit Queue
f5c07157a8 Merge pull request #41092 from yujuhong/cri-docker1_10
Automatic merge from submit-queue (batch tested with PRs 41037, 40118, 40959, 41084, 41092)

CRI node e2e: add tests for docker 1.10
2017-02-09 16:44:44 -08:00
David Ashpole
b224f83c37 Revert "[Kubelet] Delay deletion of pod from the API server until volumes are deleted" 2017-02-09 08:45:18 -08:00
Wojciech Tyczynski
6c0535a939 Use secret TTL annotation in secret manager 2017-02-09 13:53:32 +01:00
tanshanshan
46b1256d34 fix wrong 2017-02-09 12:19:32 +08:00
Kubernetes Submit Queue
42d8d4ca88 Merge pull request #40948 from freehan/cri-hostport
Automatic merge from submit-queue (batch tested with PRs 40873, 40948, 39580, 41065, 40815)

[CRI] Enable Hostport Feature for Dockershim

Commits:
1. Refactor common hostport util logics and add more tests

2. Add HostportManager which can ADD/DEL hostports instead of a complete sync.

3. Add Interface for retreiving portMappings information of a pod in Network Host interface. 
Implement GetPodPortMappings interface in dockerService. 

4. Teach kubenet to use HostportManager
2017-02-08 14:14:43 -08:00
Derek Carr
0171121486 Add debug logging to eviction manager 2017-02-08 15:01:12 -05:00
Yu-Ju Hong
f96611ac45 dockershim: set the default cgroup driver 2017-02-08 10:22:19 -08:00
Minhan Xia
be9eca6b51 teach kubenet to use hostport_manager 2017-02-08 09:35:04 -08:00
Minhan Xia
bd05e1af2b add portmapping getter into network host 2017-02-08 09:35:04 -08:00
Minhan Xia
8e7219cbb4 add hostport manager 2017-02-08 09:26:52 -08:00
Minhan Xia
aabdaa984f refactor hostport logic 2017-02-08 09:26:52 -08:00
David Ashpole
67cb2704c5 delete volumes before pod deletion 2017-02-08 07:34:49 -08:00
Kubernetes Submit Queue
c6f30d3750 Merge pull request #41105 from Random-Liu/fix-kuberuntime-log
Automatic merge from submit-queue (batch tested with PRs 38796, 40823, 40756, 41083, 41105)

Let ReadLogs return when there is a read error.

Fixes a bug in kuberuntime log.

Today, @yujuhong found that once we cancel `kubectl logs -f` with `Ctrl+C`, kuberuntime will keep complaining:
```
27939 kuberuntime_logs.go:192] Failed with err write tcp 10.240.0.4:10250->10.240.0.2:53913: write: broken pipe when writing log for log file "/var/log/pods/5bb76510-ed71-11e6-ad02-42010af00002/busybox_0.log": &{timestamp:{sec:63622095387 nsec:625309193 loc:0x484c440} stream:stdout log:[84 117 101 32 70 101 98 32 32 55 32 50 48 58 49 54 58 50 55 32 85 84 67 32 50 48 49 55 10]}
```

This is because kuberuntime keeps writing to the connection even though it is already closed. Actually, kuberuntime should return and report error whenever there is a writing error.

Ref the [docker code](3a4ae1f661/pkg/stdcopy/stdcopy.go (L159-L167))

I'm still creating the cluster and verifying this fix. Will post the result here after that.

/cc @yujuhong @kubernetes/sig-node-bugs
2017-02-08 00:49:51 -08:00
Kubernetes Submit Queue
1b7bdde40b Merge pull request #38796 from chentao1596/kubelet-cni-log
Automatic merge from submit-queue (batch tested with PRs 38796, 40823, 40756, 41083, 41105)

kubelet/network-cni-plugin: modify the log's info

**What this PR does / why we need it**:

  Checking the startup logs of kubelet, i can always find a error like this:
    "E1215 10:19:24.891724    2752 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d"
 
 It will appears, neither i use cni network-plugin or not. 
After analysis codes, i thought it should be a warn log, because it will not produce any actions like as exit or abort, and just ignored when not any valid plugins exit.

 thank you!
2017-02-08 00:49:44 -08:00
Kubernetes Submit Queue
843e6d1cc3 Merge pull request #40770 from apilloud/clientset_interface
Automatic merge from submit-queue (batch tested with PRs 41103, 41042, 41097, 40946, 40770)

Use Clientset interface in KubeletDeps

**What this PR does / why we need it**:
This replaces the Clientset struct with the equivalent interface for the KubeClient injected via KubeletDeps. This is useful for testing and for accessing the Node and Pod status event stream without an API server.

**Special notes for your reviewer**:
Follow up to #4907

**Release note**:

`NONE`
2017-02-07 22:12:39 -08:00
Kubernetes Submit Queue
51b8eb9424 Merge pull request #40946 from yujuhong/docker_sep
Automatic merge from submit-queue (batch tested with PRs 41103, 41042, 41097, 40946, 40770)

dockershim: set security option separators based on the docker version

Also add a version cache to avoid hitting the docker daemon frequently.

This is part of #38164
2017-02-07 22:12:37 -08:00
Random-Liu
65190e2a72 Let ReadLogs return when there is a read error. 2017-02-07 15:43:48 -08:00
Yu-Ju Hong
e66dd63b05 Add OWNERS to the dockertools package 2017-02-07 11:31:37 -08:00
Yu-Ju Hong
d8e29e782f dockershim: set security option separators based on the docker version
Also add a version cache to avoid hitting the docker daemon frequently.
2017-02-07 11:06:40 -08:00