Commit Graph

1594 Commits

Author SHA1 Message Date
wlan0
45d2bc06b7 cloud initialize node in external cloud controller 2017-05-05 16:51:45 -07:00
Kubernetes Submit Queue
f6ec7bade1 Merge pull request #45316 from yujuhong/dockershim-plugin-settings
Automatic merge from submit-queue (batch tested with PRs 45316, 45341)

Pass NoOpLegacyHost to dockershim in --experimental-dockershim mode

This allows dockershim to use network plugins, if needed.

/cc @Random-Liu
2017-05-04 05:19:49 -07:00
Yu-Ju Hong
40b0474956 pass noopnetworkhost to dockershim 2017-05-03 16:32:01 -07:00
Yu-Ju Hong
78b2c3b4c2 kuberuntime: remove the unused network plugin
Network plugin is completely handled by the container runtimes. Remove
this unused field in the kuberuntime manager.
2017-05-03 16:21:46 -07:00
Yu-Ju Hong
93ecaf6812 Move exec.go from dockertools to dockershim 2017-05-01 16:00:46 -07:00
Yu-Ju Hong
9f3184c5a4 Remove DockerManager from kubelet
This commit deletes code in dockertools that is only used by
DockerManager. A follow-up change will rename and clean up the rest of
the files in this package.

The commit also sets EnableCRI to true if the container runtime is not
rkt. A follow-up change will remove the flag/field and all references to
it.
2017-05-01 12:14:50 -07:00
Lee Verberne
d22dd0fa35 Implement shared PID namespace in the dockershim 2017-04-27 23:43:53 +00:00
Klaus Ma
6d29cfc0cc Registered node before other initialization. 2017-04-18 10:43:56 +08:00
Yu-Ju Hong
1d3d12dfc2 Don't check runtime condition for rktnetes
rktnetes is not a CRI implementation, and does not provide runtime
conditions. This change fixes the issue where rkt will never be
considered running from kubelet's point of view.
2017-04-17 11:33:58 -07:00
Andy Goldstein
00e11566f2 Make the dockershim root directory configurable
Make the dockershim root directory configurable so things like
integration tests (e.g. in OpenShift) can run as non-root.
2017-04-12 09:06:21 -04:00
Andy Goldstein
010b71a5f7 kubelet: make dockershim.sock configurable
Make the location of dockershim.sock configurable, so downstream
projects (such as OpenShift) can place it in a location that does not
require root access (e.g. for integration tests).

Make the kubelet respect and use the values of
--container-runtime-endpoint and --image-service-endpoint, if set. If
unset, the default value of /var/run/dockershim.sock is used.
2017-04-06 12:01:21 -04:00
Kubernetes Submit Queue
faf2eca226 Merge pull request #42916 from dashpole/misleading_log
Automatic merge from submit-queue

Clearer ImageGC failure errors.  Fewer events.

Addresses #26000.  Kubelet often "fails" image garbage collection if cAdvisor has not completed the first round of stats collection.  Don't create events for a single failure, and make log messages more specific.

@kubernetes/sig-node-bugs
2017-04-04 11:23:32 -07:00
Michael Taufen
f5eed7e91d Add a separate flags struct for Kubelet flags
Kubelet flags are not necessarily appropriate for the KubeletConfiguration
object. For example, this PR also removes HostnameOverride and NodeIP
from KubeletConfiguration. This is a preleminary step to enabling Nodes
to share configurations, as part of the dynamic Kubelet configuration
feature (#29459). Fields that must be unique for each node inhibit
sharing, because their values, by definition, cannot be shared.
2017-04-03 13:28:29 -07:00
David Ashpole
2cd65ea863 only create event for multiple imagegc failures 2017-03-30 16:19:18 -07:00
NickrenREN
2f89a6bda6 optimize getPullSecretsForPod() and syncPod()
Since getPullSecretsForPod() will never return err,we do not need the second return value,and modify syncPod() function.
2017-03-25 11:05:13 +08:00
Vishnu kannan
ff158090b3 use active pods instead of runtime pods in gpu manager
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Andy Goldstein
b011529d8a Add pprof trace support
Add pprof trace support and --enable-contention-profiling to those
components that don't already have it.
2017-03-07 10:10:42 -05:00
Kubernetes Submit Queue
4bbf98850f Merge pull request #42500 from vishh/fix-gpu-init
Automatic merge from submit-queue

[Bug] Fix gpu initialization in Kubelet

Kubelet incorrectly fails if `AllAlpha=true` feature gate is enabled with container runtimes that are not `docker`.

Replaces #42407
2017-03-04 20:28:08 -08:00
Vishnu kannan
038585626d fix gpu initialization
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-03 12:13:01 -08:00
David Ashpole
ac612eab8e eviction manager changes for allocatable 2017-03-02 07:36:24 -08:00
Kubernetes Submit Queue
fa0387c9fe Merge pull request #42195 from Random-Liu/cri-support-non-json-logging
Automatic merge from submit-queue (batch tested with PRs 41931, 39821, 41841, 42197, 42195)

Use `docker logs` directly if the docker logging driver is not `json-file`

Fixes https://github.com/kubernetes/kubernetes/issues/41996.

Post the PR first, I still need to manually test this, because we don't have test coverage for journald logging pluggin.

@yujuhong @dchen1107 
/cc @kubernetes/sig-node-pr-reviews
2017-03-01 20:08:08 -08:00
Random-Liu
7c261bfed7 Use docker logs directly if the docker logging driver is not
supported.
2017-03-01 10:50:11 -08:00
vefimova
fc8a37ec86 Added ability for Docker containers to set usage of dns settings along with hostNetwork is true
Introduced chages:
   1. Re-writing of the resolv.conf file generated by docker.
      Cluster dns settings aren't passed anymore to docker api in all cases, not only for pods with host network:
      the resolver conf will be overwritten after infra-container creation to override docker's behaviour.

   2. Added new one dnsPolicy - 'ClusterFirstWithHostNet', so now there are:
      - ClusterFirstWithHostNet - use dns settings in all cases, i.e. with hostNet=true as well
      - ClusterFirst - use dns settings unless hostNetwork is true
      - Default

Fixes #17406
2017-03-01 17:10:00 +00:00
Kubernetes Submit Queue
ed479163fa Merge pull request #42116 from vishh/gpu-experimental-support
Automatic merge from submit-queue

Extend experimental support to multiple Nvidia GPUs

Extended from #28216

```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with  support for multiple Nvidia GPUs. 
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```

1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```

TODO:

- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
2017-03-01 04:52:50 -08:00
Vishnu kannan
2554b95994 Map nvidia devices one to one.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 13:42:08 -08:00
Vishnu kannan
69acb02394 use feature gate instead of flag to control support for GPUs
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 13:42:07 -08:00
Vishnu kannan
3b0a408e3b improve gpu integration
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 11:27:53 -08:00
Hui-Zhi
57c77ffbdd Add support for multiple nvidia gpus 2017-02-28 11:24:48 -08:00
Seth Jennings
b9adb66426 kubelet: cm: refactor QoS logic into seperate interface 2017-02-28 09:19:29 -06:00
Vishnu Kannan
cc5f5474d5 add support for node allocatable phase 2 to kubelet
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-02-27 21:24:44 -08:00
Kubernetes Submit Queue
16f87fe7d8 Merge pull request #40952 from dashpole/premption
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)

Guaranteed admission for Critical Pods

This is the first step in implementing node-level preemption for critical pods.
It defines the AdmissionFailureHandler interface, which allows callers, like the kubelet, to define how failed predicates are handled, and take steps to correct failures if necessary.
In the kubelet's implementation, it triggers preemption if the pod being admitted is critical, and if the only failed predicates are InsufficientResourceErrors, then it prempts (not yet implemented) other other pods to allow admission of the critical pod.

cc: @vishh
2017-02-26 12:57:59 -08:00
Kubernetes Submit Queue
067f92e789 Merge pull request #41801 from riverzhang/patch-1
Automatic merge from submit-queue (batch tested with PRs 41854, 41801, 40088, 41590, 41911)

Fix  some typos

**Release note**:

```release-note
```
2017-02-25 05:02:53 -08:00
David Ashpole
c58970e47c critical pods can preempt other pods to be admitted 2017-02-23 10:31:20 -08:00
Andy Goldstein
9d8d6ad16c Switch scheduler to use generated listers/informers
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
2017-02-23 09:57:12 -05:00
riverzhang
5156b7f8cf Fix some typos 2017-02-21 07:15:40 -06:00
Kubernetes Submit Queue
05c05de798 Merge pull request #41569 from yujuhong/add_healthcheck
Automatic merge from submit-queue (batch tested with PRs 38101, 41431, 39606, 41569, 41509)

Report node not ready on failed PLEG health check

Report node not ready if PLEG health check fails.
2017-02-16 15:49:18 -08:00
Kubernetes Submit Queue
6376ad134d Merge pull request #39606 from NickrenREN/kubelet-pod
Automatic merge from submit-queue (batch tested with PRs 38101, 41431, 39606, 41569, 41509)

optimize killPod() and syncPod() functions

make sure that one of the two arguments must be non-nil: runningPod, status ,just like the function note says
and judge the return value in syncPod() function before setting podKilled
2017-02-16 15:49:17 -08:00
Kubernetes Submit Queue
3c606cdd20 Merge pull request #41456 from dashpole/pod_volume_cleanup
Automatic merge from submit-queue (batch tested with PRs 41466, 41456, 41550, 41238, 41416)

Delay Deletion of a Pod until volumes are cleaned up

#41436 fixed the bug that caused #41095 and #40239 to have to be reverted.  Now that the bug is fixed, this shouldn't cause problems.

 @vishh @derekwaynecarr @sjenning @jingxu97 @kubernetes/sig-storage-misc
2017-02-16 10:14:05 -08:00
Yu-Ju Hong
5bb43a3a24 Report node not ready on failed PLEG health check 2017-02-16 09:00:22 -08:00
NickrenREN
b40e575076 optimize killPod() and syncPod() functions
make sure that one of the two arguments must be non-nil: runningPod, status ,just like the function note says
and judge the return value in syncPod() function before setting podKilled
2017-02-16 09:13:23 +08:00
Kubernetes Submit Queue
3bc575c91f Merge pull request #33550 from rtreffer/kubelet-allow-multiple-dns-server
Automatic merge from submit-queue

Allow multipe DNS servers as comma-seperated argument for kubelet --dns

This PR explores how kubectls "--dns" could be extended to specify multiple DNS servers for in-cluster PODs. Testing on the local libvirt-coreos cluster shows that multiple DNS server are injected without issues.

Specifying multiple DNS servers increases resilience against
- Packet drops
- Single server failure

I am debugging services that do 50+ DNS requests for a single incoming interactive request, thus highly increase the chance of a slowdown (+5s) due to a single packet drop. Switching to two DNS servers will reduce the impact of the issues (roughly +1s on glibc, 0s on musl, error-rate goes down to error-rate^2).

Note that there is no need to change any runtime related code as far as I know. In the case of "default" dns the /etc/resolv.conf is parsed and multiple DNS server are send to the backend anyway. This only adds the same capability for the clusterFirst case.

I've heard from @thockin that multiple DNS entries are somehow considered. I've no idea what was considered, though. This is what I would like to see for our production use, though.

```release-note
NONE
```
2017-02-15 12:45:32 -08:00
David Ashpole
1d38818326 Revert "Merge pull request #41202 from dashpole/revert-41095-deletion_pod_lifecycle"
This reverts commit ff87d13b2c, reversing
changes made to 46becf2c81.
2017-02-15 08:44:03 -08:00
Kubernetes Submit Queue
dd696683b7 Merge pull request #40647 from NickrenREN/secretManager
Automatic merge from submit-queue (batch tested with PRs 41360, 41423, 41430, 40647, 41352)

optimize NewSimpleSecretManager and cleanupOrphanedPodCgroups
2017-02-15 05:06:11 -08:00
Yu-Ju Hong
fb94f441ce Set EnableCRI to true by default
This change makes kubelet to use the CRI implementation by default,
unless the users opt out explicitly by using --enable-cri=false.
For the rkt integration, the --enable-cri flag will have no effect
since rktnetes does not use CRI.

Also, mark the original --experimental-cri flag hidden and deprecated,
so that we can remove it in the next release.
2017-02-14 16:15:51 -08:00
NickrenREN
31bfefca3c optimize NewSimpleSecretManager and cleanupOrphanedPodCgroups
remove NewSimpleSecretManager second return value and cleanupOrphanedPodCgroups's return since they will never return err
2017-02-14 09:47:05 +08:00
Kubernetes Submit Queue
e9de1b0221 Merge pull request #40992 from k82cn/rm_empty_line
Automatic merge from submit-queue (batch tested with PRs 41236, 40992)

Removed unnecessarly empty line.
2017-02-10 05:38:42 -08:00
Kubernetes Submit Queue
8188c3cca4 Merge pull request #40796 from wojtek-t/use_node_ttl_in_secret_manager
Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210)

Implement TTL controller and use the ttl annotation attached to node in secret manager

For every secret attached to a pod as volume, Kubelet is trying to refresh it every sync period. Currently Kubelet has a ttl-cache of secrets of its pods and the ttl is set to 1 minute. That means that in large clusters we are targetting (5k nodes, 30pods/node), given that each pod has a secret associated with ServiceAccount from its namespaces, and with large enough number of namespaces (where on each node (almost) every pod is from a different namespace), that resource in ~30 GETs to refresh all secrets every minute from one node, which gives ~2500QPS for GET secrets to apiserver.

Apiserver cannot keep up with it very easily.

Desired solution would be to watch for secret changes, but because of security we don't want a node watching for all secrets, and it is not possible for now to watch only for secrets attached to pods from my node.

So as a temporary solution, we are introducing an annotation that would be a suggestion for kubelet for the TTL of secrets in the cache and a very simple controller that would be setting this annotation based on the cluster size (the large cluster is, the bigger ttl is). 
That workaround mean that only very local changes are needed in Kubelet, we are creating a well separated very simple controller, and once watching "my secrets" will be possible it will be easy to remove it and switch to that. And it will allow us to reach scalability goals.

@dchen1107 @thockin @liggitt
2017-02-10 00:04:44 -08:00
David Ashpole
b224f83c37 Revert "[Kubelet] Delay deletion of pod from the API server until volumes are deleted" 2017-02-09 08:45:18 -08:00
Wojciech Tyczynski
6c0535a939 Use secret TTL annotation in secret manager 2017-02-09 13:53:32 +01:00
Kubernetes Submit Queue
42d8d4ca88 Merge pull request #40948 from freehan/cri-hostport
Automatic merge from submit-queue (batch tested with PRs 40873, 40948, 39580, 41065, 40815)

[CRI] Enable Hostport Feature for Dockershim

Commits:
1. Refactor common hostport util logics and add more tests

2. Add HostportManager which can ADD/DEL hostports instead of a complete sync.

3. Add Interface for retreiving portMappings information of a pod in Network Host interface. 
Implement GetPodPortMappings interface in dockerService. 

4. Teach kubenet to use HostportManager
2017-02-08 14:14:43 -08:00