Add documentation for creating large kubernetes clusters

2015-07-06 19:40:22 -07:00
parent 4923bb2de1
commit c4a42ea405
2 changed files with 115 additions and 8 deletions
--- a/docs/compute_resources.md
+++ b/docs/compute_resources.md
@@ -7,6 +7,7 @@
  - [How Pods with Resource Limits are Run](#how-pods-with-resource-limits-are-run)
  - [Monitoring Compute Resource Usage](#monitoring-compute-resource-usage)
  - [Troubleshooting](#troubleshooting)
+    - [Detecting Resource Starved Containers](#detecting-resource-starved-containers)
  - [Planned Improvements](#planned-improvements)

 When specifying a [pod](pods.md), you can optionally specify how much CPU and memory (RAM) each
@@ -94,18 +95,14 @@ When using Docker:
 **TODO: document behavior for rkt**

 If a container exceeds its memory limit, it may be terminated.  If it is restartable, it will be
-restarted by kubelet, as will any other type of runtime failure.  If it is killed for exceeding its
-memory limit, you will see the reason `OOM Killed`, as in this example:
-```
-$ kubectl get pods/memhog
-NAME      READY     REASON       RESTARTS   AGE
-memhog    0/1       OOM Killed   0          1h
-```
-*OOM* stands for Out Of Memory.
+restarted by kubelet, as will any other type of runtime failure.  

 A container may or may not be allowed to exceed its CPU limit for extended periods of time.
 However, it will not be killed for excessive CPU usage.

+To determine if a container cannot be scheduled or is being killed due to resource limits, see the
+"Troubleshooting" section below.
+
 ## Monitoring Compute Resource Usage

 The resource usage of a pod is reported as part of the Pod status.
@@ -140,6 +137,56 @@ The [resource quota](resource_quota_admin.md) feature can be configured
 to limit the total amount of resources that can be consumed.  If used in conjunction
 with namespaces, it can prevent one team from hogging all the resources.

+### Detecting Resource Starved Containers
+To check if a container is being killed because it is hitting a resource limit, call `kubectl describe pod`
+on the pod you are interested in:
+
+```
+[12:54:41] $ ./cluster/kubectl.sh describe pod simmemleak-hra99
+Name:               simmemleak-hra99
+Namespace:          default
+Image(s):           saadali/simmemleak
+Node:               kubernetes-minion-tf0f/10.240.216.66
+Labels:             name=simmemleak
+Status:             Running
+Reason:             
+Message:            
+IP:             10.244.2.75
+Replication Controllers:    simmemleak (1/1 replicas created)
+Containers:
+  simmemleak:
+    Image:  saadali/simmemleak
+    Limits:
+      cpu:      100m
+      memory:       50Mi
+    State:      Running
+      Started:      Tue, 07 Jul 2015 12:54:41 -0700
+    Ready:      False
+    Restart Count:  5
+Conditions:
+  Type      Status
+  Ready     False 
+Events:
+  FirstSeen                         LastSeen                         Count  From                              SubobjectPath                       Reason      Message
+  Tue, 07 Jul 2015 12:53:51 -0700   Tue, 07 Jul 2015 12:53:51 -0700  1      {scheduler }                                                          scheduled   Successfully assigned simmemleak-hra99 to kubernetes-minion-tf0f
+  Tue, 07 Jul 2015 12:53:51 -0700   Tue, 07 Jul 2015 12:53:51 -0700  1      {kubelet kubernetes-minion-tf0f}  implicitly required container POD   pulled      Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
+  Tue, 07 Jul 2015 12:53:51 -0700   Tue, 07 Jul 2015 12:53:51 -0700  1      {kubelet kubernetes-minion-tf0f}  implicitly required container POD   created     Created with docker id 6a41280f516d
+  Tue, 07 Jul 2015 12:53:51 -0700   Tue, 07 Jul 2015 12:53:51 -0700  1      {kubelet kubernetes-minion-tf0f}  implicitly required container POD   started     Started with docker id 6a41280f516d
+  Tue, 07 Jul 2015 12:53:51 -0700   Tue, 07 Jul 2015 12:53:51 -0700  1      {kubelet kubernetes-minion-tf0f}  spec.containers{simmemleak}         created     Created with docker id 87348f12526a
+```
+
+The `Restart Count:  5` indicates that the `simmemleak` container in this pod was terminated and restarted 5 times.
+
+Once [#10861](https://github.com/GoogleCloudPlatform/kubernetes/issues/10861) is resolved the reason for the termination of the last container will also be printed in this output.
+
+Until then you can call `get pod` with the `-o template -t ...` option to fetch the status of previously terminated containers:
+```
+[13:59:01] $ ./cluster/kubectl.sh  get pod -o template -t '{{range.status.containerStatuses}}{{"Container Name: "}}{{.name}}{{"\r\nLastState: "}}{{.lastState}}{{end}}'  simmemleak-60xbc
+Container Name: simmemleak
+LastState: map[terminated:map[exitCode:137 reason:OOM Killed startedAt:2015-07-07T20:58:43Z finishedAt:2015-07-07T20:58:43Z containerID:docker://0e4095bba1feccdfe7ef9fb6ebffe972b4b14285d5acdec6f0d3ae8a22fad8b2]][13:59:03] clusterScaleDoc ~/go/src/github.com/GoogleCloudPlatform/kubernetes $ 
+```
+We can see that this container was terminated because `reason:OOM Killed`, where *OOM* stands for Out Of Memory.
+
 ## Planned Improvements

 The current system only allows resource quantities to be specified on a container.