Fix bad config in flaky test documentation and add script to help check
for flakes.
This commit is contained in:
@@ -11,7 +11,7 @@ There is a testing image ```brendanburns/flake``` up on the docker hub. We will
|
|||||||
|
|
||||||
Create a replication controller with the following config:
|
Create a replication controller with the following config:
|
||||||
```yaml
|
```yaml
|
||||||
id: flakeController
|
id: flakecontroller
|
||||||
kind: ReplicationController
|
kind: ReplicationController
|
||||||
apiVersion: v1beta1
|
apiVersion: v1beta1
|
||||||
desiredState:
|
desiredState:
|
||||||
@@ -41,14 +41,26 @@ labels:
|
|||||||
|
|
||||||
```./cluster/kubectl.sh create -f controller.yaml```
|
```./cluster/kubectl.sh create -f controller.yaml```
|
||||||
|
|
||||||
This will spin up 100 instances of the test. They will run to completion, then exit, the kubelet will restart them, eventually you will have sufficient
|
This will spin up 24 instances of the test. They will run to completion, then exit, and the kubelet will restart them, accumulating more and more runs of the test.
|
||||||
runs for your purposes, and you can stop the replication controller by setting the ```replicas``` field to 0 and then running:
|
You can examine the recent runs of the test by calling ```docker ps -a``` and looking for tasks that exited with non-zero exit codes. Unfortunately, docker ps -a only keeps around the exit status of the last 15-20 containers with the same image, so you have to check them frequently.
|
||||||
|
You can use this script to automate checking for failures, assuming your cluster is running on GCE and has four nodes:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
./cluster/kubectl.sh update -f controller.yaml
|
echo "" > output.txt
|
||||||
./cluster/kubectl.sh delete -f controller.yaml
|
for i in {1..4}; do
|
||||||
|
echo "Checking kubernetes-minion-${i}"
|
||||||
|
echo "kubernetes-minion-${i}:" >> output.txt
|
||||||
|
gcloud compute ssh "kubernetes-minion-${i}" --command="sudo docker ps -a" >> output.txt
|
||||||
|
done
|
||||||
|
grep "Exited ([^0])" output.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
Now examine the machines with ```docker ps -a``` and look for tasks that exited with non-zero exit codes (ignore those that exited -1, since that's what happens when you stop the replica controller)
|
Eventually you will have sufficient runs for your purposes. At that point you can stop and delete the replication controller by running:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
./cluster/kubectl.sh stop replicationcontroller flakecontroller
|
||||||
|
```
|
||||||
|
|
||||||
|
If you do a final check for flakes with ```docker ps -a```, ignore tasks that exited -1, since that's what happens when you stop the replication controller.
|
||||||
|
|
||||||
Happy flake hunting!
|
Happy flake hunting!
|
||||||
|
Reference in New Issue
Block a user