Kubelet doesn't perform checkpointing and loses all its internal states after
restarts. It'd then mistaken pods from the api server as new pods and attempt
to go through the admission process. This may result in pods being rejected
even though they are running on the node (e.g., out of disk situation). This
change adds a condition to check whether the pod was seen before and categorize
such pods as updates. The change also removes freeze/unfreeze mechanism used to
work around such cases, since it is no longer needed and it stopped working
correctly ever since we switched to incremental updates.
If not, using `go test -count=n` would make them pile up and ultimately
get to the limit of open files:
client_test.go:522: expected an error, got Get http://127.0.0.1:46070/api: dial tcp 127.0.0.1:46070: socket: too many open files
Steps to reproduce (no longer fails):
godep go test -short -run '^$' -o test .
./test -test.run '^TestGetSwaggerSchema' -test.count 100
Note that this might not fail if your `ulimit -n` is not low enough.
Enforce minimum resource granularity of milli-{core, bytes} for Storage,
Memory, and CPU resource types. For Storage and Memory, milli-bytes are
allowed for backwards compatability, but the behavior is
undefinied (depends on docker implementation).
I broke out the error retry logic into a named function that could be
tested independently of the rest of the event processing framework. This
allows the test to know when the retry logic is done.
The problem with the original test was there was no reliable way to know
when it was done trying record an event. A sentinal event was being
used, but there is no ordering guarantee. I could have added
synchronization around attempts tracking to fix the data race, but the
test case was still fundamentally flawed and would error occasionally.
Changes include:
- Moving stats serving & routes to pkg/kubelet/server/stats/handler.go
- Managing the routes with restful.WebService, rather than manual
parsing
- Misc cleanup
These changes will make adding the new routes for /stats/summary more
manageable.
It cordons (marks unschedulable) the given node, and then deletes every
pod on it, optionally using a grace period. It will not delete pods
managed by neither a ReplicationController nor a DaemonSet without the
use of --force.
Also add cordon/uncordon, which just toggle node schedulability.