Addresses a version skew issue where the last condition status is always
evaluated as the NodeReady status. As a workaround force the NodeReady
condition to be the last in the list of node conditions.
ref: https://github.com/kubernetes/kubernetes/issues/16961
This change introduces pod lifecycle event generator (PLEG), and adds a generic
PLEG. The generic PLEG relies on relisting to discover container events, and is
container-runtime-agnostic. Both docker and rkt are changed to use generic
PLEG.
- status.Manager always deals with the local (static) pod, but gets the
mirror pod when syncing
- This lets components like the probe workers ignore mirror pods
Now that kubelet checks sources seen correctly, there is no need to enforce the
initial order of pod updates and housekeeping. Use a ticker for housekeeping to
simplify the code.
Currently kubelet syncs all pods every 10s. This is not preferred because
* Some pods may have been sync'd recently.
* This may cause all the pods to be sync'd at once, causing undesirable
CPU spikes.
This PR replaces the global syncs with independent, periodic pod syncs. At the
end of syncing, each pod worker will enqueue itslef with a future timestamp (
current time + sync interval), when it will be due for another sync.
* If the pod worker encoutners an sync error, it may requeue with a different
timestamp to retry sooner.
* If a sync is triggered by the update channel (events or spec changes), the
pod worker would enqueue a new sync time.
This change is necessary for moving to long or no periodic sync period once pod
lifecycle event generator is completed. We will still rely on the mechanism to
requeue the pod on sync error.
This change also makes sure that if a sync does not succeed (either due to
real error or the per-container backoff mechanism), an error would be propagated
back to the pod worker, which is responsible for requeuing.
Define a new out of disk node condition and use it to report when node
goes out of disk.
Make a copy of loop range clause variable in node listers so that it
is available outside the for loop.
Also update/implement unit tests.
This commit builds on previous work and creates an independent
worker for every liveness probe. Liveness probes behave largely the same
as readiness probes, so much of the code is shared by introducing a
probeType paramater to distinguish the type when it matters. The
circular dependency between the runtime and the prober is broken by
exposing a shared liveness ResultsManager, owned by the
kubelet. Finally, an Updates channel is introduced to the ResultsManager
so the kubelet can react to unhealthy containers immediately.
Change all references to the container ID in pkg/kubelet/... to the
strong type defined in pkg/kubelet/container: ContainerID
The motivation for this change is to make the format of the ID
unambiguous, specifically whether or not it includes the runtime
prefix (e.g. "docker://").
The current implementation considers a source seen when it receives a SET at
kubelet/config/config.go. However, the main kubelet sync loop may not have
received the pod update from the source via the channel. This change ensures
that kubelet would consider all sources are ready only after the sync loop has
seen all the sources.
Each container with a readiness has an individual go-routine which
handles periodic probing for that container. The results are cached, and
written to the status.Manager in the pod sync path.
In many cases clients may wish to view not ready addresses for endpoints
in order to do set membership prior to a pod being ready. For instance,
a pod that uses the service endpoints to connect to other pods under
the same service, but does not want to signal ready before it has
contacted at least a minimal number of other pods.
This is backwards compatible with old servers and clients. There is
an additional cost in size of endpoints before services ramp up, which
will add minor CPU and memory use for services that have a significant
number of pods which have not become ready.
This refactor is in preparation for moving more state handling to the
status manager. It will become the canonical cache for the latest
information on running containers and probe status, as part of the
prober refactoring.
1. Make reason field of StatusReport objects in kubelet in CamelCase format.
2. Add Message field for ContainerStateWaiting to describe detail about Reason.
3. Make reason field of Events in kubelet in CamelCase format.
4. Update swagger,deep-copy and so on.
Both GetNode and the cache.ListWatch listfunc in the
kubelet package call List unnecessary.
GetNodeInfo is sufficient for GetNode and makes looping
through a list of nodes to check for a matching name
unnecessary.
resolves#13476
Allow the user to specify the resolver configuration file that is used
to determine the default DNS parameters. This defaults to the system's
/etc/resolv.conf.
Currently, whenever there is any update, kubelet would force all pod workers to
sync again, causing resource contention and hence performance degradation.
This commit flips kubelet to use incremental updates (as opposed to snapshots).
This allows us to know what pods have changed and send updates to those pod
workers only. The `SyncPods` function has been replaced with individual
handlers, each handling an operation (ADD, REMOVE, UPDATE). Pod workers are
still triggered periodically, and kubelet performs periodic cleanup as well.
This commit also spawns a new goroutine solely responsible for killing pods.
This is necessary because pod killing could hold up the sync loop for
indefinitely long amount of time now user can define the graceful termination
period in the container spec.
We chose to use podFullName (name_namespace) as key in the status manager
because mirror pod and static pod share the same status. This is no longer
needed because we do not store statuses for static pods anymore (we only
store statuses for their mirror pods). Also, reviously, a few fixes were
merged to ensure statuses are cleaned up so that a new pod with the same
name would not resuse an old status.
This change cleans up the code by using UID as key so that the code would
become less brittle.
Getting the public IP a container is supposed to use is O(hard),
and usually involves ugly gyrations in python or with interfaces.
Using the downward API means that the IP Kube is announcing to
other endpoints is also visible inside the container for pods to
identify themselves.