- n.node used the n.lock as underlaying locker. The service loop initially
locked it, the Notify function tried to lock it before calling n.node.Signal,
leading to a dead-lock.
- the go routine calling ChangeMaster was not synchronized with the Notify
method. The former was triggering change events that the later never saw
when the former's startup was faster that of Notify. Hence, not even a single
event was noticed and not even a single start/stop call of the slow service
was triggered.
This patch replaces the n.node condition object with a simple channel n.changed.
The service loop watches it.
Updating the notified private variables is still protected with n.lock against
races, but independently of the n.changed channel. Hence, the deadlock is gone.
Moreover, the startup of the Notify loop is synchronized with the go routine which
changes the master. Hence, the Notify loop will see the master changes.
Fixes#10776
- Offers were reused and led to unexpected declining by the scheduler because
the reused offer did not get a new expiration time.
- Pod scheduling and offer creation was not synchronized. When scheduling
happened after aging of offers, the first issue was trigger. Because
the mesos driver DeclineOffer was not mocked this lead to a test error.
Depending on timing the mesos scheduler might call DeclineOffer:
The default ttl of an offer in mesos scheduler is 5sec. If the tests run longer,
the old, unused offers are declined, leading to an mock error.
Probably fixesGoogleCloudPlatform/kubernetes#10795
The file source was created even when no static pods were configured.
In this case it was never marked as seen. As a consequence the kubelet
syncPods functions never deleted pods because it was too cautious due
an unseen pod source, leading to leaked pods.
The TestExecutorFrameworkMessage test sends a "task-lost:foo" message to the
executor in order to mark a pod as lost. For that the pod must be running first.
Otherwise, the executor code will send "TASK_FAILED" status updates, not "TASK_LOST".
Before this patch there was no synchronization between the pod startup and the
test case. Moreover, in order to startup a task a working apiserver URL must be
passed to the executor which was not the case either.
Fixesmesosphere/kubernetes-mesos#351
- the mesos scheduler gets a --static-pods-config parameter with a directory with
pods specs. They are zipped and sent over to newly started mesos executors.
- the mesos executor receives the zipper static pod config via ExecutorInfo.Data
and starts up the pods via the kubelet FileSource mechanism.
- both - the scheduler and the executor side - are fully unit tested