Commit Graph

120 Commits

Author SHA1 Message Date
Dr. Stefan Schimanski
2b1ecd28f0 Add Mesos hyperkube minion server
The minion server will
- launch the proxy and executor
- relaunch them when they terminate uncleanly
- logrotate their logs.

It is a replacement for a full-blown init process like s6 which is not necessary
in this case.
2015-07-31 12:28:40 +02:00
Dr. Stefan Schimanski
1200125137 Share KM_* constants 2015-07-31 11:27:52 +02:00
Dr. Stefan Schimanski
0ebf1811f3 Add scheduler flag --executor-proxy-logv 2015-07-31 11:27:52 +02:00
Dr. Stefan Schimanski
f59b5f503b Use BindingHostKey annotation to detect scheduled pods in k8sm-scheduler
Before NodeName in the pod spec was used. Hence, pods with a fixed, pre-set
NodeName were never scheduled by the k8sm-scheduler, leading e.g. to a failing
e2e intra-pod test.

Fixes mesosphere/kubernetes-mesos#388
2015-07-31 10:22:20 +02:00
Mike Danese
51a7a38f67 Merge pull request #12020 from wojtek-t/move_to_storage
Move storage-related code to pkg/storage
2015-07-30 16:23:03 -07:00
Mike Danese
ed9975b031 Merge pull request #11230 from mesosphere/resource-accounting
Implement resource accounting for pods with the Mesos scheduler
2015-07-30 14:36:41 -07:00
Dr. Stefan Schimanski
f3f617d9db Update documentation about resource accounting 2015-07-30 21:18:15 +02:00
Dr. Stefan Schimanski
a2fa41b73f Implement resource accounting for pods with the Mesos scheduler
This patch

- set limits (0.25 cpu, 64 MB) on containers which are not limited in pod spec
  (these are also passed to the kubelet such that it uses them for the docker
  run limits)
- sums up the container resource limits for cpu and memory inside a pod,
- compares the sums to the offered resources
- puts the sums into the Mesos TaskInfo such that Mesos does the accounting
  for the pod.
- parses the static pod spec and adds up the resources
- sets the executor resources to 0.25 cpu, 64 MB plus the static pod resources
- sets the cgroups in the kubelet for system containers, resource containers
  and docker to the one of the executor that Mesos assigned
- adds scheduler parameters --default-container-cpu-limit and
  --default-container-mem-limit.

The containers themselves are resource limited the Docker resource limit which
the kubelet applies when launching them.

Fixes mesosphere/kubernetes-mesos#68 and mesosphere/kubernetes-mesos#304
2015-07-30 21:18:04 +02:00
Wojciech Tyczynski
3cbbe72f9f Move etcd storage to pkg/storage/etcd 2015-07-30 15:42:06 +02:00
Ananya Kumar
47dd0bc6f9 Refactor package controller 2015-07-29 09:54:35 -07:00
Brendan Burns
0c07b66226 Skip another flaky mesos test. 2015-07-28 15:18:21 -07:00
David Oppenheimer
bfb6203627 Merge pull request #11940 from brendandburns/e2e2
Disable a couple of flaky tests.
2015-07-28 21:31:53 +02:00
Brendan Burns
660efc7583 Disable a couple of flaky tests. 2015-07-28 11:42:39 -07:00
Alex Robinson
60611c253e Add a resync period for services in the service controller.
This should ensure all load balancers get deleted even if a reordering of
watch events causes us to strand one after its service has been deleted,
because the sync will notice that the service controller's cache has a
service in it that no longer exists in the apiserver.

It could still leak in the case that the controller manager is killed
between when it leaks something and the sync runs, but this should
improve things.
2015-07-27 18:03:13 +00:00
Mike Danese
b51b4e740f Merge pull request #10639 from caseydavenport/master
Allow specification of a network plugins directory when starting kubelet
2015-07-24 11:09:11 -07:00
Mike Danese
c70d8d4c59 Merge pull request #11108 from mesosphere/fix-10795
Fix races in mesos scheduler plugin test
2015-07-24 09:35:29 -07:00
Vish Kannan
2a5a6b99cb Merge pull request #10635 from smarterclayton/cloud_provider_should_err
Cloud provider should return an error
2015-07-23 17:50:45 -07:00
Casey D
db3650fe58 Fix missing network plugin directory argument. 2015-07-23 13:05:59 -07:00
Wojciech Tyczynski
ee92aa3897 Prepare for extracting EtcdHelper interface 2015-07-23 09:37:39 +02:00
Dr. Stefan Schimanski
8fca9b6f09 Add original k8s-mesos docs to contrib/mesos 2015-07-19 10:13:25 +02:00
Eric Tune
f5e6161e49 Merge pull request #11298 from mesosphere/fix-10776
Fix deadlocks and race conditions in mesos master election notifier
2015-07-15 13:55:17 -07:00
Dr. Stefan Schimanski
e98c8e7685 Fix deadlocks and race conditions in mesos master election notifier
- n.node used the n.lock as underlaying locker. The service loop initially
  locked it, the Notify function tried to lock it before calling n.node.Signal,
  leading to a dead-lock.
- the go routine calling ChangeMaster was not synchronized with the Notify
  method. The former was triggering change events that the later never saw
  when the former's startup was faster that of Notify. Hence, not even a single
  event was noticed and not even a single start/stop call of the slow service
  was triggered.

This patch replaces the n.node condition object with a simple channel n.changed.
The service loop watches it.

Updating the notified private variables is still protected with n.lock against
races, but independently of the n.changed channel. Hence, the deadlock is gone.

Moreover, the startup of the Notify loop is synchronized with the go routine which
changes the master. Hence, the Notify loop will see the master changes.

Fixes #10776
2015-07-15 21:45:53 +02:00
Eric Tune
3dad5a0652 Merge pull request #10835 from mesosphere/mesos-root-ca-file
Add --root-ca-key code to Mesos controller-manager fork
2015-07-14 12:16:49 -07:00
Dr. Stefan Schimanski
95c7dc8cb3 Re-enable mesos scheduler TestPlugin_LifeCycle test 2015-07-13 22:43:16 +02:00
Dr. Stefan Schimanski
143cf4b08d Use correct offer's hostname of test pods in mesos scheduler plugin tests 2015-07-13 22:41:23 +02:00
Dr. Stefan Schimanski
dd7345b25f Fix offer+pod races in mesos scheduler plugin test
- Offers were reused and led to unexpected declining by the scheduler because
  the reused offer did not get a new expiration time.
- Pod scheduling and offer creation was not synchronized. When scheduling
  happened after aging of offers, the first issue was trigger. Because
  the mesos driver DeclineOffer was not mocked this lead to a test error.
2015-07-13 22:41:23 +02:00
Dr. Stefan Schimanski
bf44f5df28 Add DeclineOffer return value to mock driver in mesos scheduler test
Depending on timing the mesos scheduler might call DeclineOffer:

The default ttl of an offer in mesos scheduler is 5sec. If the tests run longer,
the old, unused offers are declined, leading to an mock error.

Probably fixes GoogleCloudPlatform/kubernetes#10795
2015-07-13 22:41:23 +02:00
David Oppenheimer
089a703194 Disable TestPlugin_LifeCycle due to flakiness. 2015-07-10 22:14:16 -07:00
Dr. Stefan Schimanski
39b3af0fdc Add --root-ca-key code to Mesos' controller-manager fork 2015-07-07 18:19:47 +02:00
nikhiljindal
274792d7bb Stop exposing v1beta3 by default 2015-07-01 14:38:02 -07:00
Clayton Coleman
d8bb4552de Cloud provider should return an error
Not fatal - makes cloud provider useful in methods that
can return error.
2015-07-01 14:41:49 -04:00
Maxwell Forbes
712f303350 Merge pull request #9736 from sdminonne/bug_fix2
To add validation for service ports when defined as string
2015-06-25 19:37:04 -07:00
Maxwell Forbes
655179dcfb Merge pull request #10264 from mikedanese/ca-token
add ca cert to token controller and all service accounts
2015-06-25 09:56:35 -07:00
Maxwell Forbes
3afda5d566 Merge pull request #10312 from dchen1107/cleanup
Take 2: Fix the race between configuring cbr0 and restarting static pods
2015-06-24 17:59:50 -07:00
Maxwell Forbes
28946766a3 Merge pull request #9807 from krousey/container_manifest
Removing ContainerManifest
2015-06-24 17:55:29 -07:00
Mike Danese
56bde3342a add ca to token controller and all service accounts 2015-06-24 15:10:20 -07:00
Kris Rousey
d13421e084 Removing ContainerManifest 2015-06-24 11:31:34 -07:00
Dawn Chen
6ddfa512de Revert "Revert "Fix the race between configuring cbr0 and restarting static pods""
This reverts commit fd0a95dd12.
2015-06-24 11:10:10 -07:00
Piotr Szczesniak
fd0a95dd12 Revert "Fix the race between configuring cbr0 and restarting static pods" 2015-06-24 09:56:49 +02:00
Jeff Lowdermilk
50d50a3cb8 Merge pull request #10211 from dchen1107/cleanup
Fix the race between configuring cbr0 and restarting static pods
2015-06-23 17:09:01 -07:00
Dawn Chen
23200d303f Fix several issues on running syncPods until network is configured.
Also fixed unittests and compiling.
2015-06-23 12:11:19 -07:00
Dr. Stefan Schimanski
9e0c9b4f5a Mesos: create static pod file source only for configured static pods
The file source was created even when no static pods were configured.
In this case it was never marked as seen. As a consequence the kubelet
syncPods functions never deleted pods because it was too cautious due
an unseen pod source, leading to leaked pods.
2015-06-23 12:25:21 +02:00
Jeff Lowdermilk
0d7de0991e Disable TestProc_doWithNestedXConcurrent
This test is killing more than 50% of shippable builds. Disabling
to stop the madness.
2015-06-22 15:39:43 -07:00
Salvatore Dario Minonne
4b13faa346 To add validation for service ports when defined as string (fixing issue #9734) 2015-06-22 17:21:51 +02:00
Justin Santa Barbara
97cafd20f6 NodeName != HostName: Fixes for contrib/mesos 2015-06-18 12:40:14 -07:00
Satnam Singh
e4f5529a2d Revert "Allow nodename to be != hostname, use AWS instance ID on AWS" 2015-06-18 11:27:55 -07:00
Justin Santa Barbara
77e1bd3f56 NodeName != HostName: Fixes for contrib/mesos 2015-06-17 00:40:43 -04:00
Dr. Stefan Schimanski
7abe12d6f4 Fix flaky mesos executor test
The TestExecutorFrameworkMessage test sends a "task-lost:foo" message to the
executor in order to mark a pod as lost. For that the pod must be running first.
Otherwise, the executor code will send "TASK_FAILED" status updates, not "TASK_LOST".

Before this patch there was no synchronization between the pod startup and the
test case. Moreover, in order to startup a task a working apiserver URL must be
passed to the executor which was not the case either.

Fixes mesosphere/kubernetes-mesos#351
2015-06-16 09:08:23 +02:00
Fabio Yeon
241e87cf9b Merge pull request #9077 from mesosphere/staticPodsUpstream
Add static pod support to mesos scheduler and executor.
2015-06-15 15:20:33 -07:00
Fabio Yeon
da02e3059a Merge pull request #9789 from mesosphere/plugin-test-race
Fix mesos plugin-test race
2015-06-15 13:04:12 -07:00