Commit Graph

55 Commits

Author SHA1 Message Date
Jordan Liggitt
4fcd999c25
Fix watch cache filtering 2016-07-14 13:13:17 -04:00
Wojciech Tyczynski
1d9bc58328 Extend Filter interface with Trigger() and use it for pods and nodes 2016-07-13 08:45:18 +02:00
Wojciech Tyczynski
7f7ef0879f Change filter to interface in storage.Interface 2016-07-13 08:44:22 +02:00
Xiang Li
aa472ff734 cacher: replace usable lock with conditional variable 2016-07-04 08:57:59 -07:00
David McMahon
ef0c9f0c5b Remove "All rights reserved" from all the headers. 2016-06-29 17:47:36 -07:00
k8s-merge-robot
00b5b548d6 Merge pull request #26854 from xiang90/cacher
Automatic merge from submit-queue

cacher.go: remove NewCacher func

NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
2016-06-25 11:10:06 -07:00
Xiang Li
c530a5810a cacher: remove unnecessary initialzation 2016-06-04 22:49:45 -07:00
Xiang Li
e2aab093aa cacher.go: remove NewCacher func
NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
2016-06-04 22:46:58 -07:00
Jordan Liggitt
f80b59ba87 Return 'too old' errors from watch cache via watch stream 2016-05-10 10:59:53 -04:00
Russ Cox
6a19e46ed6 pkg/storage: cache timers
A previous change here replaced time.After with an explicit
timer that can be stopped, to avoid filling up the active timer list
with timers that are no longer needed. But an even better fix is to
reuse the timers across calls, to avoid filling the allocated heap
with work for the garbage collector. On top of that, try a quick
non-blocking send to avoid the timer entirely.

For the e2e 1000-node kubemark test, basically everything gets faster,
some things significantly so. The 90th and 99th percentile for LIST nodes
in particular are the worst case that has caused SLO/SLA problems
in the past, and this reduces 99th percentile by 10%.

name                               old ms/op  new ms/op   delta
LIST_nodes_p50                      127 ±16%    124 ±13%     ~     (p=0.136 n=29+29)
LIST_nodes_p90                      326 ±12%    278 ±15%  -14.85%  (p=0.000 n=29+29)
LIST_nodes_p99                      453 ±11%    405 ±19%  -10.70%  (p=0.000 n=29+28)
LIST_replicationcontrollers_p50    29.4 ±49%   26.6 ±43%     ~     (p=0.176 n=30+29)
LIST_replicationcontrollers_p90    83.0 ±78%   68.7 ±63%  -17.30%  (p=0.020 n=30+29)
LIST_replicationcontrollers_p99     216 ±43%    173 ±41%  -19.53%  (p=0.000 n=29+28)
DELETE_pods_p50                    24.5 ±14%   24.3 ±17%     ~     (p=0.562 n=30+28)
DELETE_pods_p90                    30.7 ± 1%   30.6 ± 0%   -0.44%  (p=0.000 n=29+28)
DELETE_pods_p99                    77.2 ±34%   56.3 ±27%  -26.99%  (p=0.000 n=30+28)
PUT_replicationcontrollers_p50     5.86 ±26%   5.83 ±36%     ~     (p=1.000 n=29+28)
PUT_replicationcontrollers_p90     15.8 ± 7%   15.9 ± 6%     ~     (p=0.936 n=29+28)
PUT_replicationcontrollers_p99     57.8 ±35%   56.7 ±41%     ~     (p=0.725 n=29+28)
PUT_nodes_p50                      14.9 ± 2%   14.9 ± 1%   -0.55%  (p=0.020 n=30+28)
PUT_nodes_p90                      16.5 ± 1%   16.4 ± 2%   -0.60%  (p=0.040 n=27+28)
PUT_nodes_p99                      57.9 ±47%   44.6 ±42%  -23.02%  (p=0.000 n=30+29)
POST_replicationcontrollers_p50    6.35 ±29%   6.33 ±23%     ~     (p=0.957 n=30+28)
POST_replicationcontrollers_p90    15.4 ± 5%   15.2 ± 6%   -1.14%  (p=0.034 n=29+28)
POST_replicationcontrollers_p99    52.2 ±71%   53.4 ±52%     ~     (p=0.720 n=29+27)
POST_pods_p50                      8.99 ±13%   9.33 ±13%   +3.79%  (p=0.023 n=30+29)
POST_pods_p90                      16.2 ± 4%   16.3 ± 4%     ~     (p=0.113 n=29+29)
POST_pods_p99                      30.9 ±21%   28.4 ±23%   -8.26%  (p=0.001 n=28+29)
POST_bindings_p50                  9.34 ±12%   8.98 ±17%     ~     (p=0.083 n=30+29)
POST_bindings_p90                  16.6 ± 1%   16.5 ± 2%   -0.76%  (p=0.000 n=28+26)
POST_bindings_p99                  23.5 ± 9%   21.4 ± 5%   -8.98%  (p=0.000 n=27+27)
PUT_pods_p50                       10.8 ±11%   10.3 ± 5%   -4.67%  (p=0.000 n=30+28)
PUT_pods_p90                       16.1 ± 1%   16.0 ± 1%   -0.55%  (p=0.003 n=29+29)
PUT_pods_p99                       23.4 ± 9%   21.6 ±14%   -8.03%  (p=0.000 n=28+28)
DELETE_replicationcontrollers_p50  2.42 ±16%   2.50 ±13%     ~     (p=0.072 n=29+29)
DELETE_replicationcontrollers_p90  11.5 ±12%   11.7 ±10%     ~     (p=0.190 n=30+28)
DELETE_replicationcontrollers_p99  19.5 ±21%   19.0 ±22%     ~     (p=0.298 n=29+28)
GET_nodes_p90                      1.20 ±16%   1.18 ±19%     ~     (p=0.626 n=28+29)
GET_nodes_p99                      11.4 ±48%    8.3 ±40%  -27.31%  (p=0.000 n=28+28)
GET_replicationcontrollers_p90     1.04 ±25%   1.03 ±21%     ~     (p=0.682 n=30+29)
GET_replicationcontrollers_p99     12.1 ±81%  10.0 ±123%     ~     (p=0.135 n=28+28)
GET_pods_p90                       1.06 ±19%   1.08 ±21%     ~     (p=0.597 n=29+29)
GET_pods_p99                       3.92 ±43%   2.81 ±39%  -28.39%  (p=0.000 n=27+28)
LIST_pods_p50                      68.0 ±16%   65.3 ±13%     ~     (p=0.066 n=29+29)
LIST_pods_p90                       119 ±19%    115 ±12%     ~     (p=0.091 n=28+27)
LIST_pods_p99                       230 ±18%    226 ±21%     ~     (p=0.251 n=27+28)
2016-04-21 15:53:47 -04:00
Andy Goldstein
049e63d253 Honor starting resourceVersion in watch cache
Compare the requested resourceVersion to each event's resourceVersion to ensure events that occurred
in the past are not sent to the client.
2016-04-14 09:37:22 -04:00
Daniel Smith
4c539bf082 Merge pull request #23490 from wojtek-t/remove_set_from_storage_interface
Remove Set() from storage.Interface.
2016-04-13 14:22:05 -07:00
Jordan Liggitt
ada60236f7 Make watch cache behave like uncached watch 2016-04-12 10:14:07 -04:00
Wojciech Tyczynski
53f433f019 Remove Set() from storage.Interface. 2016-04-04 17:54:18 +02:00
Chao Xu
31b425b3a1 add delete precondition 2016-03-25 11:21:39 -07:00
Russ Cox
e4b369e1d7 storage: clean up timer in cacheWatcher.add
In the e2e benchmarks, this timer is a significant source of garbage
and stale timers. Because the timer is not stopped after its use
in the select, it stays in the timer heap until it eventually fires
(5 seconds later). Under load, a lot of 5-second timers can pile up
before any start going away. The timer heap being large makes timer
operations take longer; the operations are O(log N) but N is still big.

The way to fix this in current versions of Go is to stop the underlying
timer explicitly, which this CL does for this one case.

There are many other places in the code that use the same idiom,
but those do not show up on profiles of the e2e server.
I am investigating changes for Go 1.7's runtime that would make
the old code behave like this new code transparently, so I don't
think it's worth updating any uses of the idiom that are not in
hot spots found with profiling.

Measuring 'LIST nodes' latency in milliseconds during e2e test
shows the benefit of this change.

Using Go 1.4.2:

BEFORE  p50: 148±7   p90: 328±19  p99: 513±29  n: 10
AFTER   p50: 151±8   p90: 339±19  p99: 479±20  n: 9

Using Go 1.6.0:

BEFORE  p50: 141±9   p90: 383±32  p99: 604±44  n: 11
AFTER   p50: 140±14  p90: 360±31  p99: 483±39  n: 10
2016-03-18 15:58:34 -04:00
Daniel Smith
3fb020b28d Fix a locking bug in the cacher. 2016-02-19 17:45:02 -08:00
Daniel Smith
74400c33ae changes for cross-group moves 2016-02-15 21:39:00 +01:00
Daniel Smith
4e85d42f99 fix logging every microsecond when etcd goes down 2016-02-09 00:12:19 -08:00
Jan Chaloupka
4389b3f0d6 Rewritte util.* -> wait.* wherever reasonable 2016-02-07 12:02:20 +01:00
Daniel Smith
26683fda29 add timeout to cacher 2016-02-01 15:36:15 -08:00
Chao Xu
ebcff4b5e4 fix the namespaceScoped of cachers 2016-01-28 16:24:54 -08:00
Wojciech Tyczynski
60fc2bc09e Fix cacher_test flake 2015-12-31 07:53:41 +01:00
Wojciech Tyczynski
65696989b2 Extend logging for debugging 18928 2015-12-30 20:09:05 +01:00
Wojciech Tyczynski
05b60a30cf Fix flakes in cacher_test 2015-12-28 15:28:07 +01:00
Wojciech Tyczynski
ec70eb16f3 Graceul termination in Cacher 2015-12-28 10:54:21 +01:00
Timothy St. Clair
c505a5d49d Updating kubernetes proper to use latest etcd client library 2015-12-16 15:56:35 -06:00
Wojciech Tyczynski
960808bf08 Switch to versioned ListOptions in client. 2015-12-14 14:26:09 +01:00
Wojciech Tyczynski
0cefb43707 Enable listing from memory 2015-12-09 16:24:14 +01:00
Wojciech Tyczynski
0369805308 Merge pull request #18207 from wojtek-t/string_resource_version
Change resourceVersion to string in storage.Interface
2015-12-09 15:00:54 +01:00
Wojciech Tyczynski
b0fcb5adef Pass ListOptions to List in ListWatch. 2015-12-07 11:53:53 +01:00
Wojciech Tyczynski
793da62c7f Change resourceVersion to string in storage.Interface 2015-12-07 09:22:59 +01:00
Wojciech Tyczynski
b6ef62af24 Use unversioned.ListOptions in clients. 2015-11-24 16:52:09 +01:00
feihujiang
ad79fa6e84 Move list functions from runtime to meta package 2015-11-20 09:20:55 +08:00
Wojciech Tyczynski
a5a8717539 Pass versioner to cacher. 2015-11-13 08:35:28 +01:00
Daniel Smith
45a1ec73bb Lengthen delay 2015-11-06 13:03:58 -08:00
Wojciech Tyczynski
b6a775ca50 Terminate watcher if it is full 2015-11-06 13:40:21 +01:00
Wojciech Tyczynski
8f385c563f Refactor code for creating Cacher. 2015-11-02 20:56:46 +01:00
Wojciech Tyczynski
652fb090eb Initial support for listing from in-memory cache. 2015-10-30 20:58:13 +01:00
k8s-merge-robot
d5be3635e5 Merge pull request #16273 from wojtek-t/list_options_in_api
Auto commit by PR queue bot
2015-10-29 01:57:29 -07:00
Wojciech Tyczynski
aa30e38183 Pass resource version to storage List operation. 2015-10-27 10:03:58 +01:00
Wojciech Tyczynski
d47e21f19f Reuse TCP connections in Reflector between resync periods. 2015-10-26 19:35:25 +01:00
Timothy St. Clair
2a2a2d79ff New etcd client modifications part 1 (context support)
This commit plumbs contexts which are needed for the new client.
2015-10-12 08:45:49 -05:00
k8s-merge-robot
647288cde1 Merge pull request #13734 from wojtek-t/filter_in_storage
Auto commit by PR queue bot
2015-09-14 17:25:40 -07:00
Daniel Smith
15b30b8b09 Move version agnostic parts of client
pkg/client/unversioned/cache -> pkg/client/cache
pkg/client/unversioned/record -> pkg/client/record
2015-09-10 17:17:59 -07:00
Wojciech Tyczynski
ed7d6ebd71 Filter List in Storage level to avoid additional copies. 2015-09-10 09:49:50 +02:00
Wojciech Tyczynski
d318b22f65 Move WatchCache to pkg/storage 2015-08-31 09:49:12 +02:00
Wojciech Tyczynski
a12b7edc42 Fix deadlock in Cacher on etcd error 2015-08-26 08:02:21 +02:00
Wojciech Tyczynski
ec6556987e Switch on Cacher for pods, endpoints and nodes. 2015-08-21 09:24:49 +02:00
Wojciech Tyczynski
03413ddb4a Merge pull request #12782 from wojtek-t/cacher_deadlock
Fix deadlock in the cacher
2015-08-20 08:27:15 +02:00