Commit Graph

95 Commits

Author SHA1 Message Date
Wojciech Tyczynski
d5e235c831 Reduce timeout for waiting for resource version 2016-12-20 10:05:38 +01:00
Wojciech Tyczynski
457c9a2e6e Reduce amount of allocations in cacher 2016-12-19 13:51:07 +01:00
Chao Xu
7e787b144a fix leaking goroutine issues in watch cache 2016-12-12 21:41:33 -08:00
Wojciech Tyczynski
01699ef320 Proper fix for non-receiving watchers 2016-12-09 09:43:10 +01:00
Clayton Coleman
3454a8d52c
refactor: update bazel, codec, and gofmt 2016-12-03 19:10:53 -05:00
Clayton Coleman
5df8cc39c9
refactor: generated 2016-12-03 19:10:46 -05:00
Wojciech Tyczynski
ec247315be Handle RV in Get calls to storage interface. 2016-12-03 10:18:43 +01:00
Kubernetes Submit Queue
cd560926bd Merge pull request #36889 from wojtek-t/reuse_fields_and_labels
Automatic merge from submit-queue

Reuse fields and labels

This should significantly reduce memory allocations in apiserver in large cluster.
Explanation:
- every kubelet is refreshing watch every 5-10 minutes (this generally is not causing relist - it just renews watch)
- that means, in 5000-node cluster, we are issuing ~10 watches per second
- since we don't have "watch heartbets", the watch is issued from previously received resourceVersion
- to make some assumption, let's assume pods are evenly spread across pods, and writes for them are evenly spread - that means, that a given kubelet is interested in 1 per 5000 pod changes
- with that assumption, each watch, has to process 2500 (on average) previous watch events
- for each of such even, we are currently computing fields.

This PR is fixing this problem.
2016-12-02 21:49:43 -08:00
Wojciech Tyczynski
36e6cd19e1 Cache fields for filtering in watchCache. 2016-11-29 09:48:09 +01:00
Wojciech Tyczynski
ac7b1065e7 Better waiting for watch event delivery in cacher 2016-11-28 09:25:33 +01:00
Chao Xu
4f3d0e3bde more dependencies packages:
pkg/metrics
pkg/credentialprovider
pkg/security
pkg/securitycontext
pkg/serviceaccount
pkg/storage
pkg/fieldpath
2016-11-23 15:53:09 -08:00
Kubernetes Submit Queue
6f80ec91d6 Merge pull request #35415 from wojtek-t/avoid_get
Automatic merge from submit-queue

Try to avoid Get to etcd in GuaranteedUpdate in Cacher
2016-10-26 16:15:06 -07:00
Wojciech Tyczynski
5d2062db9f Reduce amount of not-helping logs in apiserver 2016-10-26 13:20:07 +02:00
Wojciech Tyczynski
a1090151ef Try to avoid Get to etcd in GuaranteedUpdate in Cacher 2016-10-25 21:59:02 +02:00
Wojciech Tyczynski
93c008f8a4 Support resourceVersion in GetToList - unify interface of List and GetToList 2016-10-21 10:09:23 +02:00
Kubernetes Submit Queue
5fcb9fd056 Merge pull request #35125 from wojtek-t/avoid_unnecessary_reallocations
Automatic merge from submit-queue

Avoid unnecessary reallocations of slice in Cacher
2016-10-19 20:33:13 -07:00
Wojciech Tyczynski
0ced3f43bf Avoid unnecessary reallocations of slice in Cacher 2016-10-19 19:33:33 +02:00
Wojciech Tyczynski
8040719d7f Avoid computing key func multiple times in cacher 2016-10-19 08:38:18 +02:00
Wojciech Tyczynski
f10b0205e7 Store keys in watchCache store 2016-10-19 08:38:18 +02:00
Wojciech Tyczynski
9895f337ee Avoid unnecessary copies in cacher 2016-10-19 08:33:58 +02:00
Wojciech Tyczynski
0f2270698c Reduce amount of annoying logs in cacher 2016-10-17 16:15:24 +02:00
Wojciech Tyczynski
4d5ac91f88 Add tracing to listing in Cacher 2016-10-17 08:58:40 +02:00
Wojciech Tyczynski
2298e1746c Increase buffer sizes in cacher for watchers interested in all/many objects. 2016-10-13 16:40:33 +02:00
Wojciech Tyczynski
c02df26ad6 Improve some logging in cacher 2016-10-07 15:04:08 +02:00
Wojciech Tyczynski
90bc19959d Extend logging in cacher to understand its bottleneck 2016-10-06 10:57:46 +02:00
Hongchao Deng
6f3ac807fd pass SelectionPredicate instead of Filter to storage layer 2016-09-26 09:47:19 -07:00
Lucas Käldström
06917531b3 Move HighWaterMark to the top of the struct in order to fix arm, second time 2016-09-23 20:58:28 +03:00
Wojciech Tyczynski
e5b3f19638 Fix logging in cacher 2016-09-14 09:13:41 +02:00
Wojciech Tyczynski
949dd90593 Extend logging for performance debuggin 2016-09-12 12:46:19 +02:00
Wojciech Tyczynski
03a23aed09 Log water mark for incoming queue in cacher 2016-09-09 11:35:05 +02:00
Kubernetes Submit Queue
504ccc6f37 Merge pull request #32275 from wojtek-t/split_process_event
Automatic merge from submit-queue

Split dispatching to watchers in Cacher into separate goroutine.

Should help with #32257
2016-09-08 07:42:12 -07:00
Wojciech Tyczynski
e750454c31 Fix allow for non-ready nodes in e2e framework 2016-09-08 14:22:08 +02:00
Wojciech Tyczynski
378cd81dbe Split dispatching to watchers in Cacher into separate goroutine. 2016-09-08 13:27:54 +02:00
Wojciech Tyczynski
bd54c389f5 Extend logging for scalability tests debugging 2016-09-08 12:02:59 +02:00
Hongchao Deng
a607a69f4a pkg/storage: cleanup Codec() from interface 2016-08-15 20:46:13 -07:00
Kubernetes Submit Queue
a69054f9c3 Merge pull request #30368 from wojtek-t/log_terminating_all_watchers
Automatic merge from submit-queue

Log warning when terminating all watchers

Ref #30275

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/30368)
<!-- Reviewable:end -->
2016-08-10 09:26:07 -07:00
Wojciech Tyczynski
497f891cfb Log warning when terminating all watchers 2016-08-10 17:04:10 +02:00
Hongchao Deng
7f28eda9be storage interface: remove Backends() 2016-08-07 16:10:18 -07:00
Wojciech Tyczynski
33e612e101 Revert "cacher.go: embed storage.Interface into cacher" 2016-07-22 07:28:45 +02:00
Xiang Li
44c0a1190c cacher.go: embed storage.Interface into cacher 2016-07-16 23:25:48 -07:00
Jordan Liggitt
4fcd999c25
Fix watch cache filtering 2016-07-14 13:13:17 -04:00
Wojciech Tyczynski
1d9bc58328 Extend Filter interface with Trigger() and use it for pods and nodes 2016-07-13 08:45:18 +02:00
Wojciech Tyczynski
7f7ef0879f Change filter to interface in storage.Interface 2016-07-13 08:44:22 +02:00
Xiang Li
aa472ff734 cacher: replace usable lock with conditional variable 2016-07-04 08:57:59 -07:00
David McMahon
ef0c9f0c5b Remove "All rights reserved" from all the headers. 2016-06-29 17:47:36 -07:00
k8s-merge-robot
00b5b548d6 Merge pull request #26854 from xiang90/cacher
Automatic merge from submit-queue

cacher.go: remove NewCacher func

NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
2016-06-25 11:10:06 -07:00
Xiang Li
c530a5810a cacher: remove unnecessary initialzation 2016-06-04 22:49:45 -07:00
Xiang Li
e2aab093aa cacher.go: remove NewCacher func
NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
2016-06-04 22:46:58 -07:00
Jordan Liggitt
f80b59ba87 Return 'too old' errors from watch cache via watch stream 2016-05-10 10:59:53 -04:00
Russ Cox
6a19e46ed6 pkg/storage: cache timers
A previous change here replaced time.After with an explicit
timer that can be stopped, to avoid filling up the active timer list
with timers that are no longer needed. But an even better fix is to
reuse the timers across calls, to avoid filling the allocated heap
with work for the garbage collector. On top of that, try a quick
non-blocking send to avoid the timer entirely.

For the e2e 1000-node kubemark test, basically everything gets faster,
some things significantly so. The 90th and 99th percentile for LIST nodes
in particular are the worst case that has caused SLO/SLA problems
in the past, and this reduces 99th percentile by 10%.

name                               old ms/op  new ms/op   delta
LIST_nodes_p50                      127 ±16%    124 ±13%     ~     (p=0.136 n=29+29)
LIST_nodes_p90                      326 ±12%    278 ±15%  -14.85%  (p=0.000 n=29+29)
LIST_nodes_p99                      453 ±11%    405 ±19%  -10.70%  (p=0.000 n=29+28)
LIST_replicationcontrollers_p50    29.4 ±49%   26.6 ±43%     ~     (p=0.176 n=30+29)
LIST_replicationcontrollers_p90    83.0 ±78%   68.7 ±63%  -17.30%  (p=0.020 n=30+29)
LIST_replicationcontrollers_p99     216 ±43%    173 ±41%  -19.53%  (p=0.000 n=29+28)
DELETE_pods_p50                    24.5 ±14%   24.3 ±17%     ~     (p=0.562 n=30+28)
DELETE_pods_p90                    30.7 ± 1%   30.6 ± 0%   -0.44%  (p=0.000 n=29+28)
DELETE_pods_p99                    77.2 ±34%   56.3 ±27%  -26.99%  (p=0.000 n=30+28)
PUT_replicationcontrollers_p50     5.86 ±26%   5.83 ±36%     ~     (p=1.000 n=29+28)
PUT_replicationcontrollers_p90     15.8 ± 7%   15.9 ± 6%     ~     (p=0.936 n=29+28)
PUT_replicationcontrollers_p99     57.8 ±35%   56.7 ±41%     ~     (p=0.725 n=29+28)
PUT_nodes_p50                      14.9 ± 2%   14.9 ± 1%   -0.55%  (p=0.020 n=30+28)
PUT_nodes_p90                      16.5 ± 1%   16.4 ± 2%   -0.60%  (p=0.040 n=27+28)
PUT_nodes_p99                      57.9 ±47%   44.6 ±42%  -23.02%  (p=0.000 n=30+29)
POST_replicationcontrollers_p50    6.35 ±29%   6.33 ±23%     ~     (p=0.957 n=30+28)
POST_replicationcontrollers_p90    15.4 ± 5%   15.2 ± 6%   -1.14%  (p=0.034 n=29+28)
POST_replicationcontrollers_p99    52.2 ±71%   53.4 ±52%     ~     (p=0.720 n=29+27)
POST_pods_p50                      8.99 ±13%   9.33 ±13%   +3.79%  (p=0.023 n=30+29)
POST_pods_p90                      16.2 ± 4%   16.3 ± 4%     ~     (p=0.113 n=29+29)
POST_pods_p99                      30.9 ±21%   28.4 ±23%   -8.26%  (p=0.001 n=28+29)
POST_bindings_p50                  9.34 ±12%   8.98 ±17%     ~     (p=0.083 n=30+29)
POST_bindings_p90                  16.6 ± 1%   16.5 ± 2%   -0.76%  (p=0.000 n=28+26)
POST_bindings_p99                  23.5 ± 9%   21.4 ± 5%   -8.98%  (p=0.000 n=27+27)
PUT_pods_p50                       10.8 ±11%   10.3 ± 5%   -4.67%  (p=0.000 n=30+28)
PUT_pods_p90                       16.1 ± 1%   16.0 ± 1%   -0.55%  (p=0.003 n=29+29)
PUT_pods_p99                       23.4 ± 9%   21.6 ±14%   -8.03%  (p=0.000 n=28+28)
DELETE_replicationcontrollers_p50  2.42 ±16%   2.50 ±13%     ~     (p=0.072 n=29+29)
DELETE_replicationcontrollers_p90  11.5 ±12%   11.7 ±10%     ~     (p=0.190 n=30+28)
DELETE_replicationcontrollers_p99  19.5 ±21%   19.0 ±22%     ~     (p=0.298 n=29+28)
GET_nodes_p90                      1.20 ±16%   1.18 ±19%     ~     (p=0.626 n=28+29)
GET_nodes_p99                      11.4 ±48%    8.3 ±40%  -27.31%  (p=0.000 n=28+28)
GET_replicationcontrollers_p90     1.04 ±25%   1.03 ±21%     ~     (p=0.682 n=30+29)
GET_replicationcontrollers_p99     12.1 ±81%  10.0 ±123%     ~     (p=0.135 n=28+28)
GET_pods_p90                       1.06 ±19%   1.08 ±21%     ~     (p=0.597 n=29+29)
GET_pods_p99                       3.92 ±43%   2.81 ±39%  -28.39%  (p=0.000 n=27+28)
LIST_pods_p50                      68.0 ±16%   65.3 ±13%     ~     (p=0.066 n=29+29)
LIST_pods_p90                       119 ±19%    115 ±12%     ~     (p=0.091 n=28+27)
LIST_pods_p99                       230 ±18%    226 ±21%     ~     (p=0.251 n=27+28)
2016-04-21 15:53:47 -04:00