Commit Graph

8599 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
93ddb7be5f Merge pull request #52237 from smarterclayton/watch_metric
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)

Improve apiserver metrics reporting

Normalize "WATCHLIST" to "WATCH", add "scope" to the other metrics (listing 50k pods is != listing pods in a namespace), and add a new scope "resource" to cover individual resource calls.

This roughly aligns metrics with our ACL model (technically resource scope is GET, but POST to a subresource and POST to a namespace are not the same thing).

```release-note
WATCHLIST calls are now reported as WATCH verbs in prometheus for the apiserver_request_* series.  A new "scope" label is added to all apiserver_request_* values that is either 'cluster', 'resource', or 'namespace' depending on which level the query is performed at.
```
2017-09-15 01:08:11 -07:00
Kubernetes Submit Queue
9d8c11924f Merge pull request #51781 from bsalamat/preemption_tests
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)

Add more tests for pod preemption

**What this PR does / why we need it**:
Adds more e2e and integration tests for pod preemption.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:
This PR is based on #50949. Only the last commit is new.

**Release note**:

```release-note
NONE
```

ref/ #47604

@kubernetes/sig-scheduling-pr-reviews @davidopp
2017-09-15 00:11:17 -07:00
Kubernetes Submit Queue
2c81db53ce Merge pull request #52442 from crassirostris/sd-logging-e2e-fix-trimming
Automatic merge from submit-queue

[fluentd-gcp addon] Remove some e2e tests out of blocking suites

Fixes https://github.com/kubernetes/kubernetes/issues/52433

Some Stackdriver Logging e2e tests are broken in release-blocking suites:

- Due to the change in Docker 1.13, on some systems logs are automatically split by 16K chunks. This PR removes an e2e test that assumes otherwise
- In large clusters, it's not possible to ingest system logs from all nodes

Since it's not a Kubernetes problem per se, mitigating this by removing these tests from blocking suites.
2017-09-14 23:38:04 -07:00
Kubernetes Submit Queue
471b0beb2e Merge pull request #52480 from aleksandra-malinowska/test-fix-gke-small
Automatic merge from submit-queue

Fix failing autoscaling test in GKE

This should fix `[sig-autoscaling] Cluster size autoscaling [Slow] should increase cluster size if pending pods are small and there is another node pool that is not autoscaled [Feature:ClusterSizeAutoscalingScaleUp]` by getting a list of nodes from GKE nodepool in a different way (filtering nodes by labels.) Currently, gcloud command used for it is failing, as we only have GKE node pool name in the test and not the actual MIG name.
2017-09-14 18:48:26 -07:00
Kubernetes Submit Queue
5d995e3f7b Merge pull request #52372 from caesarxuchao/remove-config-copy
Automatic merge from submit-queue (batch tested with PRs 52376, 52439, 52382, 52358, 52372)

Remove the conversion of client config

It was needed because the clientset code in client-go was a copy of the clientset code in Kubernetes.. client-go is authoritative now, so we can remove the nasty copy.
2017-09-14 15:27:17 -07:00
Mik Vyatskov
e79ce0a50d [fluentd-gcp addon] Remove trimming e2e tests out of blocking suites 2017-09-14 19:16:20 +02:00
Kubernetes Submit Queue
3c8fb4b90f Merge pull request #52426 from shyamjvs/dont-crash-on-missing-data
Automatic merge from submit-queue

Don't crash density test on missing a single measurement

We failed our last run due to this (https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/33) and didn't have pod-startup latency recorded at all.
2017-09-14 05:09:46 -07:00
Aleksandra Malinowska
158ffdb1ec Get nodes from GKE node pool by checking labels 2017-09-14 12:06:34 +02:00
Bobby (Babak) Salamat
f11b0a65d1 Add more tests on pod preemption 2017-09-13 12:12:07 -07:00
Kubernetes Submit Queue
56e461fdcf Merge pull request #52431 from shyamjvs/bump-lb-controller-resource-check
Automatic merge from submit-queue

Make CPU constraint for l7-lb-controller in density test scale with #nodes

Just noticed that we changed the memory last time, but didn't change cpu. From the last run:

```
Sep 13 04:25:03.360: INFO: Unexpected error occurred: Container l7-lb-controller-v0.9.6-gce-scale-cluster-master/l7-lb-controller is using 0.642709233/0.15 CPU
```
2017-09-13 11:10:33 -07:00
Shyam Jeedigunta
fad26a71c8 Make CPU constraint for l7-lb-controller in density test scale with #nodes 2017-09-13 18:21:35 +02:00
Shyam Jeedigunta
4f3e3c6278 Don't crash density test on missing a single measurement 2017-09-13 16:11:53 +02:00
Kubernetes Submit Queue
5af069b727 Merge pull request #52413 from aleksandra-malinowska/autoscaling-tests-extra-logs-2
Automatic merge from submit-queue

Add logging gcloud command error in e2e tests

This adds extra log line to help with debugging GKE tests.
2017-09-13 06:58:52 -07:00
Kubernetes Submit Queue
991afb2436 Merge pull request #52375 from jiayingz/deviceplugin-e2e
Automatic merge from submit-queue (batch tested with PRs 52316, 52289, 52375)

Extends GPUDevicePlugin e2e test to exercise device plugin restarts.

**What this PR does / why we need it**:
This is part of issue #52189 but does not fix it.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-09-13 04:04:55 -07:00
Kubernetes Submit Queue
c9759ae318 Merge pull request #52289 from crassirostris/sd-logging-trim-long-lines
Automatic merge from submit-queue (batch tested with PRs 52316, 52289, 52375)

[fluentd-gcp addon] Trim too long log entries due to Stackdriver limitations

Stackdriver doesn't support log entries bigger than 100KB, so by default fluentd plugin just drops such entries. To avoid that and increase the visibility of this problem it's suggested to trim long lines instead.

/cc @igorpeshansky

```release-note
[fluentd-gcp addon] Fluentd will trim lines exceeding 100KB instead of dropping them.
```
2017-09-13 04:04:52 -07:00
Aleksandra Malinowska
c173296632 log gcloud command error 2017-09-13 11:56:55 +02:00
Mik Vyatskov
d8525f8bd1 [fluentd-gcp addon] Trim too long log entries due to Stackdriver limitation 2017-09-13 10:27:17 +02:00
Kubernetes Submit Queue
be78d113b1 Merge pull request #52201 from timothysc/ephemeral_gate
Automatic merge from submit-queue

Version gates the ephemeral storage e2e test

Version gates the ephemeral storage e2e test.

**Release note**:
```
NONE
```

@kubernetes/sig-testing-pr-reviews
2017-09-12 23:24:42 -07:00
Kubernetes Submit Queue
9636522137 Merge pull request #52352 from enisoc/sts-deflake
Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352)

StatefulSet: Deflake e2e RunHostCmd more.

It turns out that at some points while the Node is recovering from a reboot, we get a different kind of error ("unable to upgrade connection"). Since we can't distinguish these transient errors from an error encountered after successfully executing the remote command, let's just retry all errors for 5min. If this doesn't work, I'm gonna blame it on sig-node.

ref #48031
2017-09-12 19:40:06 -07:00
Kubernetes Submit Queue
434fffb6e0 Merge pull request #52231 from mkumatag/guestbook_multiarch
Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352)

Port Guestbook tests to mutiarch

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #52232 

**Special notes for your reviewer**:

**Release note**:

```NONE
NONE
```
2017-09-12 19:39:59 -07:00
Jiaying Zhang
06b31849e1 Extends GPUDevicePlugin e2e test to exercise device plugin restarts. 2017-09-12 16:58:19 -07:00
Chao Xu
6c5a8d5db9 Remove the conversion of client config, because client-go is authoratative now 2017-09-12 16:02:17 -07:00
Kubernetes Submit Queue
a63e3deec3 Merge pull request #51041 from balajismaniam/cpuman-e2e-tests
Automatic merge from submit-queue

Node e2e tests for the CPU Manager. 

**What this PR does / why we need it**:
- Adds node e2e tests for the CPU Manager implementation in https://github.com/kubernetes/kubernetes/pull/49186.

**Special notes for your reviewer**: 
- Previous PR in this series: #51180
- Only `test/e2e_node/cpu_manager_test.go` must be reviewed as a part of this PR (i.e., the last commit). Rest of the comments belong in #51357 and #51180.
- The tests have been on run on `n1-standard-n4` and `n1-standard-n2` instances on GCE. 

To run this node e2e test, use the following command:
```sh
make test-e2e-node TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS="CPU Manager" SKIP="" PARALLELISM=1
```

CC @ConnorDoyle @sjenning
2017-09-12 10:46:06 -07:00
Anthony Yeh
bff5f7e6b0
StatefulSet: Deflake e2e RunHostCmd more.
It turns out that at some points while the Node is recovering from a
reboot, we get a different kind of error ("unable to upgrade
connection"). Since we can't distinguish these transient errors from an
error encountered after successfully executing the remote command,
let's just retry all errors for 5min. If this doesn't work, I'm gonna
blame it on sig-node.
2017-09-12 10:12:46 -07:00
Kubernetes Submit Queue
6b6b1e5779 Merge pull request #52291 from derekwaynecarr/fix-summary
Automatic merge from submit-queue (batch tested with PRs 52007, 52196, 52169, 52263, 52291)

Summary tests should expect rss usage now

**What this PR does / why we need it**:
Fixes summary test to expect rss usage now.

Previously, cAdvisor reported rss and not total_rss, but that has now been fixed in most recent version of cAdvisor now in the project.

See: https://github.com/kubernetes/kubernetes/pull/43399#issuecomment-287858599

**Release note**:
```release-note
NONE
```
2017-09-12 08:46:17 -07:00
Kubernetes Submit Queue
99b2ee1697 Merge pull request #52106 from tallclair/aa-e2e
Automatic merge from submit-queue (batch tested with PRs 50289, 52106)

Fix AppArmor test at scale

**What this PR does / why we need it**:

The AppArmor test only runs on a single node, but previously was loading the necessary profiles to every node. This caused unnecessary churn in very large clusters, so this PR updates the test to only load the profiles to a single node, and ensure the test pod is run on that node (using pod affinity).

**Which issue this PR fixes**: fixes #51791

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
2017-09-12 03:44:18 -07:00
Kubernetes Submit Queue
77e660ed15 Merge pull request #52227 from liggitt/non-preferred-version-priority
Automatic merge from submit-queue (batch tested with PRs 52227, 52120)

Fix discovery restmapper finding resources in non-preferred versions

Fixes: #52219

Also reverts behavioral changes to tests that version-qualified cronjobs to work around this issue.

The discovery rest mapper was only populating the priority rest mapper's search list with preferred groupversions.

That meant that if a resource existed in multiple non-preferred versions, AND did not exist in the preferred version (like cronjob, which only exists in v1beta2.batch and v2alpha1.batch, but not v1.batch), the priority restmapper would not find it in its group/version priority list, and would return an error.

```release-note
Fixed an issue looking up cronjobs when they existed in more than one API version
```
2017-09-12 01:09:14 -07:00
Clayton Coleman
30a92a8f0a
Report scope in e2e test metrics 2017-09-11 22:13:55 -04:00
Derek Carr
c59715e9cb Summary tests should report rss usage now 2017-09-11 13:12:04 -04:00
Balaji Subramaniam
affa182fde Added node e2e tests for the CPU Manager feature. 2017-09-11 09:29:24 -07:00
Manjunath A Kumatagi
96c0945e69 Port Guestbook tests to mutiarch 2017-09-10 21:27:45 +05:30
Kubernetes Submit Queue
24ad0d211b Merge pull request #51660 from jiayingz/deviceplugin-e2e
Automatic merge from submit-queue

Extend nvidia-gpus e2e test to include a device plugin based test

**What this PR does / why we need it**:
This is needed to verify device plugin feature.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/features/issues/368

**Special notes for your reviewer**:
Related test_infra PR: https://github.com/kubernetes/test-infra/pull/4265

**Release note**:
Add an e2e test for nvidia gpu device plugin
2017-09-08 22:50:08 -07:00
Jordan Liggitt
a6316fb3a5
Fix discovery restmapper finding resources in non-preferred versions 2017-09-08 22:35:23 -04:00
Kubernetes Submit Queue
d6df4a5127 Merge pull request #52063 from mtaufen/dkcfg-e2enode
Automatic merge from submit-queue (batch tested with PRs 52047, 52063, 51528)

Improve dynamic kubelet config e2e node test and fix bugs

Rather than just changing the config once to see if dynamic kubelet
config at-least-sort-of-works, this extends the test to check that the
Kubelet reports the expected Node condition and the expected configuration
values after several possible state transitions.

Additionally, this adds a stress test that changes the configuration 100
times. It is possible for resource leaks across Kubelet restarts to
eventually prevent the Kubelet from restarting. For example, this test
revealed that cAdvisor's leaking journalctl processes (see:
https://github.com/google/cadvisor/issues/1725) could break dynamic
kubelet config. This test will help reveal these problems earlier.

This commit also makes better use of const strings and fixes a few bugs
that the new testing turned up.

Related issue: #50217

I had been sitting on this until the cAdvisor fix merged in #51751, as these tests fail without that fix.

**Release note**:

```release-note
NONE
```
2017-09-08 16:06:56 -07:00
Timothy St. Clair
79725246f4 Version gates the ephemeral storage e2e test 2017-09-08 16:53:38 -05:00
Kubernetes Submit Queue
f695a3120a Merge pull request #50949 from bsalamat/preemption_eviction
Automatic merge from submit-queue

Add pod preemption to the scheduler

**What this PR does / why we need it**:
This is the last of a series of PRs to add priority-based preemption to the scheduler. This PR connects the preemption logic to the scheduler workflow.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #48646

**Special notes for your reviewer**:
This PR includes other PRs which are under review (#50805, #50405, #50190). All the new code is located in 43627afdf9.

**Release note**:

```release-note
Add priority-based preemption to the scheduler.
```

ref/ #47604

/assign @davidopp 

@kubernetes/sig-scheduling-pr-reviews
2017-09-08 14:19:42 -07:00
Kubernetes Submit Queue
63d6bdb58c Merge pull request #51900 from sttts/sttts-informer-stratification
Automatic merge from submit-queue (batch tested with PRs 51900, 51782, 52030)

apiservers: stratify versioned informer construction

The versioned share informer factory has been part of the GenericApiServer config,
but its construction depended on other fields of that config (e.g. the loopback
client config). Hence, the order of changes to the config mattered.

This PR stratifies this by moving the SharedInformerFactory from the generic Config
to the CompleteConfig struct. Hence, it is only filled during completion when it is
guaranteed that the loopback client config is set.

While doing this, the CompletedConfig construction is made more type-safe again,
i.e. the use of SkipCompletion() is considereably reduced. This is archieved by
splitting the derived apiserver Configs into the GenericConfig and the ExtraConfig
part. Then the completion is structural again because CompleteConfig is again
of the same structure: generic CompletedConfig and local completed ExtraConfig.

Fixes #50661.
2017-09-08 09:46:29 -07:00
Dr. Stefan Schimanski
d99c7df360 kube-aggregator: use shared informers from RecommendedConfig 2017-09-08 16:12:54 +02:00
Dr. Stefan Schimanski
2b64d3a0fd apiserver: split core API creation from secure serving 2017-09-08 14:38:11 +02:00
Dr. Stefan Schimanski
ca3f745346 apiserver: stratify versioned informer construction 2017-09-08 14:16:09 +02:00
Dr. Stefan Schimanski
7d09148ad7 apiserver: separate apiserver specific configs into ExtraConfig 2017-09-08 14:16:09 +02:00
Kubernetes Submit Queue
45fe0a9e04 Merge pull request #51932 from dixudx/fix_forbidden_messages
Automatic merge from submit-queue

fix format of forbidden messages

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #51813

**Special notes for your reviewer**:
/assign @deads2k @liggitt 

**Release note**:

```release-note
None
```
2017-09-08 03:57:04 -07:00
Kubernetes Submit Queue
0103ed33d3 Merge pull request #48552 from mkumatag/pets
Automatic merge from submit-queue

Multiarch support for pets images

**What this PR does / why we need it**:
This PR is for multiarch support for pets image

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #52133

**Special notes for your reviewer**:
Copied over the `contrib/pets/peer-finder` as this one is heavily used in many docker images under `test/images`. After this PR I'll submit the PR in contrib project to remove it.

**Release note**:

```NONE
```
2017-09-07 22:27:20 -07:00
Jiaying Zhang
01b49b4165 Extend test/e2e/scheduling/nvidia-gpus.go to include a device plugin based nvidia gpu e2e test. 2017-09-07 22:06:35 -07:00
Kubernetes Submit Queue
ad0d36f0f0 Merge pull request #52111 from MrHohn/kube-proxy-upgrade-image
Automatic merge from submit-queue

Pipe in upgrade image target for kube-proxy migration tests

**What this PR does / why we need it**:
https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-upgrade-kube-proxy-ds&width=20
and
https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-downgrade-kube-proxy-ds&width=20
are still failing.

Reproduced it locally and found node image is being default to debian during upgrade (it was gci before upgrade) because we don't pass in `gci` via `--upgrade--target`. And for some reasons (haven't figured out yet), the upgraded node uses debian image with gci startupscripts...

This PR pipes in `--upgrade-target` for kube-proxy migration tests, hopefully in conjunction with https://github.com/kubernetes/test-infra/pull/4447 it will bring the tests back to normal.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE 

**Special notes for your reviewer**:
Sorry for bothering again.
/assign @krousey 

**Release note**:

```release-note
NONE
```
2017-09-07 20:46:04 -07:00
Di Xu
95738d5a0e fix format of forbidden messages 2017-09-08 09:20:13 +08:00
Michael Taufen
a846ba191c Improve dynamic kubelet config e2e node test and fix bugs
Rather than just changing the config once to see if dynamic kubelet
config at-least-sort-of-works, this extends the test to check that the
Kubelet reports the expected Node condition and the expected configuration
values after several possible state transitions.

Additionally, this adds a stress test that changes the configuration 100
times. It is possible for resource leaks across Kubelet restarts to
eventually prevent the Kubelet from restarting. For example, this test
revealed that cAdvisor's leaking journalctl processes (see:
https://github.com/google/cadvisor/issues/1725) could break dynamic
kubelet config. This test will help reveal these problems earlier.

This commit also makes better use of const strings and fixes a few bugs
that the new testing turned up.

Related issue: #50217
2017-09-07 15:50:17 -07:00
Bobby (Babak) Salamat
86b06c3832 autogenerated files 2017-09-07 15:31:55 -07:00
Bobby (Babak) Salamat
4a08dff168 Add pod eviction logic for scheduler preemption
Add Preempt to scheduler interface
Add preemption to the scheduling workflow
Minor changes to the scheduler integration test library
2017-09-07 15:31:55 -07:00
Kubernetes Submit Queue
f4f21b3f06 Merge pull request #52054 from janetkuo/pause-dep-integra
Automatic merge from submit-queue (batch tested with PRs 52097, 52054)

Move paused deployment e2e tests to integration

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #52113

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-09-07 15:28:25 -07:00