Commit Graph

1330 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
8395b7c12e
Merge pull request #64175 from mtaufen/test-name-cleanup
Automatic merge from submit-queue (batch tested with PRs 64175, 63893). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

add colon separators to improve readability of test names

```release-note
NONE
```
2018-05-25 03:50:09 -07:00
Kubernetes Submit Queue
690e42b734
Merge pull request #64125 from yujuhong/add-node-e2e-tags
Automatic merge from submit-queue (batch tested with PRs 61963, 64279, 64130, 64125, 64049). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add node-exclusive tags to tests in test/e2e_node

Original issue: #59001
Depends on: #64128

Add node-exclusive tags based on the [proposal](https://docs.google.com/document/d/1BdNVUGtYO6NDx10x_fueRh_DLT-SVdlPC_SsXjYCHOE/edit#)

Follow-up PRs will:
 - Tag the tests in `test/e2e/common`
 - Change the test job configurations to use the new tests
 - Remove the unused, non-node-exclusive tags in `test/e2e_node`

**Release note**:

```release-note
NONE
```
2018-05-25 01:09:24 -07:00
Kubernetes Submit Queue
731eaecfd1
Merge pull request #57527 from mtaufen/kc-metric
Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

add dynamic config metrics

This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.

https://github.com/kubernetes/features/issues/281

```release-note
The Kubelet now exports metrics that report the assigned (node_config_assigned), last-known-good (node_config_last_known_good), and active (node_config_active) config sources, and a metric indicating whether the node is experiencing a config-related error (node_config_error). The config source metrics always report the value 1, and carry the node_config_name, node_config_uid, node_config_resource_version, and node_config_kubelet_key labels, which identify the config version. The error metric reports 1 if there is an error, 0 otherwise.
```
2018-05-23 19:44:21 -07:00
Kubernetes Submit Queue
10377f6593
Merge pull request #63896 from mtaufen/refactor-test-metrics
Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Refactor test utils that deal with Kubelet metrics for clarity

I found these functions hard to understand, because the names did not
accurately reflect their behavior. For example, GetKubeletMetrics
assumed that all of the metrics passed in were measuring latency.
The caller of GetKubeletMetrics was implicitly making this assumption,
but it was not obvious at the call site.

```release-note
NONE
```
2018-05-23 19:44:15 -07:00
Michael Taufen
0188e8c2b5 add colon separators to improve readability of test names 2018-05-22 17:33:18 -07:00
Michael Taufen
0868db5bf1 fix the e2e node helpers that let tests reconfigure Kubelet
The dynamic config tests were updated with the validation change, but
the tests that try to use dynamic config via this helper were not.
2018-05-22 17:20:51 -07:00
Michael Taufen
fd3432ef05 add dynamic config metrics
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.
2018-05-22 14:08:55 -07:00
Yu-Ju Hong
90750c77c3 test/e2e_node: Add NodeFeature tags to non-conformance tests
Serial tests are not considered for conformance tests.
2018-05-21 17:52:36 -07:00
Yu-Ju Hong
ff62f037b8 Re-tag benchmark tests 2018-05-21 17:52:36 -07:00
Yu-Ju Hong
5802f18283 test/e2e_node: mark more tests with [NodeConformance] 2018-05-21 17:52:36 -07:00
Yu-Ju Hong
7cbd897e3e test/e2e_node: Add Node-exclusive feature tags to existing tests 2018-05-21 17:52:36 -07:00
Yu-Ju Hong
4ad9aedb04 test/e2e_node: Add [NodeConformance] to tests tagged [Conformance]
This has no effect yet until test configurations are updated.
2018-05-21 17:51:49 -07:00
Kubernetes Submit Queue
2a989c60ff
Merge pull request #63221 from mtaufen/dkcfg-live-configmap
Automatic merge from submit-queue (batch tested with PRs 63881, 64046, 63409, 63402, 63221). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Kubelet responds to ConfigMap mutations for dynamic Kubelet config

This PR makes dynamic Kubelet config easier to reason about by leaving less room for silent skew scenarios. The new behavior is as follows:
- ConfigMap does not exist: Kubelet reports error status due to missing source
- ConfigMap is created: Kubelet starts using it
- ConfigMap is updated: Kubelet respects the update (but we discourage this pattern, in favor of incrementally migrating to a new ConfigMap)
- ConfigMap is deleted: Kubelet keeps using the config (non-disruptive), but reports error status due to missing source
- ConfigMap is recreated: Kubelet respects any updates (but, again, we discourage this pattern)

This PR also makes a small change to the config checkpoint file tree structure, because ResourceVersion is now taken into account when saving checkpoints. The new structure is as follows:
```
- dir named by --dynamic-config-dir (root for managing dynamic config)
| - meta
  | - assigned (encoded kubeletconfig/v1beta1.SerializedNodeConfigSource object, indicating the assigned config)
  | - last-known-good (encoded kubeletconfig/v1beta1.SerializedNodeConfigSource object, indicating the last-known-good config)
| - checkpoints
  | - uid1 (dir for versions of object identified by uid1)
    | - resourceVersion1 (dir for unpacked files from resourceVersion1)
    | - ...
  | - ...
```


fixes: #61643

```release-note
The dynamic Kubelet config feature will now update config in the event of a ConfigMap mutation, which reduces the chance for silent config skew. Only name, namespace, and kubeletConfigKey may now be set in Node.Spec.ConfigSource.ConfigMap. The least disruptive pattern for config management is still to create a new ConfigMap and incrementally roll out a new Node.Spec.ConfigSource.
```
2018-05-21 17:05:42 -07:00
Michael Taufen
b5648c3f61 dynamic Kubelet config reconciles ConfigMap updates 2018-05-21 09:03:58 -07:00
Michael Taufen
83509a092f Refactor test utils that deal with Kubelet metrics for clarity
I found these functions hard to understand, because the names did not
accurately reflect their behavior. For example, GetKubeletMetrics
assumed that all of the metrics passed in were measuring latency.
The caller of GetKubeletMetrics was implicitly making this assumption,
but it was not obvious at the call site.
2018-05-18 11:32:29 -07:00
Kubernetes Submit Queue
2accf11f1a
Merge pull request #57849 from dashpole/eviction_test_event
Automatic merge from submit-queue (batch tested with PRs 63865, 57849, 63932, 63930, 63936). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Eviction Node e2e test checks for eviction reason

**What this PR does / why we need it**:
Currently, the eviction test simply ensures that pods are marked `Failed`.  However, this could occur because of an OOM, rather than an eviction.
To ensure that pods are actually being evicted, check for the Reason in the pod status to ensure it is evicted.

**Release note**:
```release-note
NONE
```

cc @kubernetes/sig-node-pr-reviews
2018-05-17 00:28:19 -07:00
Michael Taufen
fcc1f8e7b6 Move to a structured status for dynamic Kubelet config
Updates dynamic Kubelet config to use a structured status, rather than a
node condition. This makes the status machine-readable, and thus more
useful for config orchestration.

Fixes: #56896
2018-05-15 11:25:12 -07:00
Kubernetes Submit Queue
b2fe2a0a6d
Merge pull request #59847 from mtaufen/dkcfg-explicit-keys
Automatic merge from submit-queue (batch tested with PRs 63624, 59847). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

explicit kubelet config key in Node.Spec.ConfigSource.ConfigMap

This makes the Kubelet config key in the ConfigMap an explicit part of
the API, so we can stop using magic key names.
    
As part of this change, we are retiring ConfigMapRef for ConfigMap.


```release-note
You must now specify Node.Spec.ConfigSource.ConfigMap.KubeletConfigKey when using dynamic Kubelet config to tell the Kubelet which key of the ConfigMap identifies its config file.
```
2018-05-09 17:55:13 -07:00
Filipe Brandenburger
48d052fae4 Fix cgroup names in node_container_manager_test.
The names were made invalid for the CgroupName refactor in #62541, so
update them here.

Furthermore, as the new names are now compatible with what
EnforceNodeAllocatable wants, reuse the constants there as well.

Tested:
  $ make test-e2e-node REMOTE=true HOSTS=test-cos-beta-67-10575-27-0 FOCUS='Validate Node Allocatable' SKIP='' TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'
  • [SLOW TEST:39.488 seconds]
  [k8s.io] Node Container Manager [Serial]
    Validate Node Allocatable
      set's up the node and runs the test
  Ran 1 of 261 Specs in 57.348 seconds
  SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 260 Skipped
2018-05-08 16:15:26 -07:00
David Ashpole
a5df208866 eviction test ensures failed pods are evicted 2018-05-08 16:08:35 -07:00
Michael Taufen
c41cf55a2c explicit kubelet config key in Node.Spec.ConfigSource.ConfigMap
This makes the Kubelet config key in the ConfigMap an explicit part of
the API, so we can stop using magic key names.

As part of this change, we are retiring ConfigMapRef for ConfigMap.
2018-05-08 15:37:26 -07:00
Kubernetes Submit Queue
a244d8a48f
Merge pull request #63130 from vikaschoudhary16/dp_e2e_alloc
Automatic merge from submit-queue (batch tested with PRs 61455, 63346, 63130, 63404). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[Device-Plugin]: Extend e2e test to cover node allocatables

**What this PR does / why we need it**:
 Extends device plugin e2e to cover node allocatable
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
None
```
/sig node
/area hw-accelerators
/cc @jiayingz @vishh @RenaudWasTaken
2018-05-03 14:24:10 -07:00
vikaschoudhary16
b953f852f5 [Device-Plugin]: Extend e2e test to cover node allocatables 2018-05-03 14:19:29 -04:00
Kubernetes Submit Queue
592c39bccc
Merge pull request #62541 from filbranden/cgroupname1
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use a []string for CgroupName, which is a more accurate internal representation

**What this PR does / why we need it**:

This is purely a refactoring and should bring no essential change in behavior.

It does clarify the cgroup handling code quite a bit.

It is preparation for further changes we might want to do in the cgroup hierarchy. (But it's useful on its own, so even if we don't do any, it should still be considered.)

**Special notes for your reviewer**:

The slice of strings more precisely captures the hierarchic nature of the cgroup paths we use to represent pods and their groupings.

It also ensures we're reducing the chances of passing an incorrect path format to a cgroup driver that requires a different path naming, since now explicit conversions are always needed.

The new constructor `NewCgroupName` starts from an existing `CgroupName`, which enforces a hierarchy where a root is always needed. It also performs checking on the component names to ensure invalid characters ("/" and "_") are not in use.

A `RootCgroupName` for the top of the cgroup hierarchy tree is introduced.

This refactor results in a net reduction of around 30 lines of code,
mainly with the demise of ConvertCgroupNameToSystemd which had fairly
complicated logic in it and was doing just too many things.

There's a small TODO in a helper `updateSystemdCgroupInfo` that was introduced to make this commit possible. That logic really belongs in libcontainer, I'm planning to send a PR there to include it there. (The API already takes a field with that information, only that field is only processed in cgroupfs and not systemd driver, we should fix that.)

Tested: By running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs driver) and CentOS 7 (with systemd driver.)

**NOTE**: I only tested this with dockershim, we should double-check that this works with the CRI endpoints too, both in cgroupfs and systemd modes.

/assign @derekwaynecarr 
/assign @dashpole 
/assign @Random-Liu 

**Release note**:

```release-note
NONE
```
2018-05-03 08:16:45 -07:00
Kubernetes Submit Queue
b5f61ac129
Merge pull request #62657 from matthyx/master
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update all script shebangs to use /usr/bin/env interpreter instead of /bin/interpreter

This is required to support systems where bash doesn't reside in /bin (such as NixOS, or the *BSD family) and allow users to specify a different interpreter version through $PATH manipulation.
https://www.cyberciti.biz/tips/finding-bash-perl-python-portably-using-env.html
```release-note
Use /usr/bin/env in all script shebangs to increase portability.
```
2018-05-02 19:44:32 -07:00
Filipe Brandenburger
b230fb8ac4 Use a []string for CgroupName, which is a more accurate internal representation
The slice of strings more precisely captures the hierarchic nature of
the cgroup paths we use to represent pods and their groupings.

It also ensures we're reducing the chances of passing an incorrect path
format to a cgroup driver that requires a different path naming, since
now explicit conversions are always needed.

The new constructor NewCgroupName starts from an existing CgroupName,
which enforces a hierarchy where a root is always needed. It also
performs checking on the component names to ensure invalid characters
("/" and "_") are not in use.

A RootCgroupName for the top of the cgroup hierarchy tree is introduced.

This refactor results in a net reduction of around 30 lines of code,
mainly with the demise of ConvertCgroupNameToSystemd which had fairly
complicated logic in it and was doing just too many things.

There's a small TODO in a helper updateSystemdCgroupInfo that was
introduced to make this commit possible. That logic really belongs in
libcontainer, I'm planning to send a PR there to include it there.
(The API already takes a field with that information, only that field is
only processed in cgroupfs and not systemd driver, we should fix that.)

Tested by running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs
driver) and CentOS 7 (with systemd driver.)
2018-05-01 08:29:06 -07:00
Kubernetes Submit Queue
452b8c9e0d
Merge pull request #62101 from bart0sh/PR0010-e2e_node-kubelet-command-line-fix
Automatic merge from submit-queue (batch tested with PRs 58474, 60034, 62101, 63198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix wrong usage of kubelet option

**What this PR does / why we need it**:

"--allow-privileged true" is incorrect usage of boolean option.
It means setting '--allow-priviledged' to its default value plus
non-existing subcommand 'true'.

"--allow-privileged false" is even more confusing as it sets
allow-priviledged flag to its default value 'true'

This is true for any boolean command line option.

Fixed this by using correct syntax --allow-priviledged=true

**Special notes for your reviewer**:
This is a show-stopper for PR #61833 

**Release note**:

```release-note
NONE
```
2018-04-30 13:24:12 -07:00
Kubernetes Submit Queue
e01858c595
Merge pull request #63252 from liztio/e2e_node_utils
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

E2e path utils

**What this PR does / why we need it**:

A bunch of useful methods for getting k8s paths and stuff are secreted away in `e2e_node`. This PR pulls them out so they can be used in other E2E method.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:



**Special notes for your reviewer**:
This is motivated by the upcoming kubeadm-specific E2E tests. Those tests will be added in a follow-up to this PR.

**Release note**:

```release-note
NONE
```
2018-04-27 11:43:15 -07:00
liz
1ec02b1cd5
Move path management from e2e_node to common test/utils directory
enables reuse of these methods for other e2e tests
2018-04-27 11:12:10 -04:00
liz
432b542218
Generated artefacts 2018-04-27 11:11:45 -04:00
Jordan Liggitt
1bddcdcf44
Bump QPS on namespace controller
https://github.com/kubernetes/kubernetes/pull/62913 switched from using a client pool, where each groupVersionResource got its own rest client, to a single client.

This increases the QPS to account for increased requests using a single rest client rate limiter.
2018-04-27 10:11:14 -04:00
David Eads
3632037e60 add easy to use dynamic client 2018-04-25 08:55:26 -04:00
Kubernetes Submit Queue
44b57338d5
Merge pull request #59692 from mtaufen/dkcfg-unpack-configmaps
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

unpack dynamic kubelet config payloads to files

This PR unpacks the downloaded ConfigMap to a set of files on the node.

This enables other config files to ride alongside the
KubeletConfiguration, and the KubeletConfiguration to refer to these
cohabitants with relative paths.

This PR also stops storing dynamic config metadata (e.g. current,
last-known-good config records) in the same directory as config
checkpoints. Instead, it splits the storage into `meta` and
`checkpoints` dirs.

The current store dir structure is as follows:
```
- dir named by --dynamic-config-dir (root for managing dynamic config)
| - meta (dir for metadata, e.g. which config source is currently assigned, last-known-good)
  | - current (a serialized v1 NodeConfigSource object, indicating the assigned config)
  | - last-known-good (a serialized v1 NodeConfigSource object, indicating the last-known-good config)
| - checkpoints (dir for config checkpoints)
  | - uid1 (dir for unpacked config, identified by uid1)
    | - file1
    | - file2
    | - ...
  | - uid2
  | - ...
```

There are some likely changes to the above structure before dynamic config goes beta, such as renaming "current" to "assigned" for clarity, and extending the checkpoint identifier to include a resource version, as part of resolving #61643.

```release-note
NONE
```

/cc @luxas @smarterclayton
2018-04-24 12:01:37 -07:00
Michael Taufen
c9d398d01e unpack dynamic kubelet config payloads to files
This PR unpacks the downloaded ConfigMap to a set of files on the node.

This enables other config files to ride alongside the
KubeletConfiguration, and the KubeletConfiguration to refer to these
cohabitants with relative paths.

This PR also stops storing dynamic config metadata (e.g. current,
last-known-good config records) in the same directory as config
checkpoints. Instead, it splits the storage into `meta` and
`checkpoints` dirs.
2018-04-19 09:18:53 -07:00
Matthias Bertschy
9b15af19b2 Update all script to use /usr/bin/env bash in shebang 2018-04-19 13:20:13 +02:00
Kubernetes Submit Queue
dd8f8819e4
Merge pull request #62768 from krzyzacy/clean-up-jenkins
Automatic merge from submit-queue (batch tested with PRs 62445, 62768, 60633). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

clean up *.properties files

ref https://github.com/kubernetes/kubernetes/issues/62754

to double check, is any of the node config yaml files are still being used outside of CI? I'll make a follow up one to clean them up as well.

/assign @Random-Liu @mindprince @yujuhong
2018-04-18 12:25:08 -07:00
Kubernetes Submit Queue
1ddb0e05e5
Merge pull request #62761 from Random-Liu/lower-usage-nano-cores-in-summary
Automatic merge from submit-queue (batch tested with PRs 62761, 62715). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Lower UsageNanoCores boundary in summary api test.

We recently switched to use `p2p` instead of `bridge` in containerd https://github.com/containerd/cri/pull/742.

However, after that switch, the `UsageNanoCores`  becomes lower, and constantly fails the test. An example failure:
* https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/containerd_cri/740/pull-cri-containerd-node-e2e/690/

This is probably because:
1) The test container used in summary test does `ping`. https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/summary_test.go#L352
2) `p2p` is simpler than `bridge`, "Maybe cycles are saved from waiving Mac learning" - @jingax10.

This PR lowers the boundary by 1 magnitude.

Signed-off-by: Lantao Liu <lantaol@google.com>

**Release note**:

```release-note
none
```
2018-04-17 22:38:10 -07:00
Sen Lu
854132fdcc clean up *.properties files 2018-04-17 21:44:32 -07:00
Lantao Liu
002483fe72 Lower UsageNanoCores boundary in summary api test.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-04-17 18:37:51 -07:00
Lantao Liu
c86e85c420 Fix extra-log flag for node e2e.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-04-17 21:48:26 +00:00
Lantao Liu
27105c90ec Fix kubelet flags.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-04-16 20:42:40 +00:00
Yu-Ju Hong
9a47bd0b67 Node E2E: Remove the simple mount test
There are EmptyDir volume tests in test/e2e/common already. The test
does not add any more coverage.
2018-04-12 17:05:28 -07:00
Ed Bartosh
7e3d28b30f Fix wrong usage of kubelet options
"--allow-privileged true" is incorrect usage of boolean option.
It means setting '--allow-priviledged' to its default value plus
non-existing subcommand 'true'.

"--allow-privileged false" is even more confusing as it sets
allow-priviledged flag to its default value 'true'

This is true for any boolean command line option.

Fixed this by using correct syntax --allow-priviledged=true

Fixed generating of kubelet command line in addKubeletConfigFlags
function.
2018-04-12 15:19:49 +03:00
Kubernetes Submit Queue
1dc6e87f57
Merge pull request #62206 from yujuhong/rm-rkt-refs
Automatic merge from submit-queue (batch tested with PRs 62192, 61866, 62206, 62360). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove rkt references in the codebase

```release-note
None
```
2018-04-10 23:52:21 -07:00
Kubernetes Submit Queue
3bc1a0a1d0
Merge pull request #60900 from dashpole/eviction_test_no_pressure
Automatic merge from submit-queue (batch tested with PRs 60900, 62215, 62196). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[Flaky test fix] Use memory.force_empty before and after eviction tests

**What this PR does / why we need it**:
(copied from https://github.com/kubernetes/kubernetes/pull/60720):
MemoryAllocatableEviction tests have been somewhat flaky: https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-serial-gce-e2e&include-filter-by-regex=MemoryAllocatable
The failure on the flakes is ["Pod ran to completion"](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial/3785#k8sio-memoryallocatableeviction-slow-serial-disruptive-when-we-run-containers-that-should-cause-memorypressure-should-eventually-evict-all-of-the-correct-pods).
Looking at [an example log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial/3785/artifacts/tmp-node-e2e-6070a774-cos-stable-63-10032-71-0/kubelet.log) (and search for memory-hog-pod, we can see that this pod fails admission because the allocatable memory threshold has already been crossed.
`eviction manager: thresholds - ignoring grace period: threshold [signal=allocatableMemory.available, quantity=250Mi] observed 242404Ki`

https://github.com/kubernetes/kubernetes/pull/60720 wasn't effective.  To clean-up after each eviction test, and prepare for the next, use memory.force_empty to make the kernel reclaim memory in the allocatable cgroup before and after eviction tests.

**Special notes for your reviewer**:
I tested to make sure this doesn't break Cgroup Manager tests.
It should work on both cgroupfs and systemd based systems, although I have only tested in on cgroupfs.

**Release note**:
```release-note
NONE
```

/assign @yujuhong @Random-Liu 
/sig node
/priority important-soon
/kind bug

its getting a little late in the release cycle, so we can probably wait until after code freeze is lifted for this.
2018-04-06 21:30:06 -07:00
Kubernetes Submit Queue
1e767ddf60
Merge pull request #62135 from jiayingz/kubelet-restart-fix
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fixes restartKubelet in test/e2e_node failure.

Looks like there is some recent change on how we start kubelet service
in test_e2e_node. Fixes restartKubelet() to get right kubelet service
name to cope with the change.



**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubelet-serial-gce-e2e test failure:
https://k8s-testgrid.appspot.com/wg-resource-management#kubelet-serial-gce-e2e
Thanks a lot to @mindprince for noticing this!

**Special notes for your reviewer**:

**Release note**:

```release-note

```
2018-04-06 15:46:19 -07:00
David Ashpole
3254bdc1a4 use memory.force_empty before and after eviction tests 2018-04-06 14:01:11 -07:00
Yu-Ju Hong
59741bdfbd Remove rkt references in the codebase 2018-04-06 12:02:11 -07:00
Manjunath A Kumatagi
1bb810e749 Use pause manifest image 2018-04-06 11:00:50 +05:30
Jiaying Zhang
0138007bdd Fixes restartKubelet in test/e2e_node failure.
Looks like there is some recent change on how we start kubelet service
in test_e2e_node. Fixes restartKubelet() to get right kubelet service
name to cope with the change.
2018-04-04 13:18:08 -07:00