The new support in ginkgo for progress reports while a test runs dumps
information about where a test is stuck when it runs too long. This can provide
additional insights into what the test is waiting for.
For the Kubernetes jobs using ginkgo-e2e.sh, such dumps are now enabled after
300 seconds and then get repeated every 20 seconds. The initial delay is
intentionally the same as for warning about a slow test. The rationale is that
such test runtimes are unexpected and may need further information to diagnose
why they are slow.
With -ginkgo.source-root, Ginkgo is able to locate the Kubernetes source code
and display small source code snippets for functions that are related to the
test, determined through a heuristic that assumes that all files under the test
suite are for the tests in it.
Set intercept mode to none will help to reveal more information when the
test hangs,
- https://github.com/onsi/ginkgo/issues/970
or circumvent cases where the code grabbing the stdout/stderr pipe
is not under the framework control and may cause hangs,
- https://github.com/onsi/ginkgo/issues/851
The flag `output-interceptor-mode` is set to `none` as we were trying to
figure out of the rootcase of the test flaky, it's only intended for debugging.
- https://github.com/kubernetes/kubernetes/issues/111086
But this set also has some side effect, since it will turn off stdout/stderr
capture completely, any output to stdout/stderr will be lost.
Now that the root cause is not caused by Ginkgo bump nor how the intercept
mode was set, we'd better to follow the default value.
Signed-off-by: Dave Chen <dave.chen@arm.com>
Some scripts and tools still relied on the deprecated flags, the ones
which are about to be removed.
This is intentionally not a complete removal of all those flags in the entire
repo. This would lead to much more code churn also in places where commands
still accept the flags because they use klog directly.
Introduce networking/v1alpha1 api group.
Add `ClusterCIDR` type to networking/v1alpha1 api group, this type
will enable the NodeIPAM controller to support multiple ClusterCIDRs.
- add feature gate
- add encrypted object and run generated_files
- generate protobuf for encrypted object and add unit tests
- move parse endpoint to util and refactor
- refactor interface and remove unused interceptor
- add protobuf generate to update-generated-kms.sh
- add integration tests
- add defaulting for apiVersion in kmsConfiguration
- handle v1/v2 and default in encryption config parsing
- move metrics to own pkg and reuse for v2
- use Marshal and Unmarshal instead of serializer
- add context for all service methods
- check version and keyid for healthz
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
This change is to promote local storage capacity isolation feature to GA
At the same time, to allow rootless system disable this feature due to
unable to get root fs, this change introduced a new kubelet config
"localStorageCapacityIsolation". By default it is set to true. For
rootless systems, they can set this configuration to false to disable
the feature. Once it is set, user cannot set ephemeral-storage
request/limit because capacity and allocatable will not be set.
Change-Id: I48a52e737c6a09e9131454db6ad31247b56c000a
This applies to all jobs using hack/ginkgo-e2e.sh. This is done because
Spyglass does not render the escape sequences, making test output harder to
read.
It is done here because then we don't need to set GINKGO_NO_COLOR in all the
different Prow job configs.
Ginkgo v1 had a much longer default test timeout, in v2 this
switched to being 1 hour. This is not long enough to run many of our
suites.
Here we copy the backwards compatibility that is used by
hack/gingo-e2e.sh to unbreak serial pipelines.
Ginkgo has been migrated to V2, add this to unwanted dependencies
so that it won't be shown up as a dep again in the future.
Signed-off-by: Dave Chen <dave.chen@arm.com>
The alias for vendor/github.com/onsi/ginkgo/ginkgo ensures that code like
30e99cb2a9/experiment/kind-conformance-image-e2e.sh (L110)
continues to work. The one without "vendor/" is there just in case that it
was used because it also worked.
Long term, "ginkgo" is a nicer, version independent alias. It gets used
internally to avoid future churn and gets documented also publicly in the
Makefile help.
The caveat is that there's no guarantee that a future v3 CLI will be compatible
with current invocations. But the most common usage is through
hack/ginkgo-e2e.sh, which can deal with such differences.
Default timeout setting has been reduced from `24h` down to `1h` in
Ginkgo V2, but for some long running test this is too short.
How long to abort the test was controlled by the the linux command `timeout`
in V1. e.g. `'timeout -k 30s 150m ...`, and is configured in the file
like `sig-network-misc.yaml`.
Set the timeout manually for Ginkgo V2 to avoid the early aborting.
Signed-off-by: Dave Chen <dave.chen@arm.com>
The change is needed for `verify-e2e-test-ownership.sh`.
The `jq` is re-defined since the structure of test spec
is different with v1 and the stacktrace related validation
is not available, e.g. `package` and `func`.
Signed-off-by: Dave Chen <dave.chen@arm.com>
The test/e2e directory contains several unit tests that should run as part of
"make test":
./test/e2e/chaosmonkey/chaosmonkey_test.go
./test/e2e/storage/external/external_test.go
./test/e2e/storage/utils/utils_test.go
./test/e2e/framework/log_test.go
./test/e2e/framework/testfiles/testfiles_test.go
./test/e2e/framework/timer/timer_test.go
./test/e2e/framework/node/wait_test.go
./test/e2e/framework/pod/resource_test.go
./test/e2e/framework/config/config_test.go
./test/e2e/framework/ingress/ingress_utils_test.go
./test/e2e/framework/providers/gce/firewall_test.go
Because they were excluded by "./test/e2e/*", some of them became outdated.
./test/e2e/e2e_test.go is the only test that needs to be excluded because it is
the E2E test suite that depends on a functional cluster.
The following investigation occurred during development.
Add TimingHistogram impl that shares lock with WeightedHistogram
Benchmarking and profiling shows that two layers of locking is
noticeably more expensive than one.
After adding this new alternative, I now get the following benchmark
results.
```
(base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogram$ k8s.io/component-base/metrics/prometheusextension
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogram-16 22232037 52.79 ns/op 0 B/op 0 allocs/op
PASS
ok k8s.io/component-base/metrics/prometheusextension 1.404s
(base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogram$ k8s.io/component-base/metrics/prometheusextension
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogram-16 22190997 54.50 ns/op 0 B/op 0 allocs/op
PASS
ok k8s.io/component-base/metrics/prometheusextension 1.435s
```
and
```
(base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogramDirect$ k8s.io/component-base/metrics/prometheusextension
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogramDirect-16 28863244 40.99 ns/op 0 B/op 0 allocs/op
PASS
ok k8s.io/component-base/metrics/prometheusextension 1.890s
(base) mspreitz@mjs12 kubernetes %
(base) mspreitz@mjs12 kubernetes %
(base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogramDirect$ k8s.io/component-base/metrics/prometheusextension
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogramDirect-16 27994173 40.37 ns/op 0 B/op 0 allocs/op
PASS
ok k8s.io/component-base/metrics/prometheusextension 1.384s
```
So the new implementation is roughly 20% faster than the original.
Add overlooked exception, rename timingHistogram to timingHistogramLayered
Use the direct (one mutex) style of TimingHistogram impl
This is about a 20% gain in CPU speed on my development machine, in
benchmarks without lock contention. Following are two consecutive
trials.
(base) mspreitz@mjs12 prometheusextension % go test -benchmem -run=^$ -bench Histogram .
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogramLayered-16 21650905 51.91 ns/op 0 B/op 0 allocs/op
BenchmarkTimingHistogramDirect-16 29876860 39.33 ns/op 0 B/op 0 allocs/op
BenchmarkWeightedHistogram-16 49227044 24.13 ns/op 0 B/op 0 allocs/op
BenchmarkHistogram-16 41063907 28.82 ns/op 0 B/op 0 allocs/op
PASS
ok k8s.io/component-base/metrics/prometheusextension 5.432s
(base) mspreitz@mjs12 prometheusextension % go test -benchmem -run=^$ -bench Histogram .
goos: darwin
goarch: amd64
pkg: k8s.io/component-base/metrics/prometheusextension
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkTimingHistogramLayered-16 22483816 51.72 ns/op 0 B/op 0 allocs/op
BenchmarkTimingHistogramDirect-16 29697291 39.39 ns/op 0 B/op 0 allocs/op
BenchmarkWeightedHistogram-16 48919845 24.03 ns/op 0 B/op 0 allocs/op
BenchmarkHistogram-16 41153044 29.26 ns/op 0 B/op 0 allocs/op
PASS
ok k8s.io/component-base/metrics/prometheusextension 5.044s
Remove layered implementation of TimingHistogram
This commit cleans up references to the old kubernetes-node-e2e-images
project. In the process it removes the `LIST_IMAGES` mode as listing
large numbers of public cloud projects is not particularly useful, and
has been somewhat broken for a long period of time - as we defaulted
launching a VM to a different project than listing.