ginkgo.GinkgoHelper is a recent addition to ginkgo which allows functions to
mark themselves as helper. This then changes which callstack gets reported for
failures. It makes sense to support the same mechanism also for logging.
There's also no reason why framework.Logf should produce output that is in a
different format than klog log entries. Having time stamps formatted
differently makes it hard to read test output which uses a mixture of both.
Another user-visible advantage is that the error log entry from
framework.ExpectNoError now references the test source code.
With textlogger there is a simple replacement for klog that can be reconfigured
to let the caller handle stack unwinding. klog itself doesn't support that
and should be modified to support it (feature freeze).
Emitting printf-style output via that logger would work, but become less
readable because the message string would get quoted instead of printing it
verbatim as before. So instead, the traditional klog header gets reproduced
in the framework code. In this example, the first line is from klog, the second
from Logf:
I0111 11:00:54.088957 332873 factory.go:193] Registered Plugin "containerd"
...
I0111 11:00:54.987534 332873 util.go:506] >>> kubeConfig: /var/run/kubernetes/admin.kubeconfig
Indention is a bit different because the initial output is printed before
installing the logger which writes through ginkgo.GinkgoWriter.
One welcome side effect is that now "go vet" detects mismatched parameters for
framework.Logf because fmt.Sprintf is called without mangling the format
string. Some of the calls were incorrect.
A stand-alone binary shouldn't import the test/e2e/framework, which is targeted
towards usage in a Ginkgo test suite. This currently works, but will break once
test/e2e/framework becomes more opinionated about how to configure logging.
The simplest solution is to duplicate the one short function that the binary
was calling in the framework.
Using klog.Fatal to abort a test leads to a poor user experience because the
output is buffered in ginkgo.GinkgoWriter and not flushed before killing the
process. The output is also different from other failures. Using the normal
error checking is better.
Before:
$ KUBECONFIG=/no/such/config go test -v ./test/e2e/
Jan 19 10:06:58.475: INFO: The --provider flag is not set. Continuing as if --provider=skeleton had been used.
=== RUN TestE2E
I0119 10:06:58.475844 99472 e2e.go:109] Starting e2e run "5303f626-ae0e-44d7-abf1-b4956d910ef4" on Ginkgo node 1
Running Suite: Kubernetes e2e suite - /nvme/gopath/src/k8s.io/kubernetes/test/e2e
=================================================================================
Random Seed: 1705655217 - will randomize all specs
Will run 4678 of 7421 specs
goroutine 817 [running]:
k8s.io/klog/v2/internal/dbg.Stacks(0x0)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/internal/dbg/dbg.go:35 +0x85
k8s.io/klog/v2.(*loggingT).output(0x9d92b20, 0x3, 0x0, 0xc00069d7a0, 0x2, {0x834c6e8?, 0x9d91c80?}, 0x300000060?, 0x0)
...
k8s.io/klog/v2.Fatal(...)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1652
k8s.io/kubernetes/test/e2e.setupSuite({0x7fb49064c078, 0xc003072360})
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/e2e.go:187 +0x125
...
FAIL k8s.io/kubernetes/test/e2e 0.759s
FAIL
After:
$ KUBECONFIG=/no/such/config go test -v ./test/e2e/
Jan 19 10:12:58.889: INFO: The --provider flag is not set. Continuing as if --provider=skeleton had been used.
=== RUN TestE2E
I0119 10:12:58.889224 106019 e2e.go:109] Starting e2e run "bed5a77a-f595-42d0-b512-5f601067444b" on Ginkgo node 1
Running Suite: Kubernetes e2e suite - /nvme/gopath/src/k8s.io/kubernetes/test/e2e
=================================================================================
Random Seed: 1705655578 - will randomize all specs
Will run 4678 of 7421 specs
------------------------------
[SynchronizedBeforeSuite] [FAILED] [0.001 seconds]
[SynchronizedBeforeSuite]
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/e2e.go:69
Timeline >>
Jan 19 10:12:59.063: INFO: >>> kubeConfig: /no/such/config
Jan 19 10:12:59.063: INFO: Unexpected error: Error loading client:
<*errors.errorString | 0xc00182c130>:
error creating client: error loading KubeConfig: open /no/such/config: no such file or directory
{
s: "error creating client: error loading KubeConfig: open /no/such/config: no such file or directory",
}
[FAILED] in [SynchronizedBeforeSuite] - /nvme/gopath/src/k8s.io/kubernetes/test/e2e/e2e.go:186 @ 01/19/24 10:12:59.064
<< Timeline
[FAILED] Error loading client: error creating client: error loading KubeConfig: open /no/such/config: no such file or directory
In [SynchronizedBeforeSuite] at: /nvme/gopath/src/k8s.io/kubernetes/test/e2e/e2e.go:186 @ 01/19/24 10:12:59.064
------------------------------
Summarizing 1 Failure:
[FAIL] [SynchronizedBeforeSuite]
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/e2e.go:186
Ran 0 of 7421 Specs in 0.001 seconds
FAIL! -- A BeforeSuite node failed so all tests were skipped.
--- FAIL: TestE2E (0.18s)
FAIL
FAIL k8s.io/kubernetes/test/e2e 0.769s
FAIL
This checks that the With* label functions are used instead of the previous
inline tags. To catch strings passed to Ginkgo directly instead of the
framework wrapper functions, the final test specs are checked.
These wrapper functions set labels in addition to injecting the annotation into
the test text. It then becomes possible to select tests in different ways:
ginkgo -v --focus="should respect internalTrafficPolicy.*\[FeatureGate:ServiceInternalTrafficPolicy\]"
ginkgo -v --label-filter="FeatureGate:ServiceInternalTrafficPolicy"
ginkgo -v --label-filter="Beta"
When a test runs, ginkgo shows it as:
[It] should respect internalTrafficPolicy=Local Pod to Pod [FeatureGate:ServiceInternalTrafficPolicy] [Beta] [FeatureGate:ServiceInternalTrafficPolicy, Beta]
The test name and the labels at the end are in different colors. Embedding the
annotations inside the text is redundant and only done because users of the e2e
suite might expect it. Also, our tooling that consumes test results currently
doesn't know about ginkgo labels.
Environments, features and node features as described by
https://github.com/kubernetes/enhancements/tree/master/keps/sig-testing/3041-node-conformance-and-features
are also supported.
The framework and thus (at the moment) test/e2e do not have any pre-defined
environments and features. Adding those and modifying tests will follow in
a separate commit.
If something goes wrong during the test registration phase, the only solution
so far was to panic. This is not user-friendly and only allows to report one
problem at a time.
If initialization can continue, then a better solution is to record a bug,
continue, and then report all bugs together.
This also works when just listing tests. The new verify-e2e-suites.sh uses that
to check all test suites (identified as "packages that call
framework.AfterReadingAllFlags", with some exceptions) as part of
pull-kubernetes-verify.
Example output for a fake
framework.RecordBug(framework.NewBug("fake bug during SIGDescribe", 0))
in test/e2e/storage/volume_metrics.go:
```
$ hack/verify-e2e-suites.sh
go version go1.21.1 linux/amd64
ERROR: E2E test suite invocation failed for test/e2e.
ERROR: E2E suite initialization was faulty, these errors must be fixed:
ERROR: test/e2e/storage/volume_metrics.go:49: fake bug during SIGDescribe
E2E suite test/e2e_kubeadm passed.
E2E suite test/e2e_node passed.
```
-list-tests is a more concise alternative for `ginkgo --dry-run` with one line
per test. In contrast to `--dry-run`, it really lists all tests. `--dry-run`
without additional parameters uses the default skip expression from the E2E
context, which filters out flaky and feature-gated tests. The output includes
the source code location where each test is defined. It is sorted by test
name (not source code location) because that order is independent of
reorganizing the source code and ordering by location can be achieved with
"sort".
-list-labels has no corresponding feature in Ginkgo.
One possible usage is to figure out what values might make sense for
-focus/skip/label-filter.
Unit tests will follow in a future commit.
It's conceptually wrong to have dependencies to k/k/pkg in
the e2e framework code. They should be moved to corresponding
packages, in this particular case to the test/e2e_node.
The previous approach was based on the observation that some Prow jobs use the
--report-dir parameter instead of the E2E_REPORT_DIR env variable. Parsing the
command line was necessary to use the --json-report and --junit-report
parameters.
But that is complex and can be avoided by triggering the creation of complete
reports in the E2E test suite. The paths are hard-coded and relative to the
report directory to keep the code simple.
There was a report that k8s-triage started processing more data after
6db4b741dd was merged. It's unclear whether
that was because of the new <report-dir>/ginkgo_report.xml file. To avoid
this potential problem, the reports are now in a "ginkgo" sub-directory.
While at it, error checking gets enhanced:
- Create directories at the start of
the suite and bail out early if that fails.
- *All* e2e suites using the framework do this, not just test/e2e.
- Added missing error checking of truncated JUnit report writing.
Primarily this protects against accidentally polling with the default interval
of 10ms. Setting these defaults may also make some tests simpler because they
don't need to override the defaults.
This consolidates timeout handling. In the future, configuration of all
timeouts via a configuration file might get added. For now, the same three
legacy command line flags for the timeouts that get moved continue to be
supported.
It doesn't make sense for the E2E framework to have command line options that
don't do anything because then all test suites built with the framework inherit
those options.
For -list-images and -list-conformance-tests the solution is to move the
implementation into the framework (-list-images) respectively move the flag
into test/e2e (-list-conformance-tests).
The placement was decided based on the observation that image patching is
common functionality while conformance testing is specific to one test suite.
All information that we want will be written into the failure XML element's
data. We don't need the message tag and don't want it because our
tools (kettle, testgrid, spyglass) would then just concatenate the two strings.
This gets implemented for us by Ginkgo. However, truncating the failure message
is not supported there at the moment. It's unclear how important that is,
therefore this (recently added feature) gets removed.
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
Our tooling cannot handle very long failure messages well:
- when unfolding a test in the spyglass UI, it fills the entire screen
- failure correlation for http://go.k8s.io/triage has resource constraints
We cannot enforce that all tests only produce short failure messages and even
if we could, depending on the test failure, including more information may be
useful to understand it.
To achieve both goals (summary for correlation and overview, all details
available when digging deeper), too longer failure messages now get truncated,
with the full message guaranteed to be captured in the test output.
"Too long" is arbitrarily chosen to be similar to the gomega.MaxLength because
that has been a limit for failure message size in the past.
When gomega.format exceeds the default size of 4000, it truncates and prints:
Gomega truncated this representation as it exceeds 'format.MaxLength'.
Consider having the object provide a custom 'GomegaStringer' representation
or adjust the parameters in Gomega's 'format' package.
Learn more here: https://onsi.github.io/gomega/#adjusting-output
These instructions don't help the user of the e2e.test binary unless we provide
a command line flag.
Pulling the CreateKubeConfig function from the expensive to build
test/utils/apiserver package had a considerable impact on the overall build
time because that package depends on a lot of other packages.
Because only that one function is needed by the framework, that extra build
time can be avoided by moving it into its own package.
We don't want klog to print to anything other than GinkgoWriter, but it still
used os.Stderr in addition to GinkgoWriter when printing log entries with
severity >= error. Changing "stderrthreshold" fixes that.
The unit test for framework output handling didn't test klog behavior. Now it
does:
- os.Stderr is redirected, should be empty
- a new test invokes klog
Including the full information for successful tests makes the resulting XML
file too large for the 200GB limit in Spyglass when running large jobs (like
scale testing).
The original solution from https://github.com/kubernetes/kubernetes/pull/111627
broke JUnit reporting in other test suites, in particular
test/e2e_node. Keeping the code inside the framework ensures that all test
suites continue to have the JUnit reporting.
AfterReadingAllFlags is a good place to set this up because all test suites
using the test context are expected to call it before running tests and after
parsing flags.
Removing the ReportEntries added by ginkgo.By from all test reports usually
avoids the `system-err` part in the JUnit file, which in Spyglass avoids
the extra "open stdout" button.
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Co-authored-by: Dave Chen <dave.chen@arm.com>
Including the full information for successful tests makes the resulting XML
file too large for the 200GB limit in Spyglass when running large jobs (like
scale testing).
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Co-authored-by: Dave Chen <dave.chen@arm.com>
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Ginkgo is now writing the JUnit file itself. The -report-dir parameter is used
as fallback for enabling JUnit output in case that users haven't migrated to
the new -junit-report parameter.
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: Dave Chen <dave.chen@arm.com>
- update all the import statements
- run hack/pin-dependency.sh to change pinned dependency versions
- run hack/update-vendor.sh to update go.mod files and the vendor directory
- update the method signatures for custom reporters
Signed-off-by: Dave Chen <dave.chen@arm.com>
The test/e2e suite has never supported feature gates:
- it cannot discover at runtime how the cluster is configured
- its --feature-gates parameter had no effect
Despite that, tests were written that used
e2eskipper.SkipUnlessFeatureGateEnabled even though that function then only
checked the default feature gate state. To catch such mistakes, e2e tests
suites now must explicitly enable feature gate checking via
e2eskipper.InitFeatureGates. They also must register their own command line
flag. When that is not done, then using SkipUnlessFeatureGateEnabled or
SkipIfFeatureGateEnabled leads to a test failure.
test/e2e_node does both and therefore continues to work as before.
Some storage tests deploy DaemonSets which hard-code /var/lib/kubelet as root
directory for kubelet registration and pod directory. There was already a
parameter which allowed specifying the root directory, just with a very
confusing name ("--volume-dir") and matching field name. A --kubelet-root-dir
parameters gets added because this may make it easier to find the parameter,
with the old name preserved as an alias for the same field for backwards
compatibility.