On local execution of Topology Manager metrics tests, the tests pass rate was 100%.
Yet, we can see that the Topology Manager metrics tests are failing in upstream
CI consistently: https://testgrid.k8s.io/sig-node-presubmits#pr-kubelet-serial-gce-e2e-topology-manager.
From the logs, it was identified that these failures are because of timeouts,
so we are increasing the default timeout as well as polling interval frequency
of obtaining KubeletMetrics to deflake this test.
We have noticed a similar flake in case of CPU manager metrics tests as well:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-cpu-manager/1701615009836044288.
Once it is confirmed that the issue is resolved for Topology Manager test,
we will be fix this for CPU Manager as well in a follow-up PR.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
The post merge job was failed #117103
and this causes the e2e tests to fail. Considering we have increased
the timeout recently, retriggering the job.
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>
The Service API Rest implementation is complex and has to use different
hooks on the REST storage. The status store was making a shallow copy of
the storage before adding the hooks, so it was not inheriting the hooks.
The status store must have the same hooks as the rest store to be able
to handle correctly the allocation and deallocation of ClusterIPs and
nodePorts.
Change-Id: I44be21468d36017f0ec41a8f912b8490f8f13f55
Signed-off-by: Antonio Ojea <aojea@google.com>
- move fabriziopandini to emeritus_approvers for /test/e2e*
and /cmd/kubeadm. fabriziopandini remains in /OWNERS_ALIASES
under sig-cluster-lifecycle-leads.
- remove RA489 as reviewer for /test/e2e* and /cmd/kubeadm
The status error was embedded inside the new error constructed by
WaitForPodsResponding's get function, but not wrapped. Therefore
`apierrors.IsServiceUnavailable(err)` didn't find it and returned false -> no
retries.
Wrapping fixes this and Gomega formatting of the error remains useful:
err := &errors.StatusError{}
err.ErrStatus.Code = 503
err.ErrStatus.Message = "temporary failure"
err2 := fmt.Errorf("Controller %s: failed to Get from replica pod %s:\n%w\nPod status:\n%s",
"foo", "bar",
err, "some status")
fmt.Println(format.Object(err2, 1))
fmt.Println(errors.IsServiceUnavailable(err2))
=>
<*fmt.wrapError | 0xc000139340>:
Controller foo: failed to Get from replica pod bar:
temporary failure
Pod status:
some status
{
msg: "Controller foo: failed to Get from replica pod bar:\ntemporary failure\nPod status:\nsome status",
err: <*errors.StatusError | 0xc0001a01e0>{
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {
SelfLink: "",
ResourceVersion: "",
Continue: "",
RemainingItemCount: nil,
},
Status: "",
Message: "temporary failure",
Reason: "",
Details: nil,
Code: 503,
},
},
}
true