Commit Graph

48102 Commits

Author SHA1 Message Date
Francesco Romani
2ea47038b9 podresources: e2e: force eager connection
Add and use more facilities to the *internal* podresources client.
Checking e2e test runs, we have quite some
```
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused"
```

This is likely caused by kubelet restarts, which we do plenty in e2e tests,
combined with the fact gRPC does lazy connection AND we don't really
check the errors in client code - we just bubble them up.

While it's arguably bad we don't check properly error codes, it's also
true that in the main case, e2e tests, the functions should just never
fail besides few well known cases, we're connecting over a
super-reliable unix domain socket after all.

So, we centralize the fix adding a function (alongside with minor
cleanups) which wants to trigger and ensure the connection happens,
localizing the changes just here. The main advantage is this approach
is opt-in, composable, and doesn't leak gRPC details into the client
code.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-09-07 08:24:49 +02:00
Kubernetes Prow Robot
abb74c7afa Merge pull request #120412 from aojea/proxy_invalid
only drop invalid cstate packets if non liberal
2023-09-05 21:32:50 -07:00
Kubernetes Prow Robot
debe30de70 Merge pull request #120281 from gjkim42/feature-gate-sidecar-containers-in-kuberuntime
Feature-gate SidecarContainers code in pkg/kubelet/kuberuntime
2023-09-05 18:34:54 -07:00
Kubernetes Prow Robot
a7f9e70384 Merge pull request #120413 from pohly/scheduler-in-flight-events-fix
scheduler: fix tracking of concurrent events
2023-09-05 15:17:03 -07:00
Kubernetes Prow Robot
f68c66f96d Merge pull request #119142 from aramase/aramase/f/kep_3331_add_feature_flag
[StructuredAuthenticationConfig] Add feature flag and wire up `--authentication-config` flag
2023-09-05 13:08:51 -07:00
Patrick Ohly
c131c92b9f scheduler: unit test case for concurrent event with other pod
The problematic scenario was having one pod in flight, one event in the list,
and then detecting a concurrent event for a second pod after the first pod is
done. The new test case covers that.

To make it work without assumptions about the implementation, the QueuedPodInfo
returned by Pop must be the one passed to AddUnschedulableIfNotPresent
after (potentially) populating UnschedulablePlugins. This is done via callback
functions which bind to the same shared variable.
2023-09-05 21:01:13 +02:00
Kubernetes Prow Robot
73580b2038 Merge pull request #120336 from pohly/dra-generated-name-hyphen
resource claim controller: separate generated suffix from base
2023-09-05 11:22:51 -07:00
Patrick Ohly
cd943dd95e scheduler: fix tracking of concurrent events
The previous approach was based on the assumption that an in-flight pod can use
the head of the received event list as marker for identifying all events that
occur while the pod is in flight. That assumption is incorrect: when that
existing element gets removed from the list because all pods that were
in-flight when it was received are done, that marker's Next method returns nil
and the code which should have seen several concurrent events (if there were
any) missed all of those.

As a result, a pod with concurrent events could incorrectly get moved to the
unschedulable queue where it could got stuck until the next periodic purging
after 5 minutes if there was no other event for it.

The approach with maintaining a single list of concurrent events can be fixed
by inserting each in-flight pod into the list and using that element to
identify "more recent" events for the pod.
2023-09-05 19:58:38 +02:00
Antonio Ojea
933bcc123b only drop invalid cstate packets if non liberal
Conntrack invalid packets may cause unexpected and subtle bugs
on esblished connections, because of that we install by default an
iptables rules that drops the packets with this conntrack state.

However, there are network scenarios, specially those that use multihoming
nodes, that may have legit traffic that is detected by conntrack as
invalid, hence these iptables rules are causing problems dropping this
traffic.

An alternative to solve the spurious problems caused by the invalid
connectrack packets is to set the sysctl nf_conntrack_tcp_be_liberal
option, but this is a system wide setting and we don't want kube-proxy
to be opinionated about the whole node networking configuration.

Kube-proxy will only install the DROP rules for invalid conntrack states
if the nf_conntrack_tcp_be_liberal is not set.

Change-Id: I5eb326931ed915f5ae74d210f0a375842b6a790e
2023-09-05 14:16:17 +00:00
Kubernetes Prow Robot
8e2b12a220 Merge pull request #119068 from lauchokyip/podgc-unit-test
added podgc orphaned pod unit tests
2023-09-05 03:19:49 -07:00
Kubernetes Prow Robot
9fca4ec44a Merge pull request #120399 from SataQiu/clean-scheduler-20230904
scheduler: remove unused constant SchedulerPolicyConfigMapKey
2023-09-05 00:05:52 -07:00
Kubernetes Prow Robot
5d94b2a8e8 Merge pull request #118709 from ty-dc/pr/ut
[UT] add ut for pkg/registry/networking/ipaddress
2023-09-04 02:49:48 -07:00
SataQiu
cae090e7fe scheduler: remove unused constant SchedulerPolicyConfigMapKey 2023-09-04 17:48:36 +08:00
Patrick Ohly
3c2cfd9a4f resource claim controller: separate generated suffix from base
When the resource claim name inside the pod had some suffix like "1a" in
"resource-1a", the generated name suffix got added directly after that, leading
to "my-pod-resource-1ax6zgt".

Adding another hyphen makes the result more readable: "my-pod-resource-1a-x6zgt".
2023-09-04 09:45:25 +02:00
Kubernetes Prow Robot
d4050a80c7 Merge pull request #119394 from aroradaman/fix/proxy-conntrack
Fix stale conntrack flow detection logic
2023-09-03 14:53:46 -07:00
Kubernetes Prow Robot
03762cbcb5 Merge pull request #120316 from dims/move-to-new-repo-for-reference
New repo who dis? distribution/reference
2023-09-02 21:05:11 -07:00
Davanum Srinivas
ceaed508ce Validate the cloud-provider passed in and the corresponding feature flags
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-09-02 13:08:04 -04:00
Daman Arora
2e5f17166b pkg/proxy: fix stale detection logic
Signed-off-by: Daman Arora <aroradaman@gmail.com>
2023-09-02 12:45:19 +05:30
Davanum Srinivas
42e8cfa28a fix failing metadata test
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-09-01 15:22:07 -04:00
Davanum Srinivas
cdcbfcc0a6 [KEP-2395] Phase 4 - Disabling In-Tree Providers
https://github.com/kubernetes/enhancements/tree/master/keps/sig-cloud-provider/2395-removing-in-tree-cloud-providers#phase-4---disabling-in-tree-providers

DisableCloudProviders - this feature gate will disable any functionality
in kube-apiserver, kube-controller-manager and kubelet related to the
--cloud-provider component flag.

DisableKubeletCloudCredentialProvider - this feature gate will disable
in-tree functionality in the kubelet to authenticate to the Azure and
GCP container registries for image pull credentials.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-09-01 15:22:07 -04:00
Kubernetes Prow Robot
470fe396bd Merge pull request #120275 from danwinship/kep-3705-to-beta
Bump CloudDualStackNodeIPs to beta for 1.29
2023-09-01 11:28:43 -07:00
Kubernetes Prow Robot
1cadbd5887 Merge pull request #120172 from DrAuYueng/fix-log-in-deployment-controller
Fix pod deletion log in deployment controller
2023-09-01 11:28:31 -07:00
Kubernetes Prow Robot
efadb94a74 Merge pull request #120259 from SataQiu/clean-apf-20230830
apf: remove v1alpha1 API
2023-08-31 19:44:42 -07:00
Kubernetes Prow Robot
36352ba306 Merge pull request #120233 from xuzhenglun/master
KEP-3668: promote ServiceNodePortStaticSubrange to stable
2023-08-31 19:44:31 -07:00
Davanum Srinivas
fcff996909 adjust error message a bit
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-08-31 21:53:43 -04:00
Davanum Srinivas
889c8e919b New repo who dis? distribution/reference
github.com/docker/distribution/reference has a new home github.com/distribution/reference

and a new tag v0.5.0. Let's switch to that.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-08-31 21:53:40 -04:00
Albert Sverdlov
a46bab6930 Fix a job quota related deadlock (#119776)
* Fix a job quota related deadlock

In case ResourceQuota is used and sets a max # of jobs, a CronJob may get
trapped in a deadlock:
  1. Job quota for a namespace is reached.
  2. CronJob controller can't create a new job, because quota is
     reached.
  3. Cleanup of jobs owned by a cronjob doesn't happen, because a
     control loop iteration is finished because of an error to create a
     job.

To fix this we stop early quitting from a control loop iteration when
cronjob reconciliation failed and always let old jobs to be cleaned up.

* Dont reorder imports

* Don't stop requeuing on reconciliation error

Previous code only logged the reconciliation error inside jm.sync() and
didn't return the reconciliation error to it's invoker
processNextWorkItem().

Adding a copy-paste back to avoid this issue.

* Remove copy-pasted cleanupFinishedJobs()

Now we always call jm.cleanupFinishedJobs() first and then
jm.syncCronJob().

We also extract cronJobCopy and updateStatus outside jm.syncCronJob
function and pass pointers to them in both jm.syncCronJob and
jm.cleanupFinishedJobs to make delayed updates handling more explicit
and not dependent on the order in which cleanupFinishedJobs and
syncCronJob are invoked.

* Return updateStatus bool instead of changing the reference

* Explicitly ignore err in tests to fix linter
2023-08-31 08:25:00 -07:00
Gunju Kim
696f84aeb0 Feature-gate SidecarContainers code in pkg/kubelet/kuberuntime 2023-09-01 00:13:47 +09:00
Anish Ramasekar
9e1ff1e512 add loading config and wire feature flag
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2023-08-30 23:14:56 +00:00
Dan Winship
04f0b4a014 Bump CloudDualStackNodeIPs to beta for 1.29 2023-08-30 13:22:39 -04:00
Quan Tian
2b69daa960 Allow specifying ExternalTrafficPolicy for ClusterIP Services with ExternalIPs
When defining a ClusterIP Service, we can specify externalIP, and the
traffic policy of externalIP is subject to externalTrafficPolicy.
However, the policy can't be set when type is not NodePort or
LoadBalancer, and will default to Cluster when kube-proxy processes the
Service.

This commit updates the defaulting and validation of Service to allow
specifying ExternalTrafficPolicy for ClusterIP Services with
ExternalIPs.

Signed-off-by: Quan Tian <qtian@vmware.com>
2023-08-30 23:56:47 +08:00
SataQiu
2825519da2 apf: remove v1alpha1 API 2023-08-30 20:48:42 +08:00
Kubernetes Prow Robot
370c85f5ab Merge pull request #118493 from kerthcet/cleanup/pod-status-reason
Remove reasons from PodConditionType
2023-08-30 01:40:47 -07:00
xuzhenglun
af7d76fa05 Promoted feature gate ServiceNodePortStaticSubrange to stable and locked it to always active 2023-08-30 10:06:46 +08:00
Kubernetes Prow Robot
cd91351dff Merge pull request #117720 from kerthcet/feat/remove-selector-spread
Remove deprecated selectorSpread
2023-08-29 00:25:22 -07:00
Kubernetes Prow Robot
3e910875a7 Merge pull request #120125 from kerthcet/cleanup/write-to-cycle
Make sure skipped score plugins always returned
2023-08-28 15:13:20 -07:00
Kubernetes Prow Robot
9c25ce6f3e Merge pull request #119540 from SataQiu/clean-apiserver-20230724
Remove the deprecated kube-apiserver identity lease garbage collector for k8s.io/component=kube-apiserver
2023-08-28 10:49:42 -07:00
Kubernetes Prow Robot
a3374795e4 Merge pull request #120204 from mimowo/make-onpodconditions-optional
Mark Job onPodConditions as optional in pod failure policy
2023-08-28 09:47:42 -07:00
Kubernetes Prow Robot
029d518970 Merge pull request #117588 from kerthcet/cleanup/use-genericset
Avoid duplicated dots in pod status when preempting
2023-08-28 08:39:44 -07:00
kerthcet
580f83ab4a Avoid duplicated dots in pod condition
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-08-28 22:36:36 +08:00
kerthcet
855b445d28 Remove deprecated selectorSpread
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-08-28 22:11:33 +08:00
Kubernetes Prow Robot
faf1b5d655 Merge pull request #114685 from AxeZhan/dynamicresources
dynamic resource allocation: optimize class.SuitableNodes usage
2023-08-28 04:43:43 -07:00
Michal Wozniak
cc784cfe85 Mark Job onPodConditions as optional in pod failure policy 2023-08-28 11:42:56 +02:00
kerthcet
3d583398fe Avoid to build the error msg for twice
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-08-28 17:13:39 +08:00
Daman Arora
01df59a73b pkg/proxy: remove equal method from endpoint interface
Signed-off-by: Daman Arora <aroradaman@gmail.com>
2023-08-27 18:16:44 +05:30
Kubernetes Prow Robot
0e86fa5115 Merge pull request #118984 from aramase/aramase/c/kep_3331_wiring_flag_with_api
[StructuredAuthenticationConfig] Create struct for authn config and re-wire OIDC flags to use it
2023-08-25 11:52:55 -07:00
Anish Ramasekar
1bad3cbbf5 wiring existing oidc flags with internal API struct
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
2023-08-25 17:15:33 +00:00
DrAuYueng
a4ce32769f fix pod delete log in deployment controller
Signed-off-by: DrAuYueng <ouyang1204@gmail.com>
2023-08-25 22:20:51 +08:00
Kubernetes Prow Robot
10c622e99a Merge pull request #119994 from SataQiu/remove-scheduler-v1beta3
scheduler: remove deprecated v1beta3 KubeSchedulerConfiguration component config
2023-08-24 15:31:17 -07:00
Kubernetes Prow Robot
df8bfdf55e Merge pull request #120102 from p0lyn0mial/upstream-storage-etcd-new-params
storage/factory: extend the Create method by newList and resourcePrefix params
2023-08-24 05:22:32 -07:00