Automatic merge from submit-queue
Initial work on running windows containers on Kubernetes
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
This is the first stab at getting the Kubelet running on Windows (fixes#30279), and getting it to deploy network-accessible pods that consist of Windows containers. Thanks @csrwng, @jbhurat for helping out.
The main challenge with Windows containers at this point is that container networking is not supported. In other words, each container in the pod will get it's own IP address. For this reason, we had to make a couple of changes to the kubelet when it comes to setting the pod's IP in the Pod Status. Instead of using the infra-container's IP, we use the IP address of the first container.
Other approaches we investigated involved "disabling" the infra container, either conditionally on `runtime.GOOS` or having a separate windows-docker container runtime that re-implemented some of the methods (would require some refactoring to avoid maintainability nightmare).
Other changes:
- The default docker endpoint was removed. This results in the docker client using the default for the specific underlying OS.
More detailed documentation on how to setup the Windows kubelet can be found at https://docs.google.com/document/d/1IjwqpwuRdwcuWXuPSxP-uIz0eoJNfAJ9MWwfY20uH3Q.
cc: @ikester @brendandburns @jstarks
Mark NodeLegacyHostIP will be deprecated in 1.7;
Let cloudprovider that used to only set NodeLegacyHostIP set the IP as both InternalIP and ExternalIP, to allow dprecation in 1.7
Generating self links, especially for lists, is inefficient. Replace
use of net.URL.String() with direct encoding that reduces number of
allocations. Switch from calling meta.ExtractList|SetList to a function
that iterates over each object in the list.
In steady state for nodes performing frequently small get/list
operations, and for larger LISTs significantly reduces CPU and
allocations.
Automatic merge from submit-queue
Avoid double decoding all client responses
Fixes#35982
The linked issue uncovered that we were always double decoding the response in restclient for get, list, update, create, and patch. That's fairly expensive, most especially for list. This PR refines the behavior of the rest client to avoid double decoding, and does so while minimizing the changes to rest client consumers.
restclient must be able to deal with multiple types of servers. Alter the behavior of restclient.Result#Raw() to not process the body on error, but instead to return the generic error (which still matches the error checking cases in api/error like IsBadRequest). If the caller uses
.Error(), .Into(), or .Get(), try decoding the body as a Status.
For older servers, continue to default apiVersion "v1" when calling restclient.Result#Error(). This was only for 1.1 servers and the extensions group, which we have since fixed.
This removes a double decode of very large objects (like LIST) - we were trying to DecodeInto status, but that ends up decoding the entire result and then throwing it away. This makes the decode behavior specific to the type of action the user wants.
```release-note
The error handling behavior of `pkg/client/restclient.Result` has changed. Calls to `Result.Raw()` will no longer parse the body, although they will still return errors that react to `pkg/api/errors.Is*()` as in previous releases. Callers of `Get()` and `Into()` will continue to receive errors that are parsed from the body if the kind and apiVersion of the body match the `Status` object.
This more closely aligns rest client as a generic RESTful client, while preserving the special Kube API extended error handling for the `Get` and `Into` methods (which most Kube clients use).
```
Automatic merge from submit-queue
Simplify negotiation in server in preparation for multi version support
This is a pre-factor for #33900 to simplify runtime.NegotiatedSerializer, tighten up a few abstractions that may break when clients can request different client versions, and pave the way for better negotiation.
View this as pure simplification.
- Prevents kubelet from overwriting capacity during sync.
- Handles opaque integer resources in the scheduler.
- Adds scheduler predicate tests for opaque resources.
- Validates opaque int resources:
- Ensures supplied opaque int quantities in node capacity,
node allocatable, pod request and pod limit are integers.
- Adds tests for new validation logic (node update and pod spec).
- Added e2e tests for opaque integer resources.
Alter how runtime.SerializeInfo is represented to simplify negotiation
and reduce the need to allocate during negotiation. Simplify the dynamic
client's logic around negotiating type. Add more tests for media type
handling where necessary.
Automatic merge from submit-queue
Adding cascading deletion support to federated namespaces
Ref https://github.com/kubernetes/kubernetes/issues/33612
With this change, whenever a federated namespace is deleted with `DeleteOptions.OrphanDependents = false`, then federation namespace controller first deletes the corresponding namespaces from all underlying clusters before deleting the federated namespace.
cc @kubernetes/sig-cluster-federation @caesarxuchao
```release-note
Adding support for DeleteOptions.OrphanDependents for federated namespaces. Setting it to false while deleting a federated namespace also deletes the corresponding namespace from all registered clusters.
```
Automatic merge from submit-queue
Make "Unschedulable" reason a constant in api
String "Unschedulable" is used in couple places in K8S:
* scheduler
* federation replicaset and deployment controllers
* cluster autoscaler
* rescheduler
This PR makes the string a part of API so it not changed.
cc: @davidopp @fgrzadkowski @wojtek-t
Automatic merge from submit-queue
Deny service ClusterIP update from `None`
**What this PR does / why we need it**: Headless service should not be transformed into a service with ClusterIP, therefore update of this field if it's set to `None` is disallowed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#33029
**Release note**:
```release-note
Changing service ClusterIP from `None` is not allowed anymore.
```
Automatic merge from submit-queue
Remove last probe time from replica sets
While experimenting with Deployment conditions I found out that if we are going to use lastProbeTime as we are supposed to be using it then we hotloop between updates (see https://github.com/kubernetes/kubernetes/pull/19343#issuecomment-255096778 for more info)
cc: @smarterclayton @soltysh
Automatic merge from submit-queue
Optimize label selector
The number of values for a given label is generally pretty small (in huge majority of cases it is exactly one value).
Currently computing selectors is up to 50% of CPU usage in both apiserver and scheduler.
Changing the structure in which those values are stored from map to slice improves the performance of typical usecase for computing selectors.
Early results:
- scheduler throughput it ~15% higher
- apiserver cpu-usage is also lower (seems to be also ~10-15%)
Automatic merge from submit-queue
Loadbalanced client src ip preservation enters beta
Sounds like we're going to try out the proposal (https://github.com/kubernetes/kubernetes/issues/30819#issuecomment-249877334) for annotations -> fields on just one feature in 1.5 (scheduler). Or do we want to just convert to fields right now?
Automatic merge from submit-queue
requests.storage is a standard resource name
The value `requests.storage` is a valid standard resource name but was omitted from the standard list.
Automatic merge from submit-queue
Add validation that detects repeated keys in the labels and annotations maps
Fixes#2965 (a nearly 2 year old feature request!)
@kubernetes/kubectl
@eparis
Automatic merge from submit-queue
Adding default StorageClass annotation printout for resource_printer and describer and some refactoring
adding ISDEFAULT for _kubectl get storageclass_ output
```
[root@screeley-sc1 gce]# kubectl get storageclass
NAME TYPE ISDEFAULT
another-class kubernetes.io/gce-pd NO
generic1-slow kubernetes.io/gce-pd YES
generic2-fast kubernetes.io/gce-pd YES
```
```release-note
Add ISDEFAULT to kubectl get storageClass output
```
@kubernetes/sig-storage
Automatic merge from submit-queue
Split conversion and defaulting
Separate conversion and defaulting. Defaulting occurs mixed with conversion today - change the server so that the `VersioningCodec` performs defaulting on the external type during decoding.
* Add a new method to `Scheme` - `func (*runtime.Scheme) Default(runtime.Object)` - that takes an object and performs defaulting.
* Call `Default` during decoding and at static initialization time
* Use the new `defaulter-gen` to generate top level object defaulters (`v1.Pod`) at build time for any type that needs to perform defaulting.
* Add tests and alter the existing code to adapt as necessary
* Fix a few bugs in conversions that depended on defaulting behavior
---
Step 1 of decoupling conversion and defaulting. The generator will assist in creating top level defaulters that in a single method invoke all nested defaulters, preventing the need to recurse via reflection or conversion. These top level defaulters will be registered in the scheme and invoked instead of the nested recursion path. This will set the stage for a future generator, capable of creating defaulters from embedded struct tags on external types. However, we must gradually switch these over.
The immediate goal here is to split defaulting and conversion so that the unsafe convertor can be used to maximum potential (we would be able to use direct memory conversion for any identical nested struct, even those that must be defaulted).
The generator uses `k8s:defaulter-gen=TypeMeta` on most public packages to flag any top level type that has defaulters to get a `SetObjectDefaults_NAME` function created (types that don't have defaulters won't have functions). This also creates a `RegisterDefaults` method that applies a default to an interface{} and returns true if the object was handled. Existing defaults are left as is.
Add a test to verify old and new path generate the same outcomes. Defaulter will move to gengo before this is merged, and subsequent PRs will remove defaulting during conversion and have the VersioningCodec apply defaults.
Most normal codec use should perform defaulting. DirectCodecs should not
perform defaulting. Update the defaulting_test to fuzz the list of known
defaulters. Use the new versioning.NewDefaultingCodec() method.
Automatic merge from submit-queue
Add PSP support for seccomp profiles
Seccomp support for PSP. There are still a couple of TODOs that need to be fixed but this is passing tests.
One thing of note, since seccomp is all being stored in annotations right now it breaks some of the assumptions we've stated for the provider in terms of mutating the passed in pod. I've put big warning comments around the pieces that do that to make sure it's clear and covered the rollback in admission if the policy fails to validate.
@sttts @pmorie @erictune @smarterclayton @liggitt
Automatic merge from submit-queue
Add unit test for bad ReclaimPolicy and valid ReclaimPolicy in /pkg/api/validation
unit tests for validation.go regarding PersistentVolumeReclaimPolicy (bad value and good value)
see PR: #30304
Automatic merge from submit-queue
Add PVC storage to LimitRange
This PR adds the ability to add a LimitRange to a namespace that enforces min/max on `pvc.Spec.Resources.Requests["storage"]`.
@derekwaynecarr @abhgupta @kubernetes/sig-storage
Examples forthcoming.
```release-note
pvc.Spec.Resources.Requests min and max can be enforced with a LimitRange of type "PersistentVolumeClaim" in the namespace
```
Automatic merge from submit-queue
Match GroupVersionKind against specific version
Currently when multiple GVK match a specific kind in `KindForGroupVersionKinds` only the first will be matched, which not necessarily will be the correct one. I'm proposing to extend this to pick the best match, instead.
Here's my problematic use-case, of course it involves ScheduledJobs 😉:
I have a `GroupVersions` with `batch/v1` and `batch/v2alpha1` in that order. I'm calling `KindForGroupVersionKinds` with kind `batch/v2alpha1 ScheduledJob` and that currently results this matching first `GroupVersion`, instead of picking more concrete one. There's a [clear description](ee77d4e6ca/pkg/api/unversioned/group_version.go (L183)) why it is on single `GroupVersion`, but `GroupVersions` should pick this more carefully.
@deads2k this is your baby, wdyt?
Contination of #1111
I tried to keep this PR down to just a simple search-n-replace to keep
things simple. I may have gone too far in some spots but its easy to
roll those back if needed.
I avoided renaming `contrib/mesos/pkg/minion` because there's already
a `contrib/mesos/pkg/node` dir and fixing that will require a bit of work
due to a circular import chain that pops up. So I'm saving that for a
follow-on PR.
I rolled back some of this from a previous commit because it just got
to big/messy. Will follow up with additional PRs
Signed-off-by: Doug Davis <dug@us.ibm.com>
Automatic merge from submit-queue
Dynamic provisioning for flocker volume plugin
Refactor flocker volume plugin
* [x] Support provisioning beta (#29006)
* [x] Support deletion
* [x] Use bind mounts instead of /flocker in containers
* [x] support ownership management or SELinux relabeling.
* [x] adds volume specification via datasetUUID (this is guranted to be unique)
I based my refactor work to replicate pretty much GCE-PD behaviour
**Related issues**: #29006#26908
@jsafrane @mattbates @wallrj @wallnerryan
Automatic merge from submit-queue
Revert "Work around the etcd watch issue"
Reverts kubernetes/kubernetes#33101
Since #33393 is merged, the bug should have been fixed.