Automatic merge from submit-queue
validate third party resources
addresses validation portion of https://github.com/kubernetes/kubernetes/issues/22768
* ThirdPartyResource: validates name (3 segment DNS subdomain) and version names (single segment DNS label)
* ThirdPartyResourceData: validates objectmeta (name is validated as a DNS label)
* removes ability to use GenerateName with thirdpartyresources (kind and api group should not be randomized, in my opinion)
test improvements:
* updates resttest to clean up after create tests (so the same valid object can be used)
* updates resttest to take a name generator (in case "foo1" isn't a valid name for the object under test)
action required for alpha thirdpartyresource users:
* existing thirdpartyresource objects that do not match these validation rules will need to be removed/updated (after removing thirdpartyresourcedata objects stored under the disallowed versions, kind, or group names)
* existing thirdpartyresourcedata objects that do not match the name validation rule will not be able to be updated, but can be removed
Automatic merge from submit-queue
Optimize group version allocations
Avoid allocation in strings.Split() for most common cases.
Extracted from #24845
@wojtek-t or @deads2k
Automatic merge from submit-queue
Make IsQualifiedName return error strings
Part of the larger validation PR, broken out for easier review and merge.
@lavalamp FYI, but I know you're swamped, too.
Automatic merge from submit-queue
Automatically add node labels beta.kubernetes.io/{os,arch}
Proposal: #17981
As discussed in #22623:
> @davidopp: #9044 says cloud provider but can also cover platform stuff.
Adds a label `beta.kubernetes.io/platform` to `kubelet` that informs about the os/arch it's running on.
Makes it easy to specify `nodeSelectors` for different arches in multi-arch clusters.
```console
$ kubectl get no --show-labels
NAME STATUS AGE LABELS
127.0.0.1 Ready 1m beta.kubernetes.io/platform=linux-amd64,kubernetes.io/hostname=127.0.0.1
$ kubectl describe no
Name: 127.0.0.1
Labels: beta.kubernetes.io/platform=linux-amd64,kubernetes.io/hostname=127.0.0.1
CreationTimestamp: Thu, 31 Mar 2016 20:39:15 +0300
```
@davidopp @vishh @fgrzadkowski @thockin @wojtek-t @ixdy @bgrant0607 @dchen1107 @preillyme
Automatic merge from submit-queue
Add IPv6 address support for pods - does NOT include services
This allows a container to have an IPv6 address only and extracts the address via nsenter and iproute2 or the docker client directly. An IPv6 address is now correctly reported when describing a pod.
@thockin @kubernetes/sig-network
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/23090)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Resource name constants were incorrect in versioned types.go
The constant field names and actual values to quota requests and limits for cpu and memory were incorrect in the v1 types.go file.
Important to note for reviewer:
* the constant fields here are unused in the project at this time
* the values those unused constant fields mapped to are not actually supported by the project
* there is no backwards compatibility concern, but if/when we look to convert to using versioned clients, we should have the correct constant fields and values.
The correct values were here:
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/types.go#L2213
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24136)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Reduce allocations during conversion, enable new UnsafeConvertToVersion path
Cleans up the conversion path to avoid a few unnecessary allocations, then creates a new UnsafeConvertToVersion path that will allow encode/decode to bypass copying the object for performance. In that subsequent PR, ConvertToVersion will start to call Copy() and we will refactor conversions to reuse as much of the existing object as possible.
Also changes the unversioned.ObjectKind signature to not require allocations - speeds up a few common paths.
Automatic merge from submit-queue
WIP v0 NVIDIA GPU support
```release-note
* Alpha support for scheduling pods on machines with NVIDIA GPUs whose kubelets use the `--experimental-nvidia-gpus` flag, using the alpha.kubernetes.io/nvidia-gpu resource
```
Implements part of #24071 for #23587
I am not familiar with the scheduler enough to know what to do with the scores. Mostly punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and docs
cc @erictune @davidopp @dchen1107 @vishh @Hui-Zhi @gopinatht
Automatic merge from submit-queue
Add pod condition PodScheduled to detect situation when scheduler tried to schedule a Pod, but failed
Set `PodSchedule` condition to `ConditionFalse` in `scheduleOne()` if scheduling failed and to `ConditionTrue` in `/bind` subresource.
Ref #24404
@mml (as it seems to be related to "why pending" effort)
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24459)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Move internal types of hpa from pkg/apis/extensions to pkg/apis/autoscaling
ref #21577
@lavalamp could you please review or delegate to someone from CSI team?
@janetkuo could you please take a look into the kubelet changes?
cc @fgrzadkowski @jszczepkowski @mwielgus @kubernetes/autoscaling
Automatic merge from submit-queue
Added JobTemplate, a preliminary step for ScheduledJob and Workflow
@sdminonne as promised, sorry it took this long 😊
@erictune fyi though it does not have to be in for 1.2
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/21675)
<!-- Reviewable:end -->
Implements part of #24071
I am not familiar with the scheduler enough to know what to do with the scores. Punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and user docs
Automatic merge from submit-queue
Add subPath to mount a child dir or file of a volumeMount
Allow users to specify a subPath in Container.volumeMounts so they can use a single volume for many mounts instead of creating many volumes. For instance, a user can now use a single PersistentVolume to store the Mysql database and the document root of an Apache server of a LAMP stack pod by mapping them to different subPaths in this single volume.
Also solves https://github.com/kubernetes/kubernetes/issues/20466.
Automatic merge from submit-queue
Avoid allocations and a reflect.Call in conversion
reflect.Call is fairly expensive, performing 8 allocations and having to
set up a call stack. Using a fairly straightforward to generate switch
statement, we can bypass that early in conversion (as long as the
function takes responsibility for invocation). We may also be able to
avoid an allocation for the conversion scope, but not positive yet.
```
benchmark old ns/op new ns/op delta
BenchmarkPodConversion-8 14713 12173 -17.26%
benchmark old allocs new allocs delta
BenchmarkPodConversion-8 80 72 -10.00%
benchmark old bytes new bytes delta
BenchmarkPodConversion-8 9133 8712 -4.61%
```
@wojtek-t related to #20309
Automatic merge from submit-queue
Kubelet: Cleanup with new engine api
Finish step 2 of #23563
This PR:
1) Cleanup go-dockerclient reference in the code.
2) Bump up the engine-api version.
3) Cleanup the code with new engine-api.
Fixes#24076.
Fixes#23809.
/cc @yujuhong
Automatic merge from submit-queue
API changes for Cascading deletion
This PR includes the necessary API changes to implement cascading deletion with finalizers as proposed is in #23656. Comments are welcome.
@lavalamp @derekwaynecarr @bgrant0607 @rata @hongchaodeng
The codec factory should support two distinct interfaces - negotiating
for a serializer with a client, vs reading or writing data to a storage
form (etcd, disk, etc). Make the EncodeForVersion and DecodeToVersion
methods only take Encoder and Decoder, and slight refactoring elsewhere.
In the storage factory, use a content type to control what serializer to
pick, and use the universal deserializer. This ensures that storage can
read JSON (which might be from older objects) while only writing
protobuf. Add exceptions for those resources that may not be able to
write to protobuf (specifically third party resources, but potentially
others in the future).
Automatic merge from submit-queue
Redo Unstructured to have accessor methods
Add accessor methods that implement pkg/api/unversioned.ObjectKind,
pkg/api/meta.Object, pkg/api/meta.Type and pkg/api/meta.List.
Removed the convenience fields since writing to them was not reflected
in serialized JSON.
Add accessor methods that implement pkg/api/unversioned.ObjectKind,
pkg/api/meta.Object, pkg/api/meta.Type and pkg/api/meta.List.
Removed the convenience fields since writing to them was not reflected
in serialized JSON.
Having internal and external integer types being different hides
potential conversion problems. Propagate that out further (which will
also allow us to better optimize conversion).
Automatic merge from submit-queue
Make all defaulters public
Will allow for generating direct accessors in conversion code instead of using reflection.
@wojtek-t
Automatic merge from submit-queue
Promote Pod Hostname & Subdomain to fields (were annotations)
Deprecating the podHostName, subdomain and PodHostnames annotations and created corresponding new fields for them on PodSpec and Endpoints types.
Annotation doc: #22564
Annotation code: #20688
reflect.Call is fairly expensive, performing 8 allocations and having to
set up a call stack. Using a fairly straightforward to generate switch
statement, we can bypass that early in conversion (as long as the
function takes responsibility for invocation). We may also be able to
avoid an allocation for the conversion scope, but not positive yet.
```
benchmark old ns/op new ns/op delta
BenchmarkPodConversion-8 14713 12173 -17.26%
benchmark old allocs new allocs delta
BenchmarkPodConversion-8 80 72 -10.00%
benchmark old bytes new bytes delta
BenchmarkPodConversion-8 9133 8712 -4.61%
```
Automatic merge from submit-queue
Fix use of docker removed ParseRepositoryTag() function
Docker has removed the ParseRepositoryTag() function in
leading to failures using the kubernetes Go client API.
Failure:
```
../k8s.io/kubernetes/pkg/util/parsers/parsers.go:30: undefined: parsers.ParseRepositoryTag
```
Automatic merge from submit-queue
Move internal types of job from pkg/apis/extensions to pkg/apis/batch
This addressed the job part of #23216, this is still WIP. Will notify once finished. I'd like to have it in before starting working on ScheduledJob.
@lavalamp @erictune fyi
Docker has removed the ParseRepositoryTag() function in
leading to failures using the kubernetes Go client API.
Lets use github.com/docker/distribution reference.ParseNamed()
instead.
Failure:
../k8s.io/kubernetes/pkg/util/parsers/parsers.go:30: undefined: parsers.ParseRepositoryTag
Add tests to watch behavior in both protocols (http and websocket)
against all 3 media types. Adopt the
`application/vnd.kubernetes.protobuf;stream=watch` media type for the
content that comes back from a watch call so that it can be
distinguished from a Status result.
Automatic merge from submit-queue
Remove requirement that Endpoints IPs be IPv4
Signed-off-by: André Martins <aanm90@gmail.com>
Release Note: The `Endpoints` API object now allows IPv6 addresses to be stored. Other components of the system are not ready for IPv6 yet, and many cloud providers are not IPv6 compatible, but installations that use their own controller logic can now store v6 endpoints.
Automatic merge from submit-queue
Add watch.Until, a conditional watch mechanism
A more powerful tool than wait.Poll, allows a watch interface to drive conditionals to react to changes on a resource or resources. Provide a set of standard conditions that are in common use in the code, and updates e2e to use a few of these.
Extracted from #23567
Also add helpers for collecting the events that happen during a watch
and a helper that makes it easy to start a watch from any object with
ObjectMeta.
Automatic merge from submit-queue
Implement a streaming serializer for watch
Changeover watch to use streaming serialization. Properly version the
watch objects. Implement simple framing for JSON and Protobuf (but not
YAML).
@wojtek-t @lavalamp
Automatic merge from submit-queue
Additional go vet fixes
Mostly:
- pass lock by value
- bad syntax for struct tag value
- example functions not formatted properly
Automatic merge from submit-queue
relax restmapper resource matching
We were matching case insensitive on Kinds, not Resources, thus driving me insane.
@liggitt @caesarxuchao
Automatic merge from submit-queue
Migrate to the new conversion generator - part1
This PR contains two commits:
- few more fixes to the generator
- migration of the pkg/api/v1 to use the new generator
The second commit is big, but I reviewed the changes and they contain:
- conversions between types that we didn't even generating conversion between
- changes in how we handle maps/pointers/slices - previously we were explicitly referencing fields, now we are using "shadowing in, out" to make the code more generic
- lack of auto-generated method for ReplicationControllerSpec (because these types are different (*int vs int for Replicas) and a preexisting conversion already exists
Most of issues in the first commit (e.g. adding references to "in" and "out" for slices/maps/points) were discovered by our tests. So I'm pretty confident that this change is correct now.
Here are a list of changes along with an explanation of how they work:
1. Add a new string field called TargetSelector to the external version of
extensions Scale type (extensions/v1beta1.Scale). This is a serialized
version of either the map-based selector (in case of ReplicationControllers)
or the unversioned.LabelSelector struct (in case of Deployments and
ReplicaSets).
2. Change the selector field in the internal Scale type (extensions.Scale) to
unversioned.LabelSelector.
3. Add conversion functions to convert from two external selector fields to a
single internal selector field. The rules for conversion are as follows:
i. If the target resource that this scale targets supports LabelSelector
(Deployments and ReplicaSets), then serialize the LabelSelector and
store the string in the TargetSelector field in the external version
and leave the map-based Selector field as nil.
ii. If the target resource only supports a map-based selector
(ReplicationControllers), then still serialize that selector and
store the serialized string in the TargetSelector field. Also,
set the the Selector map field in the external Scale type.
iii. When converting from external to internal version, parse the
TargetSelector string into LabelSelector struct if the string isn't
empty. If it is empty, then check if the Selector map is set and just
assign that map to the MatchLabels component of the LabelSelector.
iv. When converting from internal to external version, serialize the
LabelSelector and store it in the TargetSelector field. If only
the MatchLabel component is set, then also copy that value to
the Selector map field in the external version.
4. HPA now just converts the LabelSelector field to a Selector interface
type to list the pods.
5. Scale Get and Update etcd methods for Deployments and ReplicaSets now
return extensions.Scale instead of autoscaling.Scale.
6. Consequently, SubresourceGroupVersion override and is "autoscaling"
enabled check is now removed from pkg/master/master.go
7. Other small changes to labels package, fuzzer and LabelSelector
helpers to piece this all together.
8. Add unit tests to HPA targeting Deployments and ReplicaSets.
9. Add an e2e test to HPA targeting ReplicaSets.
In podSecurityPolicy:
1. Rename .seLinuxContext to .seLinux
2. Rename .seLinux.type to .seLinux.rule
3. Rename .runAsUser.type to .runAsUser.rule
4. Rename .seLinux.SELinuxOptions
1,2,3 as suggested by thockin in #22159.
I added 3 for consistency with 2.
Had to move other things around too to avoid a weird api ->
cloudprovider dependency.
Also adding fixes per code reviews.
(This is a squash of the previously approved commits)
Added selector generation to Job's
strategy.Validate, right before validation.
Can't do in defaulting since UID is not known.
Added a validation to Job to ensure that the generated
labels and selector are correct when generation was requested.
This happens right after generation, but validation is in a better
place to return an error.
Adds "manualSelector" field to batch/v1 Job to control selector generation.
Adds same field to extensions/__internal. Conversion between those two
is automatic.
Adds "autoSelector" field to extensions/v1beta1 Job. Used for storing batch/v1 Jobs
- Default for v1 is to do generation.
- Default for v1beta1 is to not do it.
- In both cases, unset == false == do the default thing.
Release notes:
Added batch/v1 group, which contains just Job, and which is the next
version of extensions/v1beta1 Job.
The changes from the previous version are:
- Users no longer need to ensure labels on their pod template are unique to the enclosing
job (but may add labels as needed for categorization).
- In v1beta1, job.spec.selector was defaulted from pod labels, with the user responsible for uniqueness.
In v1, a unique label is generated and added to the pod template, and used as the selector (other
labels added by user stay on pod template, but need not be used by selector).
- a new field called "manualSelector" field exists to control whether the new behavior is used,
versus a more error-prone but more flexible "manual" (not generated) seletor. Most users
will not need to use this field and should leave it unset.
Users who are creating extensions.Job go objects and then posting them using the go client
will see a change in the default behavior. They need to either stop providing a selector (relying on
selector generation) or else specify "spec.manualSelector" until they are ready to do the former.
Delete a job scale test
A subsequent PR is going to remove support
for this anyways.
Initialize extensions before batch and autoscaling
per @lavalamp review suggestion.
Leaving the type fields as comments for reference and reminder. But
deleting the conversion, defaulting and validation code. They can
always be brough back from the previous PR once the types are
introduced. Because builds break without them anyway that serves as a
reminder, so there is no need to leave them commented out.
Most volume plugins use SafeFormatAndMount, which uses ext4 by default.
FlexVolume plugin has FSType attribute 'omitempty', so reflect it in the
description of the type.
Move type LabelSelector and type LabelSelectorRequirement from pkg/apis/extensions
This avoids an import loop when Job (and later DaemonSet, Deployment, ReplicaSet)
are moved out of extensions to new api groups.
Also Move LabelSelectorAsSelector utility from pkg/apis/extensions/ to pkg/api/unversioned/
Also its test.
Also LabelSelectorOp* constants.
Also the pkg/apis/extensions/validation functions ValidateLabelSelectorRequirement and
ValidateLabelSelector move to pkg/api/unversioned
The related type in pkg/apis/extensions/v1beta1/ is staying there. I might move
it in another PR if neccessary.
This is useful because it allows defaulting to infer wheter a
string,bool value was provided by the user while still having the field
represented as a non pointer type in the internal representation.
Combine the fields that will be used for content transformation
(content-type, codec, and group version) into a single struct in client,
and then pass that struct into the rest client and request. Set the
content-type when sending requests to the server, and accept the content
type as primary.
Will form the foundation for content-negotiation via the client.
Remove Codec from versionInterfaces in meta (RESTMapper is now agnostic
to codec and serialization). Register api/latest.Codecs as the codec
factory and use latest.Codecs.LegacyCodec(version) as an equvialent to
the previous codec.
It makes more sense for `ValidatePositiveField` and
`ValidatePositiveQuantity` methods to be named `ValidateNonnegativeField`
and `ValidateNonnegativeQuantity` as that is what is truly being
checked. This commit simply updates the method names everywhere they are
used.
Enforce minimum resource granularity of milli-{core, bytes} for Storage,
Memory, and CPU resource types. For Storage and Memory, milli-bytes are
allowed for backwards compatability, but the behavior is
undefinied (depends on docker implementation).
Replace many of the remaining s.Convert() invocations with direct
execution, and make generated methods public. Removes 10% of the
allocations during decode of a pod and ~20-40% of the total CPU time.
1. Name default scheduler with name `kube-scheduler`
2. The default scheduler only schedules the pods meeting the following condition:
- the pod has no annotation "scheduler.alpha.kubernetes.io/name: <scheduler-name>"
- the pod has annotation "scheduler.alpha.kubernetes.io/name: kube-scheduler"
update gofmt
update according to @david's review
run hack/test-integration.sh, hack/test-go.sh and local e2e.test
I took a hard look at error output and played until I was happier. This now
prints JSON for structs in the error, rather than go's format.
Also made the error message easier to read.
Fixed tests.
Treat `nil` Amount as 0 in `resource.Quantity.Add` and
`resource.Quantity.Sub`. Also, allow adding/subtracting resources with
different Formats (since Format has no effect on the underlying value).
The pending codec -> conversion split changes the signature of
Encode and Decode to be more complicated. Create a stub helper
with the exact semantics of today and do the simple mechanical
refactor here to reduce the cost of that change.
This enables use of software or hardware transports viz. be2iscsi,
bnx2i, cxgb3i, cxgb4i, qla4xx, iser and ocs. The default transport
(tcp) happens to be called "default".
Use of non-default transports changes the disk path to the following format:
/dev/disk/by-path/pci-<pci_id>-ip-<portal>-iscsi-<iqn>-lun-<lun_id>
The original scale function takes around 800ns/op with more
than 10 allocations. It significantly slow down scheduler
and other components that heavily relys on resource pkg.
For more information see #18126.
This pull request tries to optimize scale function. It takes
two approach:
1. when the value is small, only use normal math ops.
2. when the value is large, use math.Big with buffer pool.
The final result is:
BenchmarkScaledValueSmall-4 20000000 66.9 ns/op 0 B/op 0 allocs/op
BenchmarkScaledValueLarge-4 2000000 711 ns/op 48 B/op 1 allocs/op
I also run the scheduler benchmark again. It doubles the throughput of
scheduler for 1000 nodes case.
Before this change we have a mish-mash of ways to pass field names around for
error generation. Sometimes string fieldnames, sometimes .Prefix(), sometimes
neither, often wrong names or not indexed when it should be.
Instead of that mess, this is part one of a couple of commits that will make it
more strongly typed and hopefully encourage correct behavior. At least you
will have to think about field names, which is better than nothing.
It turned out to be really hard to do this incrementally.
All external types that are not int64 are now marked as int32,
including
IntOrString. Prober is now int32 (43 years should be enough of an initial
probe time for anyone).
Did not change the metadata fields for now.