Automatic merge from submit-queue (batch tested with PRs 38354, 38371)
Add GetOptions parameter to Get() calls in client library
Ref #37473
This PR is super mechanical - the non trivial commits are:
- Update client generator
- Register GetOptions in batch/v2alpha1 group
Automatic merge from submit-queue (batch tested with PRs 37328, 38102, 37261, 31321, 38146)
Make thirdparty codec able to decode DeleteOptions
Fix#37278.
Without this PR, the gvk sent to the delegated codec will be the thirdparty one, which is not recognized by the delegated codec (usually api.Codecs).
RESTMapping method can now rely on RESTMappings by passing versions parameter and taking the first match found by RESTMappings method. In addition
a UT that test the new method has been added.
The only change in logic to what was before is when calling RESTMapping
we search all defaultGroupVersion as opposed to just one when no mapping was found for provided versions.
Automatic merge from submit-queue (batch tested with PRs 35300, 36709, 37643, 37813, 37697)
[etcd] test cleanup: remove unnecessary AddPrefix()
What?
Remove etcdtest.AddPrefix() in tests. They will be automatically prepended in etcd storage.
Why?
ref: #36290#36374
After the change, it will double prepend.
Automatic merge from submit-queue
Reuse fields and labels
This should significantly reduce memory allocations in apiserver in large cluster.
Explanation:
- every kubelet is refreshing watch every 5-10 minutes (this generally is not causing relist - it just renews watch)
- that means, in 5000-node cluster, we are issuing ~10 watches per second
- since we don't have "watch heartbets", the watch is issued from previously received resourceVersion
- to make some assumption, let's assume pods are evenly spread across pods, and writes for them are evenly spread - that means, that a given kubelet is interested in 1 per 5000 pod changes
- with that assumption, each watch, has to process 2500 (on average) previous watch events
- for each of such even, we are currently computing fields.
This PR is fixing this problem.
Automatic merge from submit-queue
switch bootstrap controller to use a client where possible
While looking at https://github.com/kubernetes/kubernetes/issues/37040, I found more places where we can use a normal client instead of a direct to etcd connection.
@wojtek-t you made similar changes in the same controller.
Automatic merge from submit-queue
Fix logic error in graceful deletion
If a resource has the following criteria:
1. deletion timestamp is not nil
2. deletion graceperiod seconds as persisted in storage is 0
the resource could never be deleted as we always returned pending graceful.
Automatic merge from submit-queue
Rename ScheduledJobs to CronJobs
I went with @smarterclayton idea of registering named types in schema. This way we can support both the new (CronJobs) and old (ScheduledJobs) resource name. Fixes#32150.
fyi @erictune @caesarxuchao @janetkuo
Not ready yet, but getting close there...
**Release note**:
```release-note
Rename ScheduledJobs to CronJobs.
```
Automatic merge from submit-queue
Adding more e2e tests for federated namespace cascading deletion and fixing bugs
Ref https://github.com/kubernetes/kubernetes/issues/33612
Adding more e2e tests for testing cascading deletion of federated namespace.
New tests are now verifying that cascading deletion happen when DeletionOptions.OrphanDependents=false and it does not happen when DeleteOptions.OrphanDependents=true.
Also updated deletion helper to always add OrphanFinalizer. generic registry will remove it if DeleteOptions.OrphanDependents=false. Also updated namespace registry to do the same.
We need to add the orphan finalizer to keep the orphan by default behavior. We assume that its dependents are going to be orphaned and hence add that finalizer. If user does not want the orphan behavior, he can do so using DeleteOptions and then the registry will remove that finalizer.
cc @kubernetes/sig-cluster-federation @caesarxuchao @derekwaynecarr
Automatic merge from submit-queue
Handle redirects in apiserver proxy handler
Overview:
1. Peek at the HTTP response from the proxied backend
2. If it is a redirect response (302/3), redo the request to the redirect location
3. If it's not a redirect, forward the response to the client and then set up the proxy as before
This change is required for implementing streaming requests in the Container Runtime Interface (CRI). See [design](https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit).
For https://github.com/kubernetes/kubernetes/issues/29579
/cc @yujuhong
Automatic merge from submit-queue
Remove non-generic options from genericapiserver.Config
Remove non-generic options from genericapiserver.Config. Changes the discovery CIDR/IP information to an interface and then demotes several fields.
I haven't pulled from them genericapiserver.Options, but that's a future option we have. Segregation as as a followup at the very least.
Alter how runtime.SerializeInfo is represented to simplify negotiation
and reduce the need to allocate during negotiation. Simplify the dynamic
client's logic around negotiating type. Add more tests for media type
handling where necessary.
Automatic merge from submit-queue
Deny service ClusterIP update from `None`
**What this PR does / why we need it**: Headless service should not be transformed into a service with ClusterIP, therefore update of this field if it's set to `None` is disallowed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#33029
**Release note**:
```release-note
Changing service ClusterIP from `None` is not allowed anymore.
```
Automatic merge from submit-queue
Support resourceVersion in GetToList - unify interface of List and Ge…
This pretty much unifies the interface of List() and GetToList() methods of storage interface.
I'm going to use it in a subsequent PR to improve performance of the whole cluster.
Automatic merge from submit-queue
Optimize label selector
The number of values for a given label is generally pretty small (in huge majority of cases it is exactly one value).
Currently computing selectors is up to 50% of CPU usage in both apiserver and scheduler.
Changing the structure in which those values are stored from map to slice improves the performance of typical usecase for computing selectors.
Early results:
- scheduler throughput it ~15% higher
- apiserver cpu-usage is also lower (seems to be also ~10-15%)
Automatic merge from submit-queue
Loadbalanced client src ip preservation enters beta
Sounds like we're going to try out the proposal (https://github.com/kubernetes/kubernetes/issues/30819#issuecomment-249877334) for annotations -> fields on just one feature in 1.5 (scheduler). Or do we want to just convert to fields right now?
Automatic merge from submit-queue
Remove static kubelet client, refactor ConnectionInfoGetter
Follow up to https://github.com/kubernetes/kubernetes/pull/33718
* Collapses the multi-valued return to a `ConnectionInfo` struct
* Removes the "raw" connection info method and interface, since it was only used in a single non-test location (by the "real" connection info method)
* Disentangles the node REST object from being a ConnectionInfoProvider itself by extracting an implementation of ConnectionInfoProvider that takes a node (using a provided NodeGetter) and determines ConnectionInfo
* Plumbs the KubeletClientConfig to the point where we construct the helper object that combines the config and the node lookup. I anticipate adding a preference order for choosing an address type in https://github.com/kubernetes/kubernetes/pull/34259
Automatic merge from submit-queue
convert TPR controller to posthook instead of disable flag
Converts the third party resource controller into a posthook using a loopback client instead going direct to etcd. This let's us eliminate more flags and special-casing during initialization. Also, using a client brings us closer to building this without side-effects for downstream composers.
Automatic merge from submit-queue
Avoid unnecessary allocation
This is supposed to avoid unnecessary memory allocations.
PodToSelectableFields seems to be the biggest contributor to memory allocations:
```
Showing top 10 nodes out of 247 (cum >= 83166442)
flat flat% sum% cum cum%
1796823715 31.09% 31.09% 1796823715 31.09% k8s.io/kubernetes/pkg/registry/core/pod.PodToSelectableFields
530856268 9.19% 40.28% 530856268 9.19% k8s.io/kubernetes/pkg/storage.NamespaceKeyFunc
241505351 4.18% 44.46% 241505351 4.18% reflect.unsafe_New
...
```
Automatic merge from submit-queue
Run rbac authorizer from cache
RBAC authorization can be run very effectively out of a cache. The cache is a normal reflector backed cache (shared informer).
I've split this into three parts:
1. slim down the authorizer interfaces
1. boilerplate for adding rbac shared informers and associated listers which conform to the new interfaces
1. wiring
@liggitt @ericchiang @kubernetes/sig-auth
Automatic merge from submit-queue
Add default cluster role bindings
Add default cluster roles bindings to rbac bootstrapping. Also adds a case for allowing escalation when you have no authenticator.
@liggitt I expect you may need to make peace with this.
Automatic merge from submit-queue
Use nodeutil.GetHostIP consistently when talking to nodes
Most of our communications from apiserver -> nodes used
nodutil.GetNodeHostIP, but a few places didn't - and this meant that the
node name needed to be resolvable _and_ we needed to populate valid IP
addresses.
```release-note
The apiserver now uses addresses reported by the kubelet in the Node object's status for apiserver->kubelet communications, rather than the name of the Node object. The address type used defaults to `InternalIP`, `ExternalIP`, and `LegacyHostIP` address types, in that order.
```
Automatic merge from submit-queue
remove testapi.Default.GroupVersion
I'm going to try to take this as a series of mechanicals. This removes `testapi.Default.GroupVersion()` and replaces it with `registered.GroupOrDie(api.GroupName).GroupVersion`.
@caesarxuchao I'm trying to see how much of `pkg/api/testapi` I can remove.
Automatic merge from submit-queue
fix pod eviction storage
Refactor pod eviction storage to remove the tight order coupling of the storage. This also gets us ready to deal with cases where API groups are not co-located on the same server, though the particular client being used would assume a proxy.
Automatic merge from submit-queue
Refactor: separate KubeletClient & ConnectionInfoGetter concepts
KubeletClient implements ConnectionInfoGetter, but it is not a complete
implementation: it does not set the kubelet port from the node record,
for example.
By renaming the method so that it does not implement the interface, we
are able to cleanly see where the "raw" GetConnectionInfo is used (it is
correct) and also have go type-checking enforce this for us.
This is related to #25532; I wanted to satisfy myself that what we were doing there was correct, and I wanted also to ensure that the compiler could enforce this going forwards.
This patch adds the option to set a nodeport when creating a NodePort
service. In case of a port allocation error due to a specified port
being out of the valid range, the error now includes the valid
range. If a `--node-port` value is not specified, it defaults to zero, in
which case the allocator will default to its current behavior of
assigning an available port.
This patch also adds a new helper function in `cmd/util/helpers.go` to
retrieve `Int32` cobra flags.
**Example**
```
$ kubectl create service nodeport mynodeport --tcp=8080:7777 --node-port=1
The Service "mynodeport" is invalid: spec.ports[0].nodePort: Invalid
value: 1: provided port is not in the valid range. Valid ports range
from 30000-32767
$ kubectl create service nodeport mynodeport --tcp=8080:7777 --node-port=30000
service "mynodeport" created
$ oc describe service mynodeport
Name: mynodeport
Namespace: default
Labels: app=mynodeport
Selector: app=mynodeport
Type: NodePort
IP: 172.30.81.254
Port: 8080-7777 8080/TCP
NodePort: 8080-7777 30000/TCP
Endpoints: <none>
Session Affinity: None
No events.
```
Most of our communications from apiserver -> nodes used
nodutil.GetNodeHostIP, but a few places didn't - and this
meant that the node name needed to be resolvable _and_ we needed
to populate valid IP addresses.
Fix the last few places that used the NodeName.
Issue #18525
Issue #9451
Issue #9728
Issue #17643
Issue #11543
Issue #22063
Issue #2462
Issue #22109
Issue #22770
Issue #32286
KubeletClient implements ConnectionInfoGetter, but it is not a complete
implementation: it does not set the kubelet port from the node record,
for example.
By renaming the method so that it does not implement the interface, we
are able to cleanly see where the "raw" GetConnectionInfo is used (it is
correct) and also have go type-checking enforce this for us.
Contination of #1111
I tried to keep this PR down to just a simple search-n-replace to keep
things simple. I may have gone too far in some spots but its easy to
roll those back if needed.
I avoided renaming `contrib/mesos/pkg/minion` because there's already
a `contrib/mesos/pkg/node` dir and fixing that will require a bit of work
due to a circular import chain that pops up. So I'm saving that for a
follow-on PR.
I rolled back some of this from a previous commit because it just got
to big/messy. Will follow up with additional PRs
Signed-off-by: Doug Davis <dug@us.ibm.com>
Automatic merge from submit-queue
Make sure finalizers prevent deletion on storage that supports graceful deletion
Fixing bug:
Non-empty Finalizers fails to prevent a pod from being deleted, if deleteOptions.GracefulPeriod=0. See https://github.com/kubernetes/kubernetes/issues/32157#issuecomment-245778483
We didn't hit any issue with orphan finalizer because all our tests set finalizers on RC or RS, whose storage doesn't support graceful deletion.
cc @thockin @lavalamp
Automatic merge from submit-queue
api storage: Decouple Decorator from Filter
Continue #28249
What?
This PR decouples Decorator from Filter, i.e. remove Decorator in createFilter().
- For List, Decorator is called on returned list object.
- For Watch, we implement a new watcher to pipe through decorator. Error will be returned as a watch event.
Why?
- We want to change filter to SelectionPredicate struct. But Decorator is designed to be coupled with filtering.
- Per the discussion in #28249, decorator shouldn't be coupled to filter and error from Decorator should be returned instead of assuming false filtering.
Automatic merge from submit-queue
add selfsubjectaccessreview API
Exposes the REST API for self subject access reviews. This allows a user to see whether or not they can perform a particular action.
@kubernetes/sig-auth
Automatic merge from submit-queue
Split path validation into a separate library
This PR splits path segment validation into it's own package. This cuts off one of the restclient's dependency paths to some docker packages, and completely eliminates its dependency on go-restful swagger validation.
cc @kubernetes/sig-api-machinery
Automatic merge from submit-queue
Allow services which use same port, different protocol to use the same nodePort for both
fix#20092
@thockin @smarterclayton ptal.
Automatic merge from submit-queue
return destroy func to clean up internal resources of storage
What?
Provide a destroy func to clean up internal resources of storage.
It changes **unit tests** to clean up resources. (Maybe fix integration test in another PR.)
Why?
Although apiserver is designed to be long running, there are some cases that it's not.
See https://github.com/kubernetes/kubernetes/issues/31262#issuecomment-242208771
We need to gracefully shutdown and clean up resources.
- Error from Decorator will be returned instead of assuming false filtering
- For List, Decorator is called on return list object
- For Watch, we implement a new watcher to pipe through decorator
1. When overlapping deployments are discovered, annotate them
2. Expose those overlapping annotations as warnings in kubectl describe
3. Only respect the earliest updated one (skip syncing all other overlapping deployments)
4. Use indexer instead of store for deployment lister
Automatic merge from submit-queue
[GarbageCollector] Allow per-resource default garbage collection behavior
What's the bug:
When deleting an RC with `deleteOptions.OrphanDependents==nil`, garbage collector is supposed to treat it as `deleteOptions.OrphanDependents==true", and orphan the pods created by it. But the apiserver is not doing that.
What's in the pr:
Allow each resource to specify the default garbage collection behavior in the registry. For example, RC registry's default GC behavior is Orphan, and Pod registry's default GC behavior is CascadingDeletion.
Automatic merge from submit-queue
Enable v3 Client as the default on UTs
Updates the default initialization to use clientv3 interface to etcd3, and fixes the UTs.
This PR includes a cherry-pick of https://github.com/kubernetes/kubernetes/pull/30634 so we can validate the tests, so do not merge until that PR is complete.
Automatic merge from submit-queue
Make labels, fields expose selectable requirements
What?
This is to change the labels/fields Selector interface and make them expose selectable requirements. We reuse labels.Requirement struct for label selector and add fields.Requirement for field selector.
Why?
In order to index labels/fields, we need them to tell us three things: index key (a field or a label), operator (greater, less, or equal), and value (string, int, etc.). By getting selectable requirements, we are able to pass them down and use them for indexing in storage layer.
Automatic merge from submit-queue
Basic scaler/reaper for petset
Currently scaling or upgrading a petset is more complicated than it should be. Would be nice if this made code freeze on friday. I'm planning on a follow up change with generation number and e2es post freeze.
Automatic merge from submit-queue
change all PredicateFunc to use SelectionPredicate
What?
- This PR changes all PredicateFunc in registry to return SelectionPredicate instead of Matcher interface.
Why?
- We want to pass SelectionPredicate to storage layer. Matcher interface did not expose enough information for indexing.
Convert single GV and lists of GVs into an interface that can handle
more complex scenarios (everything internal, nothing supported). Pass
the interface down into conversion.
Automatic merge from submit-queue
Change kubectl create to use dynamic client
https://github.com/kubernetes/kubernetes/issues/16764https://github.com/kubernetes/kubernetes/issues/3955
This is a series of changes to allow kubectl create to use discovery-based REST mapping and dynamic clients.
cc @kubernetes/sig-api-machinery
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
kubectl will no longer do client-side defaulting on create and replace.
```
Automatic merge from submit-queue
Move new etcd storage (low level storage) into cacher
In an effort for #29888, we are pushing forward this:
What?
- It changes creating etcd storage.Interface impl into creating config
- In creating cacher storage (StorageWithCacher), it passes config created above and new etcd storage inside.
Why?
- We want to expose the information of (etcd) kv client to cacher. Cacher storage uses this information to talk to remote storage.
Automatic merge from submit-queue
Run goimport for the whole repo
While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`
This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin
Automatic merge from submit-queue
add subjectaccessreviews resource
Adds a subjectaccessreviews endpoint that uses the API server's authorizer to determine if a subject is allowed to perform an action.
Part of kubernetes/features#37
The swagger spec for pods/{name}/log does not include
"text/plain" as a possible content-type for the the response.
So we implement ProducesMIMETypes to make sure "text/plain"
gets added to the default list ot content-types.
the v1.json was generated by running:
hack/update-generated-swagger-docs.sh;./hack/update-swagger-spec.sh;
Fixes#14071
Automatic merge from submit-queue
make reousrce prefix consistent with other registries
Storage class registry was added recently in #29694.
In other registries, resource prefix was changed to a more generic way using RESTOptions.ResourcePrefix.
This PR is an attempt to make both consistent.
Automatic merge from submit-queue
make the resource prefix in etcd configurable for cohabitation
This looks big, its not as bad as it seems.
When you have different resources cohabiting, the resource name used for the etcd directory needs to be configurable. HPA in two different groups worked fine before. Now we're looking at something like RC<->RS. They normally store into two different etcd directories. This code allows them to be configured to store into the same location.
To maintain consistency across all resources, I allowed the `StorageFactory` to indicate which `ResourcePrefix` should be used inside `RESTOptions` which already contains storage information.
@lavalamp affects cohabitation.
@smarterclayton @mfojtik prereq for our rc<->rs and d<->dc story.
Automatic merge from submit-queue
cleanup wrong naming: limitrange -> hpa
The code is in `horizontalpodautoscaler/strategy.go`, but the parameter is "limitrange". This is legacy copy-paste issue...
Automatic merge from submit-queue
include metadata in third party resource list serialization
Third party resource listing does not include important metadata such as resourceVersion and apiVersion. This commit includes the missing metadata and also replaces the string templating with an anonymous struct.
With the introduction of batch/v1 and batch/v2alpha1, any
MultiRESTMapper that has per groupversion mappers (as a naive discovery
client would create) would end up being unable to call RESTMapping() on
batch.Jobs. As we finish up discovery we will need to be able to choose
prioritized RESTMappings based on the service discovery doc.
This change implements RESTMappings(groupversion) which returns all
possible RESTMappings for that kind. That allows a higher level call to
prioritize the returned mappings by server or client preferred version.
Automatic merge from submit-queue
ListOptions: add test for ResourceVersion > 0 in List
ref: #28472
Done:
- Add a test for ResourceVersion > 0 in registry (cache) store List()
- Fix the docs.
Automatic merge from submit-queue
TLS bootstrap API group (alpha)
This PR only covers the new types and related client/storage code- the vast majority of the line count is codegen. The implementation differs slightly from the current proposal document based on discussions in design thread (#20439). The controller logic and kubelet support mentioned in the proposal are forthcoming in separate requests.
I submit that #18762 ("Creating a new API group is really hard") is, if anything, understating it. I've tried to structure the commits to illustrate the process.
@mikedanese @erictune @smarterclayton @deads2k
```release-note-experimental
An alpha implementation of the the TLS bootstrap API described in docs/proposals/kubelet-tls-bootstrap.md.
```
[]()
Automatic merge from submit-queue
cacher.go: remove NewCacher func
NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
Automatic merge from submit-queue
Remove EncodeToStream(..., []unversioned.GroupVersion)
Was not being used. Is a signature change and is necessary for post 1.3 work on Templates and other objects that nest objects.
Extracted from #26044
NewCacher is a wrapper of NewCacherFromConfig. NewCacher understands
how to create a key func from scopeStrategy. However, it is not the
responsibility of cacher. So we should remove this function, and
construct the config in its caller, which should understand scopeStrategy.
Automatic merge from submit-queue
Expose GET and PATCH for status subresource
We can do this for other status subresource. I only updated node/status in this PR to unblock https://github.com/kubernetes/node-problem-detector/issues/9.
cc @Random-Liu @lavalamp
Automatic merge from submit-queue
validate third party resources
addresses validation portion of https://github.com/kubernetes/kubernetes/issues/22768
* ThirdPartyResource: validates name (3 segment DNS subdomain) and version names (single segment DNS label)
* ThirdPartyResourceData: validates objectmeta (name is validated as a DNS label)
* removes ability to use GenerateName with thirdpartyresources (kind and api group should not be randomized, in my opinion)
test improvements:
* updates resttest to clean up after create tests (so the same valid object can be used)
* updates resttest to take a name generator (in case "foo1" isn't a valid name for the object under test)
action required for alpha thirdpartyresource users:
* existing thirdpartyresource objects that do not match these validation rules will need to be removed/updated (after removing thirdpartyresourcedata objects stored under the disallowed versions, kind, or group names)
* existing thirdpartyresourcedata objects that do not match the name validation rule will not be able to be updated, but can be removed
Automatic merge from submit-queue
Add pod condition PodScheduled to detect situation when scheduler tried to schedule a Pod, but failed
Set `PodSchedule` condition to `ConditionFalse` in `scheduleOne()` if scheduling failed and to `ConditionTrue` in `/bind` subresource.
Ref #24404
@mml (as it seems to be related to "why pending" effort)
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24459)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Make ThirdPartyResource a root scoped object
ThirdPartyResource (the registration of a third party type) belongs at the cluster scope. It results in resource handlers installed in every namespace, and the same name in two namespaces collides (namespace is ignored when determining group/kind).
ThirdPartyResourceData (an actual instance of that type) is still namespace-scoped.
This PR moves ThirdPartyResource to be a root scope object. Someone previously using ThirdPartyResource definitions in alpha should be able to move them from namespace to root scope like this:
setup (run on 1.2):
```
kubectl create ns ns1
echo '{"kind":"ThirdPartyResource","apiVersion":"extensions/v1beta1","metadata":{"name":"foo.example.com"},"versions":[{"name":"v8"}]}' | kubectl create -f - --namespace=ns1
echo '{"kind":"Foo","apiVersion":"example.com/v8","metadata":{"name":"MyFoo"},"testkey":"testvalue"}' | kubectl create -f - --namespace=ns1
```
export:
```
kubectl get thirdpartyresource --all-namespaces -o yaml > tprs.yaml
```
remove namespaced kind registrations (this shouldn't remove the data of that type, which is another possible issue):
```
kubectl delete -f tprs.yaml
```
... upgrade ...
re-register the custom types at the root scope:
```
kubectl create -f tprs.yaml
```
Additionally, pre-1.3 clients that expect to read/write ThirdPartyResource at a namespace scope will not be compatible with 1.3+ servers, and 1.3+ clients that expect to read/write ThirdPartyResource at a root scope will not be compatible with pre-1.3 servers.
The codec factory should support two distinct interfaces - negotiating
for a serializer with a client, vs reading or writing data to a storage
form (etcd, disk, etc). Make the EncodeForVersion and DecodeToVersion
methods only take Encoder and Decoder, and slight refactoring elsewhere.
In the storage factory, use a content type to control what serializer to
pick, and use the universal deserializer. This ensures that storage can
read JSON (which might be from older objects) while only writing
protobuf. Add exceptions for those resources that may not be able to
write to protobuf (specifically third party resources, but potentially
others in the future).
Automatic merge from submit-queue
Petset controller
Took longer than I expected. Main parts of this pr are:
1. Identity generation based on petset spec (volumes are mapped per discussion in #18016)
2. Ensure that we create/delete pets in sequence
3. Ensuring that we create, wait for healthy, create; or delete, wait for terminationGrace, delete
4. Controller that watches apiserver and drives actual -> desired
PVCs are not deleted, yet.
Automatic merge from submit-queue
Move internal types of job from pkg/apis/extensions to pkg/apis/batch
This addressed the job part of #23216, this is still WIP. Will notify once finished. I'd like to have it in before starting working on ScheduledJob.
@lavalamp @erictune fyi
Add tests to watch behavior in both protocols (http and websocket)
against all 3 media types. Adopt the
`application/vnd.kubernetes.protobuf;stream=watch` media type for the
content that comes back from a watch call so that it can be
distinguished from a Status result.
Automatic merge from submit-queue
Make etcd cache size configurable
Instead of the prior 50K limit, allow users to specify a more sensible size for their cluster.
I'm not sure what a sensible default is here. I'm still experimenting on my own clusters. 50 gives me a 270MB max footprint. 50K caused my apiserver to run out of memory as it exceeded >2GB. I believe that number is far too large for most people's use cases.
There are some other fundamental issues that I'm not addressing here:
- Old etcd items are cached and potentially never removed (it stores using modifiedIndex, and doesn't remove the old object when it gets updated)
- Cache isn't LRU, so there's no guarantee the cache remains hot. This makes its performance difficult to predict. More of an issue with a smaller cache size.
- 1.2 etcd entries seem to have a larger memory footprint (I never had an issue in 1.1, even though this cache existed there). I suspect that's due to image lists on the node status.
This is provided as a fix for #23323
Automatic merge from submit-queue
Implement a streaming serializer for watch
Changeover watch to use streaming serialization. Properly version the
watch objects. Implement simple framing for JSON and Protobuf (but not
YAML).
@wojtek-t @lavalamp
Automatic merge from submit-queue
Additional go vet fixes
Mostly:
- pass lock by value
- bad syntax for struct tag value
- example functions not formatted properly
Here are a list of changes along with an explanation of how they work:
1. Add a new string field called TargetSelector to the external version of
extensions Scale type (extensions/v1beta1.Scale). This is a serialized
version of either the map-based selector (in case of ReplicationControllers)
or the unversioned.LabelSelector struct (in case of Deployments and
ReplicaSets).
2. Change the selector field in the internal Scale type (extensions.Scale) to
unversioned.LabelSelector.
3. Add conversion functions to convert from two external selector fields to a
single internal selector field. The rules for conversion are as follows:
i. If the target resource that this scale targets supports LabelSelector
(Deployments and ReplicaSets), then serialize the LabelSelector and
store the string in the TargetSelector field in the external version
and leave the map-based Selector field as nil.
ii. If the target resource only supports a map-based selector
(ReplicationControllers), then still serialize that selector and
store the serialized string in the TargetSelector field. Also,
set the the Selector map field in the external Scale type.
iii. When converting from external to internal version, parse the
TargetSelector string into LabelSelector struct if the string isn't
empty. If it is empty, then check if the Selector map is set and just
assign that map to the MatchLabels component of the LabelSelector.
iv. When converting from internal to external version, serialize the
LabelSelector and store it in the TargetSelector field. If only
the MatchLabel component is set, then also copy that value to
the Selector map field in the external version.
4. HPA now just converts the LabelSelector field to a Selector interface
type to list the pods.
5. Scale Get and Update etcd methods for Deployments and ReplicaSets now
return extensions.Scale instead of autoscaling.Scale.
6. Consequently, SubresourceGroupVersion override and is "autoscaling"
enabled check is now removed from pkg/master/master.go
7. Other small changes to labels package, fuzzer and LabelSelector
helpers to piece this all together.
8. Add unit tests to HPA targeting Deployments and ReplicaSets.
9. Add an e2e test to HPA targeting ReplicaSets.
Also register replicationcontrollers/scale subresource. Along with
registering the resource, also specify the cross-group override for the
subresource since Scale belongs belongs to autoscaling/v1 but
ReplicationController belongs to api/v1.
In podSecurityPolicy:
1. Rename .seLinuxContext to .seLinux
2. Rename .seLinux.type to .seLinux.rule
3. Rename .runAsUser.type to .runAsUser.rule
4. Rename .seLinux.SELinuxOptions
1,2,3 as suggested by thockin in #22159.
I added 3 for consistency with 2.
Added selector generation to Job's
strategy.Validate, right before validation.
Can't do in defaulting since UID is not known.
Added a validation to Job to ensure that the generated
labels and selector are correct when generation was requested.
This happens right after generation, but validation is in a better
place to return an error.
Adds "manualSelector" field to batch/v1 Job to control selector generation.
Adds same field to extensions/__internal. Conversion between those two
is automatic.
Adds "autoSelector" field to extensions/v1beta1 Job. Used for storing batch/v1 Jobs
- Default for v1 is to do generation.
- Default for v1beta1 is to not do it.
- In both cases, unset == false == do the default thing.
Release notes:
Added batch/v1 group, which contains just Job, and which is the next
version of extensions/v1beta1 Job.
The changes from the previous version are:
- Users no longer need to ensure labels on their pod template are unique to the enclosing
job (but may add labels as needed for categorization).
- In v1beta1, job.spec.selector was defaulted from pod labels, with the user responsible for uniqueness.
In v1, a unique label is generated and added to the pod template, and used as the selector (other
labels added by user stay on pod template, but need not be used by selector).
- a new field called "manualSelector" field exists to control whether the new behavior is used,
versus a more error-prone but more flexible "manual" (not generated) seletor. Most users
will not need to use this field and should leave it unset.
Users who are creating extensions.Job go objects and then posting them using the go client
will see a change in the default behavior. They need to either stop providing a selector (relying on
selector generation) or else specify "spec.manualSelector" until they are ready to do the former.
This should allow users to update DaemonSet pods by manually deleting
the corresponding running pods. Users can use this mechanism for
DaemonSet updates until we implement Deployment style rolling update
for DaemonSet.
Leaving the type fields as comments for reference and reminder. But
deleting the conversion, defaulting and validation code. They can
always be brough back from the previous PR once the types are
introduced. Because builds break without them anyway that serves as a
reminder, so there is no need to leave them commented out.
Update the Deployments' API types, defaulting code, conversions, helpers
and validation to use ReplicaSets instead of ReplicationControllers and
LabelSelector instead of map[string]string for selectors.
Also update the Deployment controller, registry, kubectl subcommands,
client listers package and e2e tests to use ReplicaSets and
LabelSelector for Deployments.
Move type LabelSelector and type LabelSelectorRequirement from pkg/apis/extensions
This avoids an import loop when Job (and later DaemonSet, Deployment, ReplicaSet)
are moved out of extensions to new api groups.
Also Move LabelSelectorAsSelector utility from pkg/apis/extensions/ to pkg/api/unversioned/
Also its test.
Also LabelSelectorOp* constants.
Also the pkg/apis/extensions/validation functions ValidateLabelSelectorRequirement and
ValidateLabelSelector move to pkg/api/unversioned
The related type in pkg/apis/extensions/v1beta1/ is staying there. I might move
it in another PR if neccessary.
This commit adds support for paused deployments so a user can choose
when to run a deployment that exists in the system instead of having
the deployment controller automatically reconciling it after every
change or sync interval.
Components can write services during startup, which results in the ip
allocator map being updated. Since core controllers *must* succeed for
the masters to start, we should retry a few times in order to pass.
Most of the logic related to type and kind retrieval belongs in the
codec, not in the various classes. Make it explicit that the codec
should handle these details.
Factory now returns a universal Decoder and a JSONEncoder to assist code
in kubectl that needs to specifically deal with JSON serialization
(apply, merge, patch, edit, jsonpath). Add comments to indicate the
serialization is explicit in those places. These methods decode to
internal and encode to the preferred API version as previous, although
in the future they may be changed.
React to removing Codec from version interfaces and RESTMapping by
passing it in to all the places that it is needed.
The pending codec -> conversion split changes the signature of
Encode and Decode to be more complicated. Create a stub helper
with the exact semantics of today and do the simple mechanical
refactor here to reduce the cost of that change.
Before this change we have a mish-mash of ways to pass field names around for
error generation. Sometimes string fieldnames, sometimes .Prefix(), sometimes
neither, often wrong names or not indexed when it should be.
Instead of that mess, this is part one of a couple of commits that will make it
more strongly typed and hopefully encourage correct behavior. At least you
will have to think about field names, which is better than nothing.
It turned out to be really hard to do this incrementally.
This commit adds support for using kubectl scale to scale deployments. Makes use of the
deployments/scale endpoint instead of updating deployment.spec.replicas directly.
This commit introduces a validator for use with Scale updates.
The validator checks that we have > 0 replica count, as well
as the normal ObjectMeta checks (some of which have to be
faked since they don't exist on the Scale object).
When etcd is down today we don't specifically handle the error involved,
which means clients get a generic 500 error. This commit adds a formal
error type internally for both WatchExpired and EtcdUnreachable, and
then converts them to api/errors before returning to the client. It also
upgrades the client to retry on any 429 or 5xx error that has a
Retry-After header, instead of just 429.
In combination, this allows the apiserver to exert backpressure on
controllers that are hotlooping. Picked 2 seconds by default, but we
could potentially ramp that up even further in a future iteration.
For connect handlers that need to respond with a structured error or
structured object, pass an interface that hides the details of writing
an object to the response (error or runtime.Object).
Example use case:
Connect handler that accepts a body input stream, which it streams to a
pod, and then returns a structured object with info about the pod it
just created.
Not all clients and systems can support SPDY protocols. This commit adds
support for two new websocket protocols, one to handle streaming of pod
logs from a pod, and the other to allow exec to be tunneled over
websocket.
Browser support for chunked encoding is still poor, and web consoles
that wish to show pod logs may need to make compromises to display the
output. The /pods/<name>/log endpoint now supports websocket upgrade to
the 'binary.k8s.io' subprotocol, which sends chunks of logs as binary to
the client. Messages are written as logs are streamed from the container
daemon, so flushing should be unaffected.
Browser support for raw communication over SDPY is not possible, and
some languages lack libraries for it and HTTP/2. The Kubelet supports
upgrade to WebSocket instead of SPDY, and will multiplex STDOUT/IN/ERR
over websockets by prepending each binary message with a single byte
representing the channel (0 for IN, 1 for OUT, and 2 for ERR). Because
framing on WebSockets suffers from head-of-line blocking, clients and
other server code should ensure that no particular stream blocks. An
alternative subprotocol 'base64.channel.k8s.io' base64 encodes the body
and uses '0'-'9' to represent the channel for ease of use in browsers.
Added status REST storage.
Added validation for Status Updates.
Changed job controller to update status rather than just job
(which ignores status updates).
Increase the supported controls on pod logging. Add validaiton to pod
log options. Ensure the Kubelet is using a consistent, structured way to
process pod log arguments.
Add ?sinceSeconds=<durationInSeconds>, &sinceTime=<RFC3339>, ?timestamps=<bool>,
?tailLines=<number>, and ?limitBytes=<number>
A lot of packages use StringSet, but they don't use anything else from
the util package. Moving StringSet into another package will shrink
their dependency trees significantly.
Avoid TTL by deleting pods immediately when they aren't
scheduled, and letting the Kubelet delete them otherwise.
Ensure the Kubelet uses pod.Spec.TerminationGracePeriodSeconds
when no pod.DeletionGracePeriodSeconds is available.
This change is required for the handler to work with sshtunnels.
Without it, `kubectl exec` and `kubectl port-forward` are broken
when an ssh proxy is used (see #9292). I manually verified this
fixes that issue, e2e test coming shortly.
Refactor GetNodeHostIP into pkg/util/node (instead of pkg/util to break import cycle).
Include internalIP in gce NodeAddresses. Remove NodeLegacyHostIP
This commit wires together the graceful delete option for pods
on the Kubelet. When a pod is deleted on the API server, a
grace period is calculated that is based on the
Pod.Spec.TerminationGracePeriodInSeconds, the user's provided grace
period, or a default. The grace period can only shrink once set.
The value provided by the user (or the default) is set onto metadata
as DeletionGracePeriod.
When the Kubelet sees a pod with DeletionTimestamp set, it uses the
value of ObjectMeta.GracePeriodSeconds as the grace period
sent to Docker. When updating status, if the pod has DeletionTimestamp
set and all containers are terminated, the Kubelet will update the
status one last time and then invoke Delete(pod, grace: 0) to
clean up the pod immediately.
Change the signature of GuaranteedUpdate so that TTL can
be more easily preserved. Allow a simpler (no ttl) and
more complex (response and node directly available, set ttl)
path for GuaranteedUpdate. Add some tests to ensure this
doesn't blow up again.
* Add an allocator which saves state in etcd
* Perform PortalIP allocation check on startup and periodically afterwards
Also expose methods in master for downstream components to handle IP allocation
/ master registration themselves.
The Spec.Host is set when binding a pod to a node and will contain the
canonical host name of the node. The Status.HostIP may not be included
in the node's server TLS certificate.
Makes it possible to access the following subresources:
/namespaces/<ns>/pods/<pod-name>[:port]/proxy
/namespaces/<ns>/pods/<pod-name>/exec?container=<container>&command=<cmd>
/namespaces/<ns>/pods/<pod-name>/portforward
PodExecOptions represents the URL parameters used to invoke an
exec request on a pod. PodProxyOptions contains the path parameter
passed to a proxy request.
external load balancers up-to-date based on the service's specs, using
the new DeltaFIFO watch queue class. Remove the old registry REST
handler code for creating/updating/deleting load balancers.
Also clean up a bunch of the GCE cloudprovider code related to load balancers.
Allows REST consumers to build paths like:
/api/v1beta3/namespaces/foo/webhookresource/<name>/<encodedsecretinurl>
Also fixes parameter exposure for subresources (was only fixed for
v1beta3).
Standardize how our fakes are used so that a test case can use a
simpler mechanism for providing large, complex data sets, as well
as represent queries over time.
Adds a Log subresource to Pod storage. The Log subresource implements
rest.GetterWithOptions and produces a ResourceStreamer resource that
will stream the log output from the pod's host node.
Convert the return value of pods rest.NewStorage to a struct.
This will allow returning more storage objects for a pod (sub resources)
without awkwardly adding more return values.
* Improper format specifier (e.g. %s for bools or %s for ints)
* More or less parameters than format specifiers
* Not calling a formatting function when it should have (e.g. Error() instead of Errorf())
Instead of endpoints being a flat list, it is now a list of "subsets"
where each is a struct of {Addresses, Ports}. To generate the list of
endpoints you need to take union of the Cartesian products of the
subsets. This is compact in the vast majority of cases, yet still
represents named ports and corner cases (e.g. each pod has a different
port number).
This also stores subsets in a deterministic order (sorted by hash) to
avoid spurious updates and comparison problems.
This is a fully compatible change - old objects and clients will
keepworking as long as they don't need the new functionality.
This is the prep for multi-port Services, which will add API to produce
endpoints in this new structure.
As per discussion with @snmarterclayton. I implemented this for most
types in the "obvious" way. I am not sure how to implement
this for a couple types, though.
Dependency chain is now api -> api/rest -> apiserver. Makes the
interfaces much cleaner to read, and cleans up some inconsistenties
that crept in along the way.
This commit adds support to core resources to enable deferred deletion
of resources. Clients may optionally specify a time period after which
resources must be deleted via an object sent with their DELETE. That
object may define an optional grace period in seconds, or allow the
default "preferred" value for a resource to be used. Once the object
is marked as pending deletion, the deletionTimestamp field will be set
and an etcd TTL will be in place.
Clients should assume resources that have deletionTimestamp set will
be deleted at some point in the future. Other changes will come later
to enable graceful deletion on a per resource basis.
In order to support graceful deletion, the resource object will
need access to the TTL value in etcd. Also, in the future we
may want to get the creation index (distinct from modifiedindex)
and expose it to clients. Change EtcdResourceVersioner to be
more type specific (objects vs lists) and provide a default
implementation that relies on the internal API convention.
Also, rename etcd_tools.go to etcd_helper.go and split a few
things up.
We decided to get rid of boundPods. Removing this check is
a prerequisite for that. This check had some value before we had
IP-per-Pod. However, AIUI, use of HostPort is strongly discouraged
in Kubernetes. It still exists as part of a Pod spec because
of ContainerVM, where it is used. But, this change does not affect
ContainerVM, where there is no master.
If someone did create pods with HostPort using kubernetes, the following
would happen:
- The scheduler would try not to put two conflicting pods on the same
machine (pkg/scheduler/predicates.go : PodFitsPorts() )
- I'm not sure if it is currently possible for a race to occur where
the PodFitsPorts check were bypassed. Maybe it could happen.
- If the kubelet was sent conflicting pods, it would detect them in
( pkg/kubelet/kubelet.go : filterHostPortConflicts() ). It would
arbitrarily pick one pod to run and another to ignore.
- If all of the above happened and the user filed and issue on github,
we might figure out that the user used HostPort and tell the user to stop.
TODO:
- e2e test
- Several of the demos in examples/ use hostPort. Change them to
not specify hostPort and have a service instead.
Some load balancers (particularly AWS ELB) define the public endpoint
as a hostname (instead of using IP addresses).
This is a partial fix for #5224; there will also be some proxy work.
Allows POST to create a binding as a child. Also refactors internal
and v1beta3 Binding to be more generic (so that other resources can
support Bindings).
Also fix that Update was returning AlreadyExists instead of
NotFound (when create is disabled) and that Update when CreateOnUpdate
was true was not populating the returned object.
Added more tests
PUT /api/v1beta3/namespaces/default/pods/foo/status
{
"metadata": {...}, // allowed for valid values
"spec": {}, // ignored
"status": {...}, // allowed, except for Host
}
Exposes the simplest possibly change. Needs a slight refactoring
to RESTUpdateStrategy to split merging which can be done in a
follow up.
If `kube-apiserver` is started before `etcd` is reachable, `kube-apiserver`
fails to create those services.
However, in the `Create` function, an IP has already been reserved for them.
When `etcd` comes back, the `Create` function fails because it considers that
the IP is already used.
If the service couldn't be created, the reserved IP should be released.
They will still show up in etcd. They never were available
through the API.
A subsequent PR(s) will rip out all BoundPods code.
Working in small increments.
This PR will cause users on lagging cloud providers
to not get env vars in their pods if they update to this code.
They have already been warned via email.
Removed unit tests of BasicBoundPodFactory.
There is adequate coverage in pkg/kubelet/kubelet_test.go.
In some cases, when the key doesn't exist, AtomicUpdate should simply fail and
surface the error. This change adds a new function parameter "ignoreNotFound"
for this purpose.
Also make sure all POST operations return 201 by default.
Removes the remainder of the asych logic in RESTStorage and
leaves it up to the API server to expose that behavior.
As far as I know, nobody uses it. It was replaced by PublicIPs. If I were
being very polite I would leave it in internal, but since I am 99.99% sure
nobody uses it, I am cutting it. Let's argue about it.
Currently, the validation logic validates fields in an object and supply default
values wherever applies. This change factors out defaulting to a set of
defaulting callback functions for decoding (see #1502 for more discussion).
* This change is based on pull request 2587.
* Most defaulting has been migrated to defaults.go where the defaulting
functions are added.
* validation_test.go and converter_test.go have been adapted to not testing the
default values.
* Fixed all tests with that create invalid objects with the absence of
defaulting logic.
If a client says they want the name to be generated, a 409 is
not appropriate (since they didn't specify a name). Instead, we
should return the next most appropriate error, which is a 5xx
error indicating the request failed but the client *should* try
again. Since there is no 5xx error that exactly fits this purpose,
use 500 with StatusReasonTryAgainLater set.
This commit does not implement client retry on TryAgainLater, but
clients should retry up to a certain number of times.
Adds `ObjectMeta.GenerateName`, an optional string field that defines
name generation behavior if a Name is not provided.
Adds `pkg/api/rest`, which defines the default Kubernetes API pattern
for creation (and will cover update as well). Will allow registries
and REST objects to be merged by moving logic on api out of those places.
Add `pkg/api/rest/resttest`, which will be the test suite that verifies
a RESTStorage object follows the Kubernetes API conventions and begin
reducing our duplicated tests.
# *** ERROR: *** Some files have not been gofmt'd. To fix these
# errors, run gofmt -s -w <file>, or cut and paste the following:
# gofmt -s -w pkg/kubecfg/resource_printer.go pkg/proxy/config/config.go pkg/runtime/types.go
#
# Your commit will be aborted unless you override this warning. To
# commit in spite of these format errors, delete the following line:
# COMMIT_BLOCKED_ON_GOFMT
make etcd registry pass test
fix kubelet config for quantity
fix openstack for quantity
fix controller for quantity
fix last tests for quantity
wire into binaries
fix controller manager
fix build for 32 bit systems
- Added process to cleanup stale session affinity records
- Automatically set cloud provided load balancer for sticky session if the service requires it - Note, this only works on GCE right now.
- Changed sessionAffinityMap a map to pointers instead of structs to improve performance
- Commented out cookie and protocol from sessionAffinityDetail to avoid confusion as it is not yet implemented.
Replaces the client public interface but leaves old references to "minions"
for a later refactor. Selects the path "nodes" for v1beta3 and "minions"
for older versions.
A few reasons:
- Mux is already widely used in the codebase to refer to a http handler mux.
- Original meaning of Mux was something which sent a chose one of several inputs to
and output. This sends one output to all outputs. Broadcast captures that idea
better.
- Aligns with similar class config.Broadcaster (see #2747)
This allows the proxier to portal Public IPs even if the
createExternalLoadBalancer flag is not set.
This also fixes what appears to be a bug in the createExternalLoadBalancer path
wherein multiple PublicIPs would get truncated.
GetServiceEnvironmentVariables (originally in pkg/registry/service/rest.go)
is split into two parts: one that lists services, and one that turns
a ServiceList into environment vars. This will allow a subsequent PR
to add a call to the latter function with an existing ServiceList.
The former part is moved into pkg/registry/pod/bound_pod_factory.go.
The latter part is put in a new package, pkg/kubelet/envvars/envvars.go.
The new package is under kubelet because the container enviroment is more
associated with kubelet than with registry.
Test code moved too.
Create a new MetadataAccessor interface that combines both
and use it where previously latest.ResourceVersioner and SelfLinker
were being used.
Adds Namespace to the get/set interface. Adds TODO about future
fast path for metadata (as per thockin's comment)
PUT allows an object to be created (http 201). This allows REST code to
indicate an object has been created and clients to react to it.
APIServer now deals with <-chan RESTResult instead of <-chan runtime.Object,
allowing more data to be passed through.
Allows us to define different watch versioning regimes in the future
as well as to encode information with the resource version.
This changes /watch/resources?resourceVersion=3 to start the watch at
4 instead of 3, which means clients can read a resource version and
then send it back to the server. Clients should no longer do math on
resource versions.
Move a lot of common error logging into better buckets:
glog.Errorf() - Always an error
glog.Warningf() - Something unexpected, but probably not an error
glog.V(0) - Generally useful for this to ALWAYS be visible
to an operator
* Programmer errors
* Logging extra info about a panic
* CLI argument handling
glog.V(1) - A reasonable default log level if you don't want
verbosity
* Information about config (listening on X, watching Y)
* Errors that repeat frequently that relate to conditions
that can be corrected (pod detected as unhealthy)
glog.V(2) - Useful steady state information about the service
* Logging HTTP requests and their exit code
* System state changing (killing pod)
* Controller state change events (starting pods)
* Scheduler log messages
glog.V(3) - Extended information about changes
* More info about system state changes
glog.V(4) - Debug level verbosity (for now)
* Logging in particularly thorny parts of code where
you may want to come back later and check it
This includes 3 changes:
1) Use the Service.Protocol field, rather than hardcoding TCP.
2) Use Service.Port rather than ContainerPort - ContainerPort can be a string
and has absolutely no meaning to consumers.
3) Beef up tests for these env vars.
As a replacement of a single SERVICE_HOST variable, offer a FOO_SERVICE_HOST
variable. This will help ease the transition to ip-per-service, where there
is no longer a single service host.
# *** ERROR: *** Some files are missing the required boilerplate
# header from hooks/boilerplate.txt:
# examples/guestbook/redis-slave/run.sh
#
# Your commit will be aborted unless you fix these.
# COMMIT_BLOCKED_ON_BOILERPLATE
* Defaults to v1beta1
* apiserver takes -storage_version which controls etcd storage version
and the version of the client used to connect to other apiservers
* Changed signature of client.New to add version parameter
* All controller code and component code prefers the oldest (most common)
server version
* Make Codec separate from Scheme
* Move EncodeOrDie off Scheme to take a Codec
* Make Copy work without a Codec
* Create a "latest" package that imports all versions and
sets global defaults for "most recent encoding"
* v1beta1 is the current "latest", v1beta2 exists
* Kill DefaultCodec, replace it with "latest.Codec"
* This updates the client and etcd to store the latest known version
* EmbeddedObject is per schema and per package now
* Move runtime.DefaultScheme to api.Scheme
* Split out WatchEvent since it's not an API object today, treat it
like a special object in api
* Kill DefaultResourceVersioner, instead place it on "latest" (as the
package that understands all packages)
* Move objDiff to runtime.ObjectDiff
To proxy traffic to anything that implements ResourceLocation.
Currently, this is only services. This is easily extensible to minions
(would supercede existing mechanism) and pods.
This is some cleanup that has been needed for a while.
There's still one more step that could usefully be done, which is to
split up our api package into the part that provides the helper
functions and the part that provides the internal types. That can come
later.
The v1beta1 package is now a good example of what an api plugin should
do to version its types.
Also rename some to other names that make better reading. There are still a
bunch of "make" functions but they do things like assemble a string from parts
or build an array of things. It seemed that "make" there seemed fine. "New"
is for "constructors".
Because time.Time doesn't work correctly with our YAML package, it is necessary
to introduce a type, util.Time, which serializes correctly to JSON and YAML.
Eventually we would like timestamping to cut across storage implementations;
for now, we set it in each storage.
Currently all registry implementations live in a single package,
which makes it bit harder to maintain. The different registry
implementations do not follow the same coding style and naming
conventions, which makes the code harder to read.
Breakup the registry package into smaller packages based on
the registry implementation. Refactor the registry packages
to follow a similar coding style and naming convention.
This patch does not introduce any changes in behavior.
This commit adds a Binding object. The idea is that schedulers can write
these to cause pods to be asssigned to hosts. I'll provide an implementation
along with a rudimentary scheduler plugin.
This continues k8s' tradition of phrasing all APIs as RESTful handlers.
1. Change names of Pod statuses (Waiting, Running, Terminated).
2. Store assigned host in etcd.
3. Change pod key to /registry/pods/<podid>. Container location remains
the same (/registry/hosts/<machine>/kublet).
* Add labels selector (same as List)
* Add fields selector
* Plan to let you select pods by Host and/or Status
* Add resourceVersion to let you resume a watch where you left off.
The apiserver on initialization must be provided with a codec
for encoding and decoding all handled objects including api.Status
and api.ServerOp. In addition, the RESTStorage Extract() method
has been changed to New(), which returns a pointer object that the
codec must decode into (the internal object). Switched registry
methods to use pointers for Create/Update instead of values.
Contains breaking API change on api.Status#Details (type change)
Turn Details from string -> StatusDetails - a general
bucket for keyed error behavior. Define an open enumeration
ReasonType exposed as Reason on the status object to provide
machine readable subcategorization beyond HTTP Status Code. Define
a human readable field Message which is common convention (previously
this was joined into Details).
Precedence order: HTTP Status Code, Reason, Details. apiserver would
impose restraints on the ReasonTypes defined by the main apiobject,
and ensure their use is consistent.
There are four long term scenarios this change supports:
1. Allow a client access to a machine readable field that can be
easily switched on for improving or translating the generic
server Message.
2. Return a 404 when a composite operation on multiple resources
fails with enough data so that a client can distinguish which
item does not exist. E.g. resource Parent and resource Child,
POST /parents/1/children to create a new Child, but /parents/1
is deleted. POST returns 404, ReasonTypeNotFound, and
Details.ID = "1", Details.Kind = "parent"
3. Allow a client to receive validation data that is keyed by
attribute for building user facing UIs around field submission.
Validation is usually expressed as map[string][]string, but
that type is less appropriate for many other uses.
4. Allow specific API errors to return more granular failure status
for specific operations. An example might be a minion proxy,
where the operation that failed may be both proxying OR the
minion itself. In this case a reason may be defined "proxy_failed"
corresponding to 502, where the Details field may be extended
to contain a nested error object.
At this time only ID and Kind are exposed
etcd_tools.go is not dependent on the specific implementation
(which is provided by pkg/api). All EtcdHelpers are created
with an encoding object which handles Encode/Decode/DecodeInto.
Additional tests added to verify simple atomic flows.
Begins to break up api singleton pattern.
Files that have RESTStorage implementations now end in "storage", and
files that have registries now end in "registry". I removed some
underscores in file names, since it seems to be go style not to have
them. I split minion_registry.go into two files.
We should consider splitting this package into two, to make more clear
the separation between the layers.
To make sure the etcd watcher works, I changed the replication
controller to use watch.Interface. I made apiserver support watches on
controllers, so replicationController can be run only off of the
apiserver. I made sure all the etcd watch testing that used to be in
replicationController is now tested on the new etcd watcher in
pkg/tools/.
If a service is deleted before the etcd key corresponding to its
endpoints has been created, a 500 error is returned (etcd key
not found). Instead, key not found should be ignored silently.
There is a potential race condition with the controller creating
endpoints after the service has been deleted which a cleanup task
should handle in the future.
Closes#684
Return all healthy minions instead of only returning the first several
healthy minions. Without the PR, if the first minion failed with health
check, the machine list is empty, thus GET /api/v1beta1/pods failed
with code 500
The current tests for the health package utilize a fake HTTP client
for testing health checks and only cover a limited set of test cases.
This patch removes the need for mocks by using the httptest package
from the standard library. After removing the fake HTTP client a bug
was found in the health.Check function; it incorrectly assumes that
a http.Response is always non-nil. Fix the issue by moving the defer
that closes the http.Response.Body after error handling.
Related tests in the registry package have be refactored to work with
the changes made to the health.Check function. All methods that implement
the health.HTTPGetInterface interface now return a http.Response with
with a noop http.Response.Body.
This patch does not introduce any changes in behavior.
1) imported glog to third_party (previous commit)
2) add support for third_party/update.sh to update just one pkg
3) search-and-replace:
s/log.Printf/glog.Infof/
s/log.Print/glog.Info/
s/log.Fatalf/glog.Fatalf/
s/log.Fatal/glog.Fatal/
4) convert glog.Info.*, err into glog.Error*
Adds some util interfaces to logging and calls them from each cmd, which
will set the default log output to write to glog. Pass glog-wrapped
Loggers to etcd for logging.
Log files will go to /tmp - we should probably follow this up with a
default log dir for each cmd.
The glog lib is sort of weak in that it only flushes every 30 seconds, so
we spin up our own flushing goroutine.