Files
kubernetes/federation
Kubernetes Submit Queue 15df7fedca Merge pull request #44626 from madhusudancs/fed-dns-paged-list
Automatic merge from submit-queue (batch tested with PRs 44626, 45641)

Update Google Cloud DNS provider Rrset.Get(name) method to return a list and change the `Rrset.List()` implementation to perform a paged walk

Some federated service e2e tests and a few ingress tests would become flaky after a few hundred runs. @csbell spent quite a lot of time debugging this and found out that this flakiness was due to a bug in the federated service controller deletion logic. Deletion of a federated service object triggers a logic in the controller to update the DNS records corresponding to that object. This DNS record update logic would return an error in failed runs which would in-turn cause the controller to reschedule the operation. This led to an infinite retry-failure cycle that never gave the API server a chance to garbage collect the deleted service object.

A couple of days ago we started seeing a correlation between the number of resource records in a DNS managed zone and these test failures. If you look at the test runs before and after run 2900 in the test grid - https://k8s-testgrid.appspot.com/cluster-federation#gce, you will notice that the grid became super green at 2900. That's when I deleted all the dangling DNS records from the past runs.

After some investigation yesterday, we found that `ResourceRecordSet.Get()` interface and its implementation, and `ResourceRecordSet.List()` implementation at least for Google Cloud DNS were incorrect.

This PR makes minimal set of changes (read: least invasive) in Google Cloud DNS provider implementation to fix these problems:

1. Modifies DNS provider Rrset.Get(name) interface to return multiple records and updates federated service controller.

    There can be multiple DNS resource records for a given name. They can vary by type, ttl, rrdata and a number of various other parameters. It is incorrect to return a single resource record for a given name.

    This change updates the Get interface to return multiple records for a given name and uses this list in the federated service controller to perform DNS operations.

2. Update Google Cloud DNS List implementation to perform a paged walk of lists to aggregate all the DNS records.

    The current `List()` implementation just lists the DNS resorce records in a given managed zone once and retruns the list. It neither performs a paged walk nor does it consider the `page_token` in the returned response.

    This change walks all the pages and aggregates the records in the pages and returns the aggregated list. This is potentially dangerous as it can blow up memory if there are a huge number of records in the given managed zone. But this is the best we can do without changing the provider interface too much. 

    Next step is to define a new paged list interface and implement it.

**Release note**:
```release-note
NONE
```

/assign @csbell 

cc @justinsb @shashidharatd @quinton-hoole @kubernetes/sig-federation-pr-reviews
2017-05-11 03:59:35 -07:00
..
2017-05-04 11:30:51 -07:00
2016-10-12 00:15:01 -07:00
2017-02-27 16:23:44 +01:00
2017-02-09 09:50:57 -08:00
2017-05-08 09:07:40 +08:00

Cluster Federation

Kubernetes Cluster Federation enables users to federate multiple Kubernetes clusters. Please see the user guide and the admin guide for more details about setting up and using the Cluster Federation.

Building Kubernetes Cluster Federation

Please see the Kubernetes Development Guide for initial setup. Once you have the development environment setup as explained in that guide, you also need to install jq

Building cluster federation artifacts should be as simple as running:

make build

You can specify the docker registry to tag the image using the KUBE_REGISTRY environment variable. Please make sure that you use the same value in all the subsequent commands.

To push the built docker images to the registry, run:

make push

To initialize the deployment run:

(This pulls the installer images)

make init

To deploy the clusters and install the federation components, edit the ${KUBE_ROOT}/_output/federation/config.json file to describe your clusters and run:

make deploy

To turn down the federation components and tear down the clusters run:

make destroy

Ideas for improvement

  1. Continue with destroy phase even in the face of errors.

    The bash script sets set -e errexit which causes the script to exit at the very first error. This should be the default mode for deploying components but not for destroying/cleanup.

Analytics