Commit Graph

5283 Commits

Author SHA1 Message Date
Madhusudan.C.S
edef3af34f Split federation-{up,down} from e2e-{up,down}. 2017-02-24 14:27:31 -08:00
Marco Ceppi
07ef43b630 Update owners file to reflect Juju/Charm knowledgable reviewers 2017-02-24 11:57:19 -05:00
Kubernetes Submit Queue
8e13ee01d6 Merge pull request #41908 from chuckbutler/remove-ivan-from-juju
Automatic merge from submit-queue

Remove ivan4th from reviewers

**What this PR does / why we need it**:

Per @ivan4th request in #41351 he would like to be removed from the
reviewers list in this directory tree. This commit addresses that
request.

**Special notes for your reviewer**:

As Ivan has already investigated the PR in question under 41351 I would like to see that driven to landing before landing this OWNERS file change, unless another reviewer would like to step in and help land that open PR.

**Release note**:

```release-note
NONE
```
2017-02-23 22:10:48 -08:00
Kubernetes Submit Queue
84b74074a4 Merge pull request #41674 from ixdy/etcd-empty-dir-cleanup-busybox
Automatic merge from submit-queue

Base etcd-empty-dir-cleanup on busybox, run as nobody, and update to etcdctl 3.0.14

**What this PR does / why we need it**: since the `etcd-empty-dir-cleanup` image just uses a simple shell script and `etcdctl`, we can base it on busybox, which is a smaller target than alpine.

I've also updated this to use an `etcdctl` from etcd 3.0.14, which matches the version of etcd we're running in 1.6 clusters (I believe), and changed the tag to match the `etcdctl` version.

Tested in my own e2e cluster, where it seems to work.

I haven't pushed the image yet, so e2e tests *may* fail. Tagging `do-not-merge`; if you think this looks good, I'll push the image and retest.

**Release note**:

```release-note
```

cc @timstclair @mml @wojtek-t
2017-02-23 21:25:56 -08:00
Kubernetes Submit Queue
e70d23db2a Merge pull request #41667 from mikedanese/certs
Automatic merge from submit-queue (batch tested with PRs 41667, 41820, 40910, 41645, 41361)

refactor certs in GCE to break up usages

TODO: debian
2017-02-23 20:57:27 -08:00
Kubernetes Submit Queue
b799bbf0a8 Merge pull request #38816 from deads2k/rbac-23-switch-kubedns-sa
Automatic merge from submit-queue

move kube-dns to a separate service account

Switches the kubedns addon to run as a separate service account so that we can subdivide RBAC permission for it.  The RBAC permissions will need a little more refinement which I'm expecting to find in https://github.com/kubernetes/kubernetes/pull/38626 .

@cjcullen @kubernetes/sig-auth since this is directly related to enabling RBAC with subdivided permissions
 @thockin @kubernetes/sig-network since this directly affects now kubedns is added.  


```release-note
`kube-dns` now runs using a separate `system:serviceaccount:kube-system:kube-dns` service account which is automatically bound to the correct RBAC permissions.
```
2017-02-23 12:06:13 -08:00
Mike Danese
192392bddd refactor certs in GCE 2017-02-23 10:12:31 -08:00
Kubernetes Submit Queue
bb5fdff58b Merge pull request #41567 from Crassirostris/fluentd-gcp-monitoring
Automatic merge from submit-queue (batch tested with PRs 39855, 41433, 41567, 41887, 41652)

Add fluentd monitoring to fluentd-gcp image

Right now we are not able to monitor the state of fluentd in cluster, which may result in logging subsystem quietly failing. This PR tries to address that problem by introducing the fluentd container monitoring:

* fluentd internal metrics, like number of buffers and number of data in buffers
* `logging_line_count`, number of lines, read by fluentd from application containers' logs
    * Has `tag` label, corresponding to the fluentd tag of the entry
* `logging_entry_count`, number of entries, emitted to the output plugin
    * With label `component` set to `container`, generated by application containers
    * With label `component` set to `system`, generated by system components like kubelet, docker, scheduler, etc.
    * Has `tag` label, corresponding to the fluentd tag of the entry

CC @fabxc @igorpeshansky @edsiper
2017-02-23 09:36:33 -08:00
Wojciech Tyczynski
b70e392161 Update clusters to use 3.0.17 etcd 2017-02-23 10:08:50 +01:00
Wojciech Tyczynski
a7d2136ce1 Update etcd to 3.0.17 in integration tests 2017-02-23 10:08:50 +01:00
Kubernetes Submit Queue
a91cf1ed94 Merge pull request #41771 from cblecker/go-1.7.5
Automatic merge from submit-queue (batch tested with PRs 41812, 41665, 40007, 41281, 41771)

Bump golang versions to 1.7.5

**What this PR does / why we need it**: While #41636 might not make it in until 1.7, this would bump current golang versions from 1.7.4 to 1.7.5 to integrate the fixes from that patch version. This would include, among other things, a fix to ensure cross-built binaries for darwin don't have certificate validation errors (golang/go#18688)

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: none

**Special notes for your reviewer**:

**Release note**:

```release-note
Upgrade golang versions to 1.7.5
```
2017-02-23 00:11:41 -08:00
Kubernetes Submit Queue
8fc311c96c Merge pull request #41807 from shyamjvs/remove-fart-metrics
Automatic merge from submit-queue (batch tested with PRs 41797, 41793, 41795, 41807, 41781)

Remove unnecessary metrics (http/process/go) from being exposed by etcd-version-monitor

Unregister metrics we do not want from the etcd version metrics handler.

cc @wojtek-t @piosz
2017-02-22 22:06:35 -08:00
Kubernetes Submit Queue
e64835683b Merge pull request #41795 from Crassirostris/fluentd-gcp-turn-supervisor-off
Automatic merge from submit-queue (batch tested with PRs 41797, 41793, 41795, 41807, 41781)

Turn fluentd supervisor off for fluentd-gcp

By default, turn fluentd supervisor off so that when fluentd process fails, for example due to OOM, container fails completely and it would be easy to detect.

CC @igorpeshansky @qingling128
2017-02-22 22:06:33 -08:00
Kubernetes Submit Queue
59f4c5911a Merge pull request #41819 from dchen1107/master
Automatic merge from submit-queue (batch tested with PRs 38957, 41819, 41851, 40667, 41373)

Bump GCI to gci-stable-56-9000-84-2

Changelogs since gci-beta-56-9000-80-0:

- Fixed google-accounts-daemon breaks on GCI when network is unavailable.
- Fixed iptables-restore performance regression.

cc/ @adityakali @Random-Liu @fabioy
2017-02-22 19:59:33 -08:00
Jeff Grafton
eeec939361 Don't fail if the grep fails to match any resources 2017-02-22 14:55:57 -08:00
Jeff Grafton
511bdc11ae Bump etcd-empty-dir-cleanup to 3.0.14.0 2017-02-22 13:22:04 -08:00
Jeff Grafton
1f3ba7f484 Base etcd-empty-dir-cleanup on busybox, run as nobody, and update to etcdctl 3.0.14 2017-02-22 13:22:03 -08:00
Charles Butler
3c5009d00a Remove ivan4th from reviewers
Per ivans request in #41351 he would like to be removed from the
reviewers list in this directory tree. This commit addresses that
request.
2017-02-22 12:06:00 -06:00
Kubernetes Submit Queue
44aa1679c9 Merge pull request #41657 from bowei/update-dns
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)

Update dns

```release-note
NONE
```
2017-02-22 08:12:48 -08:00
Kubernetes Submit Queue
fe34705f8a Merge pull request #41587 from MrHohn/addon-manager-fix-hpa
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)

Update kubectl in addon-manager to use HPA in autoscaling/v1

Addon-manager is broken since HPA objects were removed from extensions api group.

Came across the logs from [the latest addon-manager on Jenkins](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/4290/artifacts/bootstrap-e2e-master/kube-addon-manager.log):
```
INFO: == Entering periodical apply loop at 2017-02-16T17:33:37+0000 ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl  apply --namespace=kube-system -f /etc/kubernetes/addons     --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:38+0000. 2 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl  apply --namespace=kube-system -f /etc/kubernetes/addons     --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:46+0000. 1 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl  apply --namespace=kube-system -f /etc/kubernetes/addons     --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:53+0000. 0 tries remaining. ==
WRN: == Kubernetes addon update completed with errors at 2017-02-16T17:33:58+0000 ==
```

And notice this commit (f66679a4e9) came in two weeks ago, which removed HorizontalPodAutoscaler from extensions/v1beta1.

Addon-manager is now partially functioning that it could successfully create and update addons, but will fail to prune objects, which means upgrade tests may mostly fail.

Pushed another version of addon-manager with kubectl v1.6.0-alpha.2 ([release 2 days ago](https://github.com/kubernetes/kubernetes/releases/tag/v1.6.0-alpha.2)) for fixing, including below images:
- gcr.io/google-containers/kube-addon-manager:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.4-alpha.2

@mikedanese 

cc @wojtek-t @shyamjvs
2017-02-22 08:12:46 -08:00
Kubernetes Submit Queue
b29bdee735 Merge pull request #41256 from mbruzek/mbruzek-juju-lint-fixes
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)

Lint fixes for the master and worker Python code.

**What this PR does / why we need it**: lint fixes for the python code.

**Which issue this PR fixes** none

**Special notes for your reviewer**: This is lint fixes for the Juju python code.

**Release note**:

```release-note
NONE
```

Please consider these changes so we can pass flake8 lint tests in our build process.
2017-02-22 08:12:43 -08:00
Shyam Jeedigunta
d5a28b3618 Remove unnecessary metrics (http/process/go) from being exposed by etcd-version-monitor 2017-02-22 13:11:00 +01:00
Christoph Blecker
c3de31c8d0 Bump golang versions to 1.7.5 2017-02-21 13:02:16 -08:00
Madhusudan.C.S
2cb2200847 Move kube-dns ConfigMap creation/deletion out of federated services e2e tests to federation-up.sh/federation-down.sh where the clusters are joined/unjoined. 2017-02-21 10:27:31 -08:00
Shyam JVS
746cc5d284 Merge pull request #41800 from shyamjvs/fix-hollow-node-logging
Whitelist kubemark in node_ssh_supported_providers for log dump
2017-02-21 19:13:08 +01:00
Dawn Chen
3d510461a3 Bump GCI to gci-stable-56-9000-84-2 2017-02-21 10:03:14 -08:00
Kubernetes Submit Queue
409d7d0a91 Merge pull request #41326 from ncdc/ci-cache-mutation
Automatic merge from submit-queue (batch tested with PRs 41364, 40317, 41326, 41783, 41782)

Add ability to enable cache mutation detector in GCE

Add the ability to enable the cache mutation detector in GCE. The current default behavior (disabled) is retained.

When paired with https://github.com/kubernetes/test-infra/pull/1901, we'll be able to detect shared informer cache mutations in gce e2e PR jobs.
2017-02-21 07:45:42 -08:00
Shyam Jeedigunta
3bc6bf6b70 Whitelist kubemark in node_ssh_supported_providers for log dump 2017-02-21 14:02:17 +01:00
Mik Vyatskov
5d59d4d27b Turn fluentd supervisor off for fluentd-gcp 2017-02-21 13:50:47 +01:00
Kubernetes Submit Queue
70c9eebd21 Merge pull request #41739 from shyamjvs/hollow-node-logs
Automatic merge from submit-queue (batch tested with PRs 41706, 39063, 41330, 41739, 41576)

[Kubemark] Add option to log hollow-node logs

Ref https://github.com/kubernetes/kubernetes/issues/41613

Added an option to log kubemark hollow-node logs which includes kubelet, kubeproxy and npd logs for each hollow-node.
Setting the env var `ENABLE_HOLLOW_NODE_LOGS=true` should now enable logging for tests.

cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @yujuhong @Random-Liu
2017-02-21 02:24:43 -08:00
Zihong Zheng
2c8e89820a Update kubectl in addon-manager to use HPA in autoscaling/v1 instead of extensions/v1beta1 2017-02-20 10:49:10 -08:00
deads2k
36b586d5d7 move kube-dns to a separate service account 2017-02-20 07:35:08 -05:00
Shyam Jeedigunta
ed0ab3cd8e [Kubemark] Add option to log hollow-node logs 2017-02-20 11:52:49 +01:00
Kubernetes Submit Queue
ff12e5688c Merge pull request #40206 from Random-Liu/add-standalone-npd
Automatic merge from submit-queue

Add standalone npd on GCI.

This PR added standalone NPD in GCE GCI cluster. I already verified the PR, and it should work.

/cc @dchen1107 @fabioy @andyxning @kubernetes/sig-node-misc
2017-02-18 02:00:20 -08:00
Kubernetes Submit Queue
4b3a097ecd Merge pull request #41525 from yujuhong/fix_output
Automatic merge from submit-queue

Fix the output of health-mointor.sh

The script show prints the errors/response of the health check, but not
show the progress of `curl`.
2017-02-17 16:57:29 -08:00
Random-Liu
d40c0a7099 Add standalone npd on GCI. 2017-02-17 16:18:08 -08:00
Bowei Du
9f75db3c69 Update kube-dns image versions to the latest stable release 2017-02-17 11:12:25 -08:00
Kubernetes Submit Queue
6d5b2ef49e Merge pull request #41080 from shyamjvs/etcd-version-monitor
Automatic merge from submit-queue

Added a basic monitor for providing etcd version related info

Fixes #41071 

This tool scrapes metrics partly from etcd's /version and /metrics endpoints and partly using etcdctl and exposes them as prometheus metrics at `http://localhost:9101/metrics` endpoint on the master. Here is a summary of the metrics it exposes (self-explanatory from the code):
-        etcdVersionFetchCount   = prometheus.NewCounterVec(
                prometheus.CounterOpts{
                        Namespace: "etcd",
                        Name: "version_info_fetch_count",
                        Help: "Number of times etcd's version info was fetched, labeled by etcd's server binary and cluster version",
                },
                []string{"serverversion", "clusterversion"})
-         etcdGRPCRequestsTotal   = prometheus.NewCounterVec(
                prometheus.CounterOpts{
                        Namespace: namespace,
                        Name: "grpc_requests_total",
                        Help: "Counter of received grpc requests, labeled by grpc method and grpc service names",
                },
                []string{"grpc_method", "grpc_service"})

For further info on how to run this as a binary/docker-container/kubernetes-pod and checking the metrics, have a look at the README.md file.

cc @fgrzadkowski @wojtek-t @piosz
2017-02-17 10:18:48 -08:00
Kubernetes Submit Queue
46cd8ec91b Merge pull request #41637 from wojtek-t/expose_storage_format_as_env
Automatic merge from submit-queue

Expose storage media type as env variable

Ref #40636

@mml
2017-02-17 08:15:27 -08:00
Andy Goldstein
688c19ec71 Allow cache mutation detector enablement by PRs
Allow cache mutation detector enablement by PRs in an attempt to find
mutations before they're merged in to the code base. It's just for the
apiserver and controller-manager for now. If/when the other components
start using a SharedInformerFactory, we should set them up just like
this as well.
2017-02-17 10:03:13 -05:00
Kubernetes Submit Queue
3b14667afe Merge pull request #41604 from shyamjvs/kubemark-num-nodes
Automatic merge from submit-queue

Reduce default value of kubemark's NUM_NODES to 10

Changing the default value of kubemark's NUM_NODES from 100 to 10, as it would then be possible to start kubemark on gce clusters that have been started using kube-up that uses the default config of three n1-standard-2 nodes. I've already been asked by a couple of people about why kubemark is not starting on their cluster because of this. More people shouldn't be facing this issue in future.

cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-02-17 06:49:21 -08:00
Wojciech Tyczynski
3695e85b34 Expose storage media type as env variable 2017-02-17 14:16:55 +01:00
Shyam Jeedigunta
7e6b8ac26b Added a basic monitor for watching etcd version and size related info 2017-02-17 12:52:54 +01:00
Shyam Jeedigunta
94d2ed5e34 Reduce default value of kubemark's NUM_NODES to 10 2017-02-16 23:35:39 +01:00
Matt Bruzek
3b29b6a9ef Lint fixes for the master and worker Python code. 2017-02-16 14:01:30 -06:00
Mik Vyatskov
8d2d91070a Add fluentd monitoring to fluentd-gcp image 2017-02-16 19:04:32 +01:00
Kubernetes Submit Queue
30e8953fad Merge pull request #41564 from Crassirostris/fluentd-gcp-plugin-version-bump
Automatic merge from submit-queue

Bump fluentd-gcp google_cloud plugin version

Bump the version of `fluent-plugin-google-cloud` in fluentd-gcp image, because it's broken for version `0.5.2`.

Recently, gem `google-api-client` was updated to version `0.10.0`. The new version broke `fluent-plugin-google-cloud` which doesn't specify the upper version of `google-api-client` gem. I'm bumping the version used in our image to allow future changes in this release to be run and tested.

This PR doesn't bump the version, since no effective changes has happened, leaving this for the next PR to do.

CC @igorpeshansky
2017-02-16 09:20:12 -08:00
Mik Vyatskov
e8de31623f Bump fluentd-gcp google_cloud plugin version 2017-02-16 16:49:16 +01:00
Kubernetes Submit Queue
627c6ce2b8 Merge pull request #41489 from Crassirostris/fluentd-add-toleration
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)

Add toleration to fluentd daemonset to make it run on master

Because of https://github.com/kubernetes/kubernetes/pull/41172 fluentd pods stopped being allocated on master node.

This PR introduces toleration for master taint for fluentd.

CC @davidopp @janetkuo @kubernetes/sig-scheduling-bugs

Unfortunately, we don't have e2e tests to ensure that master logs are being ingested. This problem is a great signal to work on https://github.com/kubernetes/kubernetes/issues/41411
2017-02-16 01:52:08 -08:00
Kubernetes Submit Queue
5ff9a72ea0 Merge pull request #41508 from Crassirostris/fluentd-dns-problem-fix
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)

Make fluentd use default dns instead of cluster dns to make it work o…

Fix https://github.com/kubernetes/kubernetes/issues/41415

Fluentd for Stackdriver requires external urls (e.g. `logging.googleapis.com`) to be available in order to work. If fluentd runs on master, it cannot access the service endpoint of cluster DNS. This change makes fluentd use default dns to fix this problem.

CC @thockin @bowei
2017-02-16 01:52:06 -08:00