Commit Graph

5079 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
29e2d8be09 Merge pull request #40113 from maisem/cos
Automatic merge from submit-queue

Adding cos as an alias for gci.

**What this PR does / why we need it**: Adding COS as an alias for GCI.

cc: @adityakali @wonderfly
2017-01-18 18:40:43 -08:00
Kubernetes Submit Queue
0c61553cbc Merge pull request #40105 from sc68cal/bugs/40102
Automatic merge from submit-queue (batch tested with PRs 40105, 40095)

[OpenStack-Heat] Fix regex used to get object-store URL

**Release note**:

```release-note

Fixes a bug in the OpenStack-Heat kubernetes provider, in the handling of differences between the Identity v2 and Identity v3 APIs

```
2017-01-18 15:54:08 -08:00
Maisem Ali
52b6c9bb41 Adding cos as an alias for gci. 2017-01-18 15:14:25 -08:00
Kubernetes Submit Queue
b29d9cdbcf Merge pull request #39898 from ixdy/bazel-release-tars
Automatic merge from submit-queue

Build release tars using bazel

**What this PR does / why we need it**: builds equivalents of the various kubernetes release tarballs, solely using bazel.

For example, you can now do
```console
$ make bazel-release
$ hack/e2e.go -v -up -test -down
```

**Special notes for your reviewer**: this is currently dependent on 3b29803eb5, which I have yet to turn into a pull request, since I'm still trying to figure out if this is the best approach.

Basically, the issue comes up with the way we generate the various server docker image tarfiles and load them on nodes:
* we `md5sum` the binary being encapsulated (e.g. kube-proxy) and save that to `$binary.docker_tag` in the server tarball
* we then build the docker image and tag using that md5sum (e.g. `gcr.io/google_containers/kube-proxy:$MD5SUM`)
* we `docker save` this image, which embeds the full tag in the `$binary.tar` file.
* on cluster startup, we `docker load` these tarballs, which are loaded with the tag that we'd created at build time. the nodes then use the `$binary.docker_tag` file to find the right image.

With the current bazel `docker_build` rule, the tag isn't saved in the docker image tar, so the node is unable to find the image after `docker load`ing it.

My changes to the rule save the tag in the docker image tar, though I don't know if there are subtle issues with it. (Maybe we want to only tag when `--stamp` is given?)

Also, the docker images produced by bazel have the timestamp set to the unix epoch, which is not great for debugging. Might be another thing to change with a `--stamp`.

Long story short, we probably need to follow up with bazel folks on the best way to solve this problem.

**Release note**:

```release-note
NONE
```
2017-01-18 14:24:48 -08:00
Kubernetes Submit Queue
76d023ca90 Merge pull request #40094 from zmerlynn/cvm-v20170117
Automatic merge from submit-queue (batch tested with PRs 36467, 36528, 39568, 40094, 39042)

Bump GCE to container-vm-v20170117

Base image update only, no kubelet or Docker updates.

```release-note
Update GCE ContainerVM deployment to container-vm-v20170117 to pick up CVE fixes in base image.
```
2017-01-18 13:37:12 -08:00
Sean M. Collins
8ad7e1613a [OpenStack-Heat] Fix regex used to get object-store URL
"publicURL" is used for endpoints in the Identity v2 API, while in the
Identity v3 API it has been changed to just "public"

Fixes #40102
2017-01-18 16:29:41 -05:00
Zach Loafman
a0b8fd618f Bump GCE to container-vm-v20170117
Base image update only, no kubelet or Docker updates.
2017-01-18 10:50:17 -08:00
Kubernetes Submit Queue
6dfe5c49f6 Merge pull request #38865 from vwfs/ext4_no_lazy_init
Automatic merge from submit-queue

Enable lazy initialization of ext3/ext4 filesystems

**What this PR does / why we need it**: It enables lazy inode table and journal initialization in ext3 and ext4.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #30752, fixes #30240

**Release note**:
```release-note
Enable lazy inode table and journal initialization for ext3 and ext4
```

**Special notes for your reviewer**:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.

These extended options come from a script that was historically located at */usr/share/google/safe_format_and_mount* and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount

Checking the history of this script, I found the commit [Disable lazy init of inode table and journal.](4d7346f7f5). This one introduces the extended flags with this description:
```
Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.
```

The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.

When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system. 

As of https://github.com/kubernetes/kubernetes/issues/30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory. 

I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using *bio*s, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.

As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.
2017-01-18 09:09:52 -08:00
Kubernetes Submit Queue
16f45aee85 Merge pull request #39925 from appscode/kube-dns-1.11.0
Automatic merge from submit-queue

Use kube-dns:1.11.0

Use [kube-dns:1.11.0](https://github.com/kubernetes/dns/releases/tag/1.11.0)

With: kubernetes/dns#25
Fixes kubernetes/kubernetes#26752
Fixes kubernetes/kubernetes#33470

@bowei @thockin
2017-01-17 10:08:48 -08:00
Kubernetes Submit Queue
685e421b89 Merge pull request #40020 from wojtek-t/really_enable_etcd3
Automatic merge from submit-queue (batch tested with PRs 34763, 38706, 39939, 40020)

Really enable etcd3

Ref #39589

@timothysc @hongchaodeng
2017-01-17 09:14:52 -08:00
sadlil
e075e2e633 Use kube-dns:1.11.0 2017-01-17 08:37:24 -08:00
Wojciech Tyczynski
61f2201304 Really enable etcd3 2017-01-17 15:57:43 +01:00
Kubernetes Submit Queue
936a94f0a8 Merge pull request #40012 from Crassirostris/fluentd-liveness-probe-sync
Automatic merge from submit-queue (batch tested with PRs 39911, 40002, 39969, 40012, 40009)

Sync fluentd daemonset liveness probe with static pod liveness probe

Syncing change from https://github.com/kubernetes/kubernetes/pull/39949

Should also be cherry-picked
2017-01-17 06:46:58 -08:00
Kubernetes Submit Queue
002cdfa1ae Merge pull request #39861 from Traum-Ferienwohnungen/hostname_as_nodename
Automatic merge from submit-queue

Use $HOSTNAME as node.name by default

**What this PR does / why we need it**:
Allows to identify elasticsearch instances more easily.
As $HOSTNAME of a pod is unique, this should be no problem.
2017-01-17 04:57:09 -08:00
Mik Vyatskov
5b96233423 Sync fluentd daemonset liveness probe with static pod liveness probe 2017-01-17 13:29:54 +01:00
Janis Meybohm
6b3284acd2 Use $HOSTNAME as node.name by default
Allows to identify elasticsearch instances more easily.
As $HOSTNAME of a pod is unique, this should be no problem.
2017-01-17 08:38:53 +01:00
Kubernetes Submit Queue
06c610e276 Merge pull request #39949 from Crassirostris/fluentd-liveness-probe-fix
Automatic merge from submit-queue (batch tested with PRs 38592, 39949, 39946, 39882)

Remove fluentd buffers if fluentd is stuck

Fluentd now stores its buffers on disk for the resiliency. However, if buffer is corrupted, fluentd will be restarting forever.

Following change will make fluentd liveness probe delete buffers if fluentd is stuck for more than X minutes (15 by default).
2017-01-16 10:37:40 -08:00
Mik Vyatskov
edf1ffc074 Remove fluentd buffers if fluentd is stuck 2017-01-16 13:47:23 +01:00
Jeff Grafton
b9e060a630 Update scripts to look for binary artifacts in bazel-bin/ 2017-01-13 16:17:48 -08:00
Jeff Grafton
bc4b6ac397 Build release tarballs in bazel and add make bazel-release rule 2017-01-13 16:17:44 -08:00
Jordan Liggitt
d94bb26776
Conditionally write token file entries 2017-01-13 17:59:46 -05:00
Kubernetes Submit Queue
31483bf546 Merge pull request #39770 from ixdy/ubuntu-slim-base-image
Automatic merge from submit-queue

Update images that use ubuntu-slim base image to :0.6

**What this PR does / why we need it**: `ubuntu-slim:0.4` is somewhat old, being based on Ubuntu 16.04, whereas `ubuntu-slim:0.6` is based on Ubuntu 16.04.1.

**Special notes for your reviewer**: I haven't pushed any of these images yet, so I expect all of the e2e builds to fail. If we're happy with the changes, I can push the images and then re-trigger tests.

**Release note**:

```release-note
NONE
```

cc @aledbf as FYI
2017-01-12 20:39:13 -08:00
Kubernetes Submit Queue
ae04755d71 Merge pull request #39827 from MrHohn/addon-manager-v6.2
Automatic merge from submit-queue

Update kubectl to stable version for Addon Manager

Bumps up Addon Manager to v6.2, below images are pushed:
- gcr.io/google-containers/kube-addon-manager:v6.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.2

@mikedanese 

cc @ixdy
2017-01-12 15:54:24 -08:00
Kubernetes Submit Queue
d50c027d0c Merge pull request #39537 from liggitt/legacy-policy
Automatic merge from submit-queue (batch tested with PRs 39803, 39698, 39537, 39478)

include bootstrap admin in super-user group, ensure tokens file is correct on upgrades

Fixes https://github.com/kubernetes/kubernetes/issues/39532

Possible issues with cluster bring-up scripts:

- [x] known_tokens.csv and basic_auth.csv is not rewritten if the file already exists
  * new users (like the controller manager) are not available on upgrade
  * changed users (like the kubelet username change) are not reflected
  * group additions (like the addition of admin to the superuser group) don't take effect on upgrade
  * this PR updates the token and basicauth files line-by-line to preserve user additions, but also ensure new data is persisted
- [x] existing 1.5 clusters may depend on more permissive ABAC permissions (or customized ABAC policies). This PR adds an option to enable existing ABAC policy files for clusters that are upgrading

Follow-ups:
- [ ] both scripts are loading e2e role-bindings, which only be loaded in e2e tests, not in normal kube-up scenarios
- [ ] when upgrading, set the option to use existing ABAC policy files
- [ ] update bootstrap superuser client certs to add superuser group? ("We also have a certificate that "used to be" a super-user. On GCE, it has CN "kubecfg", on GKE it's "client"")
- [ ] define (but do not load by default) a relaxed set of RBAC roles/rolebindings matching legacy ABAC, and document how to load that for new clusters that do not want to isolate user permissions
2017-01-12 15:06:31 -08:00
Zihong Zheng
f62be637c8 Update kubectl to stable version for Addon Manager 2017-01-12 13:49:13 -08:00
Aleksandra Malinowska
043e809b8f update heapster version to 1.3.0-beta.0 2017-01-12 13:42:31 +01:00
Jeff Grafton
1c2ea28080 Update images that use ubuntu-slim base image to :0.6 2017-01-11 15:07:04 -08:00
Jordan Liggitt
968b0b30cf
Update token users if needed 2017-01-11 17:21:12 -05:00
Jordan Liggitt
21b422fccc
Allow enabling ABAC authz 2017-01-11 17:20:51 -05:00
Jordan Liggitt
1fe517e96a
Include admin in super-user group 2017-01-11 17:20:42 -05:00
Kubernetes Submit Queue
12e8271cd3 Merge pull request #33584 from marketlogicsoftware/kayrus/enable_elk_k8s_metadata
Automatic merge from submit-queue

Enable kubernetes_metadata by default for ELK stack

Looks like it was accidentally removed and was not restored back in this PR https://github.com/kubernetes/kubernetes/pull/29883
Because actually this plugin still exists in the image, but new ELK deployment don't allow you to index namespaces, pod names, etc.
2017-01-11 12:19:42 -08:00
Kubernetes Submit Queue
04326905b8 Merge pull request #39721 from euank/rkt-api-restart
Automatic merge from submit-queue (batch tested with PRs 39731, 39662, 39721)

container-linux: restart rkt-api on failure

This works around a flake I saw which had the same root cause as
https://github.com/coreos/rkt/issues/3513.

This will potentially help reduce the impact of such future problems as
well.

```release-note
NONE
```
2017-01-11 11:00:52 -08:00
Kubernetes Submit Queue
9814369ea1 Merge pull request #39662 from rf232/dashboard-v1.5.1
Automatic merge from submit-queue (batch tested with PRs 39731, 39662, 39721)

Update dashboard version to v1.5.1

**What this PR does / why we need it**:
Latest Dashboard developments, including a CSRF issue in the dashboard POST handlers

**Release note**:
```
Set Dashboard UI version to v1.5.1
```
2017-01-11 11:00:50 -08:00
kayrus
8435d19982 Enable kubernetes_metadata by default for ELK stack 2017-01-11 14:08:01 +01:00
Euan Kemp
eeef293ee2 container-linux: restart rkt-api on failure
This works around a flake I saw which had the same root cause as
https://github.com/coreos/rkt/issues/3513.

This will potentially help reduce the impact of such future problems as
well.
2017-01-11 00:25:14 -08:00
Kubernetes Submit Queue
ebc8e40694 Merge pull request #39691 from yujuhong/bump_timeout
Automatic merge from submit-queue (batch tested with PRs 39694, 39383, 39651, 39691, 39497)

Bump container-linux and gci timeout for docker health check

The command `docker ps` can take longer time to respond under heavy load or
when encountering some known issues. In these cases, the containers are running
fine, so aggressive health check could cause serious disruption. Bump the
timeout to 60s to be consistent with the debian-based containerVM.

This addresses #38588
2017-01-10 21:25:16 -08:00
Kubernetes Submit Queue
3f2a02cf98 Merge pull request #39383 from liggitt/bind-check
Automatic merge from submit-queue (batch tested with PRs 39694, 39383, 39651, 39691, 39497)

Allow rolebinding/clusterrolebinding with explicit bind permission check

Fixes https://github.com/kubernetes/kubernetes/issues/39176
Fixes https://github.com/kubernetes/kubernetes/issues/39258

Allows creating/updating a rolebinding/clusterrolebinding if the user has explicitly been granted permission to perform the "bind" verb against the referenced role/clusterrole (previously, they could only bind if they already had all the permissions in the referenced role via an RBAC role themselves)

```release-note
To create or update an RBAC RoleBinding or ClusterRoleBinding object, a user must:
1. Be authorized to make the create or update API request
2. Be allowed to bind the referenced role, either by already having all of the permissions contained in the referenced role, or by having the "bind" permission on the referenced role.
```
2017-01-10 21:25:13 -08:00
Kubernetes Submit Queue
addc6cae4a Merge pull request #38212 from mikedanese/kubeletauth
Automatic merge from submit-queue (batch tested with PRs 38212, 38792, 39641, 36390, 39005)

Generate a kubelet CA and kube-apiserver cert-pair for kubelet auth.

cc @cjcullen
2017-01-10 19:48:09 -08:00
Jeff Grafton
19aafd291c Always --pull in docker build to ensure recent base images 2017-01-10 16:21:05 -08:00
Yu-Ju Hong
4e87973a9b Bump container-linux and gci timeout for docker health check
The command `docker ps` can take longer time to respond under heavy load or
when encountering some known issues. In these cases, the containers are running
fine, so aggressive health check could cause serious disruption. Bump the
timeout to 60s to be consistent with the debian-based containerVM.
2017-01-10 13:07:21 -08:00
Kubernetes Submit Queue
d7ce8b80ee Merge pull request #39607 from deads2k/rbac-35-e2e-permission-typos
Automatic merge from submit-queue (batch tested with PRs 39628, 39551, 38746, 38352, 39607)

fix e2e kubelet binding

Fixes #39543

This limits scope of the kubelet.  It was an oversight before.  Hopefully we won't end up chasing permissions again.
2017-01-10 11:54:21 -08:00
Jordan Liggitt
6057a2ca76
Remove kubekins as cluster-admin 2017-01-10 14:34:33 -05:00
Piotr Szczesniak
da7b81c4d8 Added owners to monitoring and logging related directories 2017-01-10 12:14:10 +01:00
Rob Franken
59ef8a4739 update dashboard version to v1.5.1 2017-01-10 11:57:21 +01:00
deads2k
60daaa3cca fix e2e kubelet binding 2017-01-09 07:39:10 -05:00
Mik Vyatskov
57ec7b77fd Fix fluentd-gcp image config by avoiding processing its own logs 2017-01-09 10:05:33 +01:00
Bowei Du
75c29adbaa Update DNS readme to point to the new code repository 2017-01-06 13:08:59 -08:00
Bowei Du
b5c0fd5837 Update image references to the output of the kubernetes/dns project 2017-01-06 12:57:41 -08:00
Kubernetes Submit Queue
4881341f8c Merge pull request #39520 from shyamjvs/add-etcd-events-log
Automatic merge from submit-queue (batch tested with PRs 39318, 39520)

Added etcd-events to cluster logging

Fixes #38983 

@kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-01-06 12:22:09 -08:00
Shyam Jeedigunta
9bb636e9f8 Added etcd-events to cluster logging 2017-01-06 10:28:48 +01:00