Commit Graph

4756 Commits

Author SHA1 Message Date
Bowei Du
9478c4b01f Add dnsmasq-metrics to the standard DNS pod
- Enables prometheus metrics on kube-dns
- Explicitly set v=0 logging for now
2016-11-10 00:08:14 -08:00
Kubernetes Submit Queue
a330acddee Merge pull request #36358 from Crassirostris/use-new-fluentd-gcp-config
Automatic merge from submit-queue

Use new fluentd-gcp image version

In #35618 we used new version of fluentd agent, which includes new version of jeamalloc, allowing us to use it.

Additionally, we came up with a hacky way to encourage Ruby GC to be invoked more often by using RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR variable.

@piosz
2016-11-09 21:50:53 -08:00
Kubernetes Submit Queue
0f082c6663 Merge pull request #36280 from rkouj/better-mount-error
Automatic merge from submit-queue

Better messaging for missing volume binaries on host

**What this PR does / why we need it**:
When mount binaries are not present on a host, the error returned is a generic one.
This change is to check the mount binaries before the mount and return a user-friendly error message.

This change is specific to GCI and the flag is experimental now.

https://github.com/kubernetes/kubernetes/issues/36098

**Release note**:
Introduces a flag `check-node-capabilities-before-mount` which if set, enables a check (`CanMount()`) prior to mount operations to verify that the required components (binaries, etc.) to mount the volume are available on the underlying node. If the check is enabled and `CanMount()` returns an error, the mount operation fails. Implements the `CanMount()` check for NFS.















Sample output post change :


rkouj@rkouj0:~/go/src/k8s.io/kubernetes$ kubectl describe pods
Name:		sleepyrc-fzhyl
Namespace:	default
Node:		e2e-test-rkouj-minion-group-oxxa/10.240.0.3
Start Time:	Mon, 07 Nov 2016 21:28:36 -0800
Labels:		name=sleepy
Status:		Pending
IP:		
Controllers:	ReplicationController/sleepyrc
Containers:
  sleepycontainer1:
    Container ID:	
    Image:		gcr.io/google_containers/busybox
    Image ID:		
    Port:		
    Command:
      sleep
      6000
    QoS Tier:
      cpu:	Burstable
      memory:	BestEffort
    Requests:
      cpu:		100m
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Environment Variables:
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  data:
    Type:	NFS (an NFS mount that lasts the lifetime of a pod)
    Server:	127.0.0.1
    Path:	/export
    ReadOnly:	false
  default-token-d13tj:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-d13tj
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  7s		7s		1	{default-scheduler }						Normal		Scheduled	Successfully assigned sleepyrc-fzhyl to e2e-test-rkouj-minion-group-oxxa
  6s		3s		4	{kubelet e2e-test-rkouj-minion-group-oxxa}			Warning		FailedMount	Unable to mount volume kubernetes.io/nfs/32c7ef16-a574-11e6-813d-42010af00002-data (spec.Name: data) on pod sleepyrc-fzhyl (UID: 32c7ef16-a574-11e6-813d-42010af00002). Verify that your node machine has the required components before attempting to mount this volume type. Required binary /sbin/mount.nfs is missing
2016-11-09 18:51:00 -08:00
Kubernetes Submit Queue
de2bec7691 Merge pull request #36550 from yujuhong/kern_timestamps
Automatic merge from submit-queue

Get kernel logs with timestamps
2016-11-09 18:13:06 -08:00
Kubernetes Submit Queue
b392910bc7 Merge pull request #36505 from Crassirostris/kibana-image-fix
Automatic merge from submit-queue

Fix startup script bug in kibana image

Big thanks to @lhopki01 for noticing this!

As mention in discussion in https://github.com/kubernetes/kubernetes/pull/36103 current image crashes if we don't want to work behind proxy because of string interpolation in bash.

@piosz
2016-11-09 17:33:58 -08:00
Kubernetes Submit Queue
9922489abc Merge pull request #36384 from Crassirostris/fluentd-es-rescheduler-config
Automatic merge from submit-queue

Add rescheduler logs to the fluentd-elasticsearch configuration

Same as https://github.com/kubernetes/kubernetes/pull/36359 for elasticsearch plugin

@piosz
2016-11-09 17:33:50 -08:00
Yu-Ju Hong
fac2aeb416 Get kernel logs with timestamps
Without the timestamps, the log is not very useful.
2016-11-09 17:23:33 -08:00
Kubernetes Submit Queue
986839e9fb Merge pull request #35886 from MrHohn/addon-manager-token
Automatic merge from submit-queue

Fixes token_found bug in addon manager

From #35832.

Above PR exposed addon manager's logs on Jenkins, found below error on the gce e2e test artifacts:
```
Error from server: serviceaccounts "default" not found
error executing template "{{with index .secrets 0}}{{.name}}{{end}}": template: output:1:7: executing "output" at <index .secrets 0>: error calling index: index of untyped nil
== default service account in the kube-system namespace has token Error executing template: template: output:1:7: executing "output" at <index .secrets 0>: error calling index: index of untyped nil. Printing more information for debugging the template:
	template was:
		{{with index .secrets 0}}{{.name}}{{end}}
	raw data was:
		{"kind":"ServiceAccount","apiVersion":"v1","metadata":{"name":"default","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/serviceaccounts/default","uid":"de3f2f85-9d6a-11e6-9df3-42010af00002","resourceVersion":"48","creationTimestamp":"2016-10-29T00:01:40Z"}}
	object given to template engine was:
		map[apiVersion:v1 metadata:map[selfLink:/api/v1/namespaces/kube-system/serviceaccounts/default uid:de3f2f85-9d6a-11e6-9df3-42010af00002 resourceVersion:48 creationTimestamp:2016-10-29T00:01:40Z name:default namespace:kube-system] kind:ServiceAccount] ==
```

Seems like the script failed to retrieve service token at the first time and mistakenly used the error message as the token content. Fixes by replacing `|| true` with if condition.
2016-11-09 15:55:02 -08:00
Rajat Ramesh Koujalagi
d81e216fc6 Better messaging for missing volume components on host to perform mount 2016-11-09 15:16:11 -08:00
Kubernetes Submit Queue
916f526811 Merge pull request #36435 from wojtek-t/fix_max_inflight_requests
Automatic merge from submit-queue

Increase max-requests-inflight in large clusters

Fix #35402
2016-11-09 09:27:02 -08:00
Mik Vyatskov
94eeca8d2c Fixed startup script bug in kibana image 2016-11-09 16:35:34 +01:00
Kubernetes Submit Queue
54274807d9 Merge pull request #35832 from MrHohn/addon-manager-logs
Automatic merge from submit-queue

Expose addon manager's log by logging to file

Fixes #35823.

Use the same way as  how [`kube-proxy`](https://github.com/kubernetes/kubernetes/blob/master/cluster/saltbase/salt/kube-proxy/kube-proxy.manifest) deals with logging. We would be able to check Addon Manager's logs for Jenkins tests after this.

Would like to see the Jenkins test result to examine.

@mikedanese
2016-11-08 22:50:57 -08:00
Vishnu kannan
773ad9be29 Make gci mounter pre-fetch mounter image to reduce startup latency during runtime
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-11-08 12:13:49 -08:00
Jing Xu
d07396f7c7 Update configure.sh
Update the gci-mounter sha1 number
2016-11-08 12:13:49 -08:00
Vishnu kannan
77218d361b Use a local file for rkt stage1 and gci-mounter docker image.
Added a make rule `make upload` to audit and automate release artifact
uploads to GCS.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-11-08 11:09:13 -08:00
Vishnu kannan
dd8ec911f3 Revert "Revert "Merge pull request #35821 from vishh/gci-mounter-scope""
This reverts commit 402116aed4.
2016-11-08 11:09:10 -08:00
Mik Vyatskov
279e20ed13 Fix flunetd-gcp image Dockerfile 2016-11-08 15:14:09 +01:00
Wojciech Tyczynski
75d7d1ad37 Increase max-requests-inflight in large clusters 2016-11-08 14:41:58 +01:00
Kubernetes Submit Queue
e5fb8ac226 Merge pull request #36431 from mwielgus/ca-0.4.0-b1
Automatic merge from submit-queue

Switch cluster autoscaler to 0.4.0-beta1

Switch Kubernetes to new 0.4.0-beta1 Cluster Autoscaler. The release contains mainly bugfixes:
* unschedulable nodes don't stop cluster autoscaler
* better logging
* events for deltions
* bulk delete for empty nodes

cc: @fgrzadkowski @piosz @jszczepkowski
2016-11-08 03:47:21 -08:00
Marcin
b6ef1a132e Switch cluster autoscaler to 0.4.0-beta1 2016-11-08 11:45:42 +01:00
Kubernetes Submit Queue
ece94c317a Merge pull request #36077 from mtaufen/upgrade-log-os-and-k8s-ver
Automatic merge from submit-queue

Print osImage and kubeletVersion for nodes before and after GCE upgrade

This will print, e.g.:
```
== Pre-Upgrade Node OS and Kubelet Versions ==
name: "e2e-test-mtaufen-master", osImage: "Google Container-VM Image", kubeletVersion: "v1.4.5-beta.0.45+90d209221ec8dc-dirty"
name: "e2e-test-mtaufen-minion-group-jo79", osImage: "Debian GNU/Linux 7 (wheezy)", kubeletVersion: "v1.4.5-beta.0.45+90d209221ec8dc-dirty"
name: "e2e-test-mtaufen-minion-group-ox5l", osImage: "Debian GNU/Linux 7 (wheezy)", kubeletVersion: "v1.4.5-beta.0.45+90d209221ec8dc-dirty"
name: "e2e-test-mtaufen-minion-group-qvbq", osImage: "Debian GNU/Linux 7 (wheezy)", kubeletVersion: "v1.4.5-beta.0.45+90d209221ec8dc-dirty"
```

Let me know what output format you prefer and I'll see if I can make it work, I have the extent of flexibility allowed by jsonpath.
2016-11-08 02:18:44 -08:00
Kubernetes Submit Queue
a0c34eee35 Merge pull request #33239 from MrHohn/dns-autoscaler
Automatic merge from submit-queue

Deploy kube-dns with cluster-proportional-autoscaler

This PR integrates [cluster-proportional-autoscaler](https://github.com/kubernetes-incubator/cluster-proportional-autoscaler) with kube-dns for DNS horizontal autoscaling. 

Fixes #28648 and #27781.
2016-11-07 19:31:31 -08:00
Kubernetes Submit Queue
465c6b749c Merge pull request #36370 from Crassirostris/flunetd-gcp-image-fix
Automatic merge from submit-queue

Fix config file names inside fluentd-gcp image

Need this in order to merge https://github.com/kubernetes/kubernetes/pull/36358

Because on container-vm we need implicitly used configuration file

@piosz
2016-11-07 13:51:07 -08:00
Kubernetes Submit Queue
4ef95cd720 Merge pull request #36356 from jszczepkowski/exp-flag
Automatic merge from submit-queue

Removed EXPERIMENTAL from KUBE_REPLICATE_EXISTING_MASTER flag.
2016-11-07 12:45:31 -08:00
Mik Vyatskov
d478307106 Fix config file names inside fluentd-gcp image 2016-11-07 20:31:12 +01:00
Mik Vyatskov
800aafea9b Add rescheduler logs to the fluentd-elasticsearch configuration 2016-11-07 20:24:06 +01:00
Zihong Zheng
d961190e6f Deployed DNS horizontal autoscaler as an addon
DNS horizontal autoscaling feature is turned on by default on gce.
The corresponding env var is piped into almost all other cloud
providers.
2016-11-07 10:44:44 -08:00
Kubernetes Submit Queue
4b66d80e85 Merge pull request #36218 from wojtek-t/backup_before_migration
Automatic merge from submit-queue

Backup before migration

Do backup before etcd migration.

Ref #20504
2016-11-07 08:34:19 -08:00
Kubernetes Submit Queue
04a81cdd3e Merge pull request #36363 from Crassirostris/fluentd-gcp-image-build-fix
Automatic merge from submit-queue

Fix fluentd-gcp Dockerfile to reduce image size

Change reduces image size by 150MB.

@piosz
2016-11-07 07:56:11 -08:00
Mik Vyatskov
82457deb74 Use new fluentd-gcp image version 2016-11-07 15:52:47 +01:00
Mik Vyatskov
d3465e5b8c Add rescheduler logs to the fluentd-gcp configuration 2016-11-07 15:10:52 +01:00
Mik Vyatskov
220168c9aa Fix fluentd-gcp Dockerfile to reduce image size 2016-11-07 15:01:38 +01:00
Kubernetes Submit Queue
d2aabc8509 Merge pull request #35618 from Crassirostris/gcl-flunetd-config-update
Automatic merge from submit-queue

Update fluentd-gcp configuration

Related to #32762

Though it's not a final solution to the fluentd OOM problems, it increases number of logs that can be handled without losses by
- switching to the file buffering, making buffering mechanism more resilient
- decreasing size of the buffer, decreasing the amount of memory needed
- decreasing number of threads handling the load, since number of chunks is lower than previous number of threads

which results in decrease in theoretical throughput. Tests to confirm cases covered by this change will follow.

cc @piosz @edsiper @repeatedly please take look and confirm that all of these changed are meaningful.
2016-11-07 05:49:00 -08:00
Jerzy Szczepkowski
2ae5c701bd Removed EXPERIMENTAL from KUBE_REPLICATE_EXISTING_MASTER flag.
Removed EXPERIMENTAL from KUBE_REPLICATE_EXISTING_MASTER flag.
2016-11-07 12:47:04 +01:00
Wojciech Tyczynski
b34ac6baef Bump etcd to 3.0.14 in tests 2016-11-07 08:41:17 +01:00
Kubernetes Submit Queue
b75c3a45a1 Merge pull request #35776 from jimmycuadra/petset-rename-docs-examples
Automatic merge from submit-queue

Rename PetSet to StatefulSet in docs and examples.

**What this PR does / why we need it**: Addresses some of the pre-code-freeze changes for implementing the PetSet --> StatefulSet rename. (#35534)

**Special notes for your reviewer**: This PR only changes docs and examples, as #35731 hasn't been merged yet and I don't want to create merge conflicts. I'll open another PR for any remaining code changes needed after that PR is merged. /cc @erictune @janetkuo @chrislovecnm
2016-11-06 13:30:21 -08:00
Kubernetes Submit Queue
182a09c3c7 Merge pull request #35526 from justinsb/fix_35521_b
Automatic merge from submit-queue

kubelet bootstrap: start hostNetwork pods before we have PodCIDR

Network readiness was checked in the pod admission phase, but pods that
fail admission are not retried.  Move the check to the pod start phase.

Issue #35409 
Issue #35521
2016-11-06 12:53:14 -08:00
Zihong Zheng
168f6f7ecd Expose addon manager's log by logging in file 2016-11-06 12:18:18 -08:00
Kubernetes Submit Queue
b7512d9c8b Merge pull request #36240 from wojtek-t/quota_bytes_backend
Automatic merge from submit-queue

Increase quota-bytes for etcd in v3 mode

Ref #20504
2016-11-06 09:45:59 -08:00
Kubernetes Submit Queue
eeb5ef2705 Merge pull request #36226 from piosz/fluent-manifest
Automatic merge from submit-queue

Made fluentd-gcl config consitent for GCI and ContainerVM
2016-11-06 07:56:40 -08:00
Kubernetes Submit Queue
48ef0faa0e Merge pull request #35216 from edsiper/fluentd-jemalloc
Automatic merge from submit-queue

cluster-addons: enable Jemalloc for Fluentd based images

**What this PR does / why we need it**:

This Pull Request includes two patches that enable the recommended use of Jemalloc memory allocator for container images that are based in Fluentd. The patches applies to the following cluster-addons:
- fluentd-es-image
- fluentd-gcp-image

**Which issue this PR fixes** 

This PR is part of the solution for issues:
-  kubernetes/kubernetes/issues/32762
-  GoogleCloudPlatform/fluent-plugin-google-cloud/issues/87

When Fluentd runs in high load environments, it's likely the default operating system memory allocator will generate a high fragmentation ending up in a high memory usage. In order to reduce fragmentation and decrease memory usage an alternative memory allocator as Jemalloc is used. 

![](https://cloud.githubusercontent.com/assets/369718/19498577/eaa9f324-954e-11e6-9a6b-6b30310a66a3.png)

For the record: fluentd-es-image uses [td-agent](https://docs.treasuredata.com/articles/td-agent) Fluentd package maintained by Treasure Data, which contains Jemalloc 4.2.1 (latest stable version). The google-fluentd package used in fluentd-gcp-image comes with Jemalloc 2.2.5, which have many known issues, I strongly suggest google-fluentd package gets updated.

**Special notes for your reviewer**:

In the research of this topic have been involved @piosz and @Crassirostris.
2016-11-06 05:26:58 -08:00
Kubernetes Submit Queue
ff8e780c30 Merge pull request #36244 from Crassirostris/export-rescheduler-logs
Automatic merge from submit-queue

Add rescheduler.log to the logs exported from master

Related to https://github.com/kubernetes/kubernetes/issues/36227

@piosz
2016-11-06 03:38:35 -08:00
Kubernetes Submit Queue
afa99c68b8 Merge pull request #35144 from pipejakob/generate-token
Automatic merge from submit-queue

New command: "kubeadm token generate"

As part of #33930, this PR adds a new top-level command to kubeadm to just generate a token for use with the init/join commands. Otherwise, users are left to either figure out how to generate a token on their own, or let `kubeadm init` generate a token, capture and parse the output, and then use that token for `kubeadm join`.

At this point, I was hoping for feedback on the CLI experience, and then I can add tests. I spoke with @mikedanese and he didn't like the original propose of `kubeadm util generate-token`, so here are the runners up:

```
$ kubeadm generate-token          # <--- current implementation
$ kubeadm generate token          # in case kubeadm might generate other things in the future?
$ kubeadm init --generate-token   # possibly as a subcommand of an existing one
```

Currently, the output is simply the token on one line without any padding/formatting:

```
$ kubeadm generate-token
1087fd.722b60cdd39b1a5f
```

CC: @kubernetes/sig-cluster-lifecycle 

**Release note**:

<!--  Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access) 
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`. 
-->

``` release-note
New kubeadm command: generate-token
```
2016-11-05 16:12:52 -07:00
Jimmy Cuadra
d42eabd9d2 Rename PetSet to StatefulSet in docs and examples. 2016-11-05 00:17:28 -07:00
Jeff Grafton
7436b315c4 Use curl -f in cluster/get-kube.sh 2016-11-04 11:48:15 -07:00
Mik Vyatskov
62f0a171d1 Add rescheduler.log to the logs exported from master 2016-11-04 17:43:49 +01:00
Wojciech Tyczynski
ca99cbca02 Increase quota-bytes for etcd in v3 mode 2016-11-04 17:00:54 +01:00
Kubernetes Submit Queue
8363c55f9b Merge pull request #36228 from wojtek-t/storage_backend_changes
Automatic merge from submit-queue

Prepare for easy change to etcd3 storage backend

Ref #20504
2016-11-04 08:53:56 -07:00
Kubernetes Submit Queue
921245c828 Merge pull request #35081 from ixdy/cluster-gce-red-herrings
Automatic merge from submit-queue

Remove several red herring error messages in GCE cluster scripts

This fixes things like

```
I1018 15:57:53.524] Bringing down cluster
W1018 15:57:53.524] NODE_NAMES=
W1018 15:57:55.995] ERROR: (gcloud.compute.ssh) could not parse resource: []
W1018 15:57:56.392] ERROR: (gcloud.compute.ssh) could not parse resource: []
```

and

```
I1018 16:32:34.947] property "clusters.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0" unset.
I1018 16:32:35.079] property "users.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0" unset.
I1018 16:32:35.195] property "users.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0-basic-auth" unset.
I1018 16:32:35.307] property "contexts.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0" unset.
W1018 16:32:35.420] failed to get client config: Error in configuration: context was not found for specified context: kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0
```

It seems like the `kubectl` behavior was introduced in #29236: if `current-context` is set to something invalid, it now complains.
2016-11-04 07:04:04 -07:00
Wojciech Tyczynski
3ca1f06149 Prepare for easy change to etcd3 storage backend 2016-11-04 13:46:01 +01:00