Commit Graph

698 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
930b3939f1
Merge pull request #64294 from vishh/shutdown-script
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Adding a shutdown script that would enable handling preemptible VM terminations gracefully in GCP environment

This PR adds a shutdown script to COS nodes in GCP k8s clusters that will make preemptible nodes sleep for however long they can between the time they receive an ACPI shutdown request and get's terminated.
https://cloud.google.com/compute/docs/instances/preemptible#preemption_process

This will then allow for catching termination signals via GCE metadata APIs and gracefully evict pods in k8s.

xref https://github.com/kubernetes/release/pull/560/
2018-05-25 22:33:33 -07:00
Vishnu kannan
9475292cd8 Adding a shutdown script that would enable handling preemptible VM terminations gracefully in GCP environment
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2018-05-25 16:20:24 -07:00
Kubernetes Submit Queue
972a74e238
Merge pull request #63755 from tomoe/dumpstack-docker
Automatic merge from submit-queue (batch tested with PRs 63434, 64172, 63975, 64180, 63755). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Dump Stack when docker fails on healthcheck

Save stack dump of docker daemon in order to be able to
investigate why docker daemon was unresposive to `docker ps`

See https://github.com/moby/moby/blob/master/daemon/daemon.go on
how docker sets up a trap for SIGUSR1 with `setupDumpStackTrap()`

**What this PR does / why we need it**:

This allows us to investigate why docker daemon was unresponsive to "docker ps" command. 

**Special notes for your reviewer**:
Manually tested on Ubuntu and COS.

**Release note**:

```release-note
NONE
```
2018-05-24 12:18:25 -07:00
CJ Cullen
b3a31b28af re-reorder authorizers (RBAC before Webhook). 2018-05-22 16:48:39 -07:00
Tomoe Sugihara
da23396e22 Dump Stack when docker fails on healthcheck
Send SIGUSR1 to dockerd to save stack dump of docker daemon
in order to be able to investigate why docker daemon was
unresposive to health check done by `docker ps`.

See https://github.com/moby/moby/blob/master/daemon/daemon.go on
how docker sets up a trap for SIGUSR1 with `setupDumpStackTrap()`
2018-05-21 11:39:59 +09:00
Kubernetes Submit Queue
0d815fbc27
Merge pull request #64029 from loburm/truncate-flag
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add environment variable to control truncating backend.

```release-note
NONE
```
2018-05-19 05:17:00 -07:00
Kubernetes Submit Queue
bfca0d32a5
Merge pull request #63689 from awly/gce-fix-kubelet-ca-path
Automatic merge from submit-queue (batch tested with PRs 63969, 63902, 63689, 63973, 63978). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Reuse existing CA cert path for kubelet certs

**What this PR does / why we need it**: configure-helper.sh already knows the path to CA cert, re-use that to avoid typos.

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
2018-05-18 15:59:19 -07:00
Kubernetes Submit Queue
f105ae3e6d
Merge pull request #63918 from cezarygerard/sd-event-exporter
Automatic merge from submit-queue (batch tested with PRs 63569, 63918, 63980, 63295, 63989). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

New event exporter config with support for new stackdriver resources

New event exporter, with support for use new and old stackdriver resource model.

This should also be cherry-picked to release-1.10 branch, as all  fluentd-gcp components support new and stackdriver resource model.

```release-note
Update event-exporter to version v0.2.0  that supports old (gke_container/gce_instance) and new (k8s_container/k8s_node/k8s_pod) stackdriver resources.
```
2018-05-18 09:54:16 -07:00
Marian Lobur
c1d0004013 Add environment variable to control truncating backend. 2018-05-18 15:52:47 +02:00
Cezary Zawadka
d611aeac80 new event exporter config with support for new stackdriver resource types 2018-05-18 10:37:47 +02:00
Maciej Borsz
128d6d3498 Add a way to pass extra arguments to etcd. 2018-05-17 10:48:13 +02:00
Kubernetes Submit Queue
e392f5b08b
Merge pull request #63696 from grosskur/gce-advertise-addr
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

gce: Prefer MASTER_ADVERTISE_ADDRESS in apiserver setup

MASTER_ADVERTISE_ADDRESS is used to set the --advertise-address flag
for the apiserver. It's useful for running the apiserver behind a load
balancer.

However, if PROJECT_ID, TOKEN_URL, TOKEN_BODY, and NODE_NETWORK are
all set, the GCE VM's external IP address will be fetched and used
instead and MASTER_ADVERTISE_ADDRESS will be ignored.

Change this behavior so that MASTER_ADVERTISE_ADDRESS takes precedence
because it's more specific. We still fall back to using the VM's
external IP address if the other variables are set.

Also: Move the setting of --ssh-user and --ssh-keyfile based on
PROXY_SSH_USER) to a top-level block because this is common to all
codepaths.

```release-note
NONE
```
2018-05-15 23:25:22 -07:00
Kubernetes Submit Queue
7b8bb6e7d3
Merge pull request #63357 from Random-Liu/install-and-use-crictl
Automatic merge from submit-queue (batch tested with PRs 63167, 63357). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Install and use crictl in gce kube-up.sh

Download and use crictl in gce kube-up.sh.

This PR:
1. Downloads crictl `v1.0.0-beta.0` onto the node, which supports CRI v1alpha2. We'll upgrade it to `v1.0.0-beta.1` soon after the release is cut.
2. Change `kube-docker-monitor` to `kube-container-runtime-monitor`, and let it use `crictl` to do health monitoring.
3. Change `e2e-image-puller` to use `crictl`. Because of https://github.com/kubernetes/kubernetes/issues/63355, it doesn't work now. But in `crictl v1.0.0-beta.1`, we are going to statically link it, and the `e2e-image-puller` should work again.
4. Use `systemctl kill --kill-who=main` instead of `pkill`, the reason is that:
  a. `pkill docker` will send `SIGTERM` to all processes including `dockerd`, `docker-containerd`, `docker-containerd-shim`. This is not a problem for Docker 17.03 CE, because `containerd-shim` in containerd 0.2.x doesn't exit with SIGERM (see [code](https://github.com/containerd/containerd/blob/v0.2.x/containerd-shim/main.go#L123)). However, `containerd-shim` in containerd 1.0+ does exit with SIGTERM (see [code](https://github.com/containerd/containerd/blob/master/cmd/containerd-shim/main_unix.go#L200)). This means that `pkill docker` and `pkill containerd` will kill all shim processes for Docker 17.11+ and containerd 1.0+.
  b. We can use `pkill -x` instead. However, docker systemd service name is `docker`, but daemon process name is `dockerd`. We have to introduce another environment variable to specify "daemon process name". Given so, it seems easier to just use `systemctl kill` which only requires systemd service name. `systemctl kill --kill-who=main` will make sure only main process receives SIGTERM.

Signed-off-by: Lantao Liu <lantaol@google.com>

/cc @filbranden @yujuhong @feiskyer @mrunalp @kubernetes/sig-node-pr-reviews @kubernetes/sig-cluster-lifecycle-pr-reviews 

**Release note**:

```release-note
Kubernetes cluster on GCE have crictl installed now. Users can use it to help debug their node. The documentation of crictl can be found https://github.com/kubernetes-incubator/cri-tools/blob/master/docs/crictl.md.
```
2018-05-15 21:18:12 -07:00
Alan Grosskurth
3541a93f92 gce: Prefer MASTER_ADVERTISE_ADDRESS in apiserver setup
MASTER_ADVERTISE_ADDRESS is used to set the --advertise-address flag
for the apiserver. It's useful for running the apiserver behind a load
balancer.

However, if PROJECT_ID, TOKEN_URL, TOKEN_BODY, and NODE_NETWORK are
all set, the GCE VM's external IP address will be fetched and used
instead and MASTER_ADVERTISE_ADDRESS will be ignored.

Change this behavior so that MASTER_ADVERTISE_ADDRESS takes precedence
because it's more specific. We still fall back to using the VM's
external IP address if the other variables are set.

Also: Pass --ssh-user and --ssh-keyfile flags if both PROXY_SSH_USER
and MASTER_ADVERTISE_ADDRESS is set.
2018-05-15 17:00:51 -07:00
Lantao Liu
f952b093a7 Still use docker ps for docker health monitoring.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-05-15 00:42:25 -07:00
Bowei Du
2e7807a249 Enable CUSTOM_INGRESS_YAML to replace the glbc manifest
This allows for customized versions of the Ingress YAML separate from
stock Kubernetes.
2018-05-14 23:24:55 -07:00
Kubernetes Submit Queue
b617748f7b
Merge pull request #62905 from serathius/event-exporter-region
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[fluentd-gcp addon] Pass region in seperate field

This PR makes location passed to event-exporter based on `MULTIZONE` env.

Fixes https://github.com/kubernetes/kubernetes/issues/62399
```release-note
NONE
```
/cc @loburm
2018-05-11 06:00:44 -07:00
Marek Siarkowicz
f351b00a99 [fluentd-gcp addon] Pass region in seperate field 2018-05-11 09:50:07 +02:00
Andrew Lytvynov
1c94d0bd64 Reuse existing CA cert path for kubelet certs 2018-05-10 14:02:06 -07:00
Kubernetes Submit Queue
a743392937
Merge pull request #63353 from bmoyles0117/fix-stackdriver-metadata-agent-url-for-fluentd
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use the logging agent's node name as the metadata agent URL.

The Stackdriver Logging agent should use the node's hostname when it constructs the Stackdriver Metadata Agent's URL, currently, it's using the GKE Master's hostname, which is a bug.

**Release note:**
```release-note
[fluentd-gcp addon] Use the logging agent's node name as the metadata agent URL.
```
2018-05-08 16:20:43 -07:00
Kubernetes Submit Queue
940e716c06
Merge pull request #63323 from awly/gce-kubelet-ca
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

gce: plumb --kubelet-certificate-authority flag to apiserver

**What this PR does / why we need it**:
We want to start signing kubelets' serving certs with cluster CA. This
flag is required to enforce that on apiserver side.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
2018-05-07 21:03:43 -07:00
Kubernetes Submit Queue
c59393e9fd
Merge pull request #63266 from awly/exec-plugin-kubeconfig
Automatic merge from submit-queue (batch tested with PRs 63340, 63266). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

gcp: allow non-bootstrap kubeconfig

**What this PR does / why we need it**:
Needed for https://github.com/kubernetes/community/pull/2022
This change lets us generate a non-bootstrap kubeconfig with exec plugin for authn.
The plugin does TLS bootstrapping internally.

**Special notes for your reviewer**:
Defaults when no new env vars are set will behave same as before this change.
`KUBELET_AUTH_TYPE` should never be `tls-auth` in practice, but leaving it there just in case.

**Release note**:
```release-note
NONE
```
2018-05-07 15:16:14 -07:00
Lantao Liu
d94a2b39d9 Install and use crictl in gce kube-up.sh
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-05-03 17:17:55 -07:00
Andrew Lytvynov
77c13d6dc7 Allow fetching bootstrap-kubeconfig from VM metadata 2018-05-03 11:32:18 -07:00
Kubernetes Submit Queue
b5f61ac129
Merge pull request #62657 from matthyx/master
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update all script shebangs to use /usr/bin/env interpreter instead of /bin/interpreter

This is required to support systems where bash doesn't reside in /bin (such as NixOS, or the *BSD family) and allow users to specify a different interpreter version through $PATH manipulation.
https://www.cyberciti.biz/tips/finding-bash-perl-python-portably-using-env.html
```release-note
Use /usr/bin/env in all script shebangs to increase portability.
```
2018-05-02 19:44:32 -07:00
Bryan Moyles
a0a7686e38 Use the logging agent's node name as the metadata agent URL. 2018-05-02 10:12:35 +02:00
Andrew Lytvynov
0a567f0990 gcp: allow non-bootstrap kubeconfig
The regular kubeconfig is fetched from metadata when
CREATE_BOOTSTRAP_KUBECONFIG==false.

We will experiment with an exec plugin that does TLS bootstrapping
internally: #61803
2018-05-01 10:40:32 -07:00
Kubernetes Submit Queue
dd1d5c74f2
Merge pull request #63152 from mikedanese/break
Automatic merge from submit-queue (batch tested with PRs 63152, 63253). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert "Revert "gce: move etcd dir cleanup to manifests""

This reverts commit 2d6b4d0fa0.

```release-note
NONE
```
2018-05-01 07:36:09 -07:00
Andrew Lytvynov
e86bdf5801 gce: plumb --kubelet-certificate-authority flag to apiserver
We want to start signing kubelets' serving certs with cluster CA. This
flag is required to enforce that on apiserver side.
2018-04-30 15:16:22 -07:00
Kubernetes Submit Queue
ded95bc9f1
Merge pull request #62863 from awly/kube-controller-manager-disable-controllers
Automatic merge from submit-queue (batch tested with PRs 62718, 62863). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

gcp: add env var to configure enabled controllers in controller-manager

```release-note
NONE
```
2018-04-27 20:16:05 -07:00
Mike Danese
6817494424 Revert "Revert "gce: move etcd dir cleanup to manifests""
This reverts commit 2d6b4d0fa0.
2018-04-25 08:57:02 -07:00
Shyam Jeedigunta
2d6b4d0fa0 Revert "gce: move etcd dir cleanup to manifests"
This reverts commit ae73bed1d0.
2018-04-25 12:54:12 +02:00
Kubernetes Submit Queue
5b0df3656e
Merge pull request #63000 from kawych/versions
Automatic merge from submit-queue (batch tested with PRs 62590, 62818, 63015, 62922, 63000). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove METADATA_AGENT_VERSION config option

**What this PR does / why we need it**:
Remove METADATA_AGENT_VERSION configuration option. To keep Metadata Agent version consistent across Kubernetes deployments.

**Release note**:
```release-note
Remove METADATA_AGENT_VERSION configuration option.
```
2018-04-24 14:22:23 -07:00
Mike Danese
ae73bed1d0 gce: move etcd dir cleanup to manifests
we deploy it as a manifest, not an addon so locate it with the other
master manifests.
2018-04-24 08:02:32 -07:00
Kubernetes Submit Queue
eea406c108
Merge pull request #62669 from immutableT/deploy_helper_test
Automatic merge from submit-queue (batch tested with PRs 63007, 62919, 62669, 62860). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add unit test for configure-helper.sh.

**What this PR does / why we need it**:
Add a framework for unit-testing configure-helper.sh.
configure-helper.sh plays a critical role in initializing clusters both on GCE and GKE. It is currently, over 2K lines of code, yet it has no unit test coverage.
This PR proposes a framework/approach on how to provide test coverage for this component.
Notes: 
1. Changes to configure-helper.sh itself were necessary to enable sourcing of this script for the purposes of testing.
2. As POC api_manifest_test.go covers the logic related to the initialization of apiserver when integration with KMS was requested. The hope is that the same approach could be extended to the rest of the script.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-04-23 15:45:17 -07:00
immutablet
dc78d72f04 Add unit test for configure-helper. 2018-04-23 12:18:57 -07:00
Karol Wychowaniec
6fb42aea4a Remove METADATA_AGENT_VERSION config option 2018-04-23 12:15:48 +02:00
Kubernetes Submit Queue
77f5324223
Merge pull request #62409 from rajansandeep/corednsscaler
Automatic merge from submit-queue (batch tested with PRs 62409, 62856). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

DNS-Autoscaler support for CoreDNS

**What this PR does / why we need it**:
This PR provides the dns-horizontal autoscaler for CoreDNS in kube-up, enabling the tests to pass once CoreDNS is the default. 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #61176 

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-04-23 01:35:07 -07:00
Andrew Lytvynov
2666d73336 gcp: add env var to configure enabled controllers in controller-manager 2018-04-19 10:15:17 -07:00
Matthias Bertschy
9b15af19b2 Update all script to use /usr/bin/env bash in shebang 2018-04-19 13:20:13 +02:00
Kubernetes Submit Queue
bb8f58b6e6
Merge pull request #62195 from serathius/prometheus
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add prometheus cluster monitoring addon.

This PR adds new cluster monitoring addon based on prometheus.
It adds prometheus deployment with e2e tests.
Additional components will be added iterativly in future.
Manifests based on current Helm chart.
At current state it's not intended for production use.

cc @piosz @kawych @miekg
```release-note
Add prometheus cluster monitoring addon to kube-up
```
/sig instrumentation
/kind feature
/priority important-soon
2018-04-18 02:17:48 -07:00
Lantao Liu
0ee734d49e Fix NPD preload. 2018-04-17 18:43:47 +00:00
wojtekt
1bcdfdbe00 Increase max requests inflight limits in gce for very large clusters 2018-04-16 20:46:41 +02:00
Michael Taufen
420edc7b50 provision Kubelet config file for GCE
This PR extends the client-side startup scripts to provision a Kubelet
config file instead of legacy flags. This PR also extends the
master/node init scripts to install this config file from the GCE
metadata server, and provide the --config argument to the Kubelet.
2018-04-13 13:08:38 -07:00
Marek Siarkowicz
113987e0db Add prometheus addon 2018-04-13 11:12:08 +02:00
Kubernetes Submit Queue
1d905bbdfc
Merge pull request #61862 from immutableT/kms-plugin-deploy-cherry-pick
Automatic merge from submit-queue (batch tested with PRs 59636, 62429, 61862). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Inject CloudKMS Plugin container into Kube-APIServer pod.

**What this PR does / why we need it**:
Inject CloudKMS Plugin container into Kube-APIServer pod when etcd level encryption via CloudKMS Plugin is requested.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE

```
2018-04-12 02:02:24 -07:00
Sandeep Rajan
8d5b9d3c36 autoscaler support for CoreDNS 2018-04-11 11:54:23 -04:00
immutablet
cbc428395c Enable CloudKMS Plugin deployment. 2018-04-10 09:47:32 -07:00
Filipe Brandenburger
af3dff7cc8 Fix umask to actually intended behavior.
Fixes #52999.
2018-04-09 16:30:38 -07:00
Kubernetes Submit Queue
4009cb3b8b
Merge pull request #62076 from qingling128/master
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add support to ingest log entries to Stackdriver against new "k8s_container" and "k8s_node" resources.

**What this PR does / why we need it**:

**Which issue(s) this PR fixes**
Fluentd 0.14 has some memory leak issues that caused the e2e tests to be flaky. Downgrading to v0.12.

**Special notes for your reviewer**:
We never released any previous version with Fluentd v0.14. Only upgraded it very recently. So this downgrading is not visible to users.

**Release note**:
```release-note
Add support to ingest log entries to Stackdriver against new "k8s_container" and "k8s_node" resources.
```
2018-04-06 09:51:32 -07:00