Register the kubelet on the master node with an apiserver. This option is

separated from the apiserver running locally on the master node so that it
can be optionally enabled or disabled as needed.

Also, fix the healthchecking configuration for the master components, which
was previously only working by coincidence:

If a kubelet doesn't register with a master, it never bothers to figure out
what its local address is. In which case it ends up constructing a URL like
http://:8080/healthz for the http probe. This happens to work on the master
because all of the pods are using host networking and explicitly binding to
127.0.0.1. Once the kubelet is registered with the master and it determines
the local node address, it tries to healthcheck on an address where the pod
isn't listening and the kubelet periodically restarts each master component
when the liveness probe fails.
This commit is contained in:
Robert Bailey
2015-08-04 11:14:46 -07:00
parent 1407aee8b0
commit 8df33bc1a7
17 changed files with 140 additions and 61 deletions

View File

@@ -27,6 +27,10 @@ source "${KUBE_ROOT}/cluster/kube-util.sh"
MINIONS_FILE=/tmp/minions-$$
trap 'rm -rf "${MINIONS_FILE}"' EXIT
EXPECTED_NUM_NODES="${NUM_MINIONS}"
if [[ "${REGISTER_MASTER_KUBELET:-}" == "true" ]]; then
EXPECTED_NUM_NODES=$((EXPECTED_NUM_NODES+1))
fi
# Make several attempts to deal with slow cluster birth.
attempt=0
while true; do
@@ -38,20 +42,24 @@ while true; do
# Echo the output, strip the first line, then gather 2 counts:
# - Total number of nodes.
# - Number of "ready" nodes.
"${KUBE_ROOT}/cluster/kubectl.sh" get nodes > "${MINIONS_FILE}" || true
#
# Suppress errors from kubectl output because during cluster bootstrapping
# for clusters where the master node is registered, the apiserver will become
# available and then get restarted as the kubelet configures the docker bridge.
"${KUBE_ROOT}/cluster/kubectl.sh" get nodes > "${MINIONS_FILE}" 2> /dev/null || true
found=$(cat "${MINIONS_FILE}" | sed '1d' | grep -c .) || true
ready=$(cat "${MINIONS_FILE}" | sed '1d' | awk '{print $NF}' | grep -c '^Ready') || true
if (( ${found} == "${NUM_MINIONS}" )) && (( ${ready} == "${NUM_MINIONS}")); then
if (( "${found}" == "${EXPECTED_NUM_NODES}" )) && (( "${ready}" == "${EXPECTED_NUM_NODES}")); then
break
else
# Set the timeout to ~10minutes (40 x 15 second) to avoid timeouts for 100-node clusters.
if (( attempt > 40 )); then
echo -e "${color_red}Detected ${ready} ready nodes, found ${found} nodes out of expected ${NUM_MINIONS}. Your cluster may not be working.${color_norm}"
echo -e "${color_red}Detected ${ready} ready nodes, found ${found} nodes out of expected ${EXPECTED_NUM_NODES}. Your cluster may not be working.${color_norm}"
cat -n "${MINIONS_FILE}"
exit 2
else
echo -e "${color_yellow}Waiting for ${NUM_MINIONS} ready nodes. ${ready} ready nodes, ${found} registered. Retrying.${color_norm}"
echo -e "${color_yellow}Waiting for ${EXPECTED_NUM_NODES} ready nodes. ${ready} ready nodes, ${found} registered. Retrying.${color_norm}"
fi
attempt=$((attempt+1))
sleep 15