Kubernetes By Component - Part 3

K8s Scheduler


This is part 3 in a series of posts related to Kubernetes. Pervious posts were about kubelet and Kubernetes API, etcd, and kubectl. Like those posts, this one was is inspired by another 2015 post by Kamal Marhubi and is intended to update and expand on the topics with my own thoughts and learnings. Our focus will be exploring kube-scheduler.

Our Goal

In the previous post, we got a sense for how Kubernetes API and kubectl make life easier for working with a Kubernetes cluster. Each component of a Kubernetes cluster is super helpful but together can feel a bit like magic if you don’t see the role they play. Knowing that role, you will be in a better position to reason about problems you are seeing in your cluster.

As you will remember, kubelet runs your pods on a node via Docker. In the last article, we told the Kubernetes API, via kubectl to run a specific pod on a specific node. This time we will let kube-scheduler decide which node to run the pod on.

Introducing Kubernetes Scheduler

The Kubernetes Scheduler makes the decision about which pods are allocated to which nodes. It doesn’t do this blindly – it knows the topology of the cluster, it knows what each pod requires to run, and it knows what other pods are running on what other nodes. You could do all this by hand but it requires a lot of awareness and best practices that are baked into how kube-scheduler works saving you effort.

Getting Set Up

As we did in the previous parts of this series, we need to provision our work environment. Since we have done this a couple times already we are going to breeze through it.

As we did in part 1 of this series, we will use a Vagrant box running Ubuntu to house our work. This provides a consistent platform and also contains everything we do nicely.

Download and install Vagrant if you have not already. You may also need to install Virtual Box if you do not have it already.

From there, open a terminal and provision your Vagrant box with the following commands:

1
2
3
4
5
6
$ vagrant init ubuntu/artful64
$ vagrant up
$ vagrant ssh
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       2.2G  902M  1.3G  42% /

If you see the size is ~2.2G instead of ~9.7G then we have a problem. As I mentioned in the previous post, I found that this is a known issue with the ubuntu/artful64 Vagrant box that we are using. While it will be nice when the underlying bug is addressed (it wasn’t as of 2018-03-11), we can get around the problem for now with the following:

1
2
3
4
5
6
7
8
9
$ sudo resize2fs /dev/sda1
resize2fs 1.43.5 (04-Aug-2017)
Filesystem at /dev/sda1 is mounted on /; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 2
The filesystem on /dev/sda1 is now 2621179 (4k) blocks long.

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       9.7G  905M  8.8G  10% /

Now that we know we have enough space in this VM, we can continue setting up.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ sudo apt-get update
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88
pub   rsa4096 2017-02-22 [SCEA]
      9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
uid           [ unknown] Docker Release (CE deb) <docker@docker.com>
sub   rsa4096 2017-02-22 [S]

$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
$ sudo apt-get update
$ sudo apt-get install docker-ce
$ docker --version
Docker version 17.12.1-ce, build 7390fc6
$ sudo docker run hello-world

Docker should have said, “Hi” and you should be ready for the next steps.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ wget -q --show-progress --https-only --timestamping \
https://storage.googleapis.com/kubernetes-release/release/v1.9.2/bin/linux/amd64/kubelet
$ chmod +x kubelet
$ mkdir etcd-data
$ sudo docker run --volume=$PWD/etcd-data:/default.etcd \
--detach --net=host quay.io/coreos/etcd > etcd-container-id
$ wget https://storage.googleapis.com/kubernetes-release/release/v1.9.2/bin/linux/amd64/kube-apiserver
$ chmod +x kube-apiserver
$ wget https://storage.googleapis.com/kubernetes-release/release/v1.9.2/bin/linux/amd64/kubectl
$ chmod +x kubectl

OK, at this point we have everything we need to fire up our one node cluster. This is a significant departure – and simplification – from Kamal’s post. He had three VM’s: master, node1, and node2. For the purposes of this article, I’m hoping that we can show kube-scheduler doing its thing without needing to spin up additional VMs since I’m using a Vagrant box and I haven’t (yet) spent much time learning how to network multiple Vagrant boxes together.

Let’s keep this current terminal session open for interacting with Kubernetes and use new terminal sessions for running our various services.

Open a new terminal and fire up the Kubernetes API server.

1
2
3
4
$ vagrant ssh
$ sudo ./kube-apiserver \
--etcd-servers=http://127.0.0.1:2379 \
--service-cluster-ip-range=10.0.0.0/16

One another new terminal and fire up kubelet.

1
2
3
$ vagrant ssh
$ wget https://raw.githubusercontent.com/joshuasheppard/k8s-by-component/master/part3/kubeconfig
$ sudo ./kubelet --kubeconfig=$PWD/kubeconfig

Back in our first terminal, let’s take a look at what we have running so far using kubectl.

1
2
3
4
5
$ ./kubectl get nodes
NAME            STATUS    ROLES     AGE       VERSION
ubuntu-artful   Ready     <none>    1m        v1.9.2
$ ./kubectl get pods
No resources found.

Deploying a Pod

Looking back again to part 2, we had to specify nodeName: ubuntu-artful (the name of our Vagrant VM) in order to get the pod to be deployed on our Kubernetes node. At the time I referenced that we didn’t have a kube-scheduler to handle that for us. Let’s give that a try again to confirm that deploying a pod definition without a nodeName will fail to be spun up on the node.

1
2
3
4
5
6
7
8
$ wget https://raw.githubusercontent.com/joshuasheppard/k8s-by-component/master/part3/nginx.yaml
$ grep -i 'name' nginx.yaml
  name: nginx
  - name: nginx
      name: nginx-logs
  - name: log-truncator
      name: nginx-logs
  - name: nginx-logs

As you can see the pod definition does not include a nodeName element. Now lets ask kubectl to deploy it for us.

1
2
3
4
5
6
7
8
$ ./kubectl create --filename nginx.yaml
pod "nginx" created
$ ./kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
nginx     0/2       Pending   0          28s
$ ./kubectl describe pods/nginx | grep ^Node
Node:         <none>
Node-Selectors:  <none>

As we expected, our nginx pod is staying in Pending and it is not assigned to a node. This will remain like this until we manual assign a node – like we did in the previous post – or until we add the kube-scheduler controller to the cluster.

Deploying the Scheduler

At this point we have no less than three terminal sessions going on our Vagrant VM.

  • One for sudo ./kube-apiserver ...
  • Another for sudo ./kubelet ...
  • and one for our kubectl commands

To get kube-scheduler running, we will open yet another terminal:

1
2
3
4
$ vagrant ssh
$ wget https://storage.googleapis.com/kubernetes-release/release/v1.9.2/bin/linux/amd64/kube-scheduler
$ chmod +x kube-scheduler
$ sudo ./kube-scheduler --kubeconfig=$PWD/kubeconfig

Now back in our terminal for kubectl commands, do the following:

1
2
3
4
5
6
$ ./kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
nginx     2/2       Running   0          13m
$ ./kubectl describe pods/nginx | grep ^Node
Node:         ubuntu-artful/10.0.2.15
Node-Selectors:  <none>

As you can see by the time we got over there - assuming you weren’t really quick about it – the kube-scheduler has already assigned our pod to our (only) node and kubelet has spun it up.

If we wanted to know more we can look for events that are related to the pod.

1
2
3
4
5
6
7
$ ./kubectl describe pods/nginx | grep -A5 ^Events
Events:
  Type     Reason                 Age               From                    Message
  ----     ------                 ----              ----                    -------
  Normal   Scheduled              4m                default-scheduler       Successfully assigned nginx to ubuntu-artful
  Normal   SuccessfulMountVolume  4m                kubelet, ubuntu-artful  MountVolume.SetUp succeeded for volume "nginx-logs"
  Normal   Pulling                4m                kubelet, ubuntu-artful  pulling image "nginx"

We see that kube-scheduler assigned the pod to ubuntu-artful, then kubelet on that node created the log volume and pulled the nginx image.

Let’s add another pod just to see kube-scheduler pick it up and handle it immediately.

1
2
3
4
5
6
7
$ sed 's/^  name:.*/  name: nginx-the-second/' nginx.yaml > nginx2.yaml
$ ./kubectl create --filename nginx2.yaml
pod "nginx-the-second" created
$ ./kubectl get pods
NAME               READY     STATUS    RESTARTS   AGE
nginx              2/2       Running   0          23m
nginx-the-second   2/2       Running   0          20s

Wrapping Up

You could follow the cleanup steps from part 2, or you could leave it running as a starting point for the next post exploring kube-controller-manager.