Kubernetes 201 Production tooling

```

These slides have been built from commit: 1ed7554

[shared/title.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/shared/title.md)]
---

Kubernetes 201 Production tooling

.footnote[
**Be kind to the WiFi!** (Network Name: OReillyCon19, Password: oscon2019) 

*Don't use your hotspot.* 
*Don't stream videos or download big files during the workshop[.](https://www.youtube.com/watch?v=h16zyxiwDLY)* 
*Thank you!*

**Slides: https://container.training/**
]

.debug[[shared/title.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/shared/title.md)]
---
## Intros

- Hello! We are:

- .emoji[✨] Bridget ([@bridgetkromhout](https://twitter.com/bridgetkromhout))
   - .emoji[☁️] Aaron ([@as_w](https://twitter.com/as_w))

- .emoji[🌟] Joe ([@joelaha](https://twitter.com/joelaha))

- We encourage networking at #oscon

- Take a minute to introduce yourself to your neighbors

- What company or organization are you from? Where are you based?

- Share what you're hoping to learn in this session! .emoji[✨]

.debug[[logistics-bridget.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/logistics-bridget.md)]
---
## Logistics

- The tutorial will run from 1:30pm-5:00pm

- There will be a break from 3:10pm-3:40pm

- This means we start with 1hr 40min, then 30min break, then 1hr 20min.

- Feel free to interrupt for questions at any time

- *Especially when you see full screen container pictures!*

- Live feedback, questions, help: [Gitter](https://gitter.im/k8s-workshops/oscon2019)

.debug[[logistics-bridget.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/logistics-bridget.md)]
---
## A brief introduction

- This was initially written by [Jérôme Petazzoni](https://twitter.com/jpetazzo) to support in-person,
  instructor-led workshops and tutorials
  
- Credit is also due to [multiple contributors](https://github.com/jpetazzo/container.training/graphs/contributors) — thank you!

- You can also follow along on your own, at your own pace

- We included as much information as possible in these slides

- We recommend having a mentor to help you ...

- ... Or be comfortable spending some time reading the Kubernetes [documentation](https://kubernetes.io/docs/) ...

- ... And looking for answers on [StackOverflow](http://stackoverflow.com/questions/tagged/kubernetes) and other outlets

.debug[[k8s/intro.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/intro.md)]
---

## Hands on, you shall practice

- Nobody ever became a Jedi by spending their lives reading Wookiepedia

- Likewise, it will take more than merely *reading* these slides
  to make you an expert

- These slides include *tons* of exercises and examples

- They assume that you have access to a Kubernetes cluster

- If you are attending a workshop or tutorial:
 you will be given specific instructions to access your cluster

- If you are doing this on your own:
 the first chapter will give you various options to get your own cluster

.debug[[k8s/intro.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/intro.md)]
---
## About these slides

- All the content is available in a public GitHub repository:

https://github.com/jpetazzo/container.training

- You can get updated "builds" of the slides there:

http://container.training/

- Typos? Mistakes? Questions? Feel free to hover over the bottom of the slide ...

.footnote[.emoji[👇] Try it! The source file will be shown and you can view it on GitHub and fork and edit it.]

.debug[[shared/about-slides.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/shared/about-slides.md)]
---

## Extra details

- This slide has a little magnifying glass in the top left corner

- This magnifying glass indicates slides that provide extra details

- Feel free to skip them if:

- you are in a hurry

- you are new to this and want to avoid cognitive overload

- you want only the most essential information

- You can review these slides another time if you want, they'll be waiting for you ☺

.debug[[shared/about-slides.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/shared/about-slides.md)]
---

## Chapter 1

- [Pre-requirements](#toc-pre-requirements)

- [Controlling a Kubernetes cluster remotely](#toc-controlling-a-kubernetes-cluster-remotely)

- [Kubernetes architecture](#toc-kubernetes-architecture)

- [The Kubernetes API](#toc-the-kubernetes-api)

- [Other control plane components](#toc-other-control-plane-components)

## Chapter 2

- [Healthchecks](#toc-healthchecks)

- [Deploying a sample application](#toc-deploying-a-sample-application)

- [Authentication and authorization](#toc-authentication-and-authorization)

## Chapter 3

- [Resource Limits](#toc-resource-limits)

- [Defining min, max, and default resources](#toc-defining-min-max-and-default-resources)

- [Namespace quotas](#toc-namespace-quotas)

- [Limiting resources in practice](#toc-limiting-resources-in-practice)

- [Checking pod and node resource usage](#toc-checking-pod-and-node-resource-usage)

## Chapter 4

- [Cluster sizing](#toc-cluster-sizing)

- [The Horizontal Pod Autoscaler](#toc-the-horizontal-pod-autoscaler)

- [Extending the Kubernetes API](#toc-extending-the-kubernetes-api)

- [Managing stacks with Helm](#toc-managing-stacks-with-helm)

## Chapter 5

- [What's next?](#toc-whats-next)

- [Links and resources](#toc-links-and-resources)

- [Operators](#toc-operators)

.debug[[shared/toc.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/shared/toc.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/Container-Ship-Freighter-Navigation-Elbe-Romance-1782991.jpg)]

---

Pre-requirements

.nav[
[Previous section](#toc-)
|
[Back to table of contents](#toc-chapter-1)
|
[Next section](#toc-controlling-a-kubernetes-cluster-remotely)
]

---
# Pre-requirements

- Kubernetes concepts

(pods, deployments, services, labels, selectors)

- Hands-on experience working with containers

(building images, running them; doesn't matter how exactly)

- Familiar with the UNIX command-line

(navigating directories, editing files, using `kubectl`)

.debug[[k8s/prereqs-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/prereqs-k8s201.md)]
---

## Labs and exercises

- We are going to explore advanced k8s concepts

- Everyone will get their own private environment

- You are invited to reproduce all the demos (but you don't have to)

- All hands-on sections are clearly identified, like the gray rectangle below

- This is the stuff you're supposed to do!

- Go to https://container.training/ to view these slides

- Join the chat room: [Gitter](https://gitter.im/k8s-workshops/oscon2019)

]

.debug[[k8s/prereqs-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/prereqs-k8s201.md)]
---

## Private environments

- Each person gets their own Kubernetes cluster

- Each person should have a printed card with connection information

- We will connect to these clusters with `kubectl`

- If you don't have `kubectl` installed, we'll explain how to install it shortly

- You will also want to install [jq](https://stedolan.github.io/jq/)

.debug[[k8s/prereqs-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/prereqs-k8s201.md)]
---

## Doing or re-doing this on your own?

- We are using AKS with kubectl installed locally

- You could use any managed k8s

- You could also use any cloud VMs with Ubuntu LTS and Kubernetes [packages] or [binaries] installed

[packages]: https://kubernetes.io/docs/setup/independent/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl

[binaries]: https://kubernetes.io/docs/setup/release/notes/#server-binaries

.debug[[k8s/prereqs-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/prereqs-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/ShippingContainerSFBay.jpg)]

---

Controlling a Kubernetes cluster remotely

.nav[
[Previous section](#toc-pre-requirements)
|
[Back to table of contents](#toc-chapter-1)
|
[Next section](#toc-kubernetes-architecture)
]

---
# Controlling a Kubernetes cluster remotely

- `kubectl` can be used either on cluster instances or outside the cluster

- Since we're using [AKS](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough), we'll be running `kubectl` outside the cluster

- We can use Azure Cloud Shell

- Or we can use `kubectl` from our local machine

.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---

## Connecting to your AKS cluster via Azure Cloud Shell

- open portal.azure.com in a browser
- auth with the info on your card

- click `[>_]` in the top menu bar to open cloud shell

- get your cluster credentials:
  ```bash
  RESOURCE_GROUP=$(az group list | jq -r \
    '[.[].name|select(. | startswith("Group-"))][0]')
  AKS_NAME=$(az aks list -g $RESOURCE_GROUP | jq -r '.[0].name')
  az aks get-credentials -g $RESOURCE_GROUP -n $AKS_NAME
  ```

]

- If you're going to use Cloud Shell, you can skip ahead

.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---
class: extra-details

## Preserving the existing `~/.kube/config` (optional)

- If you already have a `~/.kube/config` file, rename it

(we are going to overwrite it in the following slides!)

- If you never used `kubectl` on your machine before: nothing to do!

- Make a copy of `~/.kube/config`; if you are using macOS or Linux, you can do:
  ```bash
  cp ~/.kube/config ~/.kube/config.before.training
  ```

- If you are using Windows, you will need to adapt this command

]

.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---

## Connecting to your AKS cluster via local tools

- install the [az CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest)

- log in to azure:
  ```bash
  az login
  ```

- get your cluster credentials (requires jq):
  ```bash
  RESOURCE_GROUP=$(az group list | jq -r \
    '[.[].name|select(. | startswith("Group-"))][0]')
  AKS_NAME=$(az aks list -g $RESOURCE_GROUP | jq -r '.[0].name')
  az aks get-credentials -g $RESOURCE_GROUP -n $AKS_NAME
  ```

- optionally, if you don't have kubectl:
  ```bash
  az aks install-cli
  ```

]

.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---

## Getting started with kubectl

- `kubectl` is officially available on Linux, macOS, Windows

(and unofficially anywhere we can build and run Go binaries)

- You may want to try Azure cloud shell if you are following along from:

- a tablet or phone

- a web-based terminal

- an environment where you can't install and run new binaries

.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---
class: extra-details

## Installing `kubectl`

- If you already have `kubectl` on your local machine, you can skip this

- Download the `kubectl` binary from one of these links:

[Linux](https://storage.googleapis.com/kubernetes-release/release/v1.15.0/bin/linux/amd64/kubectl)
  |
  [macOS](https://storage.googleapis.com/kubernetes-release/release/v1.15.0/bin/darwin/amd64/kubectl)
  |
  [Windows](https://storage.googleapis.com/kubernetes-release/release/v1.15.0/bin/windows/amd64/kubectl.exe)

- On Linux and macOS, make the binary executable with `chmod +x kubectl`

(And remember to run it with `./kubectl` or move it to your `$PATH`)

]

Note: if you are following along with a different platform (e.g. Linux on an architecture different from amd64, or with a phone or tablet), installing `kubectl` might be more complicated (or even impossible) so check with us about cloud shell.

.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---

## Testing `kubectl`

- Check that `kubectl` works correctly

(before even trying to connect to a remote cluster!)

- Ask `kubectl` to show its version number:
  ```bash
  kubectl version --client
  ```

]

The output should look like this:
```
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0",
GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean",
BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc",
Platform:"darwin/amd64"}
```
.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---

## Let's look at your cluster!

- Scan for the `server:` address that matches the `name` of your new cluster
  ```bash
  kubectl config view
  ```

- Store the API endpoint you find:
  ```bash 
  API_URL=$(kubectl config view -o json | jq -r ".clusters[]  \
            | select(.name == \"$AKS_NAME\") | .cluster.server")
  echo $API_URL
  ```
]

.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---

## What if we get a certificate error?

- Generally, the Kubernetes API uses a certificate that is valid for:

- `kubernetes`
  - `kubernetes.default`
  - `kubernetes.default.svc`
  - `kubernetes.default.svc.cluster.local`
  - the ClusterIP address of the `kubernetes` service
  - the hostname of the node hosting the control plane
  - the IP address of the node hosting the control plane

- On most clouds, the IP address of the node is an internal IP address

- ... And we are going to connect over the external IP address

- ... And that external IP address was not used when creating the certificate!

.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---

## Working around the certificate error

- We need to tell `kubectl` to skip TLS verification

(only do this with testing clusters, never in production!)

- The following command will do the trick:
  ```bash
  kubectl config set-cluster $AKS_NAME --insecure-skip-tls-verify
  ```

.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---

## Checking that we can connect to the cluster

- We can now run a couple of trivial commands to check that all is well

- Check the versions of the local client and remote server:
  ```bash
  kubectl version
  ```

It is okay if you have a newer client than what is available on the server.

- View the nodes of the cluster:
  ```bash
  kubectl get nodes
  ```

]

We can now utilize the cluster exactly as if we're logged into a node, except that it's remote.

.debug[[k8s/localkubeconfig-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/localkubeconfig-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/aerial-view-of-containers.jpg)]

---

Kubernetes architecture

.nav[
[Previous section](#toc-controlling-a-kubernetes-cluster-remotely)
|
[Back to table of contents](#toc-chapter-1)
|
[Next section](#toc-the-kubernetes-api)
]

---
# Kubernetes architecture

We can arbitrarily split Kubernetes in two parts:

- the *nodes*, a set of machines that run our containerized workloads;

- the *control plane*, a set of processes implementing the Kubernetes APIs.

Kubernetes also relies on underlying infrastructure:

- servers, network connectivity (obviously!),

- optional components like storage systems, load balancers ...

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## Control plane location

The control plane can run:

- in containers, on the same nodes that run other application workloads

(example: Minikube; 1 node runs everything)

- on a dedicated node

(example: a cluster installed with kubeadm)

- on a dedicated set of nodes

(example: Kubernetes The Hard Way; kops)

- outside of the cluster

(example: most managed clusters like AKS, EKS, GKE)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

![Kubernetes architecture diagram: control plane and nodes](images/k8s-arch2.png)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## What runs on a node

- Our containerized workloads

- A container engine like Docker, CRI-O, containerd...

(in theory, the choice doesn't matter, as the engine is abstracted by Kubernetes)

- kubelet: an agent connecting the node to the cluster

(it connects to the API server, registers the node, receives instructions)

- kube-proxy: a component used for internal cluster communication

(note that this is *not* an overlay network or a CNI plugin!)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## What's in the control plane

- Everything is stored in etcd

(it's the only stateful component)

- Everyone communicates exclusively through the API server:

- we (users) interact with the cluster through the API server

- the nodes register and get their instructions through the API server

- the other control plane components also register with the API server

- API server is the only component that reads/writes from/to etcd

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## Communication protocols: API server

- The API server exposes a REST API

(except for some calls, e.g. to attach interactively to a container)

- Almost all requests and responses are JSON following a strict format

- For performance, the requests and responses can also be done over protobuf

(see this [design proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/api-machinery/protobuf.md) for details)

- In practice, protobuf is used for all internal communication

(between control plane components, and with kubelet)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## Communication protocols: on the nodes

The kubelet agent uses a number of special-purpose protocols and interfaces, including:

- CRI (Container Runtime Interface)

- used for communication with the container engine
  - abstracts the differences between container engines
  - based on gRPC+protobuf

- [CNI (Container Network Interface)](https://github.com/containernetworking/cni/blob/master/SPEC.md)

- used for communication with network plugins
  - network plugins are implemented as executable programs invoked by kubelet
  - network plugins provide IPAM
  - network plugins set up network interfaces in pods

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

![Kubernetes architecture diagram: communication between components](images/k8s-arch4-thanks-luxas.png)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/blue-containers.jpg)]

---

The Kubernetes API

.nav[
[Previous section](#toc-kubernetes-architecture)
|
[Back to table of contents](#toc-chapter-1)
|
[Next section](#toc-other-control-plane-components)
]

---

# The Kubernetes API

[
*The Kubernetes API server is a "dumb server" which offers storage, versioning, validation, update, and watch semantics on API resources.*
](
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/api-machinery/protobuf.md#proposal-and-motivation
)

([Clayton Coleman](https://twitter.com/smarterclayton), Kubernetes Architect and Maintainer)

What does that mean?

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## The Kubernetes API is declarative

- We cannot tell the API, "run a pod"

- We can tell the API, "here is the definition for pod X"

- The API server will store that definition (in etcd)

- *Controllers* will then wake up and create a pod matching the definition

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## The core features of the Kubernetes API

- We can create, read, update, and delete objects

- We can also *watch* objects

(be notified when an object changes, or when an object of a given type is created)

- Objects are strongly typed

- Types are *validated* and *versioned*

- Storage and watch operations are provided by etcd

(note: the [k3s](https://k3s.io/) project allows us to use sqlite instead of etcd)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## Let's experiment a bit!

- For the exercises in this section, you'll be using `kubectl` locally and connecting to an AKS cluster

- Get cluster info
  ```bash
  kubectl cluster-info
  ```
- Check that the cluster is operational:
  ```bash
  kubectl get nodes
  ```

- All nodes should be `Ready`

]

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## Create

- Let's create a simple object

- List existing namespaces:
  ```bash
    kubectl get ns
  ```

- Create a new namespace with the following command:
 ```bash
 kubectl create -f- <<EOF
 apiVersion: v1
 kind: Namespace
 metadata:
 name: hello
 EOF
 ```

]

This is equivalent to `kubectl create namespace hello`.

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## Read

- Let's retrieve the object we just created

- Read back our object:
  ```bash
  kubectl get namespace hello -o yaml
  ```

]

We see a lot of data that wasn't here when we created the object.

Some data was automatically added to the object (like `spec.finalizers`).

Some data is dynamic (typically, the content of `status`.)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## API requests and responses

- Almost every Kubernetes API payload (requests and responses) has the same format:
  ```yaml
    apiVersion: xxx
    kind: yyy
    metadata:
      name: zzz
      (more metadata fields here)
    (more fields here)
  ```

- The fields shown above are mandatory, except for some special cases

(e.g.: in lists of resources, the list itself doesn't have a `metadata.name`)

- We show YAML for convenience, but the API uses JSON

(with optional protobuf encoding)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## API versions

- The `apiVersion` field corresponds to an *API group*

- It can be either `v1` (aka "core" group or "legacy group"), or `group/versions`; e.g.:

- `apps/v1`
  - `rbac.authorization.k8s.io/v1`
  - `extensions/v1beta1`

- It does not indicate which version of Kubernetes we're talking about

- It *indirectly* indicates the version of the `kind`

(which fields exist, their format, which ones are mandatory...)

- A single resource type (`kind`) is rarely versioned alone

(e.g.: the `batch` API group contains `jobs` and `cronjobs`)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## Update

- Let's update our namespace object

- There are many ways to do that, including:

- `kubectl apply` (and provide an updated YAML file)
  - `kubectl edit`
  - `kubectl patch`
  - many helpers, like `kubectl label`, or `kubectl set`

- In each case, `kubectl` will:

- get the current definition of the object
  - compute changes
  - submit the changes (with `PATCH` requests)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## Adding a label

- For demonstration purposes, let's add a label to the namespace

- The easiest way is to use `kubectl label`

- In one terminal, watch namespaces:
  ```bash
  kubectl get namespaces --show-labels -w
  ```

- In the other, update our namespace:
  ```bash
  kubectl label namespaces hello color=purple
  ```

]

We demonstrated *update* and *watch* semantics.

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## What's special about *watch*?

- The API server itself doesn't do anything: it's just a fancy object store

- All the actual logic in Kubernetes is implemented with *controllers*

- A *controller* watches a set of resources, and takes action when they change

- Examples:

- when a Pod object is created, it gets scheduled and started

- when a Pod belonging to a ReplicaSet terminates, it gets replaced

- when a Deployment object is updated, it can trigger a rolling update

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/chinook-helicopter-container.jpg)]

---

Other control plane components

.nav[
[Previous section](#toc-the-kubernetes-api)
|
[Back to table of contents](#toc-chapter-1)
|
[Next section](#toc-healthchecks)
]

---

# Other control plane components

- API server ✔️

- etcd ✔️

- Controller manager

- Scheduler

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## Controller manager

- This is a collection of loops watching all kinds of objects

- That's where the actual logic of Kubernetes lives

- When we create a Deployment (e.g. with `kubectl run web --image=nginx`),

- we create a Deployment object

- the Deployment controller notices it, and creates a ReplicaSet

- the ReplicaSet controller notices the ReplicaSet, and creates a Pod

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

## Scheduler

- When a pod is created, it is in `Pending` state

- The scheduler (or rather: *a scheduler*) must bind it to a node

- Kubernetes comes with an efficient scheduler with many features

- if we have special requirements, we can add another scheduler
 
 (example: this [demo scheduler](https://github.com/kelseyhightower/scheduler) uses the cost of nodes, stored in node annotations)

- A pod might stay in `Pending` state for a long time:

- if the cluster is full

- if the pod has special constraints that can't be met

- if the scheduler is not running (!)

.debug[[k8s/architecture-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/architecture-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/container-cranes.jpg)]

---

Healthchecks

.nav[
[Previous section](#toc-other-control-plane-components)
|
[Back to table of contents](#toc-chapter-2)
|
[Next section](#toc-deploying-a-sample-application)
]

---
# Healthchecks

- Kubernetes provides two kinds of healthchecks: liveness and readiness

- Healthchecks are *probes* that apply to *containers* (not to pods)

- Each container can have two (optional) probes:

- liveness = is this container dead or alive?

- readiness = is this container ready to serve traffic?

- Different probes are available (HTTP, TCP, program execution)

- Let's see the difference and how to use them!

.debug[[k8s/healthchecks.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/healthchecks.md)]
---

## Liveness probe

- Indicates if the container is dead or alive

- A dead container cannot come back to life

- If the liveness probe fails, the container is killed

(to make really sure that it's really dead; no zombies or undeads!)

- What happens next depends on the pod's `restartPolicy`:

- `Never`: the container is not restarted

- `OnFailure` or `Always`: the container is restarted

.debug[[k8s/healthchecks.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/healthchecks.md)]
---

## When to use a liveness probe

- To indicate failures that can't be recovered

- deadlocks (causing all requests to time out)

- internal corruption (causing all requests to error)

- If the liveness probe fails *N* consecutive times, the container is killed

- *N* is the `failureThreshold` (3 by default)

.debug[[k8s/healthchecks.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/healthchecks.md)]
---

## Readiness probe

- Indicates if the container is ready to serve traffic

- If a container becomes "unready" (let's say busy!) it might be ready again soon

- If the readiness probe fails:

- the container is *not* killed

- if the pod is a member of a service, it is temporarily removed

- it is re-added as soon as the readiness probe passes again

.debug[[k8s/healthchecks.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/healthchecks.md)]
---

## When to use a readiness probe

- To indicate temporary failures

- the application can only service *N* parallel connections

- the runtime is busy doing garbage collection or initial data load

- The container is marked as "not ready" after `failureThreshold` failed attempts

(3 by default)

- It is marked again as "ready" after `successThreshold` successful attempts

(1 by default)

.debug[[k8s/healthchecks.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/healthchecks.md)]
---

## Different types of probes

- HTTP request

- specify URL of the request (and optional headers)

- any status code between 200 and 399 indicates success

- TCP connection

- the probe succeeds if the TCP port is open

- arbitrary exec

- a command is executed in the container

- exit status of zero indicates success

.debug[[k8s/healthchecks.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/healthchecks.md)]
---

## Benefits of using probes

- Rolling updates proceed when containers are *actually ready*

(as opposed to merely started)

- Containers in a broken state get killed and restarted

(instead of serving errors or timeouts)

- Overloaded backends get removed from load balancer rotation

(thus improving response times across the board)

.debug[[k8s/healthchecks.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/healthchecks.md)]
---

## Example: HTTP probe

Here is a pod template for the `rng` web service of our DockerCoins sample app:

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: rng-with-liveness
spec:
  containers:
  - name: rng
    image: dockercoins/rng:v0.1
    livenessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 1
```

If the backend serves an error, or takes longer than 1s, 3 times in a row, it gets killed.

.debug[[k8s/healthchecks.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/healthchecks.md)]
---

## Example: exec probe

Here is a pod template for a Redis server:

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: redis-with-liveness
spec:
  containers:
  - name: redis
    image: redis
    livenessProbe:
      exec:
        command: ["redis-cli", "ping"]
```

If the Redis process becomes unresponsive, it will be killed.

.debug[[k8s/healthchecks.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/healthchecks.md)]
---

## Details about liveness and readiness probes

- Probes are executed at intervals of `periodSeconds` (default: 10)

- The timeout for a probe is set with `timeoutSeconds` (default: 1)

- A probe is considered successful after `successThreshold` successes (default: 1)

- A probe is considered failing after `failureThreshold` failures (default: 3)

- If a probe is not defined, it's as if there was an "always successful" probe

.debug[[k8s/healthchecks.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/healthchecks.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/container-housing.jpg)]

---

Deploying a sample application

.nav[
[Previous section](#toc-healthchecks)
|
[Back to table of contents](#toc-chapter-2)
|
[Next section](#toc-authentication-and-authorization)
]

---
# Deploying a sample application

- We will connect to our new Kubernetes cluster

- We will deploy a sample application, "DockerCoins"

- That app features multiple micro-services and a web UI

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

## Cloning some repos

- We will need two repositories:

- the first one has the "DockerCoins" demo app

- the second one has these slides, some scripts, more manifests ...

- Clone the kubercoins repository locally:
  ```bash
  git clone https://github.com/jpetazzo/kubercoins
  ```

- Clone the container.training repository as well:
  ```bash
  git clone https://github.com/jpetazzo/container.training
  ```

]

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

## Running the application

Without further ado, let's start this application!

- Apply all the manifests from the kubercoins repository:
  ```bash
  kubectl apply -f kubercoins/
  ```

]

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

## What's this application?

- It is a DockerCoin miner! .emoji[💰🐳📦🚢]

- No, you can't buy coffee with DockerCoins

- How DockerCoins works:

- generate a few random bytes

- hash these bytes

- increment a counter (to keep track of speed)

- repeat forever!

- DockerCoins is *not* a cryptocurrency

(the only common points are "randomness", "hashing", and "coins" in the name)

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

## DockerCoins in the microservices era

- DockerCoins is made of 5 services:

- `rng` = web service generating random bytes

- `hasher` = web service computing hash of POSTed data

- `worker` = background process calling `rng` and `hasher`

- `webui` = web interface to watch progress

- `redis` = data store (holds a counter updated by `worker`)

- These 5 services are visible in the application's Compose file,
  [docker-compose.yml](
  https://github.com/jpetazzo/container.training/blob/master/dockercoins/docker-compose.yml)

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

## How DockerCoins works

- `worker` invokes web service `rng` to generate random bytes

- `worker` invokes web service `hasher` to hash these bytes

- `worker` does this in an infinite loop

- every second, `worker` updates `redis` to indicate how many loops were done

- `webui` queries `redis`, and computes and exposes "hashing speed" in our browser

*(See diagram on next slide!)*

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

![Diagram showing the 5 containers of the applications](images/dockercoins-diagram.svg)

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

## Service discovery in container-land

How does each service find out the address of the other ones?

- We do not hard-code IP addresses in the code

- We do not hard-code FQDNs in the code, either

- We just connect to a service name, and container-magic does the rest

(And by container-magic, we mean "a crafty, dynamic, embedded DNS server")

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

## Example in `worker/worker.py`

```python
redis = Redis("`redis`")

def get_random_bytes():
    r = requests.get("http://`rng`/32")
    return r.content

def hash_bytes(data):
    r = requests.post("http://`hasher`/",
                      data=data,
                      headers={"Content-Type": "application/octet-stream"})
```

(Full source code available [here](
https://github.com/jpetazzo/container.training/blob/8279a3bce9398f7c1a53bdd95187c53eda4e6435/dockercoins/worker/worker.py#L17
))

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

## Show me the code!

- You can check the GitHub repository with all the materials of this workshop:
 https://github.com/jpetazzo/container.training

- The application is in the [dockercoins](
  https://github.com/jpetazzo/container.training/tree/master/dockercoins)
  subdirectory

- The Compose file ([docker-compose.yml](
  https://github.com/jpetazzo/container.training/blob/master/dockercoins/docker-compose.yml))
  lists all 5 services

- `redis` is using an official image from the Docker Hub

- `hasher`, `rng`, `worker`, `webui` are each built from a Dockerfile

- Each service's Dockerfile and source code is in its own directory

(`hasher` is in the [hasher](https://github.com/jpetazzo/container.training/blob/master/dockercoins/hasher/) directory,
  `rng` is in the [rng](https://github.com/jpetazzo/container.training/blob/master/dockercoins/rng/)
  directory, etc.)

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

## Our application at work

- We can check the logs of our application's pods

- Check the logs of the various components:
  ```bash
  kubectl logs deploy/worker
  kubectl logs deploy/hasher
  ```

]

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

## Connecting to the web UI

- The `webui` container exposes a web dashboard; let's view it

- Open a proxy to our cluster:
  ```bash
  kubectl proxy &
  ```

- Open in a web browser: [http://localhost:8001/api/v1/namespaces/default/services/webui/proxy/index.html](http://localhost:8001/api/v1/namespaces/default/services/webui/proxy/index.html)

]

A drawing area should show up, and after a few seconds, a blue
graph will appear.

- If using Cloud Shell, use the [Cloud Shell Web Preview](https://docs.microsoft.com/en-us/azure/cloud-shell/using-the-shell-window#web-preview) and append `/api/v1/namespaces/default/services/webui/proxy/index.html` to the existing path

.debug[[k8s/kubercoins-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/kubercoins-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/containers-by-the-water.jpg)]

---

Authentication and authorization

.nav[
[Previous section](#toc-deploying-a-sample-application)
|
[Back to table of contents](#toc-chapter-2)
|
[Next section](#toc-resource-limits)
]

---
# Authentication and authorization

*And first, a little refresher!*

- Authentication = verifying the identity of a person

On a UNIX system, we can authenticate with login+password, SSH keys ...

- Authorization = listing what they are allowed to do

On a UNIX system, this can include file permissions, sudoer entries ...

- Sometimes abbreviated as "authn" and "authz"

- In good modular systems, these things are decoupled

(so we can e.g. change a password or SSH key without having to reset access rights)

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Authentication in Kubernetes

- When the API server receives a request, it tries to authenticate it

(it examines headers, certificates... anything available)

- Many authentication methods are available and can be used simultaneously

(we will see them on the next slide)

- It's the job of the authentication method to produce:

- the user name
  - the user ID
  - a list of groups

- The API server doesn't interpret these; that'll be the job of *authorizers*

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Authentication methods

- TLS client certificates

(that's what we've been doing with `kubectl` so far)

- Bearer tokens

(a secret token in the HTTP headers of the request)

- [HTTP basic auth](https://en.wikipedia.org/wiki/Basic_access_authentication)

(carrying user and password in an HTTP header)

- Authentication proxy

(sitting in front of the API and setting trusted headers)

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Anonymous & unauthenticated requests

- If any authentication method *rejects* a request, it's denied

(`401 Unauthorized` HTTP code)

- If a request is neither rejected nor accepted by anyone, it's anonymous

- the user name is `system:anonymous`

- the list of groups is `[system:unauthenticated]`

- By default, the anonymous user can't do anything

- Note that 401 (not 403) is what you get if you just `curl` the Kubernetes API
  ```bash
    curl -k $API_URL
  ```

]

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Authentication with tokens

- Tokens are passed as HTTP headers:

`Authorization: Bearer and-then-here-comes-the-token`

- Tokens can be validated through a number of different methods:

- static tokens hard-coded in a file on the API server

- [bootstrap tokens](https://kubernetes.io/docs/reference/access-authn-authz/bootstrap-tokens/) (special case to create a cluster or join nodes)

- [OpenID Connect tokens](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens) (to delegate authentication to compatible OAuth2 providers)

- service accounts (these deserve more details, coming right up!)

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Service accounts

- A service account is a user that exists in the Kubernetes API

(it is visible with e.g. `kubectl get serviceaccounts`)

- Service accounts can therefore be created / updated dynamically

(they don't require hand-editing a file and restarting the API server)

- A service account is associated with a set of secrets

(the kind that you can view with `kubectl get secrets`)

- Service accounts are generally used to grant permissions to applications, services...

(as opposed to humans)

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Token authentication in practice

- We are going to list existing service accounts

- Then we will extract the token for a given service account

- And we will use that token to authenticate with the API

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Listing service accounts

- The resource name is `serviceaccount` or `sa` for short:
  ```bash
  kubectl get sa
  ```

]

There should be just one service account in the default namespace: `default`.

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Finding the secret

- List the secrets for the `default` service account:
  ```bash
  kubectl get sa default -o yaml
  SECRET=$(kubectl get sa default -o json | jq -r .secrets[0].name)
  echo $SECRET
  ```

]

It should be named `default-token-XXXXX`.

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Extracting the token

- The token is stored in the secret, wrapped with base64 encoding

- View the secret:
  ```bash
  kubectl get secret $SECRET -o yaml
  ```

- Extract the token and decode it:
  ```bash
  TOKEN=$(kubectl get secret $SECRET -o json \
          | jq -r .data.token | openssl base64 -d -A)
  ```

]

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Using the token

- Let's send a request to the API, without and with the token

- Find the URL for the `kubernetes` master:
  ```bash
  kubectl cluster-info
  ```
- Set it programmatically, if AKS_NAME is set: (choose from `kubectl config view`):
  ```bash
  API=$(kubectl config view -o \
        jsonpath="{.clusters[?(@.name==\"$AKS_NAME\")].cluster.server}")
  ```
- Connect without the token, then with the token::
  ```bash
  curl -k $API
  curl -k -H "Authorization: Bearer $TOKEN" $API
  ```

]

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Authorization in Kubernetes

- There are multiple ways to grant permissions in Kubernetes, called [authorizers](https://kubernetes.io/docs/reference/access-authn-authz/authorization/#authorization-modules):

- [Node Authorization](https://kubernetes.io/docs/reference/access-authn-authz/node/) (used internally by kubelet; we can ignore it)

- [Attribute-based access control](https://kubernetes.io/docs/reference/access-authn-authz/abac/) (powerful but complex and static; ignore it too)

- [Webhook](https://kubernetes.io/docs/reference/access-authn-authz/webhook/) (each API request is submitted to an external service for approval)

- [Role-based access control](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) (associates permissions to users dynamically)

- The one we want is the last one, generally abbreviated as RBAC

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Role-based access control

- RBAC allows to specify fine-grained permissions

- Permissions are expressed as *rules*

- A rule is a combination of:

- [verbs](https://kubernetes.io/docs/reference/access-authn-authz/authorization/#determine-the-request-verb) like create, get, list, update, delete...

- resources (as in "API resource," like pods, nodes, services...)

- resource names (to specify e.g. one specific pod instead of all pods)

- in some case, [subresources](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#referring-to-resources) (e.g. logs are subresources of pods)

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## From rules to roles to rolebindings

- A *role* is an API object containing a list of *rules*

Example: role "external-load-balancer-configurator" can:
  - [list, get] resources [endpoints, services, pods]
  - [update] resources [services]

- A *rolebinding* associates a role with a user

Example: rolebinding "external-load-balancer-configurator":
  - associates user "external-load-balancer-configurator"
  - with role "external-load-balancer-configurator"

- Yes, there can be users, roles, and rolebindings with the same name

- It's a good idea for 1-1-1 bindings; not so much for 1-N ones

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Cluster-scope permissions

- API resources Role and RoleBinding are for objects within a namespace

- We can also define API resources ClusterRole and ClusterRoleBinding

- These are a superset, allowing us to:

- specify actions on cluster-wide objects (like nodes)

- operate across all namespaces

- We can create Role and RoleBinding resources within a namespace

- ClusterRole and ClusterRoleBinding resources are global

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Pods and service accounts

- A pod can be associated with a service account

- by default, it is associated with the `default` service account

- as we saw earlier, this service account has no permissions anyway

- The associated token is exposed to the pod's filesystem

(in `/var/run/secrets/kubernetes.io/serviceaccount/token`)

- Standard Kubernetes tooling (like `kubectl`) will look for it there

- So Kubernetes tools running in a pod will automatically use the service account

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

## Pod Security Policies

- If you'd like to check out pod-level controls in AKS, they are [available in preview](https://docs.microsoft.com/en-us/azure/aks/use-pod-security-policies)

- Experiment, but not in production!

.debug[[k8s/authn-authz-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/authn-authz-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/distillery-containers.jpg)]

---

Resource Limits

.nav[
[Previous section](#toc-authentication-and-authorization)
|
[Back to table of contents](#toc-chapter-3)
|
[Next section](#toc-defining-min-max-and-default-resources)
]

---
# Resource Limits

- We can attach resource indications to our pods

(or rather: to the *containers* in our pods)

- We can specify *limits* and/or *requests*

- We can specify quantities of CPU and/or memory

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## CPU vs memory

- CPU is a *compressible resource*

(it can be preempted immediately without adverse effect)

- Memory is an *incompressible resource*

(it needs to be swapped out to be reclaimed; and this is costly)

- As a result, exceeding limits will have different consequences for CPU and memory

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Exceeding CPU limits

- CPU can be reclaimed instantaneously

(in fact, it is preempted hundreds of times per second, at each context switch)

- If a container uses too much CPU, it can be throttled

(it will be scheduled less often)

- The processes in that container will run slower

(or rather: they will not run faster)

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Exceeding memory limits

- Memory needs to be swapped out before being reclaimed

- "Swapping" means writing memory pages to disk, which is very slow

- On a classic system, a process that swaps can get 1000x slower

(because disk I/O is 1000x slower than memory I/O)

- Exceeding the memory limit (even by a small amount) can reduce performance *a lot*

- Kubernetes *does not support swap* (more on that later!)

- Exceeding the memory limit will cause the container to be killed

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Limits vs requests

- Limits are "hard limits" (they can't be exceeded)

- a container exceeding its memory limit is killed

- a container exceeding its CPU limit is throttled

- Requests are used for scheduling purposes

- a container using *less* than what it requested will never be killed or throttled

- the scheduler uses the requested sizes to determine placement

- the resources requested by all pods on a node will never exceed the node size

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Pod quality of service

Each pod is assigned a QoS class (visible in `status.qosClass`).

- If limits = requests:

- as long as the container uses less than the limit, it won't be affected

- if all containers in a pod have *(limits=requests)*, QoS is considered "Guaranteed"

- If requests < limits:

- as long as the container uses less than the request, it won't be affected

- otherwise, it might be killed/evicted if the node gets overloaded

- if at least one container has *(requests<limits)*, QoS is considered "Burstable"

- If a pod doesn't have any request nor limit, QoS is considered "BestEffort"

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Quality of service impact

- When a node is overloaded, BestEffort pods are killed first

- Then, Burstable pods that exceed their limits

- Burstable and Guaranteed pods below their limits are never killed

(except if their node fails)

- If we only use Guaranteed pods, no pod should ever be killed

(as long as they stay within their limits)

(Pod QoS is also explained in [this page](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/) of the Kubernetes documentation and in [this blog post](https://medium.com/google-cloud/quality-of-service-class-qos-in-kubernetes-bb76a89eb2c6).)

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Where is my swap?

- The semantics of memory and swap limits on Linux cgroups are complex

- In particular, it's not possible to disable swap for a cgroup

(the closest option is to [reduce "swappiness"](https://unix.stackexchange.com/questions/77939/turning-off-swapping-for-only-one-process-with-cgroups))

- The architects of Kubernetes wanted to ensure that Guaranteed pods never swap

- The only solution was to disable swap entirely

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Alternative point of view

- Swap enables paging¹ of anonymous² memory

- Even when swap is disabled, Linux will still page memory for:

- executables, libraries

- mapped files

- Disabling swap *will reduce performance and available resources*

- For a good time, read [kubernetes/kubernetes#53533](https://github.com/kubernetes/kubernetes/issues/53533)

- Also read this [excellent blog post about swap](https://jvns.ca/blog/2017/02/17/mystery-swap/)

¹Paging: reading/writing memory pages from/to disk to reclaim physical memory

²Anonymous memory: memory that is not backed by files or blocks

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Enabling swap anyway

- If you don't care that pods are swapping, you can enable swap

- You will need to add the flag `--fail-swap-on=false` to kubelet

(otherwise, it won't start!)

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Specifying resources

- Resource requests are expressed at the *container* level

- CPU is expressed in "virtual CPUs"

(corresponding to the virtual CPUs offered by some cloud providers)

- CPU can be expressed with a decimal value, or even a "milli" suffix

(so 100m = 0.1)

- Memory is expressed in bytes

- Memory can be expressed with k, M, G, T, ki, Mi, Gi, Ti suffixes

(corresponding to 10^3, 10^6, 10^9, 10^12, 2^10, 2^20, 2^30, 2^40)

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Specifying resources in practice

This is what the spec of a Pod with resources will look like:

```yaml
containers:
- name: httpenv
  image: jpetazzo/httpenv
  resources:
    limits:
      memory: "100Mi"
      cpu: "100m"
    requests:
      memory: "100Mi"
      cpu: "10m"
```

This set of resources makes sure that this service won't be killed (as long as it stays below 100 MB of RAM), but allows its CPU usage to be throttled if necessary.

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Default values

- If we specify a limit without a request:

the request is set to the limit

- If we specify a request without a limit:

there will be no limit

(which means that the limit will be the size of the node)

- If we don't specify anything:

the request is zero and the limit is the size of the node

*Unless there are default values defined for our namespace!*

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## We need default resource values

- If we do not set resource values at all:

- the limit is "the size of the node"

- the request is zero

- This is generally *not* what we want

- a container without a limit can use up all the resources of a node

- if the request is zero, the scheduler can't make a smart placement decision

- To address this, we can set default values for resources

- This is done with a LimitRange object

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/lots-of-containers.jpg)]

---

Defining min, max, and default resources

.nav[
[Previous section](#toc-resource-limits)
|
[Back to table of contents](#toc-chapter-3)
|
[Next section](#toc-namespace-quotas)
]

---

# Defining min, max, and default resources

- We can create LimitRange objects to indicate any combination of:

- min and/or max resources allowed per pod

- default resource *limits*

- default resource *requests*

- maximal burst ratio (*limit/request*)

- LimitRange objects are namespaced

- They apply to their namespace only

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## LimitRange example

```yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: my-very-detailed-limitrange
spec:
  limits:
  - type: Container
    min:
      cpu: "100m"
    max:
      cpu: "2000m"
      memory: "1Gi"
    default:
      cpu: "500m"
      memory: "250Mi"
    defaultRequest:
      cpu: "500m"
```

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Example explanation

The YAML on the previous slide shows an example LimitRange object specifying very detailed limits on CPU usage,
and providing defaults on RAM usage.

Note the `type: Container` line: in the future,
it might also be possible to specify limits
per Pod, but it's not [officially documented yet](https://github.com/kubernetes/website/issues/9585).

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## LimitRange details

- LimitRange restrictions are enforced only when a Pod is created

(they don't apply retroactively)

- They don't prevent creation of e.g. an invalid Deployment or DaemonSet

(but the pods will not be created as long as the LimitRange is in effect)

- If there are multiple LimitRange restrictions, they all apply together

(which means that it's possible to specify conflicting LimitRanges,
 preventing any Pod from being created)

- If a LimitRange specifies a `max` for a resource but no `default`,
 that `max` value becomes the `default` limit too

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/plastic-containers.JPG)]

---

Namespace quotas

.nav[
[Previous section](#toc-defining-min-max-and-default-resources)
|
[Back to table of contents](#toc-chapter-3)
|
[Next section](#toc-limiting-resources-in-practice)
]

---

# Namespace quotas

- We can also set quotas per namespace

- Quotas apply to the total usage in a namespace

(e.g. total CPU limits of all pods in a given namespace)

- Quotas can apply to resource limits and/or requests

(like the CPU and memory limits that we saw earlier)

- Quotas can also apply to other resources:

- "extended" resources (like GPUs)

- storage size

- number of objects (number of pods, services...)

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Creating a quota for a namespace

- Quotas are enforced by creating a ResourceQuota object

- ResourceQuota objects are namespaced, and apply to their namespace only

- We can have multiple ResourceQuota objects in the same namespace

- The most restrictive values are used

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Limiting total CPU/memory usage

- The following YAML specifies an upper bound for *limits* and *requests*:
  ```yaml
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: a-little-bit-of-compute
    spec:
      hard:
        requests.cpu: "10"
        requests.memory: 10Gi
        limits.cpu: "20"
        limits.memory: 20Gi
  ```

These quotas will apply to the namespace where the ResourceQuota is created.

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Limiting number of objects

- The following YAML specifies how many objects of specific types can be created:
  ```yaml
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: quota-for-objects
    spec:
      hard:
        pods: 100
        services: 10
        secrets: 10
        configmaps: 10
        persistentvolumeclaims: 20
        services.nodeports: 0
        services.loadbalancers: 0
        count/roles.rbac.authorization.k8s.io: 10
  ```

(The `count/` syntax allows limiting arbitrary objects, including CRDs.)

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## YAML vs CLI

- Quotas can be created with a YAML definition

- ...Or with the `kubectl create quota` command

- Create the following quota:
  ```bash
    kubectl create quota my-resource-quota \
            --hard=pods=300,limits.memory=300Gi

```

]

- With both YAML and CLI form, the values are always under the `hard` section

(there is no `soft` quota)

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Viewing current usage

When a ResourceQuota is created, we can see how much of it is used:

- Check how much of the ResourceQuota is used:
  ```bash
    kubectl describe resourcequota my-resource-quota

```

- Remove quota:
  ```bash
    kubectl delete quota my-resource-quota

```

]

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Advanced quotas and PriorityClass

- Since Kubernetes 1.12, it is possible to create PriorityClass objects

- Pods can be assigned a PriorityClass

- Quotas can be linked to a PriorityClass

- This allows us to reserve resources for pods within a namespace

- For more details, check [this documentation page](https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-per-priorityclass)

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/train-of-containers-1.jpg)]

---

Limiting resources in practice

.nav[
[Previous section](#toc-namespace-quotas)
|
[Back to table of contents](#toc-chapter-3)
|
[Next section](#toc-checking-pod-and-node-resource-usage)
]

---

# Limiting resources in practice

- We have at least three mechanisms:

- requests and limits per Pod

- LimitRange per namespace

- ResourceQuota per namespace

- Let's see a simple recommendation to get started with resource limits

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Set a LimitRange

- In each namespace, create a LimitRange object

- Set a small default CPU request and CPU limit

(e.g. "100m")

- Set a default memory request and limit depending on your most common workload

- for Java, Ruby: start with "1G"

- for Go, Python, PHP, Node: start with "250M"

- Set upper bounds slightly below your expected node size

(80-90% of your node size, with at least a 500M memory buffer)

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Set a ResourceQuota

- In each namespace, create a ResourceQuota object

- Set generous CPU and memory limits

(e.g. half the cluster size if the cluster hosts multiple apps)

- Set generous objects limits

- these limits should not be here to constrain your users

- they should catch a runaway process creating many resources

- example: a custom controller creating many pods

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

## Observe, refine, iterate

- Observe the resource usage of your pods

(we will see how in the next chapter)

- Adjust individual pod limits

- If you see trends: adjust the LimitRange

(rather than adjusting every individual set of pod limits)

- Observe the resource usage of your namespaces

(with `kubectl describe resourcequota ...`)

- Rinse and repeat regularly

.debug[[k8s/resource-limits-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/resource-limits-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/train-of-containers-2.jpg)]

---

Checking pod and node resource usage

.nav[
[Previous section](#toc-limiting-resources-in-practice)
|
[Back to table of contents](#toc-chapter-3)
|
[Next section](#toc-cluster-sizing)
]

---
# Checking pod and node resource usage

- Since Kubernetes 1.8, metrics are collected by the [resource metrics pipeline](https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/)

- The resource metrics pipeline is:

- optional (Kubernetes can function without it)

- necessary for some features (like the Horizontal Pod Autoscaler)

- exposed through the Kubernetes API using the [aggregation layer](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)

- usually implemented by the "metrics server"

.debug[[k8s/metrics-server.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/metrics-server.md)]
---

## How to know if the metrics server is running?

- The easiest way to know is to run `kubectl top`

- Check if the core metrics pipeline is available:
  ```bash
  kubectl top nodes
  ```

]

If it shows our nodes and their CPU and memory load, we're good!

.debug[[k8s/metrics-server.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/metrics-server.md)]
---

## Installing metrics server

- The metrics server doesn't have any particular requirements

(it doesn't need persistence, as it doesn't *store* metrics)

- It has its own repository, [kubernetes-incubator/metrics-server](https://github.com/kubernetes-incubator/metrics-server)

- The repository comes with [YAML files for deployment](https://github.com/kubernetes-incubator/metrics-server/tree/master/deploy/1.8%2B)

- These files may not work on some clusters

(e.g. if your node names are not in DNS)

- The container.training repository has a [metrics-server.yaml](https://github.com/jpetazzo/container.training/blob/master/k8s/metrics-server.yaml#L90) file to help with that

(we can `kubectl apply -f` that file if needed)

.debug[[k8s/metrics-server.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/metrics-server.md)]
---

## Showing container resource usage

- Once the metrics server is running, we can check container resource usage

- Show resource usage across all containers:
  ```bash
  kubectl top pods --containers --all-namespaces
  ```
]

- We can also use selectors (`-l app=...`)

.debug[[k8s/metrics-server.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/metrics-server.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/two-containers-on-a-truck.jpg)]

---

Cluster sizing

.nav[
[Previous section](#toc-checking-pod-and-node-resource-usage)
|
[Back to table of contents](#toc-chapter-4)
|
[Next section](#toc-the-horizontal-pod-autoscaler)
]

---
# Cluster sizing

- What happens when the cluster gets full?

- How can we scale up the cluster?

- Can we do it automatically?

- What are other methods to address capacity planning?

.debug[[k8s/cluster-sizing-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/cluster-sizing-k8s201.md)]
---

## When are we out of resources?

- kubelet monitors node resources:

- memory

- node disk usage (typically the root filesystem of the node)

- image disk usage (where container images and RW layers are stored)

- For each resource, we can provide two thresholds:

- a hard threshold (if it's met, it provokes immediate action)

- a soft threshold (provokes action only after a grace period)

- Resource thresholds and grace periods are configurable

(by passing kubelet command-line flags)

.debug[[k8s/cluster-sizing-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/cluster-sizing-k8s201.md)]
---

## What happens then?

- If disk usage is too high:

- kubelet will try to remove terminated pods

- then, it will try to *evict* pods

- If memory usage is too high:

- it will try to evict pods

- The node is marked as "under pressure"

- This temporarily prevents new pods from being scheduled on the node

.debug[[k8s/cluster-sizing-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/cluster-sizing-k8s201.md)]
---

## Which pods get evicted?

- kubelet looks at the pods' QoS and PriorityClass

- First, pods with BestEffort QoS are considered

- Then, pods with Burstable QoS exceeding their *requests*

(but only if the exceeding resource is the one that is low on the node)

- Finally, pods with Guaranteed QoS, and Burstable pods within their requests

- Within each group, pods are sorted by PriorityClass

- If there are pods with the same PriorityClass, they are sorted by usage excess

(i.e. the pods whose usage exceeds their requests the most are evicted first)

.debug[[k8s/cluster-sizing-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/cluster-sizing-k8s201.md)]
---

## Eviction of Guaranteed pods

- *Normally*, pods with Guaranteed QoS should not be evicted

- A chunk of resources is reserved for node processes (like kubelet)

- It is expected that these processes won't use more than this reservation

- If they do use more resources anyway, all bets are off!

- If this happens, kubelet must evict Guaranteed pods to preserve node stability

(or Burstable pods that are still within their requested usage)

.debug[[k8s/cluster-sizing-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/cluster-sizing-k8s201.md)]
---

## What happens to evicted pods?

- The pod is terminated

- It is marked as `Failed` at the API level

- If the pod was created by a controller, the controller will recreate it

- The pod will be recreated on another node, *if there are resources available!*

- For more details about the eviction process, see:

- [this documentation page](https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/) about resource pressure and pod eviction,

- [this other documentation page](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/) about pod priority and preemption.

.debug[[k8s/cluster-sizing-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/cluster-sizing-k8s201.md)]
---

## What if there are no resources available?

- Sometimes, a pod cannot be scheduled anywhere:

- all the nodes are under pressure,

- or the pod requests more resources than are available

- The pod then remains in `Pending` state until the situation improves

.debug[[k8s/cluster-sizing-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/cluster-sizing-k8s201.md)]
---

## Cluster scaling

- One way to improve the situation is to add new nodes

- This can be done automatically with the [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)

- The autoscaler will automatically scale up:

- if there are pods that failed to be scheduled

- The autoscaler will automatically scale down:

- if nodes have a low utilization for an extended period of time

.debug[[k8s/cluster-sizing-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/cluster-sizing-k8s201.md)]
---

## Restrictions, gotchas ...

- The Cluster Autoscaler only supports a few cloud infrastructures

(see [here](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider) for a list) - ([in preview for AKS](https://docs.microsoft.com/en-us/azure/aks/cluster-autoscaler))

- The Cluster Autoscaler cannot scale down nodes that have pods using:

- local storage

- affinity/anti-affinity rules preventing them from being rescheduled

- a restrictive PodDisruptionBudget

.debug[[k8s/cluster-sizing-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/cluster-sizing-k8s201.md)]
---

## Other way to do capacity planning

- "Running Kubernetes without nodes"

- Systems like [Virtual Kubelet](https://virtual-kubelet.io/) or Kiyot can run pods using on-demand resources

- Virtual Kubelet can leverage e.g. ACI or Fargate to run pods

- Kiyot runs pods in ad-hoc EC2 instances (1 instance per pod)

- Economic advantage (no wasted capacity)

- Security advantage (stronger isolation between pods)

Check [this blog post](http://jpetazzo.github.io/2019/02/13/running-kubernetes-without-nodes-with-kiyot/) for more details.

.debug[[k8s/cluster-sizing-k8s201.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/cluster-sizing-k8s201.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/wall-of-containers.jpeg)]

---

The Horizontal Pod Autoscaler

.nav[
[Previous section](#toc-cluster-sizing)
|
[Back to table of contents](#toc-chapter-4)
|
[Next section](#toc-extending-the-kubernetes-api)
]

---
# The Horizontal Pod Autoscaler

- What is the Horizontal Pod Autoscaler, or HPA?

- It is a controller that can perform *horizontal* scaling automatically

- Horizontal scaling = changing the number of replicas

(adding/removing pods)

- Vertical scaling = changing the size of individual replicas

(increasing/reducing CPU and RAM per pod)

- Cluster scaling = changing the size of the cluster

(adding/removing nodes)

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## Principle of operation

- Each HPA resource (or "policy") specifies:

- which object to monitor and scale (e.g. a Deployment, ReplicaSet...)

- min/max scaling ranges (the max is a safety limit!)

- a target resource usage (e.g. the default is CPU=80%)

- The HPA continuously monitors the CPU usage for the related object

- It computes how many pods should be running:

`TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)`

- It scales the related object up/down to this target number of pods

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## Pre-requirements

- The metrics server needs to be running

(i.e. we need to be able to see pod metrics with `kubectl top pods`)

- The pods that we want to autoscale need to have resource requests

(because the target CPU% is not absolute, but relative to the request)

- The latter actually makes a lot of sense:

- if a Pod doesn't have a CPU request, it might be using 10% of CPU...

- ...but only because there is no CPU time available!

- this makes sure that we won't add pods to nodes that are already resource-starved

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## Testing the HPA

- We will start a CPU-intensive web service

- We will send some traffic to that service

- We will create an HPA policy

- The HPA will automatically scale up the service for us

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## A CPU-intensive web service

- Let's use `jpetazzo/busyhttp`

(it is a web server that will use 1s of CPU for each HTTP request)

- Deploy the web server:
  ```bash
  kubectl create deployment busyhttp --image=jpetazzo/busyhttp
  ```

- Expose it with a ClusterIP service:
  ```bash
  kubectl expose deployment busyhttp --port=80
  ```

- Port-forward to our service
  ```bash
  kubectl port-forward service/busyhttp 8080:80 &
  curl -k localhost:8080
  ```

]

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## Monitor what's going on

- Let's use some commands to watch what is happening

- Monitor pod CPU usage:
  ```bash
  kubectl top pods
  ```

- Monitor cluster events:
  ```bash
  kubectl get events -w
  ```

]

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## Send traffic to the service

- We will use [hey](https://github.com/rakyll/hey/releases) to send traffic

- Send a lot of requests to the service with a concurrency level of 3:
  ```bash
  curl https://storage.googleapis.com/jblabs/dist/hey_linux_v0.1.2 > hey 
  chmod +x hey
  ./hey http://localhost:8080 -c 3 -n 200
  ```

]

The CPU utilization should increase to 100%.

(The server is single-threaded and won't go above 100%.)

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## Create an HPA policy

- There is a helper command to do that for us: `kubectl autoscale`

- Create the HPA policy for the `busyhttp` deployment:
  ```bash
  kubectl autoscale deployment busyhttp --max=10
  ```

]

By default, it will assume a target of 80% CPU usage.

This can also be set with `--cpu-percent=`.

*The autoscaler doesn't seem to work. Why?*

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## What did we miss?

- The events stream (`kubectl get events -w`) gives us a hint, but to be honest, it's not very clear:

`missing request for cpu`

- We forgot to specify a resource request for our Deployment!

- The HPA target is not an absolute CPU%

- It is relative to the CPU requested by the pod

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## Adding a CPU request

- Let's edit the deployment and add a CPU request

- Since our server can use up to 1 core, let's request 1 core

- Edit the Deployment definition:
  ```bash
  kubectl edit deployment busyhttp
  ```

- In the `containers` list, add the following block:
  ```
  resources: {"requests":{"cpu":"1", "memory":"64Mi"}}
  ```
]

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## Results

- After saving and quitting, a rolling update happens

(if `hey` exits, make sure to restart it)

- It will take a minute or two for the HPA to kick in:

- the HPA runs every 30 seconds by default

- it needs to gather metrics from the metrics server first

- If we scale further up (or down), the HPA will react after a few minutes:

- it won't scale up if it already scaled in the last 3 minutes

- it won't scale down if it already scaled in the last 5 minutes

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

## What about other metrics?

- The HPA in API group `autoscaling/v1` only supports CPU scaling

- The HPA in API group `autoscaling/v2beta2` supports metrics from various API groups:

- metrics.k8s.io, aka metrics server (per-Pod CPU and RAM)

- custom.metrics.k8s.io, custom metrics per Pod

- external.metrics.k8s.io, external metrics (not associated to Pods)

- Kubernetes doesn't implement any of these API groups

- Using these metrics requires [registering additional APIs](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis)

- The metrics provided by metrics server are standard; everything else is custom

- For more details, see [this great blog post](https://medium.com/uptime-99/kubernetes-hpa-autoscaling-with-custom-and-external-metrics-da7f41ff7846) or [this talk](https://www.youtube.com/watch?v=gSiGFH4ZnS8)

.debug[[k8s/horizontal-pod-autoscaler.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/horizontal-pod-autoscaler.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/Container-Ship-Freighter-Navigation-Elbe-Romance-1782991.jpg)]

---

Extending the Kubernetes API

.nav[
[Previous section](#toc-the-horizontal-pod-autoscaler)
|
[Back to table of contents](#toc-chapter-4)
|
[Next section](#toc-managing-stacks-with-helm)
]

---
# Extending the Kubernetes API

There are multiple ways to extend the Kubernetes API.

We are going to cover:

- Custom Resource Definitions (CRDs)

- Admission Webhooks

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## Revisiting the API server

- The Kubernetes API server is a central point of the control plane

(everything connects to it: controller manager, scheduler, kubelets)

- Almost everything in Kubernetes is materialized by a resource

- Resources have a type (or "kind")

(similar to strongly typed languages)

- We can see existing types with `kubectl api-resources`

- We can list resources of a given type with `kubectl get <type>`

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## Creating new types

- We can create new types with Custom Resource Definitions (CRDs)

- CRDs are created dynamically

(without recompiling or restarting the API server)

- CRDs themselves are resources:

- we can create a new type with `kubectl create` and some YAML

- we can see all our custom types with `kubectl get crds`

- After we create a CRD, the new type works just like built-in types

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## What can we do with CRDs?

There are many possibilities!

- *Operators* encapsulate complex sets of resources

(e.g.: a PostgreSQL replicated cluster; an etcd cluster...
 
 see [awesome operators](https://github.com/operator-framework/awesome-operators) and
 [OperatorHub](https://operatorhub.io/) to find more)

- Custom use-cases like [gitkube](https://gitkube.sh/)

- creates a new custom type, `Remote`, exposing a git+ssh server

- deploy by pushing YAML or Helm charts to that remote

- Replacing built-in types with CRDs

(see [this lightning talk by Tim Hockin](https://www.youtube.com/watch?v=ji0FWzFwNhA&index=2&list=PLj6h78yzYM2PZf9eA7bhWnIh_mK1vyOfU))

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## Little details

- By default, CRDs are not *validated*

(we can put anything we want in the `spec`)

- When creating a CRD, we can pass an OpenAPI v3 schema (BETA!)

(which will then be used to validate resources)

- Generally, when creating a CRD, we also want to run a *controller*

(otherwise nothing will happen when we create resources of that type)

- The controller will typically *watch* our custom resources

(and take action when they are created/updated)

*
Examples:
[YAML to install the gitkube CRD](https://storage.googleapis.com/gitkube/gitkube-setup-stable.yaml),
[YAML to install a redis operator CRD](https://github.com/amaizfinance/redis-operator/blob/master/deploy/crds/k8s_v1alpha1_redis_crd.yaml)
*

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## Service catalog

- *Service catalog* is another extension mechanism

- It's not extending the Kubernetes API strictly speaking

(but it still provides new features!)

- It doesn't create new types; it uses:

- ClusterServiceBroker
  - ClusterServiceClass
  - ClusterServicePlan
  - ServiceInstance
  - ServiceBinding

- It uses the Open service broker API

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## Admission controllers

- When a Pod is created, it is associated with a ServiceAccount

(even if we did not specify one explicitly)

- That ServiceAccount was added on the fly by an *admission controller*

(specifically, a *mutating admission controller*)

- Admission controllers sit on the API request path

(see the cool diagram on next slide, courtesy of Banzai Cloud)

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

![API request lifecycle](images/api-request-lifecycle.png)

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## Admission controllers

- *Validating* admission controllers can accept/reject the API call

- *Mutating* admission controllers can modify the API request payload

- Both types can also trigger additional actions

(e.g. automatically create a Namespace if it doesn't exist)

- There are a number of built-in admission controllers

(see [documentation](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#what-does-each-admission-controller-do) for a list)

- But we can also define our own!

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## Admission Webhooks

- We can setup *admission webhooks* to extend the behavior of the API server

- The API server will submit incoming API requests to these webhooks

- These webhooks can be *validating* or *mutating*

- Webhooks can be set up dynamically (without restarting the API server)

- To setup a dynamic admission webhook, we create a special resource:

a `ValidatingWebhookConfiguration` or a `MutatingWebhookConfiguration`

- These resources are created and managed like other resources

(i.e. `kubectl create`, `kubectl get`...)

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## Webhook Configuration

- A ValidatingWebhookConfiguration or MutatingWebhookConfiguration contains:

- the address of the webhook

- the authentication information to use with the webhook

- a list of rules

- The rules indicate for which objects and actions the webhook is triggered

(to avoid e.g. triggering webhooks when setting up webhooks)

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## (Ab)using the API server

- If we need to store something "safely" (as in: in etcd), we can use CRDs

- This gives us primitives to read/write/list objects (and optionally validate them)

- The Kubernetes API server can run on its own

(without the scheduler, controller manager, and kubelets)

- By loading CRDs, we can have it manage totally different objects

(unrelated to containers, clusters, etc.)

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

## Documentation

- [Custom Resource Definitions: when to use them](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)

- [Custom Resources Definitions: how to use them](https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/)

- [Service Catalog](https://kubernetes.io/docs/concepts/extend-kubernetes/service-catalog/)

- [Built-in Admission Controllers](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/)

- [Dynamic Admission Controllers](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/)

.debug[[k8s/extending-api.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/extending-api.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/ShippingContainerSFBay.jpg)]

---

Managing stacks with Helm

.nav[
[Previous section](#toc-extending-the-kubernetes-api)
|
[Back to table of contents](#toc-chapter-4)
|
[Next section](#toc-whats-next)
]

---
# Managing stacks with Helm

- We created our first resources with `kubectl run`, `kubectl expose` ...

- We have also created resources by loading YAML files with `kubectl apply -f`

- For larger stacks, managing thousands of lines of YAML is unreasonable

- These YAML bundles need to be customized with variable parameters

(E.g.: number of replicas, image version to use ...)

- It would be nice to have an organized, versioned collection of bundles

- It would be nice to be able to upgrade/rollback these bundles carefully

- [Helm](https://helm.sh/) is an open source project offering all these things!

.debug[[k8s/helm.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/helm.md)]
---

## Helm concepts

- `helm` is a CLI tool

- `tiller` is its companion server-side component

- A "chart" is an archive containing templatized YAML bundles

- Charts are versioned

- Charts can be stored on private or public repositories

.debug[[k8s/helm.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/helm.md)]
---

## Installing Helm

- If the `helm` CLI is not installed in your environment, install it

- Check if `helm` is installed:
  ```bash
  helm
  ```

- If it's not installed, run the following command:
  ```bash
  curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
  ```

]

.debug[[k8s/helm.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/helm.md)]
---

## Installing Tiller

- Tiller is composed of a *service* and a *deployment* in the `kube-system` namespace

- They can be managed (installed, upgraded...) with the `helm` CLI

- Deploy Tiller:
  ```bash
  helm init
  ```

]

If Tiller was already installed, don't worry: this won't break it.

At the end of the install process, you will see:

```
Happy Helming!
```

.debug[[k8s/helm.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/helm.md)]
---

## Fix account permissions

- Helm permission model requires us to tweak permissions

- In a more realistic deployment, you might create per-user or per-team
  service accounts, roles, and role bindings

- Grant `cluster-admin` role to `kube-system:default` service account:
  ```bash
  kubectl create clusterrolebinding add-on-cluster-admin \
      --clusterrole=cluster-admin --serviceaccount=kube-system:default
  ```

]

(Defining the exact roles and permissions on your cluster requires
a deeper knowledge of Kubernetes' RBAC model. The command above is
fine for personal and development clusters.)

.debug[[k8s/helm.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/helm.md)]
---

## View available charts

- A public repo is pre-configured when installing Helm

- We can view available charts with `helm search` (and an optional keyword)

- View all available charts:
  ```bash
  helm search
  ```

- View charts related to `prometheus`:
  ```bash
  helm search prometheus
  ```

]

.debug[[k8s/helm.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/helm.md)]
---

## Install a chart

- Most charts use `LoadBalancer` service types by default

- Most charts require persistent volumes to store data

- We need to relax these requirements a bit

- Install the Prometheus metrics collector on our cluster:
  ```bash
  helm install stable/prometheus \
         --set server.service.type=NodePort \
         --set server.persistentVolume.enabled=false
  ```

]

Where do these `--set` options come from?

.debug[[k8s/helm.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/helm.md)]
---

## Inspecting a chart

- `helm inspect` shows details about a chart (including available options)

- See the metadata and all available options for `stable/prometheus`:
  ```bash
  helm inspect stable/prometheus
  ```

]

The chart's metadata includes a URL to the project's home page.

(Sometimes it conveniently points to the documentation for the chart.)

.debug[[k8s/helm.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/helm.md)]
---

## Viewing installed charts

- Helm keeps track of what we've installed

- List installed Helm charts:
  ```bash
  helm list
  ```

]

.debug[[k8s/helm.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/helm.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/aerial-view-of-containers.jpg)]

---

What's next?

.nav[
[Previous section](#toc-managing-stacks-with-helm)
|
[Back to table of contents](#toc-chapter-5)
|
[Next section](#toc-links-and-resources)
]

---
# What's next?

- Congratulations!

- We learned a lot about Kubernetes, its internals, its advanced concepts

- That was just the easy part

- The hard challenges will revolve around *culture* and *people*

- ... What does that mean?

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## Running an app involves many steps

- Write the app

- Tests, QA ...

- Ship *something* (more on that later)

- Provision resources (e.g. VMs, clusters)

- Deploy the *something* on the resources

- Manage, maintain, monitor the resources

- Manage, maintain, monitor the app

- And much more

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## Who does what?

- The old "devs vs ops" division has changed

- In some organizations, "ops" are now called "SRE" or "platform" teams

(and they have very different sets of skills)

- Do you know which team is responsible for each item on the list on the previous page?

- Acknowledge that a lot of tasks are outsourced

(e.g. if we add "buy/rack/provision machines" in that list)

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## What do we ship?

- Some organizations embrace "you build it, you run it"

- When "build" and "run" are owned by different teams, where's the line?

- What does the "build" team ship to the "run" team?

- Let's see a few options, and what they imply

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## Shipping code

- Team "build" ships code

(hopefully in a repository, identified by a commit hash)

- Team "run" containerizes that code

✔️ no extra work for developers

❌ very little advantage of using containers

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## Shipping container images

- Team "build" ships container images

(hopefully built automatically from a source repository)

- Team "run" uses theses images to create e.g. Kubernetes resources

✔️ universal artefact (support all languages uniformly)

✔️ easy to start a single component (good for monoliths)

❌ complex applications will require a lot of extra work

❌ adding/removing components in the stack also requires extra work

❌ complex applications will run very differently between dev and prod

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## Shipping Compose files

(Or another kind of dev-centric manifest)

- Team "build" ships a manifest that works on a single node

(as well as images, or ways to build them)

- Team "run" adapts that manifest to work on a cluster

✔️ all teams can start the stack in a reliable, deterministic manner

❌ adding/removing components still requires *some* work (but less than before)

❌ there will be *some* differences between dev and prod

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## Shipping Kubernetes manifests

- Team "build" ships ready-to-run manifests

(YAML, Helm charts, Kustomize ...)

- Team "run" adjusts some parameters and monitors the application

✔️ parity between dev and prod environments

✔️ "run" team can focus on SLAs, SLOs, and overall quality

❌ requires *a lot* of extra work (and new skills) from the "build" team

❌ Kubernetes is not a very convenient development platform (at least, not yet)

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## What's the right answer?

- It depends on our teams

- existing skills (do they know how to do it?)

- availability (do they have the time to do it?)

- potential skills (can they learn to do it?)

- It depends on our culture

- owning "run" often implies being on call

- do we reward on-call duty without encouraging hero syndrome?

- do we give people resources (time, money) to learn?

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## Tools to develop on Kubernetes

*If we decide to make Kubernetes the primary development platform, here
are a few tools that can help us.*

- Docker Desktop

- Draft

- Minikube

- Skaffold

- Tilt

- ...

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## Where do we run?

- Managed vs. self-hosted

- Cloud vs. on-premises

- If cloud: public vs. private

- Which vendor/distribution to pick?

- Which versions/features to enable?

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

## Some guidelines

- Start small

- Outsource what we don't know

- Start simple, and stay simple as long as possible

(try to stay away from complex features that we don't need)

- Automate

(regularly check that we can successfully redeploy by following scripts)

- Transfer knowledge

(make sure everyone is on the same page/level)

- Iterate!

.debug[[k8s/lastwords-admin.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/lastwords-admin.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/blue-containers.jpg)]

---

Links and resources

.nav[
[Previous section](#toc-whats-next)
|
[Back to table of contents](#toc-chapter-5)
|
[Next section](#toc-operators)
]

---
# Links and resources

- [What is Kubernetes? by Microsoft Azure](https://aka.ms/k8slearning)

- [Azure Kubernetes Service](https://docs.microsoft.com/azure/aks/)

- [Deis Labs](https://deislabs.io) - Cloud Native Developer Tooling

- [Kubernetes Community](https://kubernetes.io/community/) - Slack, Google Groups, meetups

- [Local meetups](https://www.meetup.com/)

- [devopsdays](https://www.devopsdays.org/)

- [Training with Jérôme](https://tinyshellscript.com/)

- **Please rate this session!** (with [this link](https://conferences.oreilly.com/oscon/oscon-or/public/schedule/detail/76390))

.debug[[k8s/links-bridget.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/links-bridget.md)]
---
class: title, self-paced

Thank you!

.debug[[shared/thankyou.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/shared/thankyou.md)]
---

That's all, folks! Questions?

![end](images/end.jpg)

.debug[[shared/thankyou.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/shared/thankyou.md)]
---

.interstitial[![Image separating from the next chapter](https://gallant-turing-d0d520.netlify.com/containers/chinook-helicopter-container.jpg)]

---

Operators

.nav[
[Previous section](#toc-links-and-resources)
|
[Back to table of contents](#toc-chapter-5)
|
[Next section](#toc-)
]

---
# Operators

- Operators are one of the many ways to extend Kubernetes

- We will define operators

- We will see how they work

- We will install a specific operator (for ElasticSearch)

- We will use it to provision an ElasticSearch cluster

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## What are operators?

*An operator represents **human operational knowledge in software,**
 
to reliably manage an application.
— [CoreOS](https://coreos.com/blog/introducing-operators.html)*

Examples:

- Deploying and configuring replication with MySQL, PostgreSQL ...

- Setting up Elasticsearch, Kafka, RabbitMQ, Zookeeper ...

- Reacting to failures when intervention is needed

- Scaling up and down these systems

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## What are they made from?

- Operators combine two things:

- Custom Resource Definitions

- controller code watching the corresponding resources and acting upon them

- A given operator can define one or multiple CRDs

- The controller code (control loop) typically runs within the cluster

(running as a Deployment with 1 replica is a common scenario)

- But it could also run elsewhere

(nothing mandates that the code run on the cluster, as long as it has API access)

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Why use operators?

- Kubernetes gives us Deployments, StatefulSets, Services ...

- These mechanisms give us building blocks to deploy applications

- They work great for services that are made of *N* identical containers

(like stateless ones)

- They also work great for some stateful applications like Consul, etcd ...

(with the help of highly persistent volumes)

- They're not enough for complex services:

- where different containers have different roles

- where extra steps have to be taken when scaling or replacing containers

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Use-cases for operators

- Systems with primary/secondary replication

Examples: MariaDB, MySQL, PostgreSQL, Redis ...

- Systems where different groups of nodes have different roles

Examples: ElasticSearch, MongoDB ...

- Systems with complex dependencies (that are themselves managed with operators)

Examples: Flink or Kafka, which both depend on Zookeeper

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## More use-cases

- Representing and managing external resources

(Example: [AWS Service Operator](https://operatorhub.io/operator/alpha/aws-service-operator.v0.0.1))

- Managing complex cluster add-ons

(Example: [Istio operator](https://operatorhub.io/operator/beta/istio-operator.0.1.6))

- Deploying and managing our applications' lifecycles

(more on that later)

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## How operators work

- An operator creates one or more CRDs

(i.e., it creates new "Kinds" of resources on our cluster)

- The operator also runs a *controller* that will watch its resources

- Each time we create/update/delete a resource, the controller is notified

(we could write our own cheap controller with `kubectl get --watch`)

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## One operator in action

- We will install the UPMC Enterprises ElasticSearch operator

- This operator requires PersistentVolumes

- We will install Rancher's [local path storage provisioner](https://github.com/rancher/local-path-provisioner) to automatically create these

- Then, we will create an ElasticSearch resource

- The operator will detect that resource and provision the cluster

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Installing a Persistent Volume provisioner

(This step can be skipped if you already have a dynamic volume provisioner.)

- This provisioner creates Persistent Volumes backed by `hostPath`

(local directories on our nodes)

- It doesn't require anything special ...

- ... But losing a node = losing the volumes on that node!

- Install the local path storage provisioner:
  ```bash
  kubectl apply -f ~/container.training/k8s/local-path-storage.yaml
  ```

]

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Making sure we have a default StorageClass

- The ElasticSearch operator will create StatefulSets

- These StatefulSets will instantiate PersistentVolumeClaims

- These PVCs need to be explicitly associated with a StorageClass

- Or we need to tag a StorageClass to be used as the default one

- List StorageClasses:
  ```bash
  kubectl get storageclasses
  ```

]

We should see the `local-path` StorageClass.

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Setting a default StorageClass

- This is done by adding an annotation to the StorageClass:

`storageclass.kubernetes.io/is-default-class: true`

- Tag the StorageClass so that it's the default one:
  ```bash
  kubectl annotate storageclass local-path \
            storageclass.kubernetes.io/is-default-class=true
  ```

- Check the result:
  ```bash
  kubectl get storageclasses
  ```

]

Now, the StorageClass should have `(default)` next to its name.

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Install the ElasticSearch operator

- The operator needs:

- a Deployment for its controller
  - a ServiceAccount, ClusterRole, ClusterRoleBinding for permissions
  - a Namespace

- We have grouped all the definitions for these resources in a YAML file

- Install the operator:
  ```bash
  kubectl apply -f ~/container.training/k8s/elasticsearch-operator.yaml
  ```

]

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Wait for the operator to be ready

- Some operators require to create their CRDs separately

- This operator will create its CRD itself

(i.e. the CRD is not listed in the YAML that we applied earlier)

- Wait until the `elasticsearchclusters` CRD shows up:
  ```bash
  kubectl get crds
  ```

]

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Create an ElasticSearch resource

- We can now create a resource with `kind: ElasticsearchCluster`

- The YAML for that resource will specify all the desired parameters:

- how many nodes do we want of each type (client, master, data)
  - image to use
  - add-ons (kibana, cerebro, ...)
  - whether to use TLS or not
  - etc.

- Create our ElasticSearch cluster:
  ```bash
  kubectl apply -f ~/container.training/k8s/elasticsearch-cluster.yaml
  ```

]

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Operator in action

- Over the next minutes, the operator will create:

- StatefulSets (one for master nodes, one for data nodes)

- Deployments (for client nodes; and for add-ons like cerebro and kibana)

- Services (for all these pods)

- Wait for all the StatefulSets to be fully up and running:
  ```bash
  kubectl get statefulsets -w
  ```

]

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Connecting to our cluster

- Since connecting directly to the ElasticSearch API is a bit raw,
 we'll connect to the cerebro frontend instead

- Edit the cerebro service to change its type from ClusterIP to NodePort:
  ```bash
  kubectl patch svc cerebro-es -p "spec: { type: NodePort }"
  ```

- Retrieve the NodePort that was allocated:
  ```bash
  kubectl get svc cerebro-es
  ```

- Connect to that port with a browser

]

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## (Bonus) Setup filebeat

- Let's send some data to our brand new ElasticSearch cluster!

- We'll deploy a filebeat DaemonSet to collect node logs

- Deploy filebeat:
  ```bash
  kubectl apply -f ~/container.training/k8s/filebeat.yaml
  ```

]

We should see at least one index being created in cerebro.

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## (Bonus) Access log data with kibana

- Let's expose kibana (by making kibana-es a NodePort too)

- Then access kibana

- We'll need to configure kibana indexes

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Deploying our apps with operators

- It is very simple to deploy with `kubectl run` / `kubectl expose`

- We can unlock more features by writing YAML and using `kubectl apply`

- Kustomize or Helm let us deploy in multiple environments

(and adjust/tweak parameters in each environment)

- We can also use an operator to deploy our application

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Pros and cons of deploying with operators

- The app definition and configuration is persisted in the Kubernetes API

- Multiple instances of the app can be manipulated with `kubectl get`

- We can add labels, annotations to the app instances

- Our controller can execute custom code for any lifecycle event

- However, we need to write this controller

- We need to be careful about changes

(what happens when the resource `spec` is updated?)

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]
---

## Operators are not magic

- Look at the ElasticSearch resource definition

(`~/container.training/k8s/elasticsearch-cluster.yaml`)

- What should happen if we flip the `use-tls` flag? Twice?

- What should happen if we remove / re-add the kibana or cerebro sections?

- What should happen if we change the number of nodes?

- What if we want different images or parameters for the different nodes?

*Operators can be very powerful, iff we know exactly the scenarios that they can handle.*

.debug[[k8s/operators.md](https://github.com/jpetazzo/container.training/tree/oscon2019/slides/k8s/operators.md)]