Kubernetes Notes
Table of Content
This is a brief of documentation at https://kubernetes.io/docs/concepts/
Concepts
Kubernetes solves problem of resource allocation
Objects in kubernetes
- Pods have unique name
- All k8s objects have an UID
labels
andselectors
to tag and organize objects, helpful from cli or UI.kubectl get pods -l environment=production,tier=frontend
kubectl get pods -l 'environment,environment notin (frontend)'
- Recommended labels:
app.kubernetes.io/name: mysql app.kubernetes.io/instance: mysql-abcxzy app.kubernetes.io/version: "5.7.21" app.kubernetes.io/component: database app.kubernetes.io/part-of: wordpress app.kubernetes.io/managed-by: helm
- Namespaces, are used as environments (useful to allocate users via Resource Quota)
- DNS when creating a service:
<service-name>.<namespace-name>.svc.cluster.local
- Not everything has a namespace, see what are those:
kubectl api-resources --namespaced=false
- DNS when creating a service:
- Annotations, not queri-able like Labels and can contain structured & unstructured obj information (build, client lib info, urls, etc)
- Owners and dependents, for eg. ReplicaSet is owner of a set of Pods, Service owns EndpointSlice.
Kubernetes components
Control Plane:
- kube-apiserver: exposes k8s API, scales horizontally (scales by deploying more insatnces of the app)
- etcd: HA key-value store for cluster data
- kube-scheduler: watches new pods without nodes, selects a node for them to run. Manages scheduling decisions(hardware/software/policy constraints, affinity, etc)
- kube-controler-manager: multiple controllers into a single binary (node, job, endpointslice, serviceAccount controllers)
- cloud-controller-manager: bunch of binaries (node, route, service controllers)
Node components:
- kubelet: make sure containers are running in a pod
- kube-proxy: network proxy that allow pods communicate inside/outside of the cluster
On-prem scenarios
- Ansible Kubespray (for medium-large setups)
- Kubeadm (for small, demos)
- Rancher RKE
rke up
(it requieres cluster.yml) - Redhat Openshift
- VMware Tanzu Kubernetes
Reasons?: Compliances, cheaper, agnostic,
Critical components:
- Etcd HA (backups)
- LB: f5, metallb, haproxy
- HA: multiple nodes AZ
- Persistent Storage: via CSI plugins (block or file storage)
- Upgrades, every 3 mo
- Master nodes, quorum min 3, max 7
- Node reqs: 32 cpu cores, 2tb ram?, 4 SSDs, 8 Sata SSDs, 2 10G eths
Containers
A container image is like a software package that is inmutable. A container engine is the service where the image is executed/ran. Kubernetes supports container runtimes accepted by CRI (container runtime interface) like docker, containerd, cri-o.
Kubernetes has imagePullPolicy
pull policy that specifies whether k8s should pull always Always
, never Never
or if the image doesn’t exist locally IfNotPresent
. As best practice, avoid using :latest
images, it makes harder to rollback to previous image which is also :latest
. You can also specify an image digest in the name: server-image.com:image-name:thetag@sha256:xxxxxxxxxxxxx
Authentication, keep in mind that Kubernetes needs to be authenticated against the Image Repository by using k8s Secrets.
Workloads
Container images are deployed in Pods on kubernetes. Kubernetes itself manages the workload via Controllers to make sure Pods are properly created across the cluster nodes.
The following are built-in workload objects: Deployment and ReplicaSet, StatefulSet, DaemonSet, Job and CronJob. If you need to have an special kind of workload that doesn’t exist you can create your own by using a CRD (custom resource definition).
Pods
- Pods are created on your behalf via Deployment, StatefulSet or DaemonSet
- Pods can share context via linux namespaces, cgroups. So multiple containers can run in the same context.
- Containers in the pod share the same networ, same IP
- Containers can share Storage volume
- A container in a pod can run with enabled privileges via
privileged
on linux, on windows iswindowsOptions.hostProcess
- A pod can have a single container or multiple containers (logs/metrics collection, config reload watcher, proxies like envoy/istio)
- You can modify a running Pod via
patch
orreplace
but it has limitations. You cannot modify most of the metadata. Maybespec.containers[*]
,spec.initContainers[*]
,spec.tolerations
- Lifecycle:
Pending
,Running
,Succeeded
,Failed
,Unknown
- Probes:
- Check mechanisms:
exec
,grpc
,httpGet
,tcpSocket
- Probe outcome:
Success
,Failure
,Unknown
- Types of probe:
livenessProbe
,readinessProbe
,startupProbe
- Check mechanisms:
- Termination of a pod: By default it has a grace of 30 seconds via signal
SIGTERM
. Or rather you can specify--grace-period=0
to quickly terminate the pod. - Pod can have Init Containers, these are containers that are first executed in order to do some previous work before running the App Containers. Work like cloning a git repo, configuring or resolving secrets, just waiting for some external http dependencies, etc.
- There can be multiple init containers defined in
spec
in the manifest yaml file - Init containers cannot have ports linked to a Service
- A pod cannot be
Ready
until init containers have succeeded
- There can be multiple init containers defined in
- PodDisruptionBudget (?)
- You cannot add containers to a running Pod
- You need to troubleshoot? then Ephemeral Containers via
kubectl debug (POD_NAME) --image=(DEBUG_IMAGE) --target=(EXISTING_CONTAINER_NAME)
. Once this is running then you cankubectl attach
orkubectl exec
, then start analyzing the Pod.
- You need to troubleshoot? then Ephemeral Containers via
Workload resources
tbd
Topics to review
- Kubernetes architecture: control plane, etcd, api server, scheduler, controllers
- nodes: kubelet, container runtime, kube-proxy
- authentication and authorization
- How docker works?
- deployment & services
- configuration & secrets
- ingress, statefulset, daemonset, jobs, crds
- network policies, persisten volumes, storage classes
- prometheus, grafana elk, fluentd
- helm, fluxcd
- rbac, security contexts
- scaling: HPA and VPA, cluster autoscaling
Random notes
- kubectl exec, happens thanks to kube-api, kubelete and websockets via HTTPS
- each pod has cgroups to limit resource allocations
TODOs
- Learn linux namespaces & cgroups
- PoC run two containers in a pod. Ping container A to container B.