CKA Exam Commands ¶

Basics¶

Create pod in finance namespace

k run redis --image=redis -n finance

Create a service and expose it on port 6379

apiVersion: v1
kind: Service
metadata:
name: redis-service
spec:
selector:
    app.kubernetes.io/name: MyApp
ports:
    - protocol: TCP
    port: 6379
    targetPort: 6379

Create a deployment named webapp using the image kodekloud/webapp-color with 3 replicas

apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
labels:
    app: nginx
spec:
replicas: 3
selector:
    matchLabels:
    app: nginx
template:
    metadata:
    labels:
        app: nginx
    spec:
    containers:
    - name: nginx
        image: kodekloud/webapp-color
        ports:
        - containerPort: 80

Create a new pod called custom-nginx using the nginx image and expose it on container port 8080

apiVersion: v1
kind: Pod
metadata:
name: custom-nginx
labels:
    tier: db
spec:
containers:
- name: nginx
    image: nginx
    ports:
    - containerPort: 8080

Create a new namespace called dev-ns.

k create ns dev-ns

Create a pod called httpd using the image httpd:alpine in the default namespace. Next, create a service of type ClusterIP by the same name (httpd). The target port for the service should be 80

k run httpd --image=httpd:alpine --port=80 --expose

service/httpd created
pod/httpd created

Scheduling¶

Node Affinity¶

# apply label
k label nodes node01 color=blue

# create a deployment 
k create deployment blue --replicas=3 --image=nginx

# check the taints on the node
kubectl describe node controlplane | grep -i taints

Create a new deployment named red with the nginx image and 2 replicas, and ensure it gets placed on the controlplane node only. Use the label key node-role.kubernetes.io/control-plane which is already set on the controlplane node.

# use the exists operator as shown below
    spec:
    affinity:
        nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
            - key: node-role.kubernetes.io/control-plane
                operator: Exists

Manual Scheduling¶

Manually schedule the pod on node01

apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
nodeName: node01
containers:
-  image: nginx
    name: nginx

Labels and Selectors¶

# count the number of pods with env=dev label
k get po -l env=dev --show-labels | wc

# Identify the POD which is part of the prod environment, the finance BU and of frontend tier?
k get pod  --selector env=prod,bu=finance,tier=frontend  --show-labels

Taints and Tolerations¶

Node affinity is a property of Pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite -- they allow a node to repel a set of pods.

Tolerations are applied to pods. Tolerations allow the scheduler to schedule pods with matching taints. Tolerations allow scheduling but don't guarantee scheduling: the scheduler also evaluates other parameters as part of its function.

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.

# You add a taint to a node using kubectl taint. For example,
kubectl taint nodes node1 key1=value1:NoSchedule


# To remove the taint added by the command above, you can run:
kubectl taint nodes node1 key1=value1:NoSchedule-

# Specify a toleration for a pod in the PodSpec.
tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule"


tolerations:
- key: "key1"
  operator: "Exists"
  effect: "NoSchedule"

# Here's an example of a pod that uses tolerations:
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  tolerations:
  - key: "example-key"
    operator: "Exists"
    effect: "NoSchedule"

Check taints on the nodes

k describe  nodes  node01 | grep -i taint # use describe instead of get

Create a taint on node01 with key of spray, value of mortein and effect of NoSchedule

k taint node node01 spray=mortain:NoSchedule

Create another pod named bee with the nginx image, which has a toleration set to the taint mortein.

First do dry run using

k run bee --image=nginx --dry-run=client -o yaml > test_pod.yaml

apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
    run: bee
name: bee
spec:
tolerations:
- key: "spray"
    operator: "Equal"
    value: "mortein"
    effect: "NoSchedule"
containers:
- image: nginx
    name: bee
    resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

Resource Limits¶

If the node where a Pod is running has enough of a resource available, it's possible (and allowed) for a container to use more resource than its request for that resource specifies. However, a container is not allowed to use more than its resource limit.

# replace the pod using the replace command
k replace --force  -f /tmp/kubectl-edit-2304618812.yaml

pod "elephant" deleted
pod/elephant replaced

DeamonSets¶

A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.

Some typical uses of a DaemonSet are:

running a cluster storage daemon on every node
running a logs collection daemon on every node
running a node monitoring daemon on every node

On how many nodes are the pods scheduled by the DaemonSet kube-proxy?

k -n kube-system describe ds kube-proxy  # check the pod status

Deploy a DaemonSet for FluentD Logging with Name: elasticsearch, Namespace: kube-system and Image: registry.k8s.io/fluentd-elasticsearch:1.20

How to create a DS ?

An easy way to create a DaemonSet is to first generate a YAML file for a Deployment with the command kubectl create deployment elasticsearch --image=registry.k8s.io/fluentd-elasticsearch:1.20 -n kube-system --dry-run=client -o yaml > fluentd.yaml. Next, remove the replicas, strategy and status fields from the YAML file using a text editor. Also, change the kind from Deployment to DaemonSet. Finally, create the Daemonset by running kubectl create -f fluentd.yaml

Static Pods¶

How many static pods exist in this cluster in all namespaces?

Run the command kubectl get pods --all-namespaces and look for those with -controlplane appended in the name

What is the path of the directory holding the static pod definition files?

/etc/kubernetes/manifests/

Create a static pod named static-busybox that uses the busybox image and the command sleep 1000

kubectl run --restart=Never --image=busybox static-busybox --dry-run=client -o yaml --command -- sleep 1000 > /etc/kubernetes/manifests/static-busybox.yaml

The path need not be /etc/kubernetes/manifests. Make sure to check the path configured in the kubelet configuration file.

root@controlplane:~# ssh node01 
root@node01:~# ps -ef |  grep /usr/bin/kubelet 
root        4147       1  0 14:05 ?        00:00:00 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9
root        4773    4733  0 14:05 pts/0    00:00:00 grep /usr/bin/kubelet

root@node01:~# grep -i staticpod /var/lib/kubelet/config.yaml
staticPodPath: /etc/just-to-mess-with-you

Logging and Monitoring¶

Identify the POD that consumes the most Memory(bytes) in default namespace.

k top pod
NAME       CPU(cores)   MEMORY(bytes)   
elephant   19m          32Mi            
lion       1m           18Mi            
rabbit     129m         252Mi

Appilcation Lifecycle Maintainance¶

What command is run at container startup?

FROM python:3.6-alpine

RUN pip install flask

COPY . /opt/

EXPOSE 8080

WORKDIR /opt

ENTRYPOINT ["python", "app.py"]

Ans is app.py

# Another question
FROM python:3.6-alpine

RUN pip install flask

COPY . /opt/

EXPOSE 8080

WORKDIR /opt

ENTRYPOINT ["python", "app.py"]

CMD ["--color", "red"]

Ans is python app.py --color red

# Question 3

apiVersion: v1 
kind: Pod 
metadata:
name: webapp-green
labels:
    name: webapp-green 
spec:
containers:
- name: simple-webapp
    image: kodekloud/webapp-color
    command: ["--color","green"]

---

FROM python:3.6-alpine

RUN pip install flask

COPY . /opt/

EXPOSE 8080

WORKDIR /opt

ENTRYPOINT ["python", "app.py"]

CMD ["--color", "red"]

Ans is --color green

What command is run at container startup? Assume the image was created from the Dockerfile in this directory

FROM python:3.6-alpine

RUN pip install flask

COPY . /opt/

EXPOSE 8080

WORKDIR /opt

ENTRYPOINT ["python", "app.py"]

CMD ["--color", "red"]

---
apiVersion: v1 
kind: Pod 
metadata:
name: webapp-green
labels:
    name: webapp-green 
spec:
containers:
- name: simple-webapp
    image: kodekloud/webapp-color
    command: ["python", "app.py"]
    args: ["--color", "pink"]

Ans is python app.py --color pink

#Create a pod with the given specifications. By default it displays a blue background. Set the given command line arguments to change it to green. Command line arguments: --color=green

apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
    run: webapp-green
name: webapp-green
spec:
containers:
- image: kodekloud/webapp-color
    args: ["--color","green"]
    name: webapp-green
    resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

Env variables¶

# create a cm with 2 params as shown below
k create cm webapp-config-map --from-literal APP_COLOR=darkblue  --from-literal APP_OTHER=disregard

# Update the environment variable on the POD to use only the APP_COLOR key from the newly created ConfigMap. 
apiVersion: v1
kind: Pod
metadata:
  labels:
    name: webapp-color
  name: webapp-color
  namespace: default
spec:
  containers:
    - name: webapp-color
      image: kodekloud/webapp-color
      env:
      - name: APP_COLOR
        valueFrom:
          configMapKeyRef:
            name: webapp-config-map
            key: APP_COLOR

Secrets¶

# The reason the application is failed is because we have not created the secrets yet. Create a new secret named db-secret with the data given below.
k create secret generic db-secret --from-literal DB_Host=sql01 --from-literal DB_User=root --from-literal DB_Password=password123

Configure webapp-pod to load environment variables from the newly created secret.

apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2023-11-19T06:03:03Z"
labels:
    name: webapp-pod
name: webapp-pod
namespace: default
spec:
containers:
- image: kodekloud/simple-webapp-mysql
    imagePullPolicy: Always
    name: webapp
    envFrom:
    - secretRef:
        name: db-secret

Multi container pods¶

Create a multi-container pod with 2 containers

apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
    run: yellow
name: yellow
spec:
containers:
- image: busybox
    name: lemon
    resources: {}
- image: redis
    name: gold
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

Init Containers¶

# Sample
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app.kubernetes.io/name: MyApp
spec:
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
  - name: init-mydb
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"]

Cluster Maintainance¶

We need to take node01 out for maintenance. Empty the node of all applications and mark it unschedulable.

k drain node01 --ignore-daemonsets 

node/node01 cordoned
Warning: ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-f5ttj, kube-system/kube-proxy-vg6wp
evicting pod default/blue-6b478c8dbf-vrh86
evicting pod default/blue-6b478c8dbf-j7g8d
pod/blue-6b478c8dbf-j7g8d evicted
pod/blue-6b478c8dbf-vrh86 evicted
node/node01 drained

drain error due to no controller

k drain node01 --ignore-daemonsets 
node/node01 cordoned
error: unable to drain node "node01" due to error:cannot delete Pods declare no controller (use --force to override): default/hr-app, continuing command...
There are pending nodes to be drained:
node01
cannot delete Pods declare no controller (use --force to override): default/hr-app

What is the current version of the cluster?

kubectl get nodes and look at the VERSION

What is the latest stable version of Kubernetes as of today?

kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.26.0
[upgrade/versions] kubeadm version: v1.26.0
I1119 12:04:28.812883   18925 version.go:256] remote version is much newer: v1.28.4; falling back to: stable-1.26
[upgrade/versions] Target version: v1.26.11
[upgrade/versions] Latest version in the v1.26 series: v1.26.11

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       TARGET
kubelet     2 x v1.26.0   v1.26.11

Upgrade to the latest version in the v1.26 series:

COMPONENT                 CURRENT   TARGET
kube-apiserver            v1.26.0   v1.26.11
kube-controller-manager   v1.26.0   v1.26.11
kube-scheduler            v1.26.0   v1.26.11
kube-proxy                v1.26.0   v1.26.11
CoreDNS                   v1.9.3    v1.9.3
etcd                      3.5.6-0   3.5.6-0

its v1.28.4 as shown above

Upgrade the controlplane components to exact version v1.27.0

Upgrade the kubeadm tool (if not already), then the controlplane components, and finally the kubelet. Practice referring to the Kubernetes documentation page.

Note: While upgrading kubelet, if you hit dependency issues while running the apt-get upgrade kubelet command, use the apt install kubelet=1.27.0-00 command instead.

# On the node01 node, run the following commands:
# If you are on the controlplane node
ssh node01 # to log in to the node01.
# This will update the package lists from the software repository.

apt-get update

#  This will install the kubeadm version 1.27.0.

apt-get install kubeadm=1.27.0-00

#  This will upgrade the node01 configuration.

kubeadm upgrade node

#  This will update the kubelet with the version 1.27.0.

apt-get install kubelet=1.27.0-00 

#  You may need to reload the daemon and restart the kubelet service after it has been upgraded.

systemctl daemon-reload
systemctl restart kubelet

At what address can you reach the ETCD cluster from the controlplane node?

k -n kube-system describe po etcd-controlplane 
Name:                 etcd-controlplane
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 controlplane/192.25.158.9
Start Time:           Sun, 19 Nov 2023 12:39:56 -0500
Labels:               component=etcd
                    tier=control-plane
Annotations:          kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.25.158.9:2379
                    kubernetes.io/config.hash: 88719a0e6555d94fd96af8b6011a2af6
                    kubernetes.io/config.mirror: 88719a0e6555d94fd96af8b6011a2af6
                    kubernetes.io/config.seen: 2023-11-19T12:39:38.200107235-05:00
                    kubernetes.io/config.source: file
Status:               Running
SeccompProfile:       RuntimeDefault
IP:                   192.25.158.9
IPs:
IP:           192.25.158.9
Controlled By:  Node/controlplane
Containers:
etcd:
    Container ID:  containerd://f21102066ab677d48612ffc74802a43ae023daa92feeab805b0a80da2e53f495
    Image:         registry.k8s.io/etcd:3.5.7-0
    Image ID:      registry.k8s.io/etcd@sha256:51eae8381dcb1078289fa7b4f3df2630cdc18d09fb56f8e56b41c40e191d6c83
    Port:          <none>
    Host Port:     <none>
    Command:
    etcd
    --advertise-client-urls=https://192.25.158.9:2379
    --cert-file=/etc/kubernetes/pki/etcd/server.crt
    --client-cert-auth=true
    --data-dir=/var/lib/etcd
    --experimental-initial-corrupt-check=true
    --experimental-watch-progress-notify-interval=5s
    --initial-advertise-peer-urls=https://192.25.158.9:2380
    --initial-cluster=controlplane=https://192.25.158.9:2380
    --key-file=/etc/kubernetes/pki/etcd/server.key
    --listen-client-urls=https://127.0.0.1:2379,https://192.25.158.9:2379
    --listen-metrics-urls=http://127.0.0.1:2381
    --listen-peer-urls=https://192.25.158.9:2380
    --name=controlplane
    --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    --peer-client-cert-auth=true
    --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    --snapshot-count=10000
    --trusted-ca-file=/et

check the listen-client-urls as shown above

Backup the Etcd

The master node in our cluster is planned for a regular maintenance reboot tonight. While we do not anticipate anything to go wrong, we are required to take the necessary backups. Take a snapshot of the ETCD database using the built-in snapshot functionality.

Store the backup file at location /opt/snapshot-pre-boot.db

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /opt/snapshot-pre-boot.db

How many clusters are defined in the kubeconfig on the student-node?

k config get-clusters 

NAME
cluster2
cluster1

How to switch from one cluster to another?

k config use-context cluster1

#If you check out the pods running in the kube-system namespace in cluster1, you will notice that etcd is running as a pod:

$  kubectl config use-context cluster1
Switched to context "cluster1".

$  kubectl get pods -n kube-system | grep etcd
etcd-cluster1-controlplane                      1/1     Running   0              9m26s


# This means that ETCD is set up as a Stacked ETCD Topology where the distributed data storage cluster provided by etcd is stacked on top of the cluster formed by the nodes managed by kubeadm that run control plane components.

# Using the external etcd

If you check out the pods running in the kube-system namespace in cluster2, you will notice that there are NO etcd pods running in this cluster!

student-node ~ ➜  kubectl config use-context cluster2
Switched to context "cluster2".

student-node ~ ➜  kubectl get pods -n kube-system  | grep etcd

student-node ~ ✖ 

Also, there is NO static pod configuration for etcd under the static pod path:

student-node ~ ✖ ssh cluster2-controlplane
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1086-gcp x86_64)

* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com
* Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.
Last login: Wed Aug 31 05:05:04 2022 from 10.1.127.14

cluster2-controlplane ~ ➜  ls /etc/kubernetes/manifests/ | grep -i etcd

cluster2-controlplane ~ ✖ 

However, if you inspect the process on the controlplane for cluster2, you will see that that the process for the kube-apiserver is referencing an external etcd datastore:

cluster2-controlplane ~ ✖ ps -ef | grep etcd
root        1705    1320  0 05:03 ?        00:00:31 kube-apiserver --advertise-address=10.1.127.3 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.pem --etcd-certfile=/etc/kubernetes/pki/etcd/etcd.pem --etcd-keyfile=/etc/kubernetes/pki/etcd/etcd-key.pem --etcd-servers=https://10.1.127.10:2379 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
root        5754    5601  0 05:15 pts/0    00:00:00 grep etcd

cluster2-controlplane ~ ➜  

# You can see the same information by inspecting the kube-apiserver pod (which runs as a static pod in the kube-system namespace):

What is the IP address of the External ETCD datastore used in cluster2?

ps -ef | grep etcd # after doing ssh to controlPlane
root        1747    1383  0 20:07 ?        00:05:56 kube-apiserver --advertise-address=192.28.229.12 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.pem --etcd-certfile=/etc/kubernetes/pki/etcd/etcd.pem --etcd-keyfile=/etc/kubernetes/pki/etcd/etcd-key.pem --etcd-servers=https://192.28.229.24:2379 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key

IP is 192.28.229.24

What is the default data directory used the for ETCD datastore used in cluster1?

ps -ef | grep -i etcd
root        1867    1383  0 20:08 ?        00:02:25 etcd --advertise-client-urls=https://192.28.229.9:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --experimental-initial-corrupt-check=true --initial-advertise-peer-urls=https://192.28.229.9:2380 --initial-cluster=cluster1-controlplane=https://192.28.229.9:2380 --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://192.28.229.9:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://192.28.229.9:2380 --name=cluster1-controlplane --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

Ans is var/lib/etcd

---

# First set the context to cluster1:

$  kubectl config use-context cluster1
Switched to context "cluster1".


# Next, inspect the endpoints and certificates used by the etcd pod. We will make use of these to take the backup.

$ kubectl describe  pods -n kube-system etcd-cluster1-controlplane  | grep advertise-client-urls

--advertise-client-urls=https://10.1.218.16:2379

$  kubectl describe  pods -n kube-system etcd-cluster1-controlplane  | grep pki
    --cert-file=/etc/kubernetes/pki/etcd/server.crt
    --key-file=/etc/kubernetes/pki/etcd/server.key
    --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    /etc/kubernetes/pki/etcd from etcd-certs (rw)

    Path:          /etc/kubernetes/pki/etcd

# SSH to the controlplane node of cluster1 and then take the backup using the endpoints and certificates we identified above:

controlplane$ 
ETCDCTL_API=3 etcdctl \
--endpoints=https://10.1.220.8:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /opt/cluster1.db

Snapshot saved at /opt/cluster1.db


# Finally, copy the backup to the student-node. To do this, go back to the student-node and use scp as shown below:

$  scp cluster1-controlplane:/opt/cluster1.db /opt

An ETCD backup for cluster2 is stored at /opt/cluster2.db. Use this snapshot file to carryout a restore on cluster2 to a new path /var/lib/etcd-data-new

# Step 1. Copy the snapshot file from the student-node to the etcd-server. In the example below, we are copying it to the /root directory:

student-node ~  scp /opt/cluster2.db etcd-server:/root
cluster2.db                                                                                                        100% 1108KB 178.5MB/s   00:00    

student-node ~ ➜  

# Step 2: Restore the snapshot on the cluster2. Since we are restoring directly on the etcd-server, we can use the endpoint https:/127.0.0.1. Use the same certificates that were identified earlier. Make sure to use the data-dir as /var/lib/etcd-data-new:

etcd-server ~ ➜  ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/pki/ca.pem --cert=/etc/etcd/pki/etcd.pem --key=/etc/etcd/pki/etcd-key.pem snapshot restore /root/cluster2.db --data-dir /var/lib/etcd-data-new

{"level":"info","ts":1662004927.2399247,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}
{"level":"info","ts":1662004927.2584803,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"0","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":1662004927.264258,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/root/cluster2.db","wal-dir":"/var/lib/etcd-data-new/member/wal","data-dir":"/var/lib/etcd-data-new","snap-dir":"/var/lib/etcd-data-new/member/snap"}

etcd-server ~ ➜  

# Step 3: Update the systemd service unit file for etcdby running vi /etc/systemd/system/etcd.service and add the new value for data-dir:

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd \
--name etcd-server \
--data-dir=/var/lib/etcd-data-new \
---End of Snippet---


# Step 4: make sure the permissions on the new directory is correct (should be owned by etcd user):

etcd-server /var/lib ➜  chown -R etcd:etcd /var/lib/etcd-data-new

etcd-server /var/lib ➜ 


etcd-server /var/lib ➜  ls -ld /var/lib/etcd-data-new/
drwx------ 3 etcd etcd 4096 Sep  1 02:41 /var/lib/etcd-data-new/
etcd-server /var/lib ➜ 

#  Step 5: Finally, reload and restart the etcd service.

etcd-server ~/default.etcd ➜  systemctl daemon-reload 
etcd-server ~ ➜  systemctl restart etcd


#  Step 6 (optional): It is recommended to restart controlplane components (e.g. kube-scheduler, kube-controller-manager, kubelet) to ensure that they don't rely on some stale data.

Security¶

View Cert Details¶

Identify the certificate file used for the kube-api server and Identify the Certificate file used to authenticate kube-apiserver as a client to ETCD Server

controlplane /etc/kubernetes/pki ➜  ls /etc/kubernetes/pki/ | grep .crt
apiserver.crt
apiserver-etcd-client.crt
apiserver-kubelet-client.crt
ca.crt
front-proxy-ca.crt
front-proxy-client.crt

Ans is apiserver.crt for 1^st question and apiserver-etcd-client.crt for 2^nd

controlplane /etc/kubernetes/pki ➜  ls /etc/kubernetes/pki/ | grep .key
apiserver-etcd-client.key
apiserver.key
apiserver-kubelet-client.key # key used to authenticate kubeapi-server to the kubelet server.
ca.key
front-proxy-ca.key
front-proxy-client.key
sa.key

Identify the ETCD Server Certificate used to host ETCD server

# TIP: Look for cert-file option in the file /etc/kubernetes/manifests/etcd.yaml.
controlplane /etc/kubernetes/manifests ➜  cat etcd.yaml | grep -i .crt
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt # answer
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

Identify the ETCD Server CA Root Certificate used to serve ETCD Server

ETCD can have its own CA. So this may be a different CA certificate than the one used by kube-api server.

# TIP: Look for CA Certificate (trusted-ca-file) in file /etc/kubernetes/manifests/etcd.yaml.

controlplane /etc/kubernetes/manifests ➜  cat etcd.yaml | grep -i .crt
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt # ans

What is the Common Name (CN) configured on the Kube API Server Certificate?

OpenSSL Syntax: openssl x509 -in file-path.crt -text -noout

# TIP: Run the command openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text and look for Subject CN.

controlplane /etc/kubernetes/pki ✖ openssl x509 -in apiserver.crt -text | grep -i cn
        Issuer: CN = kubernetes             # Name of CA who issued the cert
        Subject: CN = kube-apiserver        # What is the Common Name (CN) configured on the Kube API Server
MIIDjDCCAnSgAwIBAgIIVPn/5jFVfAMwDQYJKoZIhvcNAQELBQAwFTETMBEGA1UE
nUyXcccNRLfQfrhu9NoD+4Nq7gM99y5QRpD8QimBnv1DBzXk+XWoC2Ka3EpmRzZZ

Which of the below alternate names is not configured on the Kube API Server Certificate?

#TIP: Run the command openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text and look at Alternative Names as shown below

X509v3 Subject Alternative Name: 
DNS:controlplane, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:10.96.0.1, IP Address:192.2.128.9

How long, from the issued date, is the Kube-API Server Certificate valid for?

#TIP:  Run the command openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text and check on the Expiry date.
        Issuer: CN = etcd-ca
        Validity
            Not Before: Nov 20 00:50:29 2023 GMT
            Not After : Nov 19 00:50:29 2024 GMT

How long, from the issued date, is the Root CA Certificate valid for?

#TIP: Run the command openssl x509 -in /etc/kubernetes/pki/ca.crt -text and look for the validity.
    Data:
        Version: 3 (0x2)
        Serial Number: 0 (0x0)
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN = kubernetes
        Validity
            Not Before: Nov 20 00:50:28 2023 GMT
            Not After : Nov 17 00:50:28 2033 GMT
        Subject: CN = kubernetes

The kube-api server stopped again! Check it out. Inspect the kube-api server logs and identify the root cause and fix the issue

Run crictl ps -a command to identify the kube-api server container. Run crictl logs container-id command to view the logs.

crictl ps -a
CONTAINER           IMAGE               CREATED              STATE               NAME                      ATTEMPT             POD ID              POD
3bac921cb4a5a       6f707f569b572       16 seconds ago       Running             kube-apiserver            0                   fec26f7c6715a       kube-apiserver-controlplane
33527b14bc1bf       f73f1b39c3fe8       22 seconds ago       Running             kube-scheduler            2                   fb3dc26664afc       kube-scheduler-controlplane
627a986414afe       95fe52ed44570       25 seconds ago       Running             kube-controller-manager   2                   c674e558cf141       kube-controller-manager-controlplane
5d7b588da6e90       86b6af7dd652c       About a minute ago   Running             etcd                      0                   efd0a8d9c600e       etcd-controlplane
762ccbdfde7a8       f73f1b39c3fe8       4 minutes ago        Exited              kube-scheduler            1                   fb3dc26664afc       kube-scheduler-controlplane
b3e4272eefb7b       95fe52ed44570       4 minutes ago        Exited              kube-controller-manager   1                   c674e558cf141       kube-controller-manager-controlplane
7fa9a78979c3e       ead0a4a53df89       35 minutes ago       Running             coredns                   0                   c23b177006c7a       coredns-5d78c9869d-p8tjq
bf84f7ebb5d43       ead0a4a53df89       35 minutes ago       Running             coredns                   0                   274222b10e501       coredns-5d78c9869d-84jrn
247e828f3e5e3       8b675dda11bb1       35 minutes ago       Running             kube-flannel              0                   579d556b90be1       kube-flannel-ds-6xdvn
b412cc3976452       8b675dda11bb1       35 minutes ago       Exited              install-cni               0                   579d556b90be1       kube-flannel-ds-6xdvn
63734700d5255       fcecffc7ad4af       35 minutes ago       Exited              install-cni-plugin        0                   579d556b90be1       kube-flannel-ds-6xdvn
7c4e662a2827d       5f82fc39fa816       35 minutes ago       Running             kube-proxy                0                   60aabd71d5180       kube-proxy-hs4c4

crictl logs da40e86464c04
I1120 01:29:05.406801       1 server.go:551] external host was not specified, using 192.2.128.9
I1120 01:29:05.407768       1 server.go:165] Version: v1.27.0
I1120 01:29:05.407793       1 server.go:167] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I1120 01:29:05.685211       1 shared_informer.go:311] Waiting for caches to sync for node_authorizer
I1120 01:29:05.694542       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I1120 01:29:05.694560       1 plugins.go:161] Loaded 13 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,ClusterTrustBundleAttest,CertificateSubjectRestriction,ValidatingAdmissionPolicy,ValidatingAdmissionWebhook,ResourceQuota.
W1120 01:29:05.700731       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
"Addr": "127.0.0.1:2379",
"ServerName": "127.0.0.1",
"Attributes": null,
"BalancerAttributes": null,
"Type": 0,
"Metadata": null
}. Err: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority"
W1120 01:29:06.688471       1 logging.go:59] [core] [Channel #4 SubChannel #6] grpc: addrConn.createTransport failed to connect to {
"Addr": "127.0.0.1:2379",
"ServerName": "127.0.0.1",
"Attributes": null,
"BalancerAttributes": null,
"Type": 0,
"Metadata": null

Here specify the right ca cert file

A new member akshay joined our team. He requires access to our cluster

The Certificate Signing Request is at the /root location.

Use this command to generate the base64 encoded format as following: -

cat akshay.csr | base64 -w 0

Finally, save the below YAML in a file and create a CSR name akshay as follows: -

---
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
name: akshay
spec:
groups:
- system:authenticated
request: <Paste the base64 encoded value of the CSR file>
signerName: kubernetes.io/kube-apiserver-client
usages:
- client auth


kubectl apply -f akshay-csr.yaml

Sample shown below

controlplane ~ ➜  cat csr-mani.yaml 
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
name: akshay
spec:
request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQ1ZqQ0NBVDRDQVFBd0VURVBNQTBHQTFVRUF3d0dZV3R6YUdGNU1JSUJJakFOQmdrcWhraUc5dzBCQVFFRgpBQU9DQVE4QU1JSUJDZ0tDQVFFQTc2WjA5ZlgzLzhaanB1Rlc2aE9xeG1tYW1qb3VRVDZvSDJ0WHltd0xVOGx5Cmg4dTQwV1RtTVRxbzk4Kzk0a3lnOTdKUFRWbDdsWkNRbkZKdmlpTlAzVlRRa0tOU3FOakQzcGRESUxsUXErcHQKeDV2bXhhcUxmTlZocEt5QzdkZlk1L1VEZHNPT05CYit4dWNkNmx4YU5kdTJqMml4alF2aisyOXdRdExvaUYxNQpQNDZ5NkQ0c1dnb04zcWc4Y1RhNTRNcnRPc1FBem1CZHdQcnVXNXFlODBNaGMrQk9HWmx2YlZPcmIzREVINmFOCmNTMzA2SGlwUzl5TkpOMzArdThwd1FtcS9QM0JneHJuOS9DNkhPY1JiaHQ0WTE2Q2hjZUk3anFjcVRHbithcE4KemgxeDN2ZGg3dVNCam1Pb3JsNVpTYW1FcHhOdnBpVkdqNUZMaVBYYW93SURBUUFCb0FBd0RRWUpLb1pJaHZjTgpBUUVMQlFBRGdnRUJBRFpINmFtcXdEMjZDM1dwVjlKNzM1N3hibGhIQVkrK3FOSzR6Qk1jZE01dnUwV1VYK3dGCkFRd2czREFORW52UThMdWJnV3RLaEkwUGxPbjRWK1JSZzIxK01qUFhsUzNDWkZodEN6VE9oY0hwUGVBQnZZQnEKWkthTHBTTVlTdEVqYnNsWGg1dVhiZmxMRHBRSllZNEdTc2tXRStsZnVzTUNyNFhGNzNSVUNFWHdHZGFFNFdpcAp6WGFsb0x1ZFdneGFmVlRSR1JWK2RKMXNuV2pMaWRySVU3NDZxUVZiUW1Gc0pWU2VaTjZNSGRiU0xIZFZNZjJMCmR3dThNcUVpRUQzeUhJT2dCdlowaWJQT1VtMmFrZVFUT2F5cEhCQVJJanRSYVA5cnhuYUM4ckNZK0czb0MvdEQKY1hpajk3ak5pRzU0UWJMMEN5NUdqcEQ2YndTek5iM0dpN3c9Ci0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=
signerName: kubernetes.io/kube-apiserver-client
expirationSeconds: 86400  # one day
usages:
- client auth

csr is created as shown below

controlplane ~ ➜  k apply -f csr-mani.yaml 
certificatesigningrequest.certificates.k8s.io/akshay created

check the CSR's

controlplane ~ ➜  k get csr
NAME        AGE   SIGNERNAME                                    REQUESTOR                  REQUESTEDDURATION   CONDITION
akshay      73s   kubernetes.io/kube-apiserver-client           kubernetes-admin           24h                 Pending
csr-8wnnl   17m   kubernetes.io/kube-apiserver-client-kubelet   system:node:controlplane   <none>              Approved,Issued

Please approve the CSR

controlplane ~ ✖ k certificate approve akshay
certificatesigningrequest.certificates.k8s.io/akshay approved

Describe the new CSR you got

# TIP: use the get -o yaml instead of describe
controlplane ~ ➜  k get  csr agent-smith -o yaml 

apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
creationTimestamp: "2023-11-20T02:04:44Z"
name: agent-smith
resourceVersion: "2170"
uid: 8eeaf351-96ac-488a-8d83-cab1edc14605
spec:
groups:
- system:masters
- system:authenticated
request: XXXX_OMITTED
signerName: kubernetes.io/kube-apiserver-client
usages:
- digital signature
- key encipherment
- server auth
username: agent-x
status: {}

Kubeconfig¶

Where is the default kubeconfig file located in the current environment?

/root/.kube/config

How many clusters are defined in the default kubeconfig file?

# Run the kubectl config view command and count the number of clusters.
controlplane ~/.kube ➜  k config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://controlplane:6443
name: kubernetes
contexts:
- context:  
    cluster: kubernetes
    user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:  # numnber of users
- name: kubernetes-admin
user:
    client-certificate-data: DATA+OMITTED
    client-key-data: DATA+OMITTED

Lets check the kube config file

~/.kube ➜  cat /root/my-kube-config 
apiVersion: v1
kind: Config

clusters: # 4 clusters are configured
- name: production
cluster:
    certificate-authority: /etc/kubernetes/pki/ca.crt
    server: https://controlplane:6443

- name: development
cluster:
    certificate-authority: /etc/kubernetes/pki/ca.crt
    server: https://controlplane:6443

- name: kubernetes-on-aws
cluster:
    certificate-authority: /etc/kubernetes/pki/ca.crt
    server: https://controlplane:6443

- name: test-cluster-1
cluster:
    certificate-authority: /etc/kubernetes/pki/ca.crt
    server: https://controlplane:6443

contexts:
- name: test-user@development
context:
    cluster: development
    user: test-user

- name: aws-user@kubernetes-on-aws
context:
    cluster: kubernetes-on-aws
    user: aws-user

- name: test-user@production
context:
    cluster: production
    user: test-user

- name: research
context:
    cluster: test-cluster-1
    user: dev-user  # user for research context

users:
- name: test-user
user:
    client-certificate: /etc/kubernetes/pki/users/test-user/test-user.crt
    client-key: /etc/kubernetes/pki/users/test-user/test-user.key
- name: dev-user
user:
    client-certificate: /etc/kubernetes/pki/users/dev-user/developer-user.crt
    client-key: /etc/kubernetes/pki/users/dev-user/dev-user.key
- name: aws-user
user:
    client-certificate: /etc/kubernetes/pki/users/aws-user/aws-user.crt
    client-key: /etc/kubernetes/pki/users/aws-user/aws-user.key

current-context: test-user@development # current context
preferences: {}

I would like to use the dev-user to access test-cluster-1. Set the current context to the right one so I can do that

# TIP: use the right context file as well
controlplane ~ ➜  k config --kubeconfig /root/my-kube-config use-context research
Switched to context "research".

# Test it
controlplane ~ ➜  k config --kubeconfig /root/my-kube-config current-context 
research

RBAC¶

Role-based access control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within your organization.

The RBAC API declares four kinds of Kubernetes object: Role, ClusterRole, RoleBinding and ClusterRoleBinding.

An RBAC Role or ClusterRole contains rules that represent a set of permissions. Permissions are purely additive (there are no "deny" rules). A Role always sets permissions within a particular namespace; when you create a Role, you have to specify the namespace it belongs in. ClusterRole, by contrast, is a non-namespaced resource. The resources have different names (Role and ClusterRole) because a Kubernetes object always has to be either namespaced or not namespaced; it can't be both.

# Read role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "watch", "list"]

ClusterRole example shown below

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  # "namespace" omitted since ClusterRoles are not namespaced
  name: secret-reader
rules:
- apiGroups: [""]
  #
  # at the HTTP level, the name of the resource for accessing Secret
  # objects is "secrets"
  resources: ["secrets"]
  verbs: ["get", "watch", "list"]

A role binding grants the permissions defined in a role to a user or set of users. It holds a list of subjects (users, groups, or service accounts), and a reference to the role being granted. A RoleBinding grants permissions within a specific namespace whereas a ClusterRoleBinding grants that access cluster-wide.

apiVersion: rbac.authorization.k8s.io/v1
# This role binding allows "jane" to read pods in the "default" namespace.
# You need to already have a Role named "pod-reader" in that namespace.
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
# You can specify more than one "subject"
- kind: User
  name: jane # "name" is case sensitive
  apiGroup: rbac.authorization.k8s.io
roleRef:
  # "roleRef" specifies the binding to a Role / ClusterRole
  kind: Role #this must be Role or ClusterRole
  name: pod-reader # this must match the name of the Role or ClusterRole you wish to bind to
  apiGroup: rbac.authorization.k8s.io

Using ClusterRole in RoleBinding

A RoleBinding can also reference a ClusterRole to grant the permissions defined in that ClusterRole to resources inside the RoleBinding's namespace. This kind of reference lets you define a set of common roles across your cluster, then reuse them within multiple namespaces.

To grant permissions across a whole cluster, you can use a ClusterRoleBinding. The following ClusterRoleBinding allows any user in the group "manager" to read secrets in any namespace.

apiVersion: rbac.authorization.k8s.io/v1
# This cluster role binding allows anyone in the "manager" group to read secrets in any namespace.
kind: ClusterRoleBinding
metadata:
  name: read-secrets-global
subjects:
- kind: Group
  name: manager # Name is case sensitive
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: secret-reader
  apiGroup: rbac.authorization.k8s.io

Question

Inspect the environment and identify the authorization modes configured on the cluster.

# Use the command kubectl describe pod kube-apiserver-controlplane -n kube-system and look for --authorization-mode.

controlplane ~ ➜  k -n kube-system describe po kube-apiserver-controlplane | grep -i auth
      --authorization-mode=Node,RBAC
      --enable-bootstrap-token-auth=true

What are the resources the kube-proxy role in the kube-system namespace is given access to?

controlplane ~ ➜  k describe role kube-proxy -n kube-system 
Name:         kube-proxy
Labels:       <none>
Annotations:  <none>
PolicyRule:
Resources   Non-Resource URLs  Resource Names  Verbs
---------   -----------------  --------------  -----
configmaps  []                 [kube-proxy]    [get]

Which account is the kube-proxy role assigned to?

controlplane ~ ➜  kubectl describe rolebinding kube-proxy -n kube-system
Name:         kube-proxy
Labels:       <none>
Annotations:  <none>
Role:
Kind:  Role
Name:  kube-proxy
Subjects:
Kind   Name                                             Namespace
----   ----                                             ---------
Group  system:bootstrappers:kubeadm:default-node-token

A user dev-user is created. User's details have been added to the kubeconfig file. Inspect the permissions granted to the user. Check if the user can list pods in the default namespace.

k get pods --as dev-user

Create the necessary roles and role bindings required for the dev-user to create, list and delete pods in the default namespace

controlplane ~ ✖ kubectl create role developer  --verb=create --verb=list --verb=delete --resource=pods

role.rbac.authorization.k8s.io/developer created

Create a binding for it as shown below

controlplane ~ ✖ kubectl create rolebinding dev-user-binding --role=developer --user=dev-user --namespace=default

rolebinding.rbac.authorization.k8s.io/dev-user-binding created


controlplane ~ ➜  k get pods --as dev-user
NAME                   READY   STATUS    RESTARTS   AGE
red-697496b845-2srbh   1/1     Running   0          18m
red-697496b845-n4zsd   1/1     Running   0          18m

What user/groups are the cluster-admin role bound to?

The ClusterRoleBinding for the role is with the same name.

controlplane ~ ➜  k get  clusterrolebindings.rbac.authorization.k8s.io cluster-admin -o yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
creationTimestamp: "2023-11-21T01:48:21Z"
labels:
    kubernetes.io/bootstrapping: rbac-defaults
name: cluster-admin
resourceVersion: "134"
uid: 4330de0f-ef56-42ef-8ea9-c2bb3e26f4a2
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:masters # answer

What level of permission does the cluster-admin role grant?

controlplane ~ ➜  k get  clusterrole  cluster-admin -o yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
creationTimestamp: "2023-11-21T01:48:20Z"
labels:
    kubernetes.io/bootstrapping: rbac-defaults
name: cluster-admin
resourceVersion: "72"
uid: c5bab975-ad5f-48bb-837c-65aae7200b9e
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
- nonResourceURLs:
- '*'
verbs:
- '*'

Answer: Perform any role on any resource in the cluster

A new user michelle joined the team. She will be focusing on the nodes in the cluster. Create the required ClusterRoles and ClusterRoleBindings so she gets access to the nodes

controlplane ~ ➜  kubectl create clusterrole node-reader --verb=get,list,watch --resource=nodes

clusterrole.rbac.authorization.k8s.io/node-reader created


controlplane ~ ✖ kubectl create clusterrolebinding node-reader-cluster-binding --clusterrole=node-reader --user=michelle

clusterrolebinding.rbac.authorization.k8s.io/node-reader-cluster-binding created

Michelle's responsibilities are growing and now she will be responsible for storage as well. Create the required ClusterRoles and ClusterRoleBindings to allow her access to Storage

Get the API groups and resource names from command kubectl api-resources. Use the given spec:

ClusterRole: storage-admin
Resource: persistentvolumes
Resource: storageclasses
ClusterRoleBinding: michelle-storage-admin
ClusterRoleBinding Subject: michelle
ClusterRoleBinding Role: storage-admin

Answer is shown below

controlplane ~ ➜  kubectl create clusterrole storage-admin  --verb=get,list,watch --resource=persistentvolumes --verb=get,list --resource=storageclasses
clusterrole.rbac.authorization.k8s.io/storage-admin created

controlplane ~ ➜  kubectl create clusterrolebinding michelle-storage-admin  --clusterrole=storage-admin --user=michelle
clusterrolebinding.rbac.authorization.k8s.io/michelle-storage-admin created

Service Accounts¶

A service account is a type of non-human account that, in Kubernetes, provides a distinct identity in a Kubernetes cluster.
Application Pods, system components, and entities inside and outside the cluster can use a specific ServiceAccount's credentials to identify as that ServiceAccount.
Each service account is bound to a Kubernetes namespace. Every namespace gets a default ServiceAccount upon creation.

Which sercive account is used by deployment?

controlplane ~ ➜  k get po web-dashboard-97c9c59f6-x2zdd -o yaml | grep -i service

- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
enableServiceLinks: true
serviceAccount: default
serviceAccountName: default
    - serviceAccountToken:

The application needs a ServiceAccount with the Right permissions to be created to authenticate to Kubernetes. The default ServiceAccount has limited access. Create a new ServiceAccount named dashboard-sa.

controlplane ~ ➜  k create sa dashboard-sa

serviceaccount/dashboard-sa created

create a new token for a SA

controlplane ~ ➜  k create token dashboard-sa

Image Security¶

What secret type must we choose for docker registry?

root@controlplane ~ ➜  k create secret --help 
Create a secret using specified subcommand.

Available Commands:
  docker-registry   Create a secret for use with a Docker registry # answer
  generic           Create a secret from a local file, directory, or literal value
  tls               Create a TLS secret

Usage:
  kubectl create secret [flags] [options]

We decided to use a modified version of the application from an internal private registry. Update the image of the deployment to use a new image from myprivateregistry.com:5000

# update the image as shown below
    spec:
      containers:
      - image: myprivateregistry.com:5000/nginx:alpine
        imagePullPolicy: IfNotPresent

Create a secret object with the credentials required to access the registry.

Name: private-reg-cred
Username: dock_user
Password: dock_password
Server: myprivateregistry.com:5000
Email: dock_user@myprivateregistry.com

Create using below

root@controlplane ~ ➜  kubectl create secret docker-registry private-reg-cred   --docker-email=dock_user@myprivateregistry.com    --docker-username=dock_user   --docker-password=dock_password   --docker-server=myprivateregistry.com:5000

secret/private-reg-cred created

Configure the deployment to use credentials from the new secret to pull images from the private registry

# Add the imagepull secret as shown below
    spec:
      containers:
      - image: myprivateregistry.com:5000/nginx:alpine
        imagePullPolicy: IfNotPresent
        name: nginx
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: private-reg-cred

Security Contexts¶

A security context defines privilege and access control settings for a Pod or Container. Security context settings include, but are not limited to:

Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).

Security Enhanced Linux (SELinux): Objects are assigned security labels.

Running as privileged or unprivileged.

Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.

# Sample Security Context 
apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  securityContext:  # For container
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  volumes:
  - name: sec-ctx-vol
    emptyDir: {}
  containers:
  - name: sec-ctx-demo
    image: busybox:1.28
    command: [ "sh", "-c", "sleep 1h" ]
    volumeMounts:
    - name: sec-ctx-vol
      mountPath: /data/demo
    securityContext:   # for Pod
      allowPrivilegeEscalation: false

What is the user used to execute the sleep process within the ubuntu-sleeper pod?

# Check the user by checking the security context or doing the exec -it

controlplane ~ ➜  k get  po ubuntu-sleeper -o yaml | grep -i securi
securityContext: {}



controlplane ~ ➜  k exec -it ubuntu-sleeper -- /bin/bash
root@ubuntu-sleeper:/# whoami
root

Edit the pod ubuntu-sleeper to run the sleep process with user ID 1010

spec:
containers:
- command:
    - sleep
    - "4800"
    image: ubuntu
    imagePullPolicy: Always
    name: ubuntu
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
    name: kube-api-access-grl6f
    readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: controlplane
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
    runAsUser: 1010  ## add this

A Pod definition file named multi-pod.yaml is given. With what user are the processes in the web container started?

controlplane ~ ➜  cat multi-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
name: multi-pod
spec:
securityContext:
    runAsUser: 1001
containers:
-  image: ubuntu
    name: web
    command: ["sleep", "5000"]
    securityContext:
    runAsUser: 1002   # ans is this as local will override the global user

-  image: ubuntu
    name: sidecar
    command: ["sleep", "5000"]

Update pod ubuntu-sleeper to run as Root user and with the SYS_TIME capability

controlplane ~ ➜  cat multi-pod.yaml 

apiVersion: v1
kind: Pod
metadata:
name: ubuntu-sleeper
spec:
containers:
-  image: ubuntu
    name: web
    command: ["sleep", "5000"]
    securityContext:  # added to the container, not the pod
    capabilities:
        add:  ["SYS_TIME"]

Network Policies¶

If you want to control traffic flow at the IP address or port level for TCP, UDP, and SCTP protocols, then you might consider using Kubernetes NetworkPolicies for particular applications in your cluster.

NetworkPolicies are an application-centric construct which allow you to specify how a pod is allowed to communicate with various network "entities" (we use the word "entity" here to avoid overloading the more common terms such as "endpoints" and "services", which have specific Kubernetes connotations) over the network

By default, a pod is non-isolated for egress; all outbound connections are allowed.
By default, a pod is non-isolated for ingress; all inbound connections are allowed.

Network Plugin

`Network policies`` are implemented by the network plugin. To use network policies, you must be using a networking solution which supports NetworkPolicy. Creating a NetworkPolicy resource without a controller that implements it will have no effect.

# Network Policy Example
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: default
spec:
  podSelector:
    matchLabels:
      role: db
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - ipBlock:
            cidr: 172.17.0.0/16
            except:
              - 172.17.1.0/24
        - namespaceSelector:
            matchLabels:
              project: myproject
        - podSelector:
            matchLabels:
              role: frontend
      ports:
        - protocol: TCP
          port: 6379
  egress:
    - to:
        - ipBlock:
            cidr: 10.0.0.0/24
      ports:
        - protocol: TCP
          port: 5978

Default deny egress

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-egress
spec:
podSelector: {}
policyTypes:
- Egress

Default allow egress

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-egress
spec:
podSelector: {}
egress:
- {}   # egress is defined here
policyTypes:
- Egress

Default deny all ingress and all egress traffic

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress

What is meaning of this policy?

controlplane ~ ➜  k describe networkpolicies.networking.k8s.io payroll-policy
Name:         payroll-policy
Namespace:    default
Created on:   2023-11-20 22:47:02 -0500 EST
Labels:       <none>
Annotations:  <none>
Spec:
PodSelector:     name=payroll
Allowing ingress traffic:
    To Port: 8080/TCP
    From:
    PodSelector: name=internal  # traffic from internal pod to payroll pod is allowed
Not affecting egress traffic
Policy Types: Ingress

Use the spec given below. You might want to enable ingress traffic to the pod to test your rules in the UI

Policy Name: internal-policy

Policy Type: Egress

Egress Allow: payroll

Payroll Port: 8080

Egress Allow: mysql

MySQL Port: 3306

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: internal-policy
namespace: default
spec:
policyTypes:
    - Ingress
    - Egress
ingress:
    -  {}
egress:
    - to:
        - podSelector:
            matchLabels:
            name: payroll
    ports:
        - protocol: TCP
        port: 8080
    - to:
        - podSelector:
            matchLabels:
            name: mysql
    ports:
        - protocol: TCP
        port: 3306

Storage¶

PV and PVC¶

A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV.

A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany

Reclaim Policy

When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted.

Retain reclaim policy allows for manual reclamation of the resource. When the PersistentVolumeClaim is deleted, the PersistentVolume still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume.
For volume plugins that support the Delete reclaim policy, deletion removes both the PersistentVolume object from Kubernetes, as well as the associated storage asset in the external infrastructure, such as an AWS EBS or GCE PD volume.
If supported by the underlying volume plugin, the Recycle reclaim policy performs a basic scrub (rm -rf /thevolume/*) on the volume and makes it available again for a new claim.

Configure a volume to store these logs at /var/log/webapp on the host using

Name: webapp Image Name: kodekloud/event-simulator Volume HostPath: /var/log/webapp Volume Mount: /log

controlplane ~ ✖ cat po.yaml

apiVersion: v1
kind: Pod
metadata:
name: webapp
namespace: default
spec:
containers:
- image: kodekloud/event-simulator
    imagePullPolicy: Always
    name: event-simulator
    resources: {}
    volumeMounts:
    - mountPath: /log
    name: log-vol
    readOnly: true
volumes:
- name: log-vol
    hostPath:
    # directory location on host
    path: /var/log/webapp
    # this field is optional
    type: Directory

Create a Persistent Volume with the given specification

Volume Name: pv-log
Storage: 100Mi
Access Modes: ReadWriteMany
Host Path: /pv/log
Reclaim Policy: Retain

controlplane ~ ➜  cat pv1.yaml 

apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-log
spec:
capacity:
    storage: 100Mi
hostPath:
    path: /pv/log
accessModes:
    - ReadWriteMany
persistentVolumeReclaimPolicy: Retain

Let us claim some of that storage for our application. Create a Persistent Volume Claim with the given specification

Volume Name: claim-log-1 Storage Request: 50Mi Access Modes: ReadWriteOnce

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: claim-log-1
spec:
accessModes:
    - ReadWriteOnce
resources:
    requests:
    storage: 50Mi

Check if the claim is bound or not?

controlplane ~ ✖ k get pv,pvc
NAME                      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                 STORAGECLASS   REASON   AGE
persistentvolume/pv-log   100Mi      RWX            Retain           Bound    default/claim-log-1                           5m56s

NAME                                STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/claim-log-1   Bound    pv-log   100Mi      RWX                           13s

Update the webapp pod to use the persistent volume claim as its storage

Replace hostPath configured earlier with the newly created PersistentVolumeClaim.

Name: webapp

Image Name: kodekloud/event-simulator

Volume: PersistentVolumeClaim=claim-log-1

Volume Mount: /log

apiVersion: v1
kind: Pod
metadata:
name: webapp
namespace: default
spec:
containers:
- image: kodekloud/event-simulator
    imagePullPolicy: Always
    name: event-simulator
    resources: {}
    volumeMounts:
    - mountPath: /log
    name: pv-claim
volumes:
- name: pv-claim
    persistentVolumeClaim:
    claimName: claim-log-1

Storage Class¶

A StorageClass provides a way for administrators to describe the "classes" of storage they offer. Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators.

When a PVC does not specify a storageClassName, the default StorageClass is used. The cluster can only have one default StorageClass

# SC example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
  - debug
volumeBindingMode: Immediate

Create a new PersistentVolumeClaim by the name of local-pvc that should bind to the volume local-pv

Inspect the pv local-pv for the specs.

PVC: local-pvc

Correct Access Mode?

Correct StorageClass Used?

PVC requests volume size = 500Mi?

controlplane ~ ➜  cat pvc.yaml 

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: local-pvc
spec:
accessModes:
    - ReadWriteOnce
volumeMode: Filesystem
resources:
    requests:
    storage: 500Mi
storageClassName: local-storage

Why is the PVC in a pending state despite making a valid request to claim the volume called local-pv?

# The StorageClass used by the PVC uses WaitForFirstConsumer volume binding mode. This means that the persistent volume will not bind to the claim until a pod makes use of the PVC to request storage.
controlplane ~ ✖  k describe pvc local-pvc | grep -A4 Events
Events:
Type    Reason                Age                   From                         Message
----    ------                ----                  ----                         -------
Normal  WaitForFirstConsumer  11s (x16 over 3m47s)  persistentvolume-controller  waiting for first consumer to be created before binding

Create a new pod called nginx with the image nginx:alpine. The Pod should make use of the PVC local-pvc and mount the volume at the path /var/www/html

The PV local-pv should be in a bound state.

Pod created with the correct Image?

Pod uses PVC called local-pvc?

local-pv bound?

nginx pod running?

Volume mounted at the correct path?

apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
    run: nginx
name: nginx
spec:
containers:
- image: nginx:alpine
    name: nginx
    volumeMounts:
    - mountPath: "/var/www/html"
    name: mypd
volumes:
    - name: mypd
    persistentVolumeClaim:
        claimName: local-pvc
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

Create a new Storage Class called delayed-volume-sc that makes use of the below specs:

provisioner: kubernetes.io/no-provisioner

volumeBindingMode: WaitForFirstConsumer

controlplane ~ ➜  cat sc.yaml 

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: delayed-volume-sc
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

JQ¶

k get nodes -o json | jq -c 'paths'

Networking¶

What is the network interface configured for cluster connectivity on the controlplane node?

controlplane ~ ✖ k get no -o wide
NAME           STATUS   ROLES           AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME
controlplane   Ready    control-plane   9m2s    v1.27.0   192.9.180.3   <none>        Ubuntu 20.04.5 LTS   5.4.0-1106-gcp   containerd://1.6.6  # get the IP for controlPlane
node01         Ready    <none>          8m37s   v1.27.0   192.9.180.6   <none>        Ubuntu 20.04.5 LTS   5.4.0-1106-gcp   containerd://1.6.6


controlplane ~ ➜  ip a | grep -B3 192.9.180.3 # get the interface from the IP
366: eth0@if367: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:c0:09:b4:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.9.180.3/24 brd 192.9.180.255 scope global eth0

Get the IP of default gateway

controlplane ~ ➜  ip route show default

default via 172.25.0.1 dev eth1

What is the port the kube-scheduler is listening on in the controlplane node?

controlplane ~ ➜  netstat -lntp | grep scheduler

tcp        0      0 127.0.0.1:10259         0.0.0.0:*               LISTEN      3586/kube-scheduler

Notice that ETCD is listening on two ports. Which of these have more client connections established?

controlplane ~ ➜  netstat -lntp | grep etcd
tcp        0      0 192.9.180.3:2379        0.0.0.0:*               LISTEN      3600/etcd           
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      3600/etcd           
tcp        0      0 192.9.180.3:2380        0.0.0.0:*               LISTEN      3600/etcd           
tcp        0      0 127.0.0.1:2381          0.0.0.0:*               LISTEN      3600/etcd

CNI¶

Networking is a central part of Kubernetes, but it can be challenging to understand exactly how it is expected to work. There are 4 distinct networking problems to address:

Highly-coupled container-to-container communications: this is solved by Pods and localhost communications.
Pod-to-Pod communications: this is the primary focus of this document.
Pod-to-Service communications: this is covered by Services.
External-to-Service communications: this is also covered by Services.

CNI (Container Network Interface), a Cloud Native Computing Foundation project, consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers, along with a number of supported plugins. CNI concerns itself only with network connectivity of containers and removing allocated resources when the container is deleted. Because of this focus, CNI has a wide range of support and the specification is simple to implement.

Inspect the kubelet service and identify the container runtime endpoint value is set for Kubernetes.

controlplane ~ ➜  ps aux | grep kubelet | grep endpoint

root        4567  0.0  0.0 3848468 99736 ?       Ssl  01:06   0:06 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9

Identify which of the below plugins is not available in the list of available CNI plugins on this host?

# check the below path

controlplane ~ ➜  ls /opt/cni/bin/
bandwidth  dhcp   firewall  host-device  ipvlan    macvlan  ptp  static  vlan
bridge     dummy  flannel   host-local   loopback  portmap  sbr  tuning  vrf

What is the CNI plugin configured to be used on this kubernetes cluster?

Run the command: `ls /etc/cni/net.d/`` and identify the name of the plugin.

Kubeadm¶

kubeadm upgrade plan [version]

Lightning Lab¶

Upgrade the current version of kubernetes from `1.26.0` to `1.27.0` exactly using the kubeadm utility.

There is currently an issue with this lab which requires an extra step. This may be addressed in the near future. On controlplane node 1. Drain node

kubectl drain controlplane --ignore-daemonsets

2. Upgrade kubeadm

apt-get update
apt-mark unhold kubeadm
apt-get install -y kubeadm=1.27.0-00

3. Plan and apply upgrade

kubeadm upgrade plan
kubeadm upgrade apply v1.27.0

4. Remove taint on controlplane node. This is the issue described above. As part of the upgrade specifically to 1.26, some taints are added to all controlplane nodes. This will prevent the `gold-nginx` pod from being rescheduled to the controlplane node later on.

kubectl describe node controlplane | grep -A 3 taint

Output:

Taints:   node-role.kubernetes.io/control-plane:NoSchedule
            node.kubernetes.io/unschedulable:NoSchedule

Let's remove them

kubectl taint node controlplane node-role.kubernetes.io/control-plane:NoSchedule-
kubectl taint node controlplane node.kubernetes.io/unschedulable:NoSchedule-

5. Upgrade the kubelet

apt-mark unhold kubelet
apt-get install -y kubelet=1.27.0-00
systemctl daemon-reload
systemctl restart kubelet

6. Reinstate controlplane node

kubectl uncordon controlplane

7. Upgrade kubectl

apt-mark unhold kubectl
apt-get install -y kubectl=1.27.0-00

8. Re-hold packages

apt-mark hold kubeadm kubelet kubectl

9. Drain the worker node

kubectl drain node01 --ignore-daemonsets

10. Go to worker node

ssh node01

11. Upgrade kubeadm

apt-get update
apt-mark unhold kubeadm
apt-get install -y kubeadm=1.27.0-00

12. Upgrade node

kubeadm upgrade node

13. Upgrade the kubelet

apt-mark unhold kubelet
apt-get install kubelet=1.27.0-00
systemctl daemon-reload
systemctl restart kubelet

14. Re-hold packages

apt-mark hold kubeadm kubelet

15. Return to controlplane

exit

16. Reinstate worker node

kubectl uncordon node01

17. Verify `gold-nginx` is scheduled on controlplane node

kubectl get pods -o wide | grep gold-nginx

Print the names of all deployments in the admin2406 namespace in the following format

This is a job for custom-columns output of kubectl

kubectl -n admin2406 get deployment -o custom-columns=DEPLOYMENT:.metadata.name,CONTAINER_IMAGE:.spec.template.spec.containers[].image,READY_REPLICAS:.status.readyReplicas,NAMESPACE:.metadata.namespace --sort-by=.metadata.name > /opt/admin2406_data

A kubeconfig file called admin.kubeconfig has been created in /root/CKA. There is something wrong with the configuration. Troubleshoot and fix it

First, let's test this kubeconfig

kubectl get pods --kubeconfig /root/CKA/admin.kubeconfig

Notice the error message.

Now look at the default kubeconfig for the correct setting.

cat ~/.kube/config

Make the correction

vi /root/CKA/admin.kubeconfig

Test

kubectl get pods --kubeconfig /root/CKA/admin.kubeconfig

Create a new deployment called nginx-deploy, with image nginx:1.16 and 1 replica. Next upgrade the deployment to version 1.17 using rolling update.

kubectl create deployment nginx-deploy --image=nginx:1.16
kubectl set image deployment/nginx-deploy nginx=nginx:1.17 --record

You may ignore the deprecation warning.

A new deployment called alpha-mysql has been deployed in the alpha namespace. However, the pods are not running. Troubleshoot and fix the issue

The deployment should make use of the persistent volume alpha-pv to be mounted at /var/lib/mysql and should use the environment variable MYSQL_ALLOW_EMPTY_PASSWORD=1 to make use of an empty root password.

Important: Do not alter the persistent volume.

Inspect the deployment to check the environment variable is set. Here I'm using yq which is like jq but for YAML to not have to view the entire deployment YAML, just the section beneath containers in the deployment spec.

kubectl get deployment -n alpha alpha-mysql  -o yaml | yq e .spec.template.spec.containers -

Find out why the deployment does not have minimum availability. We'll have to find out the name of the deployment's pod first, then describe the pod to see the error.

kubectl get pods -n alpha
kubectl describe pod -n alpha alpha-mysql-xxxxxxxx-xxxxx

We find that the requested PVC isn't present, so create it. First, examine the Persistent Volume to find the values for access modes, capacity (storage), and storage class name

kubectl get pv alpha-pv

Now use vi to create a PVC manifest

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-alpha-pvc
  namespace: alpha
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: slow

Take the backup of ETCD at the location /opt/etcd-backup.db on the controlplane node

This question is a bit poorly worded. It requires us to make a backup of etcd and store the backup at the given location. Know that the certificates we need for authentication of etcdctl are located in /etc/kubernetes/pki/etcd

Get the certificates as shown below

controlplane ~ ➜  k -n kube-system get pod etcd-controlplane -o yaml | grep -i crt
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

Get the command to take backup from docs

ETCDCTL_API='3' etcdctl snapshot save /opt/etcd-backup.db \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

check if backup is taken using

controlplane ~ ➜  ETCDCTL_API=3 etcdctl --write-out=table snapshot status /opt/etcd-backup.db 
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| bc0cb4cf |     5914 |        973 |     2.2 MB |
+----------+----------+------------+------------+

Whilst we could also use the argument --endpoints=127.0.0.1:2379, it is not necessary here as we are on the controlplane node, same as etcd itself. The default endpoint is the local host.

Create a pod called secret-1401 in the admin1401 namespace using the busybox image

The container within the pod should be called secret-admin and should sleep for 4800 seconds. The container should mount a read-only secret volume called secret-volume at the path /etc/secret-volume. The secret being mounted has already been created for you and is called dotfile-secret.

Use imperative command to get a starter manifest

kubectl run secret-1401 -n admin1401 --image busybox --dry-run=client -o yaml --command -- sleep 4800 > admin.yaml

Edit this manifest to add in the details for mounting the secret

vi admin.yaml

Add in the volume and volume mount sections seen below

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: secret-1401
  name: secret-1401
  namespace: admin1401
spec:
  volumes:
  - name: secret-volume
    secret:
      secretName: dotfile-secret
  containers:
  - command:
    - sleep
    - "4800"
    image: busybox
    name: secret-admin
    volumeMounts:
    - name: secret-volume
      readOnly: true
      mountPath: /etc/secret-volume

And create the pod
```
kubectl create -f admin.yaml
```

Mock Exam¶

Deploy a pod named nginx-pod using the nginx:alpine image.

apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
    run: nginx-pod
name: nginx-pod
spec:
containers:
- image: nginx:alpine
    name: nginx-pod
    resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

Deploy a messaging pod using the redis:alpine image with the labels set to tier=msg

Run below command which create a pod with labels:

kubectl run messaging --image=redis:alpine --labels=tier=msg

Create a namespace named apx-x9984574

Run below command to create a namespace:

    kubectl create namespace apx-x9984574

Get the list of nodes in JSON format and store it in a file at /opt/outputs/nodes-z3444kd9.json

Use the below command which will redirect the o/p:

kubectl get nodes -o json > /opt/outputs/nodes-z3444kd9.json

Create a service messaging-service to expose the messaging application within the cluster on port 6379.

Execute below command which will expose the pod on port 6379:

kubectl expose pod messaging --port=6379 --name messaging-service

Create a deployment named hr-web-app using the image kodekloud/webapp-color with 2 replicas.

In v1.19, we can add --replicas flag with kubectl create deployment command:

kubectl create deployment hr-web-app --image=kodekloud/webapp-color --replicas=2

Create a static pod named static-busybox on the controlplane node that uses the busybox image and the command sleep 1000

To Create a static pod, copy it to the static pods directory. In this case, it is /etc/kubernetes/manifests. Apply below manifests:

 k run static-busybox --image=busybox --command sleep 1000 --dry-run=client -o yaml > static-busybox.yaml

This will create the below manifest

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: static-busybox
  name: static-busybox
spec:
  containers:
  - command:
    - sleep
    - "1000"
    image: busybox
    name: static-busybox
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

Create a POD in the finance namespace named temp-bus with the image redis:alpine.

Run below command to create a pod in namespace finance:

kubectl run temp-bus --image=redis:alpine -n finance

A new application orange is deployed. There is something wrong with it. Identify and fix the issue.

Run below command and troubleshoot step by step:

kubectl describe pod orange

Export the running pod using below command and correct the spelling of the command sleeeep to sleep

kubectl edit pod orange # make changes and save
k replace --force -f temp_file.yaml

Expose the hr-web-app as service hr-web-app-service application on port 30082 on the nodes on the cluster.

Apply below manifests:

apiVersion: v1
kind: Service
metadata:
name: hr-web-app-service
spec:
type: NodePort
selector:
    app: hr-web-app
ports:
    - port: 8080
    targetPort: 8080
    nodePort: 30082

Use JSON PATH query to retrieve the osImages of all the nodes and store it in a file /opt/outputs/nodes_os_x43kj56.txt

Run the below command to redirect the o/p:

 kubectl get nodes -o=jsonpath='{.items[0].status.nodeInfo.osImage}'  > /opt/outputs/nodes_os_x43kj56.txt

Create a Persistent Volume with the given specification

Volume name: pv-analytics

Storage: 100Mi

Access mode: ReadWriteMany

Host path: /pv/data-analytics

apiVersion: v1
kind: PersistentVolume
metadata:
    name: pv-analytics
spec:
    capacity:
    storage: 100Mi
    volumeMode: Filesystem
    accessModes:
    - ReadWriteMany
    hostPath:
        path: /pv/data-analytics

Create a Pod called redis-storage with image: redis:alpine with a Volume of type emptyDir that lasts for the life of the Pod.

apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
    run: redis-storage
name: redis-storage
spec:
containers:
- image: redis:alpine
    name: redis-storage
    volumeMounts:
    - mountPath: /data/redis
    name: cache-volume
volumes:
- name: cache-volume
    emptyDir:
    sizeLimit: 500Mi
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

Create a new pod called super-user-pod with image busybox:1.28. Allow the pod to be able to set system_time

apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
    run: super-user-pod
name: super-user-pod
spec:
containers:
- image: busybox:1.28
    name: super-user-pod
    resources: {}
    command:
    - sleep
    - "4800"
    securityContext:
    capabilities:
        add: ["SYS_TIME"]
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

Create a new user called john. Grant him access to the cluster. John should have permission to create, list, get, update and delete pods in the development namespace . The private key exists in the location: /root/CKA/john.key and csr at /root/CKA/john.csr

Important Note: As of kubernetes 1.19, the CertificateSigningRequest object expects a signerName.

Please refer the documentation to see an example. The documentation tab is available at the top right of terminal.

CSR: john-developer Status:Approved
Role Name: developer, namespace: development, Resource: Pods
Access: User 'john' has appropriate permissions

Form the CSR request

apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
name: myuser
spec:
request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQ1ZEQ0NBVHdDQVFBd0R6RU5NQXNHQTFVRUF3d0VhbTlvYmpDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRApnZ0VQQURDQ0FRb0NnZ0VCQUxqR2J3T1NDUFZycHB5QnhDL0ZDSURoTE90TXJyN21nWUlDbFYvcDlkaHZjSXdqCno0ZmpJdTlITDdrZlpTT2kxT21NSDNtTUlMaDRLa3J2bnFaeENXUGFrVEVDN005T1lsNHoyWXlWWDZ5R3p0WEYKYXZqSUcrVUJ5Zmo0V2M5c0l6cEdJS0dqN3JaQmVZamV3STlpUU5yQzc2RFJpcStKZU1oRFhIT2ZtSm9oU0J3YgpsQm9rSEp5aVNITzM1OGx6WEs1UElZaTVqKy9waUFhSHRKbjg3Vzl1K2tpNzJsc3IxN0JoV0FMTzQrOHFDOUgvCjMzZ2VQNUxhMXJTanVjYVk1eE9IL2s2dVdabGVVUUVyeVBqUDg0TW1sUnhrZEVHdTJ6dmY5c2pmZUFWNE1QTkoKYXYxcTMrc0ZNbHB2VndGb2RIbFgzL2ZzK25abHFhYWp2eW5yc1hFQ0F3RUFBYUFBTUEwR0NTcUdTSWIzRFFFQgpDd1VBQTRJQkFRQWhrMVVrTklqSzhDZmx2QTB0bEpOWi83TlgvZUlMQzF6d2h1T0NpQm14R2dUaGZReDdqYWtICnNyMmdUSXlpU0RsdVdVKzVZeW1CeElhL0xHVmRackhpSlBLRzgyVlNmck9DUHgrME1Bbk5PNTZpWWNUZ2RXZ3IKanByaUJYbDdrVkV0UUZjVTVwSGt0aW92Nk5mb0htRzZqT2w5dzVNYzRNMDJGbUN1Yi9sSngrNThIQnI1ekZLQQp4bGRNaXZ5V05CTlY3S3p0a1FkWElsLzR0emllME11ekdxRkxZNWh6R3pDSnVwekd5bmZXc0hmd2JaeWVKTVlrCnlmWldTV0FRSHhEZk5HRWxvNXhja1FTOVBWU29NK0YyNFoveXA3ZEI1Mlc0bE1yYVRsa2VNTy9pU25hRU5tdGwKazhPTDNielhXYS82K0hkdnNremtGK2hpVHFoRW9XTEIKLS0tLS1FTkQgQ0VSVElGSUNBVEUgUkVRVUVTVC0tLS0tCg== 
signerName: kubernetes.io/kube-apiserver-client
expirationSeconds: 86400  # one day
usages:
- client auth

Create it as shown below

# create the CSR
controlplane ~/CKA ➜  k apply -f csr-john.yaml 

certificatesigningrequest.certificates.k8s.io/john created

View and approve it

controlplane ~/CKA ➜  k get csr
NAME        AGE     SIGNERNAME                                    REQUESTOR                  REQUESTEDDURATION   CONDITION
csr-q6l5t   45m     kubernetes.io/kube-apiserver-client-kubelet   system:bootstrap:rsxu7y    <none>              Approved,Issued
csr-tnnwr   45m     kubernetes.io/kube-apiserver-client-kubelet   system:node:controlplane   <none>              Approved,Issued
john        3m47s   kubernetes.io/kube-apiserver-client           kubernetes-admin           24h                 Pending

controlplane ~/CKA ➜  k certificate approve john
certificatesigningrequest.certificates.k8s.io/john approved

Create the role

controlplane ~/CKA ➜  kubectl create role developer  --verb=get,list,create,update,delete  --resource=pods -n development
role.rbac.authorization.k8s.io/developer created

Create a role binding

controlplane ~/CKA ➜  kubectl create rolebinding developer-rb --role=developer --user=john -n development 
rolebinding.rbac.authorization.k8s.io/developer-rb created

Check if it worked

controlplane ~/CKA ✖ k auth can-i get pods -n development --as john

yes

Create a nginx pod called nginx-resolver using image nginx, expose it internally with a service called nginx-resolver-service. Test that you are able to look up the service and pod names from within the cluster. Use the image: busybox:1.28 for dns lookup. Record results in /root/CKA/nginx.svc and /root/CKA/nginx.pod

Expose the pod after creating it

/CKA ✖ kubectl expose pod nginx-resolver  --name=nginx-resolver-service --port=8080
service/nginx-resolver-service exposed

Create a test pod with sleep time

controlplane ~/CKA ➜  k run test --image=busybox:1.28 -- sleep 5000
pod/test created

run it to show the nslookup

controlplane ~/CKA ➜  k exec test -- nslookup nginx-resolver-service
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      nginx-resolver-service
Address 1: 10.104.187.189 nginx-resolver-service.default.svc.cluster.local

Get the pod records

controlplane ~/CKA ➜  k exec test -- nslookup 10-244-192-4.default.pod.cluster.local
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      10-244-192-4.default.pod.cluster.local
Address 1: 10.244.192.4 10-244-192-4.nginx-resolver-service.default.svc.cluster.local

controlplane ~/CKA ➜  k exec test -- nslookup 10-244-192-4.default.pod.cluster.local > /root/CKA/nginx.pod

Create a static pod on node01 called nginx-critical with image nginx and make sure that it is recreated/restarted automatically in case of a failure.

controlplane ~ ➜  cat static-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
    run: nginx-critical
name: nginx-critical
spec:
containers:
- image: nginx
    name: nginx-critical
    resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

Create a new service account with the name pvviewer. Grant this Service account access to list all PersistentVolumes in the cluster by creating an appropriate cluster role called pvviewer-role and ClusterRoleBinding called pvviewer-role-binding. Next, create a pod called pvviewer with the image: redis and serviceAccount: pvviewer in the default namespace

controlplane ~ ➜  k create sa pvviewer

controlplane ~ ➜    kubectl create clusterrole pvviewer-role  --verb=list --resource=persistentvolumes
clusterrole.rbac.authorization.k8s.io/pvviewer-role created

Create role binding

controlplane ~ ➜    kubectl create clusterrolebinding pvviewer-role-binding  --clusterrole=pvviewer-role  --serviceaccount=default:pvviewerclusterrolebinding.rbac.authorization.k8s.io/

pvviewer-role-binding created

controlplane ~ ✖ cat pod-view.yaml 
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
    run: pvviewer
name: pvviewer
spec:
serviceAccountName: pvviewer
containers:
- image: redis
    name: pvviewer
    resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

List the InternalIP of all nodes of the cluster. Save the result to a file /root/CKA/node_ips

Answer should be in the format: InternalIP of controlplaneInternalIP of node01 (in a single line)

kubectl get nodes -o=jsonpath='{.items[*].status.addresses[0].address}' > /root/CKA/node_ips

Create a pod called multi-pod with two containers. Container 1, name: alpha, image: nginx Container 2: name: beta, image: busybox, command: sleep 4800

Environment Variables: container 1: name: alpha

Container 2: name: beta

controlplane ~/CKA ➜  cat multi-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
name: multi-pod
spec:
containers:
- name: alpha
    image: nginx
    env:
    - name: name
    value: "alpha"
- name: beta
    image: busybox
    env:
    - name: name
    value: "beta"
    command:
    - sleep
    - "4800"

Create a Pod called non-root-pod , image: redis:alpine, runAsUser: 1000 and fsGroup: 2000

apiVersion: v1
kind: Pod
metadata:
name: non-root-pod
spec:
securityContext:
    runAsUser: 1000
    fsGroup: 2000
containers:
- name: sec-ctx-demo
    image: redis:alpine

We have deployed a new pod called np-test-1 and a service called np-test-service. Incoming connections to this service are not working. Troubleshoot and fix it. Create NetworkPolicy, by the name ingress-to-nptest that allows incoming connections to the service over port 80

controlplane ~/CKA ➜  k apply -f policy-pod.yaml 
networkpolicy.networking.k8s.io/ingress-to-nptest created

controlplane ~/CKA ➜  cat policy-pod.yaml 
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ingress-to-nptest
spec:
podSelector:
    matchLabels:
    run: np-test-1
ingress:
    - from:
    - podSelector:
        matchLabels:
            run: np-test-1
    ports:
        - protocol: TCP
        port: 80
policyTypes:
- Ingress

Taint the worker node node01 to be Unschedulable. Once done, create a pod called dev-redis, image redis:alpine, to ensure workloads are not scheduled to this worker node. Finally, create a new pod called prod-redis and image: redis:alpine with toleration to be scheduled on node01.

key: env_type, value: production, operator: Equal and effect: NoSchedule

controlplane ~/CKA ➜    kubectl taint nodes node01  env_type=production:NoSchedule
node/node01 tainted

Create the pod

controlplane ~/CKA ➜  cat tolerated-pod.yaml 

apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
    run: pod-redis
name: prod-redis
spec:
containers:
- image: redis:alpine
    name: pod-redis
tolerations:
- key: "env_type"
    operator: "Equal"
    value: "production"
    effect: "NoSchedule"
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}

create pod and add 2 labels as shown below

controlplane ~/CKA ➜  k -n hr label pod hr-pod envionment=production
pod/hr-pod labeled


controlplane ~/CKA ➜  k -n hr get pod --show-labels 
NAME     READY   STATUS    RESTARTS   AGE   LABELS
hr-pod   1/1     Running   0          69s   envionment=production,run=hr-pod,tier=frontend