Kubernetes Architecture
The lightning fast summary.
Master Node Components: API Server
- only person who read/save from/to etcd. No one else.
- Handle RESTful calls from user, operators and external agents.
- Support custom API servers.( TODO explain more)
- highly configurable and customizable
Master Node Components: Scheduler
- assign new objects, such as pods, to nodes.
- assigning decision depends on current k8s cluster state and constrains comes from requirement such as
node labeled with disk==ssd, Quality of Service requirement, data locality, taints, toleration ,etc.
- read etcd's value via API server
- Read resource usage data from nodes
- highly configurable and customizable
- additional custom schedulers are supported.
- object's configuration data should specify what scheduler it wanted otherwise default scheduler will
take place.
Master Node Components: Controller Managers
- Controller managers
- running controllers to regulate the state of the Kubernetes cluster.
- comparing object's configuration data with current state in etcd (again via api-server) and try
to matches the desired state.
- kube-controller-manager
- ensure pod counts are as expected.
- create endpoints, service accounts and API access tokens.
- cloud-controller-manager
- interact with the underlying infrastructure of a cloud provider when nodes become unavailable.
- manage storage volumes when provided by a cloud service.
- manage load balancing and routing.
Master Node Components: etcd
- persist a K8s cluster's state.
- new data is written to the data store only by appending to it.
- data never replaced in the data store.
- obsolete data is compacted periodically to minimize the size of the data store.
- should be in HA mode in Production env (and or Stage as needed).
- based on the Raft Consensus Algorithm
- One Master at a time, the rest will be followers.
- stores
- cluster state
- subnets
- ConfigMaps
- Secrets
- etc.
Worker Node
- provides a running environment for client applications.
- containerized microservices are encapsulated in Pods, controlled by the cluster control plane agents
running on master node.
- Pods are scheduled on worker nodes.
- Pod is the smallest scheduling unit in K8s
- Pod is a logical collection of one or more containers scheduled together.
- External world connect to worker nodes to access the application ( Not to the master node )
Worker Node Components: Container Runtime
- k8s doesn't have the capability to directly handle containers.
- k8s requires container runtime on the node
- k8s supports
- Docker - most widely used with k8s
- CRI-O - a lightweight container runtime for k8s. Supports
Docker image registries
- containerd - simple and portable container runtime
providing robusness
- rkt - a pod-native container engie, it also runs Docker
images
- rktlet - a k8s Container
Runtime Interface implementation using rkt
Worker Node Components: kubelet
- it is an agent running on each node.
- communicates with the control plane from master node.
- receives Pod definitions primarily from the API server
- interacts with the container runtime on the node to run containers associated with the Pod.
- monitors the health of the Pod's running containers.
- connects to the container runtime using Container Runtime Interface (CRI)
- CRI consists of protocol buffers, gRPC API, and libraries.
- CRI implements 2 services
- ImageService - responsible for all the image-related operations.
- RuntimeService - responsible for all the Pod and container-related operations.
- Any container runtime that implements CRI can be used by Kubernetes to manage Pods, containers and
container images.
Worker Node Components: kubelet - CRI shims
- dockershim
- ref
- containers are created using Docker installed on the worker nodes.
- Internally, Dockers uses containerd to create and manage containers.
- cri-containerd
- can directly use Docker's smaller offspring containerd to create and manage containers.
- ref
- CRI-O
- enables using any Open Container Initiative (OCI) compatible runtime.
- it supported the following as runtime
- Any OCI-compliant runtime can be plugged-in.
Worker Node Components: kube-proxy
- kube-proxy is the network agent which runs on each node.
- responsible for dynamic updates and maintains of all networking rules on the node.
- it abstract the details of Pods networking and forwards connection requests to Pods.
Worker Node Components: Addons
- Addons are cluster features and functionality not yet available in K8s.
- implement through 3rd-party pods and services.
- DNS - cluster DNS is a DNS server required to assign DNS records to K8s objects and resources.
- Dashboard - a general purposed web-based user interface for cluster management.
- Monitoring - collects cluster-level container metrics and saves them to a central data store.
- Logging - collects cluster-level container logs and saves them to a central log store for
analysis.
Networking Challenges
All these networking challenges must be addressed before deploying a K8s cluster.
- Container-to-container communication inside Pods
- Pod-to-Pod communication on the same node and across cluster nodes
- Pod-to-Service communication within the same namespace and across cluster namespaces
- External-to-Service communication for clients to access applications in a cluster.