Designing a Cluster
Important questions when creating a Kubernetes cluster:
What is the purpose of this cluster?
- Learning?
Try minikube, or a single node cluster with
kubeadm/gcp/aws - Dev/Test? Multi-node cluster with a single master and multiple workers Setup with kubeadm tool or quick provision on GCP, AWS, Azure
- Production application?
- Learning?
Try minikube, or a single node cluster with
Cloud or OnPrem? Which cloud or what hardware? GKE for GCP, Kops for AWS, AKS for Azure
Workload Analysis 4. How many workloads? 5. What kind? 6. Web? 7. Big Data/Analytics? 8. GPU Needs? 9. Application Resource Requirements 10. CPU 11. Memory 12. Traffic 13. Heavy Traffic? 14. Bursting Traffic?
Kubernetes does not run natively on Windows
Hosting Production Grade
- High Availability Multi Node cluster with multiple controller nodes
- Kubeadm or GCP or Kops on AWS or other supported platforms
- Up to 5,000 nodes
- Up to 150,000 pods in the cluster
- Up to 300,000 Total Containers
- Up to 100 Pods per Node
Resource considerations:
| Nodes | vCPU | Memory GB |
|---|---|---|
| 1-5 | 1 | 3.75 |
| 6-10 | 2 | 7.5 |
| 11-100 | 4 | 15 |
| 101-250 | 8 | 30 |
| 251 - 500 | 16 | 60 |
| > 500 | 32 | 120 |
Storage consideration
- High performance SSDs
- Multiple concurrent connections - network based storage
- Persistent shared volumes
- Label nodes with specific disk types
- Use node selectors to assign applications to nodes with specific disk types
In large clusters you can separate etcd into its own nodes.
Deployment Tools
| Tool | Can Create VMs | Multi Node |
|---|---|---|
| minikube | ✅ | ❌ |
| kubeadm | ❌ | ✅ |
| kops | ❌ | ✅ |
| Vagrant |
Turnkey Solutions
- OpenShift
- Cloud Foundary Container Runtime
- VMWare Cloud PKs
Hosted Solutions
- Google Container Engine (GKE)
- OpenShift Online
- Azure Kubernetes Service
- Amazon Elastic Container Service for Kubernetes (EKS)
High Availability
When the controller goes down, workers will keep the status quo. Until things go wrong.
| Component | HA Style |
|---|---|
| API Server | Active/Active (LB) |
| Controller Manager | Leader/Follower (--leader-elect) |
| Scheduler | Leader/Follower (--leader-elect) |
| ETCD | Stacked / External |
Kube-Controller-Manager and Scheduler: --leader-elect true
- Whichever process first updates the kube-controller-manager-endpoint with its lock, becomse the active. The lock is held
for the lease seconds in
--leader-elect-lease-duration(default 15) and renews the lease every--leader-elect-renew-deadlineseconds (default 10)
Both procesess try to become the leader every 2 seconds (based on --leader-elect-try-period)
ETCD
Stacked Topology : ETCD runs on the controller External ETCD : Less Risky, harder to setup, 2x more expensive