简介
使用kubeadm
配置多master
节点,实现高可用。
安装
实验环境说明
实验架构
lab1: etcd master keepalived 11.11.11.111lab2: etcd master keepalived 11.11.11.112lab3: etcd master keepalived 11.11.11.113lab4: node 11.11.11.114lab5: node 11.11.11.115lab6: node 11.11.11.116vip: 11.11.11.110
实验使用的Vagrantfile
# -*- mode: ruby -*-# vi: set ft=ruby :ENV["LC_ALL"] = "en_US.UTF-8"Vagrant.configure("2") do |config| (1..6).each do |i| config.vm.define "lab#{i}" do |node| node.vm.box = "centos-7.4-docker-17" node.ssh.insert_key = false node.vm.hostname = "lab#{i}" node.vm.network "private_network", ip: "11.11.11.11#{i}" node.vm.provision "shell", inline: "echo hello from node #{i}" node.vm.provider "virtualbox" do |v| v.cpus = 2 v.customize ["modifyvm", :id, "--name", "lab#{i}", "--memory", "2048"] end end endend
在所有机器上安装kubeadm
参考之前的文章《centos7安装kubeadm》
配置所有节点的kubelet
# 配置kubelet使用国内可用镜像# 修改/etc/systemd/system/kubelet.service.d/10-kubeadm.conf# 添加如下配置 Environment="KUBELET_EXTRA_ARGS=--pod-infra-container-image=registry.cn-shanghai.aliyuncs.com/gcr-k8s/pause-amd64:3.0"# 使用命令sed -i '/ExecStart=$/i Environment="KUBELET_EXTRA_ARGS=--pod-infra-container-image=registry.cn-shanghai.aliyuncs.com/gcr-k8s/pause-amd64:3.0"' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf# 重新载入配置systemctl daemon-reload
配置hosts
cat >>/etc/hosts<<EOF 11.11.11.111 lab1 11.11.11.112 lab2 11.11.11.113 lab3 11.11.11.114 lab4 11.11.11.115 lab5 11.11.11.116 lab6 EOF
启动etcd集群
在lab1,lab2,lab3
节点上启动etcd
集群
# lab1docker stop etcd && docker rm etcd rm -rf /data/etcd mkdir -p /data/etcd docker run -d \ --restart always \ -v /etc/etcd/ssl/certs:/etc/ssl/certs \ -v /data/etcd:/var/lib/etcd \ -p 2380:2380 \ -p 2379:2379 \ --name etcd \ registry.cn-hangzhou.aliyuncs.com/google_containers/etcd-amd64:3.1.12 \ etcd --name=etcd0 \ --advertise-client-urls=http://11.11.11.111:2379 \--listen-client-urls=http://0.0.0.0:2379 \--initial-advertise-peer-urls=http://11.11.11.111:2380 \--listen-peer-urls=http://0.0.0.0:2380 \--initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \ --initial-cluster=etcd0=http://11.11.11.111:2380,etcd1=http://11.11.11.112:2380,etcd2=http://11.11.11.113:2380 \--initial-cluster-state=new \ --auto-tls \ --peer-auto-tls \ --data-dir=/var/lib/etcd# lab2docker stop etcd && docker rm etcd rm -rf /data/etcd mkdir -p /data/etcd docker run -d \ --restart always \ -v /etc/etcd/ssl/certs:/etc/ssl/certs \ -v /data/etcd:/var/lib/etcd \ -p 2380:2380 \ -p 2379:2379 \ --name etcd \ registry.cn-hangzhou.aliyuncs.com/google_containers/etcd-amd64:3.1.12 \ etcd --name=etcd1 \ --advertise-client-urls=http://11.11.11.112:2379 \--listen-client-urls=http://0.0.0.0:2379 \--initial-advertise-peer-urls=http://11.11.11.112:2380 \--listen-peer-urls=http://0.0.0.0:2380 \--initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \ --initial-cluster=etcd0=http://11.11.11.111:2380,etcd1=http://11.11.11.112:2380,etcd2=http://11.11.11.113:2380 \--initial-cluster-state=new \ --auto-tls \ --peer-auto-tls \ --data-dir=/var/lib/etcd# lab3docker stop etcd && docker rm etcd rm -rf /data/etcd mkdir -p /data/etcd docker run -d \ --restart always \ -v /etc/etcd/ssl/certs:/etc/ssl/certs \ -v /data/etcd:/var/lib/etcd \ -p 2380:2380 \ -p 2379:2379 \ --name etcd \ registry.cn-hangzhou.aliyuncs.com/google_containers/etcd-amd64:3.1.12 \ etcd --name=etcd2 \ --advertise-client-urls=http://11.11.11.113:2379 \--listen-client-urls=http://0.0.0.0:2379 \--initial-advertise-peer-urls=http://11.11.11.113:2380 \--listen-peer-urls=http://0.0.0.0:2380 \--initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \ --initial-cluster=etcd0=http://11.11.11.111:2380,etcd1=http://11.11.11.112:2380,etcd2=http://11.11.11.113:2380 \--initial-cluster-state=new \ --auto-tls \ --peer-auto-tls \ --data-dir=/var/lib/etcd# 验证查看集群docker exec -ti etcd ash etcdctl member listetcdctl cluster-healthexit
配置keepalived
在3台master
节点操作
# 载入内核相关模块lsmod | grep ip_vs modprobe ip_vs# 启动keepalived# eth1为本次实验11.11.11.0/24网段的所在网卡docker run --net=host --cap-add=NET_ADMIN \ -e KEEPALIVED_INTERFACE=eth1 \ -e KEEPALIVED_VIRTUAL_IPS="#PYTHON2BASH:['11.11.11.110']" \ -e KEEPALIVED_UNICAST_PEERS="#PYTHON2BASH:['11.11.11.111','11.11.11.112','11.11.11.113']" \ -e KEEPALIVED_PASSWORD=hello \ --name k8s-keepalived \ --restart always \ -d osixia/keepalived:1.4.4# 查看日志# 会看到两个成为backup 一个成为masterdocker logs k8s-keepalived# 此时会配置 11.11.11.110 到其中一台机器# ping测试ping -c4 11.11.11.110# 如果失败后清理后,重新实验docker rm -f k8s-keepalived ip a del 11.11.11.110/32 dev eth1
在第一台master节点初始化
# 生成token# 保留token后面还要使用token=$(kubeadm token generate)echo $token# 生成配置文件# advertiseAddress 配置为VIP地址cat >kubeadm-master.config<<EOF apiVersion: kubeadm.k8s.io/v1alpha1 kind: MasterConfiguration kubernetesVersion: v1.10.3 imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers api: advertiseAddress: 11.11.11.110 apiServerExtraArgs: endpoint-reconciler-type: lease controllerManagerExtraArgs: node-monitor-grace-period: 10s pod-eviction-timeout: 10s networking: podSubnet: 10.244.0.0/16 etcd: endpoints: - "http://11.11.11.111:2379" - "http://11.11.11.112:2379" - "http://11.11.11.113:2379"apiServerCertSANs: - "lab1"- "lab2"- "lab3"- "11.11.11.111"- "11.11.11.112"- "11.11.11.113"- "11.11.11.110"- "127.0.0.1"token: $tokentokenTTL: "0"featureGates: CoreDNS: trueEOF# 初始化kubeadm init --config kubeadm-master.config systemctl enable kubelet# 保存初始化完成之后的join命令# kubeadm join 11.11.11.110:6443 --token nevmjk.iuh214fc8i0k3iue --discovery-token-ca-cert-hash sha256:0e4f738348be836ff810bce754e059054845f44f01619a37b817eba83282d80f# 配置kubectl使用mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config# 安装网络插件# 下载配置mkdir flannel && cd flannel wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml# 修改配置# 此处的ip配置要与上面kubeadm的pod-network一致 net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan" } }# 修改镜像image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64# 如果Node有多个网卡的话,参考flannel issues 39701,# https://github.com/kubernetes/kubernetes/issues/39701# 目前需要在kube-flannel.yml中使用--iface参数指定集群主机内网网卡的名称,# 否则可能会出现dns无法解析。容器无法通信的情况,需要将kube-flannel.yml下载到本地,# flanneld启动参数加上--iface=<iface-name> containers: - name: kube-flannel image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr - --iface=eth1# 启动kubectl apply -f kube-flannel.yml# 查看kubectl get pods -n kube-system kubectl get svc -n kube-system# 设置master允许部署应用pod,参与工作负载,现在可以部署其他系统组件# 如 dashboard, heapster, efk等kubectl taint nodes --all node-role.kubernetes.io/master-
启动其他master节点
# 打包第一台master初始化之后的/etc/kubernetes/pki目录cd /etc/kubernetes && tar czvf /root/pki.tgz pki/ && cd ~# 上传到其他master的/etc/kubernetes目录下tar xf pki.tgz -C /etc/kubernetes/# 复制启动第一台master时的配置文件到其他master节点# 初始化kubeadm init --config kubeadm-master.config systemctl enable kubelet# 配置kubectl使用mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config# 在第一台配置master节点查看kubectl get pod --all-namespaces -o wide | grep lab1 kubectl get pod --all-namespaces -o wide | grep lab2 kubectl get pod --all-namespaces -o wide | grep lab3 kubectl get nodes -o wide
启动node节点
# 加入master节点 # 这个命令是之前初始化master完成时,输出的命令kubeadm join 11.11.11.110:6443 --token nevmjk.iuh214fc8i0k3iue --discovery-token-ca-cert-hash sha256:0e4f738348be836ff810bce754e059054845f44f01619a37b817eba83282d80fsystemctl enable kubelet
测试
重建多个coredns副本
# 删除coredns的podskubectl get pods -n kube-system -o wide | grep coredns all_coredns_pods=$(kubectl get pods -n kube-system -o wide | grep coredns | awk '{print $1}' | xargs)echo $all_coredns_podskubectl delete pods $all_coredns_pods -n kube-system# 修改副本数# replicas: 3# 可以修改为node节点的个数kubectl edit deploy coredns -n kube-system# 查看状态kubectl get pods -n kube-system -o wide | grep coredns
基础测试
1. 启动
# 直接使用命令测试kubectl run nginx --replicas=2 --image=nginx:alpine --port=80 kubectl expose deployment nginx --type=NodePort --name=example-service-nodeport kubectl expose deployment nginx --name=example-service# 使用配置文件测试cat >example-nginx.yml<<EOF apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx spec: replicas: 2 template: metadata: labels: app: nginx spec: restartPolicy: Always containers: - name: nginx image: nginx:alpine ports: - containerPort: 80 livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 10 periodSeconds: 3 readinessProbe: httpGet: path: / port: 80 initialDelaySeconds: 10 periodSeconds: 3 --- kind: Service apiVersion: v1 metadata: name: example-service spec: selector: app: nginx ports: - name: http port: 80 targetPort: 80 --- kind: Service apiVersion: v1 metadata: name: example-service-nodeport spec: selector: app: nginx type: NodePort ports: - name: http-nodeport port: 80 nodePort: 32223 EOF kubectl apply -f example-nginx.yml
2. 查看状态
kubectl get deploy kubectl get pods kubectl get svc kubectl describe svc example-service
3. DNS解析
kubectl run curl --image=radial/busyboxplus:curl -i --tty nslookup kubernetes nslookup example-service curl example-service# 如果时间过长会返回错误,可以使用如下方式再进入测试curlPod=$(kubectl get pod | grep curl | awk '{print $1}') kubectl exec -ti $curlPod -- sh
4. 访问测试
# 10.96.59.56 为查看svc时获取到的clusteripcurl "10.96.59.56:80"# 32223 为查看svc时获取到的 nodeporthttp://11.11.11.114:32223/http://11.11.11.115:32223/
3. 清理删除
kubectl delete svc example-service example-service-nodeport kubectl delete deploy nginx curl
高可用测试
任意关闭master
节点测试集群是能否正常执行上一步的基础测试
,查看相关信息,只关闭到只一台master
,因为etcd
部署在相应的master
节点上,如果关闭了两台,会造成etcd
不可用,进而让整个集群不可用。
kubectl get pod --all-namespaces -o wide kubectl get pod --all-namespaces -o wide | grep lab1 kubectl get pod --all-namespaces -o wide | grep lab2 kubectl get pod --all-namespaces -o wide | grep lab3 kubectl get nodes -o wide
注意事项
当直接把
node
节点关闭时,只有过了5分钟
之后,上面的pod才会被检测到有问题,并迁移到其他节点如果想快速迁移可以执行
kubectl delete node
也可以修改
controller-manager的
的pod-eviction-timeout
参数,默认5mnode-monitor-grace-period
参数,默认40s此方案和之前文章中写的高可用方案相比,缺点就是不能使用
kube-apiserver
多节点负载均衡的功能。所有对kube-apiserver
的请求都只会发给一个master
节点,只有当这个master
节点挂掉之后,才会把所有有请求发给另外的master
。
作者:CountingStars_
链接:https://www.jianshu.com/p/cb8b05d23bae
。