1. 需求描述当前有十台主机,主机IP为192.168.56.101-110
,用户为voidking
,具有sudo权限。 现在想要用这十台主机搭建一个K8S集群,版本1.22.15,其中3台作为master,7台作为worker。
参考文档:
2. 安装思路最简单的方法,我们可以使用kubeadm逐个主机进行安装配置,参考《kubeadm安装部署K8S集群——CentOS篇》 。
但是挨个安装效率太低了,所以我们希望能够批量安装,使用 ansible+kubeadm 。有一个kubeadm-ansible 项目,可以使用ansible+kubeadm实现批量安装,但是已经废弃了(两年没有更新,issue也没人处理)。 因此,需要自行实现ansible+kubeadm的批量安装方式。
此外,因为要3台master配置高可用,所以还要安装keepalived+haproxy。
整体操作流程:
管理节点安装配置ansible 所有节点安装docker 所有节点允许iptables检查桥接流量 所有节点安装kubeadm+kubelet+kubectl master节点安装keepalived+haproxy 初始化master节点(创建集群) worker节点加入到集群中 测试使用 为方便描述,下文中使用主机IP的最后一段作为该主机的简称,例如:192.168.56.101
简称为101
。
3. 安装配置ansible参考《Ansible入门篇》 ,选择一个节点作为管理节点(这里选择101节点),安装ansible,配置管理节点免密登录其他节点。
1、管理节点,工作目录为 /home/voidking
2、/home/voidking/hosts
文件内容为
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 [common] 192.168.56.101 192.168.56.102 192.168.56.103 192.168.56.104 192.168.56.105 192.168.56.106 192.168.56.107 192.168.56.108 192.168.56.109 192.168.56.110 [common:vars] [master] 192.168.56.101 192.168.56.102 192.168.56.103 [master:vars] [worker] 192.168.56.104 192.168.56.105 192.168.56.106 192.168.56.107 192.168.56.108 192.168.56.109 192.168.56.110 [worker:vars]
3、权限测试 (1) 准备测试文件
(2) 编写剧本 copy.yaml
1 2 3 4 5 6 7 8 9 --- - hosts: common remote_user: voidking become_method: sudo tasks: - name: "cp test" copy: src: /home/voidking/test.txt dest: /tmp/
(3) 使用sudo拷贝配置到客户机
1 ansible-playbook -i hosts copy.yaml
如果能正常拷贝,说明sudo是允许执行/bin/sh
命令的,没问题。
如果不能正常拷贝,提示:对不起,用户 voidking 无权以 root 的身份在 xxx 上执行 /bin/sh 说明sudo权限被限制了,不能sudo执行/bin/sh
命令,那么只能换一种方式进行拷贝:添加客户机对管理机的免密访问,然后从客户机发起scp拷贝命令。
方法一:把管理机的id_rsa
拷贝到所有客户机,这样其实所有机器都是管理机了,不够优雅 方法二:把所有客户机的id_rsa.pub
添加到管理机的authrized_keys
中,本文选择这种方法
1 2 3 ansible all -i hosts -m command -a "ssh-keygen -t rsa -q -P '' -f ~/.ssh/id_rsa" ansible all -i hosts -m command -a "cat ~/.ssh/id_rsa.pub" | grep -v "SUCCESS" | tee -a ~/.ssh/authorized_keys ansible all -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/test.txt /tmp/"
下文以sudo权限受限为前提进行部署,如果可以使用ansible copy的话会更加简单。
4. 安装docker1、安装docker
1 2 3 4 5 6 ansible all -i hosts -m command -a "sudo yum install -y yum-utils" ansible all -i hosts -m command -a "sudo yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo" ansible all -i hosts -m command -a "sudo yum makecache fast" ansible all -i hosts -m command -a "sudo yum install docker-ce -y" ansible all -i hosts -m command -a "sudo systemctl start docker" ansible all -i hosts -m command -a "sudo systemctl enable docker"
2、安装docker-compose
1 2 3 4 5 curl -L "https://github.com/docker/compose/releases/download/v2.11.1/docker-compose-$(uname -s) -$(uname -m) " -o docker-compose chmod a+x docker-composeansible all -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/docker-compose ~/" ansible all -i hosts -m command -a "sudo cp ~/docker-compose /usr/local/bin/" ansible all -i hosts -m command -a "sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose"
5. 配置代理(可选)参考文档《Linux配置网络代理》
这里一定要注意:如果配置了网络代理,那么在执行kubeadm init
的时候,环境变量中的网络代理会自动配置到组件中,引发一些奇奇怪怪的问题。
5.1. 配置上网代理1、.bashrc
中添加代理配置
1 2 3 export http_proxy=http://192.168.56.1:7890export https_proxy=http://192.168.56.1:7890export no_proxy=127.0.0.1,localhost,192.168.56.0/24,10.96.0.0/12,10.244.0.0/16,172.31.0.0/16
2、拷贝到所有节点
1 ansible all -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/.bashrc ~/.bashrc"
5.2. 配置docker代理1、创建配置文件
1 2 mkdir -p ~/etc/systemd/system/docker.service.dvim ~/etc/systemd/system/docker.service.d/http-proxy.conf
内容为:
1 2 3 4 [Service] Environment="HTTP_PROXY=http://192.168.56.1:7890" Environment="HTTPS_PROXY=http://192.168.56.1:7890" Environment="NO_PROXY=127.0.0.1,localhost,192.168.56.0/24,10.96.0.0/12,10.244.0.0/16,172.31.0.0/16"
2、拷贝到所有节点
1 2 3 4 ansible all -i hosts -m command -a "mkdir -p ~/etc/systemd/system/docker.service.d" ansible all -i hosts -m command -a "sudo mkdir -p /etc/systemd/system/docker.service.d" ansible all -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/etc/systemd/system/docker.service.d/http-proxy.conf ~/etc/systemd/system/docker.service.d/" ansible all -i hosts -m command -a "sudo cp ~/etc/systemd/system/docker.service.d/http-proxy.conf /etc/systemd/system/docker.service.d/http-proxy.conf"
3、重启docker
1 2 3 ansible all -i hosts -m command -a "sudo systemctl daemon-reload" ansible all -i hosts -m command -a "sudo systemctl restart docker" sudo docker info | grep Proxy
6. 允许iptables检查桥接流量1、 准备配置文件
1 2 3 4 5 6 7 8 9 mkdir -p etc/modules-load.d/mkdir -p etc/sysctl.d/cat <<EOF | tee etc/modules-load.d/k8s.conf br_netfilter EOF cat <<EOF | tee etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 EOF
2、 拷贝配置到所有节点
1 2 3 4 5 6 ansible all -i hosts -m command -a "mkdir -p ~/etc/modules-load.d/" ansible all -i hosts -m command -a "mkdir -p ~/etc/sysctl.d/" ansible all -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/etc/modules-load.d/k8s.conf ~/etc/modules-load.d/" ansible all -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/etc/sysctl.d/k8s.conf ~/etc/sysctl.d/" ansible all -i hosts -m command -a "sudo cp ~/etc/modules-load.d/k8s.conf /etc/modules-load.d/" ansible all -i hosts -m command -a "sudo cp ~/etc/sysctl.d/k8s.conf /etc/sysctl.d/"
7. 安装kubeadm+kubelet+kubectl7.1. 准备kubernetes.repo配置1、当前目录下,创建kubernetes.repo文件
1 2 3 4 5 6 7 8 9 10 11 mkdir -p etc/yum.repos.d/cat <<EOF | tee etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-$basearch enabled=1 gpgcheck=1 repo_gpgcheck=0 gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg exclude=kubelet kubeadm kubectl EOF
如果不能连通google,kubernetes.repo中的baseurl就替换成aliyun
1 2 3 4 5 6 7 cat <<EOF | tee etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=0 EOF
2、拷贝配置到所有节点
1 2 3 ansible all -i hosts -m command -a "mkdir -p ~/etc/yum.repos.d/" ansible all -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/etc/yum.repos.d/kubernetes.repo ~/etc/yum.repos.d/" ansible all -i hosts -m command -a "sudo cp ~/etc/yum.repos.d/kubernetes.repo /etc/yum.repos.d/"
7.2. SELinux设置为permissive模式1 2 ansible all -i hosts -m command -a "sudo setenforce 0" ansible all -i hosts -m command -a "sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config"
7.3. 安装kubeadm+kubelet+kubectl1 2 ansible all -i hosts -m command -a "sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes" ansible all -i hosts -m command -a "sudo systemctl enable --now kubelet"
如果想要安装指定版本的k8s,比如1.22.15,那么需要安装指定版本的kubeadm、kubelet和kubectl。
1 2 3 ansible all -i hosts -m command -a "sudo yum remove -y kubelet kubeadm kubectl" sudo yum list kubeadm kubelet kubectl --showduplicatesansible all -i hosts -m command -a "sudo yum install -y kubeadm-1.22.15-0 kubelet-1.22.15-0 kubectl-1.22.15-0 --disableexcludes=kubernetes"
8. 关闭swap1 2 ansible all -i hosts -m command -a "sudo swapoff -a" ansible all -i hosts -m command -a "sudo sed -i '/ swap / s/^/#/' /etc/fstab"
9. 配置cgroup驱动1、创建docker配置文件
1 2 3 4 5 6 7 8 mkdir -p etc/docker/cat <<EOF | tee etc/docker/daemon.json { "exec-opts": [ "native.cgroupdriver=systemd" ] } EOF
2、拷贝配置文件到所有节点
1 2 3 4 ansible all -i hosts -m command -a "mkdir -p etc/docker/" ansible all -i hosts -m command -a "sudo mkdir -p /etc/docker/" ansible all -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/etc/docker/daemon.json ~/etc/docker/" ansible all -i hosts -m command -a "sudo cp -p ~/etc/docker/daemon.json /etc/docker"
3、重启docker
1 2 ansible all -i hosts -m command -a "sudo systemctl daemon-reload" ansible all -i hosts -m command -a "sudo systemctl restart docker"
10. 安装keepalived+haproxykeepalived负责高可用,haproxy负责负载均衡。详情参考《基于 keeplived+lvs 实现nginx的高可用和负载均衡》
只使用keepalived行不行?行,当一个master节点坏掉,可以自动切换到另外一个好的master节点,实现高可用。但是,会有两个master节点一直处于闲置状态。 只使用haproxy行不行?行,让一个master节点作为入口,流量可以负载均衡到3个master。但是,当入口master节点坏掉,集群也就坏掉了,不能避免单点问题。 综上,我们需要让keepalived和haproxy配合,才能实现负载均衡+高可用。keepalived提供一个VIP,后端对应master三个节点,某个时间提供服务的只有一个节点,如果提供服务的节点坏掉了,那么会自动切换到另外的节点。流量打到提供服务的master节点上的haproxy服务,haproxy把流量分发到三个master节点上的apiserver。
另外,keepalived和haproxy不一定要部署到master节点,如果有单独的机器来部署也是完全可以的。
安装keepalived+haproxy
1 ansible master -i hosts -m command -a "sudo yum install keepalived haproxy psmisc -y"
10.1. 配置haproxy三台master节点,配置haproxy,它们的配置都相同
1、创建haproxy.cfg文件
1 2 mkdir -p etc/haproxy/vim etc/haproxy/haproxy.cfg
内容为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 global log /dev/log local0 warning chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon stats socket /var/lib/haproxy/stats defaults log global option httplog option dontlognull timeout connect 5000 timeout client 50000 timeout server 50000 frontend kube-apiserver bind *:9443 # 监听9443端口 mode tcp option tcplog default_backend kube-apiserver backend kube-apiserver mode tcp option tcplog option tcp-check balance roundrobin default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100 server kube-apiserver-1 192.168.56.101:6443 check # Replace the IP address with your own. server kube-apiserver-2 192.168.56.102:6443 check # Replace the IP address with your own. server kube-apiserver-3 192.168.56.103:6443 check # Replace the IP address with your own.
2、备份&替换haproxy.cfg
1 2 3 4 ansible master -i hosts -m command -a "mkdir -p etc/haproxy/" ansible master -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/etc/haproxy/haproxy.cfg ~/etc/haproxy/" ansible master -i hosts -m command -a "sudo mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak" ansible master -i hosts -m command -a "sudo cp etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg"
3、重启haproxy
1 2 3 ansible master -i hosts -m command -a "sudo systemctl restart haproxy" ansible master -i hosts -m command -a "sudo systemctl enable haproxy" ansible master -i hosts -m command -a "sudo systemctl status haproxy"
10.2. 找一个VIP在配置keepalived之前,需要找一个没有被占用的IP作为VIP,给keepalived使用。
1 2 nmap -sP 192.168.56.0/24 cat /proc/net/arp | awk '{print $1}' | sort
通过上面的命令可以筛选出没有被占用的IP,选择其中一个作为VIP,这里选择192.169.56.200
10.3. 配置keepalived三台master节点,都配置keepalived,它们的配置有所不同
1、创建keepalived.conf文件
1 2 mkdir -p etc/keepalived/vim etc/keepalived/keepalived.conf
101机器,keepalived.conf文件内容应该为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 global_defs { notification_email { } router_id LVS_DEVEL vrrp_skip_check_adv_addr vrrp_garp_interval 0 vrrp_gna_interval 0 } vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 weight 2 } vrrp_instance haproxy-vip { state BACKUP priority 100 interface eth0 # Network card virtual_router_id 60 advert_int 1 authentication { auth_type PASS auth_pass 1111 } unicast_src_ip 192.169.56.101 # The IP address of this machine unicast_peer { 192.169.56.102 # The IP address of peer machines 192.169.56.103 } virtual_ipaddress { 192.169.56.200/24 # The VIP address } track_script { chk_haproxy } }
考虑到通用性,需要创建一个通用的配置文件,keepalived.conf内容为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 global_defs { notification_email { } router_id LVS_DEVEL vrrp_skip_check_adv_addr vrrp_garp_interval 0 vrrp_gna_interval 0 } vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 weight 2 } vrrp_instance haproxy-vip { state BACKUP priority 100 interface eth0 # Network card virtual_router_id 60 advert_int 1 authentication { auth_type PASS auth_pass 1111 } unicast_src_ip <unicast_src_ip> # The IP address of this machine unicast_peer { 192.169.56.101 # The IP address of peer machines 192.169.56.102 192.169.56.103 } virtual_ipaddress { 192.169.56.200/24 # The VIP address } track_script { chk_haproxy } }
2、创建替换配置的脚本keepalived.sh
1 vim etc/keepalived/keepalived.sh
keepalived.sh内容为:
1 2 3 4 5 #!/bin/bash source /etc/profileip=$(ifconfig eth0 | awk 'NR==2{print $2}' ) sed -i "s/ ${ip} / #${ip} /g" etc/keepalived/keepalived.conf sed -i "s/<unicast_src_ip>/${ip} /g" etc/keepalived/keepalived.conf
3、备份&替换keepalived.conf
1 2 3 4 5 6 ansible master -i hosts -m command -a "mkdir -p etc/keepalived/" ansible master -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/etc/keepalived/keepalived.conf ~/etc/keepalived/" ansible master -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/etc/keepalived/keepalived.sh ~/etc/keepalived/" ansible master -i hosts -m command -a "bash etc/keepalived/keepalived.sh" ansible master -i hosts -m command -a "sudo mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak" ansible master -i hosts -m command -a "sudo cp etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf"
5、重启keepalived
1 2 3 4 ansible master -i hosts -m command -a "sudo systemctl restart keepalived" ansible master -i hosts -m command -a "sudo systemctl enable keepalived" ansible master -i hosts -m command -a "sudo systemctl status keepalived" ansible master -i hosts -m command -a "sudo ip a"
配置成功后,ip a
查看节点IP时,会发现其中一个节点的eth0网卡有两个IP,其中eth0:0的IP是VIP 192.169.56.200
11. k8s集群部署11.1. 下载镜像文件1、导出kubeadm默认配置
1 sudo kubeadm config print init-defaults > kubeadm.conf
kubeadm.conf内容为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 apiVersion: kubeadm.k8s.io/v1beta3 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 1.2 .3 .4 bindPort: 6443 nodeRegistration: criSocket: /var/run/dockershim.sock imagePullPolicy: IfNotPresent name: node taints: null --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta3 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controllerManager: {}dns: {}etcd: local: dataDir: /var/lib/etcd imageRepository: k8s.gcr.io kind: ClusterConfiguration kubernetesVersion: 1.22 .0 networking: dnsDomain: cluster.local serviceSubnet: 10.96 .0 .0 /12 scheduler: {}
PS:导出所有默认配置
1 kubeadm config print init-defaults --component-configs KubeletConfiguration > kubeadm.conf
2、编辑kubeadm.conf,imageRepository改为国内镜像源,kubernetesVersion改为期望的版本
1 2 imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers kubernetesVersion: 1.22.15
3、下载镜像
1 sudo kubeadm config images pull --config kubeadm.conf
4、打包镜像
1 sudo docker save $(sudo docker images | grep -v REPOSITORY | awk 'BEGIN{OFS=":";ORS=" "}{print $1,$2}' ) -o all.tar
5、拷贝镜像到所有master节点
1 2 3 sudo chown voidking:voidking all.taransible master -i hosts -m command -a "scp -o StrictHostKeyChecking=no voidking@192.168.56.101:~/all.tar ~/" ansible master -i hosts -m command -a "sudo docker load -i ~/all.tar"
11.2. k8s集群配置选择一个master节点(master-0),作为第一个控制平面节点。 编辑kubeadm.conf,指定k8s集群配置,详情参考利用 kubeadm 创建高可用集群
11.2.1. controlPlaneEndpoint添加controlPlaneEndpoint相关配置,keepalived的VIP + haproxy的端口
1 2 3 4 --- apiVersion: kubeadm.k8s.io/v1beta3 kind: ClusterConfiguration controlPlaneEndpoint: "192.168.56.200:9443"
11.2.2. localAPIEndpointapiserver地址改为master本机的IP和端口
1 2 3 localAPIEndpoint: advertiseAddress: 192.168 .56 .101 bindPort: 6443
11.2.3. 指定节点名称修改 nodeRegistration.name ,把节点名称改为执行初始化的master主机的主机名
1 2 3 4 5 nodeRegistration: criSocket: /var/run/dockershim.sock name: master-0 imagePullPolicy: IfNotPresent taints: null
11.2.4. 配置pod和serviceip范围ip route
查看当前配置使用的ip范围,避开这些范围。
1 2 3 4 networking: dnsDomain: cluster.local serviceSubnet: 172.31 .0 .0 /16 podSubnet: 10.96 .0 .0 /12
关于pod ip范围和servcie ip范围的设置,可以参考子网划分详解与子网划分实例精析 和详解网络分类ABC
11.3. master-0节点初始化1、master-0节点执行初始化
1 sudo kubeadm init --config ~/kubeadm.conf --upload-certs
看到类似如下的结果:
1 2 3 4 5 6 7 8 9 10 You can now join any number of the control-plane node running the following command on each as root: kubeadm join 192.168.56.200:9443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:xxxxx \ --control-plane --certificate-key xxxxx Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.56.200:9443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:xxxxx
2、配置kubeconfig
1 2 3 mkdir -p $HOME /.kubesudo cp -i /etc/kubernetes/admin.conf $HOME /.kube/configsudo chown $(id -u):$(id -g) $HOME /.kube/config
3、查看集群状态
1 2 kubectl get nodes kubectl get pods --all-namespaces
11.4. 其他master节点加入集群1、修改hosts文件,注释掉master-0节点
2、其他master节点加入集群
1 2 3 ansible master -i hosts -m command -a "sudo kubeadm join 192.168.56.200:9443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:xxxxx \ --control-plane --certificate-key xxxxx"
11.5. worker节点加入集群1 2 ansible worker -i hosts -m command -a "sudo kubeadm join 192.168.56.200:9443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:xxxxx"
11.6. 安装CNI插件参考集群网络系统 选择一个cni插件,这里我们选择flannel。
1 kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
11.7. 查看集群状态1 2 kubectl get nodes kubectl get pods --all-namespaces