Kubernetes v1.27.6 三节点(Cilium+API Gateway)完整部署文档

Kubernetes v1.27.6 三节点(Cilium+API Gateway)完整部署文档

平台:Rocky Linux 8
节点(干净空白系统):

  • k8s-node01 — 192.168.1.233(CP+Worker)
  • k8s-node02 — 192.168.1.234(CP+Worker)
  • k8s-node03 — 192.168.1.235(CP+Worker)
    API LB(HAProxy):192.168.1.203(运行在 192.168.1.203)(集群外)
    CNI:Cilium
    LoadBalancer(裸机):MetalLB (L2)
    Ingress / API Gateway
    分布式存储:Rook v1.16 + Ceph v19.2(OSD 使用每节点 /dev/sdb
    目标:所有 control-plane 节点也可调度 Pod(去掉 NoSchedule 污点),提供动态可扩容的持久卷(RBD,StorageClass 支持扩容)。

说明:本文档为可复制执行的操作手册,按步骤执行即可。执行前请在测试环境演练;执行会擦除 /dev/sdb 上数据(Rook/Ceph 部分)。


目录

  1. 先决条件与准备
  2. 节点通用初始化(系统配置、containerd、kubeadm 安装)
  3. HAProxy 作为 external LB(在 192.168.1.233)配置
  4. kubeadm 初始化(controlPlane via 192.168.1.233)与其它 control-plane 加入
  5. 使 control-plane 也做 worker(移除污点)
  6. 安装 Cilium(CNI)
  7. 安装 MetalLB(L2)并配置 IPPool
  8. 安装 Gateway API + Kong
  9. 安装 Istio(服务网格)
  10. 部署 Rook v1.16 + Ceph v19.2(动态扩容存储) — 最详细部分
  11. 测试用例(网络、LB、Ingress、Istio、存储、扩容)
  12. 常用调试命令与故障排查
  13. 备份、监控与日常运维建议
  14. 附:关键配置文件(可直接复制)

1. 先决条件与风险提示

  • 三节点网络互通,时间同步(建议 chrony/ntp)。
  • 三台均为 Rocky 8,root 权限。
  • 每台有空盘 /dev/sdb(会被格式化并用于 Ceph OSD)。务必备份数据
  • 集群将运行 kubeadm v1.27.6。
  • 本文档多处需要 kubectlhelmistioctlcilium 等工具,请确保能从管理主机使用。
  • 操作对系统变动较多,请按步骤执行并在每步确认无误再进入下一步。

2. 节点通用初始化(在 3 台节点上执行)

以 root 执行或 sudo。

2.1 主机名与 hosts

在三节点分别设置主机名:

hostnamectl set-hostname k8s-node01   # 在 233
hostnamectl set-hostname k8s-node02   # 在 234
hostnamectl set-hostname k8s-node03   # 在 235

将集群名写入 /etc/hosts(在三台都写):

192.168.1.233 k8s-node01
192.168.1.234 k8s-node02
192.168.1.235 k8s-node03

# 设置时区:
timedatectl set-timezone Asia/Shanghai

# 安装rsyslog(检测默认是否有/var/log/message日志文件)
dnf install rsyslog -y
systemctl start rsyslog && systemctl enable rsyslog

设置系统代理(如有)

# 创建全局代理配置
mkdir -p /etc/systemd/system.conf.d/
tee /etc/systemd/system.conf.d/proxy.conf <<EOF
[Manager]
DefaultEnvironment="http_proxy=http://proxy-server:port"
DefaultEnvironment="https_proxy=http://proxy-server:port"
DefaultEnvironment="no_proxy=localhost,127.0.0.1,192.168.0.0/16,.svc,.svc.cluster.local,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster.local"  # 设置本地和内网不走代理
EOF

tee /etc/systemd/system.conf.d/proxy.conf <<EOF
[Manager]
DefaultEnvironment="http_proxy=http://192.168.1.44:7890"
DefaultEnvironment="https_proxy=http://192.168.1.44:7890"
DefaultEnvironment="no_proxy=localhost,127.0.0.1,192.168.0.0/16,.svc,.svc.cluster.local,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster.local"  # 设置本地和内网不走代理
EOF


# 重新加载 systemd 配置
sudo systemctl daemon-reload

# 重启需要代理的服务
sudo systemctl restart <service-name>

2.2 关闭 swap、设置内核参数并加载 br_netfilter

swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab

cat >/etc/modules-load.d/k8s.conf <<EOF
br_netfilter
EOF
modprobe br_netfilter

cat >/etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.vs.ignore_no_rs_error=1
net.ipv4.ip_forward=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_timestamps=1
net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh3=4096
vm.swappiness=0
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576
fs.file-max=52706963
fs.nr_open=52706963
net.ipv6.conf.all.disable_ipv6=1
net.netfilter.nf_conntrack_max=2310720
EOF
sysctl --system

2.3 安装必要工具与关闭防火墙(按需)

# 跟新系统包到最新
dnf update -y
# 安装工具和依赖
dnf install -y dnf-utils  ipvsadm  telnet  wget  net-tools  conntrack  ipset  jq  iptables  curl  sysstat  libseccomp  socat  nfs-utils  fuse fuse-devel vim

# 关闭selinux
setenforce 0
sed -i 's/^SELINUX=enforcing/SELINUX=disable/' /etc/selinux/config
# 关闭防火墙
systemctl stop firewalld
systemctl disable --now firewalld   # 若保留 firewall,请开放 kubeadm 所需端口

2.4 安装 containerd(systemd cgroup)

# 删除原有容器运行时
dnf remove docker \
                  docker-client \
                  docker-client-latest \
                  docker-common \
                  docker-latest \
                  docker-latest-logrotate \
                  docker-logrotate \
                  docker-engine


# 配置 repository
dnf -y install dnf-plugins-core
dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

# 安装 containerd
dnf -y install  containerd

# 配置 containerd
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
#编辑:/etc/containerd/config.toml 替换pause镜像(有网络问题时执行)默认sandbox_image为谷歌镜像
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"  # 注意调整版本

#安装crictl命令
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.35.0/crictl-v1.35.0-linux-amd64.tar.gz
tar -zxvf crictl-v1.35.0-linux-amd64.tar.gz -C /usr/local/bin
# 验证安装
crictl --version

# 配置crictl命令 不配置在拉取镜像时无法正常拉取: 
cat <<EOF> /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF

# 配置containerd网络代理(如有)
mkdir -p /etc/systemd/system/containerd.service.d

tee /etc/systemd/system/containerd.service.d/http-proxy.conf <<EOF
[Service]
Environment="HTTP_PROXY=http://proxy-server:port"
Environment="HTTPS_PROXY=http://proxy-server:port"
Environment="NO_PROXY=localhost,127.0.0.1,192.168.1.0/24"  # 设置本地和内网不走代理
EOF

tee /etc/systemd/system/containerd.service.d/http-proxy.conf <<EOF
[Service]
Environment="HTTP_PROXY=http://192.168.1.44:7890"
Environment="HTTPS_PROXY=http://192.168.1.44:7890"
Environment="NO_PROXY=localhost,127.0.0.1,192.168.1.0/24"  # 设置本地和内网不走代理
EOF

# 启动containerd
systemctl daemon-reload
systemctl enable --now containerd

2.5 安装helm

wget https://get.helm.sh/helm-v4.0.2-linux-amd64.tar.gz
tar -zxvf helm-v4.0.2-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm

# 验证安装
helm version

2.6 安装 kubeadm/kubelet/kubectl v1.27.6

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

dnf install -y kubeadm-1.27.6-0 kubelet-1.27.6-0 kubectl-1.27.6-0 --disableexcludes=kubernetes
systemctl enable --now kubelet
# 可锁定版本
# dnf install -y python3-jinja2  # 若生成模板脚本会用到

3. HAProxy 作为 external LB

你选择将 HAProxy 放在 192.168.1.233(203 是 HAProxy 对外 IP)。下面示例假设 HAProxy 进程绑定在 0.0.0.0:6443 并将请求轮询到 3 个 apiserver 的 6443。

3.1 在 k8s-node01 安装 HAProxy

dnf install -y haproxy

3.2 编辑 /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    maxconn 2000
    user haproxy
    group haproxy
    daemon
    stats socket /run/haproxy/admin.sock mode 660 level admin

defaults
    mode tcp
    log global
    option tcplog
    timeout connect 10s
    timeout client  1m
    timeout server  1m

listen k8s-api
    bind 0.0.0.0:6443
    mode tcp
    option tcplog
    option tcp-check
    balance roundrobin
    server k8s-node01 192.168.1.233:6443 check fall 3 rise 2
    server k8s-node02 192.168.1.234:6443 check fall 3 rise 2
    server k8s-node03 192.168.1.235:6443 check fall 3 rise 2

注意:如果 HAProxy 运行在 233,但 apiserver 也在 233:后端仍写 192.168.1.233:6443。

重启 HAProxy:

systemctl enable --now haproxy
systemctl status haproxy

测试:

telnet 192.168.1.203 6443
# 或
ss -lnt | grep 6443

4. kubeadm 初始化

(在 k8s-node01 执行,controlPlaneEndpoint 指向 HAProxy 地址 192.168.1.203:6443)

创建 kubeadm-config.yaml

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.27.6
controlPlaneEndpoint: "192.168.1.203:6443"
networking:
  podSubnet: "10.244.0.0/16"

---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "192.168.1.233"
  bindPort: 6443
nodeRegistration:
  criSocket: /run/containerd/containerd.sock
  kubeletExtraArgs:
    cgroup-driver: "systemd"

执行:

kubeadm init --config=kubeadm-config.yaml --upload-certs

记下 kubeadm join 输出(token、ca-cert-hash、certificate-key),用于其它 control-plane 与 worker 加入。

配置 kubectl(普通用户):

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

5. 在其它节点加入

假设你要把三台都做 control-plane: control-plane(node02 / node03)
在 node02/node03 使用 kubeadm join(使用 init 输出的 --control-plane --certificate-key <key>):

kubeadm join 192.168.1.203:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane --certificate-key <cert-key>

随后在 k8s-node01 上确认:

kubectl get nodes -o wide
kubectl get cs

6. 允许 control-plane 调度 Pod(移除 NoSchedule 污点)

默认 kubeadm 可能给 control-plane 打 node-role.kubernetes.io/control-plane:NoSchedule 污点。移除它使节点既是 master 也是 worker:

kubectl taint nodes k8s-node01 node-role.kubernetes.io/control-plane:NoSchedule-
kubectl taint nodes k8s-node02 node-role.kubernetes.io/control-plane:NoSchedule-
kubectl taint nodes k8s-node03 node-role.kubernetes.io/control-plane:NoSchedule-

验证:

kubectl get nodes -o wide
kubectl describe node k8s-node01 | grep -i taint -A2

7. 网络插件安装

在任一有 kubectl 的节点:

7.1 安装 cilium

1️⃣安装CLI

curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-amd64.tar.gz
tar xzvf cilium-linux-amd64.tar.gz
mv cilium /usr/local/bin/

2️⃣安装 Cilium

# clusterPoolIPv4PodCIDR ipv4NativeRoutingCIDR 需与前面集群配置一致
cilium install \
  --version 1.18.3 \
  \
  --set kubeProxyReplacement=false \
  \
  --set ipam.mode=kubernetes \
  --set routingMode=native \
  --set ipv4NativeRoutingCIDR=10.244.0.0/16 \
  \
  --set enableIPv4Masquerade=true \
  \
  --set enableNodePort=false \
  --set enableLoadBalancer=false \
  --set enableHostPort=false \
  \
  --set hubble.enabled=false



#安装完成后,确认 Cilium 是否已正确部署并运行:
kubectl get pods -n kube-system -l k8s-app=cilium
NAME           READY   STATUS    RESTARTS   AGE
cilium-frd9m   1/1     Running   0          4h8m
cilium-jbkjb   1/1     Running   0          4h8m
cilium-kmvrc   1/1     Running   0          4h8m

kubectl get gatewayclass
NAME     CONTROLLER                     ACCEPTED   AGE
cilium   io.cilium/gateway-controller   True       23s


注:若日后需要启用 kube-proxy replacement,可在稳态集群上再配置 cilium install --set kubeProxyReplacement=strict ...。启用前请确保内核、containerd 与 socketLB 的支持满足 Cilium 要求。


7.2安装metaLB(L2 模式)

安装 manifest(官方)并创建 IP 地址池。

1️⃣ 安装 MetalLB

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/main/config/manifests/metallb-native.yaml

kubectl get pods -n metallb-system
NAME                          READY   STATUS    RESTARTS   AGE
controller-5b46566d45-tpgvm   1/1     Running   0          39s
speaker-rcwsc                 1/1     Running   0          39s
speaker-spsgf                 1/1     Running   0          39s
speaker-zjb5p                 1/1     Running   0          39s

2️⃣ 配置 IPPool

保存为 metallb-pool.yaml

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: external-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.1.169/32  # 你分配给集群的外部 IP
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: l2
  namespace: metallb-system
spec:
  ipAddressPools:
  - external-pool

应用:

kubectl apply -f metallb-pool.yaml

kubectl get ipaddresspools -n metallb-system

NAME            AUTO ASSIGN   AVOID BUGGY IPS   ADDRESSES
external-pool   true          false             ["192.168.1.169/32"]

验证:部署一个 LoadBalancer 类型的 nginx 并确认分配 IP。


8. 安装 Gateway

8.1 安装 Gateway API CRDs

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml


# 或者下载yaml文件到本地:
# http crd
wget -O standard-install.yaml https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.0.0/standard-install.yaml
# tcp crd
wget -O experimental-install.yaml https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.0.0/experimental-install.yaml


8.2 安装 Envoy Gateway

kubectl apply --server-side -f https://github.com/envoyproxy/gateway/releases/download/latest/install.yaml

验证 Kong Pod 就绪:

kubectl get pods -n envoy-gateway-system
kubectl get svc  -n envoy-gateway-system

8.3 创建GatewayClass

gatewayclass.yaml

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: envoy-gateway
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller

kubectl apply -f gatewayclass.yaml
kubectl get gatewayclass

8.4 创建 Gateway(HTTP + TCP)

gateway.yaml

# gateway.yaml - 多端口配置
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg-gateway
spec:
  gatewayClassName: envoy-gateway
  listeners:
    # HTTP 监听器
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All
        kinds:
          - kind: HTTPRoute

    # TCP 监听器组
    - name: tcp-db
      protocol: TCP
      port: 3306
      allowedRoutes:
        namespaces:
          from: All
        kinds:
          - kind: TCPRoute

    - name: tcp-redis
      protocol: TCP
      port: 6379
      allowedRoutes:
        namespaces:
          from: All
        kinds:
          - kind: TCPRoute

    - name: tcp-ssh
      protocol: TCP
      port: 22
      allowedRoutes:
        namespaces:
          from: All
        kinds:
          - kind: TCPRoute
kubectl apply -f gateway.yaml

kubectl get gateway
NAME         CLASS           ADDRESS         PROGRAMMED   AGE
eg-gateway   envoy-gateway   192.168.1.169   True         18h

kubectl describe gateway eg-gateway

8.5 部署http 测试用例

kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80

HTTPRoute

referencegrant-grafana.yaml 或者使用Route与service同namespace,这样就不需要这个授权,再route中指定gateway的ns即可

跨namespace访问时需要创建授权

apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
  name: allow-grafana
  namespace: monitoring
spec:
  from:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    namespace: default
  to:
  - group: ""             # core API group
    kind: Service
    name: kube-prometheus-grafana

httproute.yaml

# 根据不同的svc做url路由
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: production-routes
  namespace: default  # 注意与service同ns,即可不需要上面的referencegrant授权
spec:
  parentRefs:
    - name: eg-gateway
      namespace: default
      sectionName: http
      port: 80

  rules:
    # nginx
    - matches:
        - path:
            type: PathPrefix
            value: /nginx
      filters:
        - type: URLRewrite
          urlRewrite:
            path:
              type: ReplacePrefixMatch
              replacePrefixMatch: /
      backendRefs:
        - name: nginx
          port: 80

    # grafana
    - matches:
        - path:
            type: PathPrefix
            value: /grafana
      filters:
        - type: RequestHeaderModifier
          requestHeaderModifier:
            add:
              - name: X-Forwarded-Prefix
                value: /grafana
      backendRefs:
        - name: kube-prometheus-grafana
          namespace: monitoring
          port: 80

8.6 部署tcp测试用例

mysql_dy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
        - name: mysql
          image: mysql:8.0
          env:
            - name: MYSQL_ROOT_PASSWORD
              value: "root123"
          ports:
            - containerPort: 3306
---
apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  selector:
    app: mysql
  ports:
    - name: tcp
      protocol: TCP
      port: 3306
      targetPort: 3306
  type: ClusterIP

TCPRoute

tcproute.yaml

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TCPRoute
metadata:
  name: tcp-route
  namespace: default
spec:
  parentRefs:
    - name: eg-gateway   # 你创建的 Gateway 名称
      sectionName: tcp-db # 对应 Gateway 中的 tcp-db 监听器
  rules:
    - backendRefs:
        - name: mysql  # Service 名称
          port: 3306
          weight: 1

9. Rook v1.16 + Ceph v19.2(动态扩容存储) —— 详细步骤

目标:创建 RBD StorageClass,启用 allowVolumeExpansion: true,并演示 PVC 扩容流程。

9.0 前置:再次确认 /dev/sdb 空且可用(在三台)

在三节点分别运行并确认:

lsblk
fdisk -l /dev/sdb
# 若有分区则清理(危险操作)
sgdisk --zap-all /dev/sdb
wipefs -a /dev/sdb

9.1 安装 Rook CRDs、common、operator(v1.16.8 示例)

kubectl create namespace rook-ceph || true
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.16.8/deploy/examples/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.16.8/deploy/examples/common.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.16.8/deploy/examples/operator.yaml
kubectl -n rook-ceph get pods -w

等待 rook-ceph-operatorrook-discover 就绪。

9.2 创建 CephCluster(ceph v19.2.0 示例)

保存为 ceph-cluster.yaml(已经为你的主机和 /dev/sdb 定制):

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v19.2.0
  dataDirHostPath: /var/lib/rook
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    modules:
      - name: dashboard
        enabled: true
  dashboard:
    enabled: true
  network:
    hostNetwork: false
  storage:
    useAllNodes: false
    useAllDevices: false
    nodes:
      - name: k8s-node01
        devices:
          - name: "sdb"
      - name: k8s-node02
        devices:
          - name: "sdb"
      - name: k8s-node03
        devices:
          - name: "sdb"

应用并等待 pods 启动与 OSD 准备:

kubectl apply -f ceph-cluster.yaml
kubectl -n rook-ceph get cephcluster rook-ceph -o yaml
kubectl -n rook-ceph get pods -o wide
# 观察 rook-ceph-osd-* prepare/install 日志

9.3 创建 CephBlockPool(replicated pool)

保存 ceph-block-pool.yaml

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3

应用:

kubectl apply -f ceph-block-pool.yaml

9.4 创建 RBD StorageClass(支持扩容)

保存 storageclass-rbd.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: replicapool
  imageFormat: "2"
  imageFeatures: layering
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

应用:

kubectl apply -f storageclass-rbd.yaml
kubectl get sc

9.5 安装 toolbox(方便运行 ceph 工具)

kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.16.8/deploy/examples/toolbox.yaml
kubectl -n rook-ceph get pods -l app=rook-ceph-tools
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
# 在容器内执行
ceph -s
ceph osd tree

9.6 创建初始 PVC(5Gi)

pvc-test.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
spec:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 5Gi
  storageClassName: rook-ceph-block

创建并挂载测试 Pod:

kubectl apply -f pvc-test.yaml
# 创建一个 Pod 挂载并测试
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: pvc-tester
spec:
  containers:
  - name: c
    image: busybox
    command: ["sleep", "3600"]
    volumeMounts:
    - mountPath: /data
      name: vol
  volumes:
  - name: vol
    persistentVolumeClaim:
      claimName: test-pvc
EOF

kubectl exec -it pvc-tester -- sh -c "df -h /data; echo hello >/data/hello; cat /data/hello"

9.7 扩容 PVC(从 5Gi -> 10Gi)

编辑 PVC 的 spec.resources.requests.storage 或使用 kubectl patch

kubectl patch pvc test-pvc -p '{"spec": {"resources": {"requests": {"storage": "10Gi"}}}}'

确认 PVC 状态:

kubectl get pvc test-pvc
kubectl describe pvc test-pvc

CSI 控制器会触发扩容;需注意文件系统在线扩容(如果 Pod 已挂载,部分文件系统自动扩容,否则需要在 Pod 内运行扩容命令,如 resize2fs,但 RBD CSI & kubelet 通常支持在线扩容)。

验证大小:

kubectl exec -it pvc-tester -- sh -c "df -h /data"

若未反映新大小,可能需要在 Pod 内运行文件系统扩展命令(视文件系统而定)。


10. 常用检测与排障命令(速查)

# Kubernetes
kubectl get nodes
kubectl get pods -A
kubectl describe pod <pod> -n <ns>
kubectl logs <pod> -n <ns>

# HAProxy
systemctl status haproxy
echo "show stat" | socat stdio /run/haproxy/admin.sock

# Cilium
cilium status
kubectl -n kube-system get pods -l k8s-app=cilium

# MetalLB
kubectl -n metallb-system get all

# Istio
kubectl -n istio-system get pods

# Rook/Ceph
kubectl -n rook-ceph get pods
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
kubectl -n rook-ceph get cephcluster
kubectl -n rook-ceph get cephblockpool
kubectl -n rook-ceph logs -l app=rook-ceph-osd -c prepare
posted @ 2026-02-05 09:58  蒲公英PGY  阅读(42)  评论(0)    收藏  举报