运维

运维相关知识和内容

eBPF实战:用Cilium替代iptables实现高性能K8s网络策略

eBPF实战:用Cilium替代iptables实现高性能K8s网络策略

iptables的规模化瓶颈

当Kubernetes集群达到一定规模后,iptables的性能问题开始显现:

iptables规则数量与集群规模的关系(经验值):
- 100个Service → ~2,000条iptables规则
- 1,000个Service → ~20,000条iptables规则  
- 10,000个Service → ~400,000条iptables规则

数据包处理时间(线性增长):
- 100个Service:~0.1ms
- 1,000个Service:~1ms
- 10,000个Service:~10ms(严重影响微服务通信)

eBPF(扩展伯克利数据包过滤器)通过在内核中运行JIT编译的字节码,将规则查找从O(n)降低到O(1),彻底解决这一问题。

一、Cilium安装与kube-proxy替换

1.1 前提条件检查

# 检查内核版本(Cilium要求 >= 4.9.17,推荐 >= 5.10)
uname -r
# 推荐:5.15.0 或以上

# 检查eBPF支持
ls /sys/fs/bpf/
# 应该存在此目录

# 检查挂载点
mount | grep bpf
# 应显示:bpf on /sys/fs/bpf type bpf

# 如果未挂载:
sudo mount bpffs /sys/fs/bpf -t bpf

1.2 完全替换kube-proxy的安装方式

# 方式一:新集群(使用kubeadm,跳过kube-proxy)
cat > kubeadm-config.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
networking:
  podSubnet: "10.0.0.0/8"
  serviceSubnet: "172.16.0.0/12"
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
skipPhases:
- addon/kube-proxy   # 跳过kube-proxy安装
EOF

kubeadm init --config kubeadm-config.yaml

# 方式二:现有集群迁移(先删除kube-proxy)
kubectl -n kube-system delete daemonset kube-proxy
kubectl -n kube-system delete configmap kube-proxy
# 清除已有的iptables规则
iptables-save | grep -v KUBE | iptables-restore

# 安装Cilium(替代kube-proxy)
helm repo add cilium https://helm.cilium.io/
helm repo update

API_SERVER_IP=$(kubectl get endpoints kubernetes -o jsonpath='{.subsets[0].addresses[0].ip}')
API_SERVER_PORT=6443

helm install cilium cilium/cilium \
    --version 1.16.0 \
    --namespace kube-system \
    --set kubeProxyReplacement=true \
    --set k8sServiceHost=${API_SERVER_IP} \
    --set k8sServicePort=${API_SERVER_PORT} \
    --set ipam.mode=kubernetes \
    --set ipv4NativeRoutingCIDR=10.0.0.0/8 \
    --set hubble.relay.enabled=true \
    --set hubble.ui.enabled=true \
    --set prometheus.enabled=true \
    --set operator.prometheus.enabled=true

1.3 验证安装

# 安装cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
tar xzvf cilium-linux-amd64.tar.gz
sudo mv cilium /usr/local/bin

# 验证状态
cilium status --wait
# 期望输出:
#     /¯¯\
#  /¯¯\__/¯¯\    Cilium:             OK
#  \__/¯¯\__/    Operator:           OK
#  /¯¯\__/¯¯\    Envoy DaemonSet:    OK
#  \__/¯¯\__/    Hubble Relay:       OK
#     \__/        ClusterMesh:        disabled

# 运行连通性测试
cilium connectivity test

二、高级网络策略配置

2.1 基于身份的L7策略

# cilium-policy.yaml
# Cilium的L7策略(iptables无法实现)
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: api-policy
  namespace: production
spec:
  description: "API服务的精细化流量控制"
  endpointSelector:
    matchLabels:
      app: api-service

  # 入站流量规则
  ingress:
  # 允许来自frontend的GET请求到/api路径
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/api/.*"
        - method: "POST"
          path: "/api/v1/users"
          headers:
          - 'X-API-Version: v1'

  # 允许monitoring的metrics采集
  - fromEndpoints:
    - matchLabels:
        app.kubernetes.io/name: prometheus
    toPorts:
    - ports:
      - port: "9090"
        protocol: TCP

  # 拒绝所有其他入站(默认拒绝)

  # 出站流量规则
  egress:
  # 允许访问数据库
  - toEndpoints:
    - matchLabels:
        app: postgresql
    toPorts:
    - ports:
      - port: "5432"
        protocol: TCP

  # 允许访问Redis
  - toEndpoints:
    - matchLabels:
        app: redis
    toPorts:
    - ports:
      - port: "6379"
        protocol: TCP

  # 允许DNS解析
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:app.kubernetes.io/name: coredns
    toPorts:
    - ports:
      - port: "53"
        protocol: UDP
      rules:
        dns:
        - matchPattern: "*.cluster.local"
        - matchPattern: "*.example.com"

2.2 FQDN-based策略(外部访问控制)

# 精确控制Pod对外访问的域名(iptables无法实现!)
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: external-access-policy
spec:
  endpointSelector:
    matchLabels:
      role: backend

  egress:
  # 只允许访问特定外部域名
  - toFQDNs:
    - matchName: "api.openai.com"
    - matchName: "api.deepseek.com"
    - matchPattern: "*.amazonaws.com"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP

  # 允许访问公司内部服务
  - toFQDNs:
    - matchPattern: "*.internal.example.com"

  # 拒绝所有其他外部访问

三、Hubble可观测性

3.1 Hubble CLI流量分析

# 安装Hubble CLI
HUBBLE_VERSION=$(curl -s "https://raw.githubusercontent.com/cilium/hubble/master/stable.txt")
curl -L --remote-name-all "https://github.com/cilium/hubble/releases/download/${HUBBLE_VERSION}/hubble-linux-amd64.tar.gz"
tar xzvf hubble-linux-amd64.tar.gz
sudo mv hubble /usr/local/bin

# 开启端口转发
cilium hubble port-forward &

# 实时观察流量
hubble observe --namespace production --follow

# 过滤特定服务的流量
hubble observe \
    --namespace production \
    --pod api-service \
    --type drop  # 只看被丢弃的流量(排查网络策略问题)

# 查看L7 HTTP流量详情
hubble observe \
    --namespace production \
    --protocol http \
    --http-status-code 500 \  # 只看500错误
    --follow

# 统计流量来源
hubble observe \
    --namespace production \
    --to-pod api-service \
    --output json | \
    jq '.source.namespace + "/" + .source.pod_name' | \
    sort | uniq -c | sort -rn

3.2 Hubble UI可视化

# 开启Hubble UI
cilium hubble ui

# 自动打开浏览器,展示服务拓扑图
# 可以看到:
# - 服务间实时流量
# - 被网络策略拒绝的流量(红色)
# - HTTP/gRPC协议详情

四、性能对比测试

在我们的测试环境(1000个Service,10000个Pod)实测结果:

测试工具:wrk2(固定QPS,测量延迟)
测试场景:Pod间HTTP通信,QPS=10000

iptables(kube-proxy):
  P50延迟:1.2ms
  P99延迟:4.8ms
  P999延迟:12.3ms
  CPU overhead(kube-proxy进程):2-4 cores

Cilium(eBPF):
  P50延迟:0.7ms   ↓ 42%
  P99延迟:2.1ms   ↓ 56%
  P999延迟:3.8ms  ↓ 69%
  CPU overhead:<0.5 cores  ↓ 87%

五、常见问题与调优

# 查看eBPF程序加载状态
cilium bpf lb list  # 查看负载均衡规则
cilium bpf policy get --all  # 查看所有网络策略

# 性能调优:启用本地重定向(减少网络跳数)
helm upgrade cilium cilium/cilium \
    --set localRedirectPolicy=true \
    --set enableEndpointRoutes=true \
    --set autoDirectNodeRoutes=true

# 监控eBPF资源使用
bpftool map show  # 查看BPF map使用情况
bpftool prog show  # 查看加载的eBPF程序数量

Cilium的eBPF架构代表了Kubernetes网络的未来方向。对于追求高性能、需要L7网络策略或希望获得深度可观测性的团队,迁移到Cilium是2026年最值得投入的运维技术升级之一。