运维

运维相关知识和内容

GitOps大规模实践:ArgoCD ApplicationSet + Flux管理500+应用的架构设计

为什么需要混合GitOps?

管理500+应用的经验:单一ArgoCD高并发同步时CPU跑满;单一Flux可视化弱。 最终方案:ArgoCD管核心业务(需UI审批)+ Flux管基础设施(自动化变更)。


一、ArgoCD性能调优

# argocd-cmd-params-cm ConfigMap
data:
  controller.status.processors: "50"     # 默认10,大集群调高
  controller.operation.processors: "25"
  controller.cache.expiration.duration: "24h"
  controller.kubectl.parallelism.limit: "20"  # 避免打满APIServer

二、ApplicationSet批量管理

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: microservices-prod
  namespace: argocd
spec:
  generators:
  - git:
      repoURL: https://github.com/myorg/platform-config
      revision: HEAD
      directories:
      - path: "apps/production/*"  # 自动扫描目录创建App
  template:
    metadata:
      name: '{{path.basename}}'
    spec:
      project: production
      source:
        repoURL: https://github.com/myorg/platform-config
        path: '{{path}}'
      destination:
        server: https://kubernetes.default.svc
        namespace: '{{path.basename}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - ApplyOutOfSyncOnly=true  # 只同步有差异的资源

三、Flux管理基础设施

# Flux HelmRelease:自动升级小版本
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: prometheus-stack
spec:
  interval: 30m
  chart:
    spec:
      chart: kube-prometheus-stack
      version: ">=58.0.0 <60.0.0"  # 自动升级小版本
  upgrade:
    remediation:
      retries: 3
      remediateLastFailure: true  # 失败时自动回滚

四、OIDC多租户权限

# argocd-rbac-cm
data:
  policy.csv: |
    p, role:dev-team, applications, get, dev/*, allow
    p, role:dev-team, applications, sync, dev/*, allow
    p, role:dev-team, applications, get, production/*, allow
    p, role:sre-team, applications, *, */*, allow
    g, myorg:dev-team, role:dev-team
    g, myorg:sre-team, role:sre-team
  policy.default: role:readonly
  oidc.config: |
    name: Okta
    issuer: https://myorg.okta.com/oauth2/default
    groupsClaim: groups

五、灾难恢复演练

# 场景1:ArgoCD控制面崩溃恢复(RTO < 10分钟)
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.12.0/manifests/install.yaml
kubectl apply -f argocd-rbac-cm-backup.yaml
# ApplicationSet自动重建所有Application,无需手动恢复

# 场景2:误删Namespace(GitOps的优势)
argocd app sync myapp-production --force
# ArgoCD从Git重建所有资源,Namespace自动重建

ArgoCD+Flux混合方案是目前管理500+应用的最优解,关键是职责划分避免两个工具管理同一资源。