运维
运维相关知识和内容
GitOps大规模实践:ArgoCD ApplicationSet + Flux管理500+应用的架构设计
为什么需要混合GitOps?
管理500+应用的经验:单一ArgoCD高并发同步时CPU跑满;单一Flux可视化弱。 最终方案:ArgoCD管核心业务(需UI审批)+ Flux管基础设施(自动化变更)。
一、ArgoCD性能调优
# argocd-cmd-params-cm ConfigMap
data:
controller.status.processors: "50" # 默认10,大集群调高
controller.operation.processors: "25"
controller.cache.expiration.duration: "24h"
controller.kubectl.parallelism.limit: "20" # 避免打满APIServer
二、ApplicationSet批量管理
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: microservices-prod
namespace: argocd
spec:
generators:
- git:
repoURL: https://github.com/myorg/platform-config
revision: HEAD
directories:
- path: "apps/production/*" # 自动扫描目录创建App
template:
metadata:
name: '{{path.basename}}'
spec:
project: production
source:
repoURL: https://github.com/myorg/platform-config
path: '{{path}}'
destination:
server: https://kubernetes.default.svc
namespace: '{{path.basename}}'
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- ApplyOutOfSyncOnly=true # 只同步有差异的资源
三、Flux管理基础设施
# Flux HelmRelease:自动升级小版本
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: prometheus-stack
spec:
interval: 30m
chart:
spec:
chart: kube-prometheus-stack
version: ">=58.0.0 <60.0.0" # 自动升级小版本
upgrade:
remediation:
retries: 3
remediateLastFailure: true # 失败时自动回滚
四、OIDC多租户权限
# argocd-rbac-cm
data:
policy.csv: |
p, role:dev-team, applications, get, dev/*, allow
p, role:dev-team, applications, sync, dev/*, allow
p, role:dev-team, applications, get, production/*, allow
p, role:sre-team, applications, *, */*, allow
g, myorg:dev-team, role:dev-team
g, myorg:sre-team, role:sre-team
policy.default: role:readonly
oidc.config: |
name: Okta
issuer: https://myorg.okta.com/oauth2/default
groupsClaim: groups
五、灾难恢复演练
# 场景1:ArgoCD控制面崩溃恢复(RTO < 10分钟)
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.12.0/manifests/install.yaml
kubectl apply -f argocd-rbac-cm-backup.yaml
# ApplicationSet自动重建所有Application,无需手动恢复
# 场景2:误删Namespace(GitOps的优势)
argocd app sync myapp-production --force
# ArgoCD从Git重建所有资源,Namespace自动重建
ArgoCD+Flux混合方案是目前管理500+应用的最优解,关键是职责划分避免两个工具管理同一资源。