2.8 KiB
2.8 KiB
Veles Infrastructure Contract
This stack is staged for Flux and intentionally starts the app deployments at replicas: 0 until images and the app-side runtime contract are ready.
Cluster Contract
- Namespace:
veles - Hostname:
https://veles.bstein.dev - Namespace:
veles; no alternate alpha namespace is used. - Backend service:
veles-backend.veles.svc.cluster.local:80 - Frontend service:
veles-frontend.veles.svc.cluster.local:80 - Postgres service:
veles-postgres.veles.svc.cluster.local:5432 - Artifact PVC:
veles-artifacts, mounted at/data/veles-artifacts - Storage classes:
veles-oceanus-db,veles-oceanus-artifacts - Images:
registry.bstein.dev/veles/veles-backendregistry.bstein.dev/veles/veles-frontendregistry.bstein.dev/veles/veles-sim-worker
Runtime Env
Veles should consume:
VELES_PUBLIC_BASE_URL=https://veles.bstein.devVELES_OIDC_ISSUER=https://sso.bstein.dev/realms/velesVELES_OIDC_CLIENT_ID=veles-webVELES_OIDC_REQUIRED_GROUPS=alpha,adminDATABASE_URLfromkv/data/atlas/veles/veles-dbVELES_SESSION_SECRETfromkv/data/atlas/veles/app-secretsVELES_BYOK_ENCRYPTION_KEYfromkv/data/atlas/veles/app-secrets
User OpenAI API keys must stay in the Veles database encrypted with VELES_BYOK_ENCRYPTION_KEY; do not store per-user BYOK secrets in Vault.
Simulation Jobs
The backend service account can create, watch, and delete Jobs only inside the veles namespace. Simulation pods should use service account veles-sim, set automountServiceAccountToken: false, and use:
priorityClassName: veles-sim
nodeSelector:
veles.bstein.dev/simulation: "true"
tolerations:
- key: veles.bstein.dev/simulation
operator: Equal
value: "true"
effect: NoSchedule
Staged Operator Steps
- Join
titan-23/Oceanus to Atlas as a worker. - Use Metis with
titan-23inMETIS_FLASH_HOSTS; the existing node secret placeholder uses192.168.22.23. - Confirm the node normalizer applies the Veles labels and taint.
- Add Oceanus Longhorn disks at paths tagged by the Longhorn tag ensure job.
- Let Vault policy reconciliation run, then unsuspend
veles-secrets-ensure-2. - Unsuspend
veles-realm-ensure-2inservices/keycloakto create the realm/client secret. - Create the Harbor
velesproject or robot access before image automation is enabled in production. - Scale
veles-postgres, then backend/frontend once app images exist.
Assumptions
veles-oceanus-artifactsis RWO for alpha; simulation workers should either run on Oceanus with the backend or stream logs to the backend, which owns writes.- Postgres uses Longhorn backup recurring jobs off Oceanus. This is not a substitute for a tested restore drill.
- The Jenkins job skeleton points at the Veles repo but stays disabled until that repo provides a Jenkinsfile.