veles: align app ports and traffic gate

This commit is contained in:
jenkins 2026-06-09 12:54:34 -03:00
parent 6833c3fe61
commit 083e9e1148
3 changed files with 16 additions and 5 deletions

View File

@ -1,6 +1,6 @@
# Veles Infrastructure Contract # Veles Infrastructure Contract
This stack is staged for Flux and intentionally starts the app deployments at `replicas: 0` until images and the app-side runtime contract are ready. This stack is staged for Flux and intentionally starts the app deployments at `replicas: 0` until images, native OIDC/session support, and smoke gates are ready.
## Cluster Contract ## Cluster Contract
@ -16,11 +16,13 @@ This stack is staged for Flux and intentionally starts the app deployments at `r
- `registry.bstein.dev/veles/veles-backend` - `registry.bstein.dev/veles/veles-backend`
- `registry.bstein.dev/veles/veles-frontend` - `registry.bstein.dev/veles/veles-frontend`
- `registry.bstein.dev/veles/veles-sim-worker` - `registry.bstein.dev/veles/veles-sim-worker`
- Backend/frontend deployments are placeholders and remain scaled to `0` until final image layout, container ports, and health endpoints are confirmed. Services route to a named `http` target port so the numeric container port can change without changing Ingress. - Backend `http` container port: `8796`
- Frontend `http` container port: `8080`
- Backend/frontend deployments remain scaled to `0` until native OIDC/session support, image tags, and smoke gates are ready. Services route to a named `http` target port so Ingress does not depend on numeric container ports.
## Auth Contract ## Auth Contract
Veles owns authorization in the app. The `veles` Ingress does not use oauth2-proxy or Traefik forward-auth, so no ingress/auth layer should strip OIDC token claims. The app should validate tokens from `https://sso.bstein.dev/realms/veles` and expect stable `sub`, `email`, `preferred_username`, `groups`, and `realm_access.roles` claims. Veles owns authorization in the app. The `veles` Ingress does not use oauth2-proxy or Traefik forward-auth, so no ingress/auth layer should strip OIDC token claims. The app should validate tokens from `https://sso.bstein.dev/realms/veles` and expect stable `sub`, `email`, `preferred_username`, `groups`, and `realm_access.roles` claims. Do not scale Veles for real user traffic until native OIDC login/session flow is implemented and smoke-tested.
The Keycloak realm setup creates both groups and realm roles named `alpha` and `admin`. Members of the `alpha` group receive the `alpha` realm role; members of `admin` receive both `alpha` and `admin`. Built-in/meta strategies can stay universal, while runs and user-created strategies should remain user-scoped in the Veles database. The Keycloak realm setup creates both groups and realm roles named `alpha` and `admin`. Members of the `alpha` group receive the `alpha` realm role; members of `admin` receive both `alpha` and `admin`. Built-in/meta strategies can stay universal, while runs and user-created strategies should remain user-scoped in the Veles database.
@ -46,6 +48,8 @@ Backend runtime secrets are synced from Vault by `veles-vault` into the generate
`veles-artifacts` is an RWO Longhorn PVC mounted into backend pods at `/data/veles-artifacts`. Backend pods own artifact writes and serving. Simulation Jobs should not mount or write directly to this PVC unless they are explicitly scheduled on Oceanus with the Veles toleration and the app has chosen a same-node direct-write model. Queue-mediated upload/copy through the backend remains the safer default until the app contract settles. `veles-artifacts` is an RWO Longhorn PVC mounted into backend pods at `/data/veles-artifacts`. Backend pods own artifact writes and serving. Simulation Jobs should not mount or write directly to this PVC unless they are explicitly scheduled on Oceanus with the Veles toleration and the app has chosen a same-node direct-write model. Queue-mediated upload/copy through the backend remains the safer default until the app contract settles.
Backend, simulation workers, and retention/cleanup workers must run on Oceanus/titan-23 when they need artifact access. Frontend pods must not mount `veles-artifacts`.
## Simulation Jobs ## Simulation Jobs
The backend service account can create, watch, and delete Jobs only inside the `veles` namespace. Simulation pods should use service account `veles-sim`, set `automountServiceAccountToken: false`, and use: The backend service account can create, watch, and delete Jobs only inside the `veles` namespace. Simulation pods should use service account `veles-sim`, set `automountServiceAccountToken: false`, and use:
@ -61,6 +65,8 @@ tolerations:
effect: NoSchedule effect: NoSchedule
``` ```
Retention/cleanup Jobs that touch artifacts should use the same node selector and toleration. If they do not need Kubernetes API access, use `veles-sim`; otherwise keep control-plane actions in the backend/controller and run artifact cleanup through a no-token worker.
## Staged Operator Steps ## Staged Operator Steps
1. Join `titan-23`/Oceanus to Atlas as a worker. 1. Join `titan-23`/Oceanus to Atlas as a worker.
@ -70,7 +76,7 @@ tolerations:
5. Let Vault policy reconciliation run, then unsuspend `veles-secrets-ensure-2`. 5. Let Vault policy reconciliation run, then unsuspend `veles-secrets-ensure-2`.
6. Unsuspend `veles-realm-ensure-4` in `services/keycloak` to create the realm/client secret, groups, and roles. 6. Unsuspend `veles-realm-ensure-4` in `services/keycloak` to create the realm/client secret, groups, and roles.
7. Create the Harbor `veles` project or robot access before image automation is enabled in production. 7. Create the Harbor `veles` project or robot access before image automation is enabled in production.
8. Scale `veles-postgres`, then backend/frontend once app images exist. 8. Keep backend/frontend scaled to `0` until native OIDC/session support is implemented, image tags exist, and smoke gates pass.
## Assumptions ## Assumptions

View File

@ -36,7 +36,7 @@ spec:
imagePullPolicy: IfNotPresent imagePullPolicy: IfNotPresent
ports: ports:
- name: http - name: http
containerPort: 8080 containerPort: 8796
protocol: TCP protocol: TCP
envFrom: envFrom:
- configMapRef: - configMapRef:

View File

@ -7,6 +7,8 @@ metadata:
data: data:
VELES_ENV: alpha VELES_ENV: alpha
VELES_PUBLIC_BASE_URL: https://veles.bstein.dev VELES_PUBLIC_BASE_URL: https://veles.bstein.dev
VELES_BACKEND_HTTP_PORT: "8796"
VELES_FRONTEND_HTTP_PORT: "8080"
VELES_OIDC_ISSUER: https://sso.bstein.dev/realms/veles VELES_OIDC_ISSUER: https://sso.bstein.dev/realms/veles
VELES_OIDC_CLIENT_ID: veles-web VELES_OIDC_CLIENT_ID: veles-web
VELES_OIDC_REQUIRED_GROUPS: alpha,admin VELES_OIDC_REQUIRED_GROUPS: alpha,admin
@ -23,4 +25,7 @@ data:
VELES_SIM_NODE_SELECTOR: veles.bstein.dev/simulation=true VELES_SIM_NODE_SELECTOR: veles.bstein.dev/simulation=true
VELES_SIM_TOLERATION_KEY: veles.bstein.dev/simulation VELES_SIM_TOLERATION_KEY: veles.bstein.dev/simulation
VELES_SIM_TOLERATION_VALUE: "true" VELES_SIM_TOLERATION_VALUE: "true"
VELES_RETENTION_NODE_SELECTOR: veles.bstein.dev/simulation=true
VELES_RETENTION_TOLERATION_KEY: veles.bstein.dev/simulation
VELES_RETENTION_TOLERATION_VALUE: "true"
VELES_LOG_RETENTION_DAYS: "30" VELES_LOG_RETENTION_DAYS: "30"