1815 Commits

Author SHA1 Message Date
628e204fc5 jenkins: fix pipeline portability and jellyfin option compatibility 2026-04-10 05:41:00 -03:00
bebe91d39b jenkins: add scm polling trigger for jellyfin oidc pipeline 2026-04-10 05:31:41 -03:00
d65b8e7a32 logging: fix groovy-safe awk matcher in data-prepper metrics 2026-04-10 05:23:35 -03:00
62aae6ffb2 jenkins: wire full quality-gate metrics across platform jobs 2026-04-10 05:19:25 -03:00
32b6e55467 monitoring: use CI-only series for platform test success panels 2026-04-10 04:52:57 -03:00
99eda351df monitoring/jenkins: add pegasus CI job and separate health probe suite 2026-04-10 03:26:51 -03:00
5f4641553c monitoring: replace failure table with 24h suite pass snapshot 2026-04-09 20:16:44 -03:00
530f440679 monitoring: add suite probe metrics and align fan labels 2026-04-09 20:10:52 -03:00
5e3aadc640 monitoring: set overview platform test panel to 7d 2026-04-09 20:05:10 -03:00
12b85f4597 monitoring: add platform quality push gateway for test metrics 2026-04-09 19:30:16 -03:00
ad1cbd6f85 monitoring: make test panel point-based and failure-by-suite 2026-04-09 19:27:48 -03:00
5cf9a16d97 monitoring: align overview panels with jobs and point-based suite rates 2026-04-09 16:35:14 -03:00
f8c1243dfd monitoring: add generic suite metric slots for platform tests 2026-04-09 16:16:35 -03:00
7b0e9acbb1 monitoring: make suite pass rate 30d rolling for sparse tests 2026-04-09 16:14:26 -03:00
0273727cb4 monitoring: make platform test success one line per suite 2026-04-09 15:21:59 -03:00
09fa3e716c monitoring/atlas: merge top rows and fix platform test pass-rate panel 2026-04-09 14:56:43 -03:00
293cd83999 monitoring/atlas: resize test/ops rows and source overview tests from atlas-jobs 2026-04-09 13:39:55 -03:00
764bfe189e monitoring/recovery: harden ananke checks and OIDC-gated service validation 2026-04-09 01:44:26 -03:00
e0b124ca4e monitoring: switch power telemetry to ananke metrics 2026-04-08 23:33:17 -03:00
cfdd5a377d atlasbot: keep retrying MAS login during transient Synapse outages 2026-04-07 13:09:36 -03:00
9a07aa9be9 keycloak: make metis ssh db key optional during migration 2026-04-07 04:40:56 -03:00
a4631dee81 maintenance: migrate metis ssh key names to ananke 2026-04-07 04:36:42 -03:00
525a0f9e71 harbor/bootstrap: pin via dynamic host label managed by recovery script 2026-04-06 21:32:43 -03:00
d168f02c7f harbor/recovery: remove fixed titan-05 pin and auto-select ready arm64 node 2026-04-06 21:27:23 -03:00
5e387e8e4d maintenance/metis: remove legacy hecate ssh key vars 2026-04-06 19:43:16 -03:00
1ccb04a18a maintenance/metis: default missing ananke ssh keys to empty 2026-04-06 19:36:01 -03:00
25ea022c2e maintenance/metis: migrate ssh key vars to ananke 2026-04-06 19:28:44 -03:00
fc4093a910 logging: raise opensearch heap headroom 2026-04-06 02:04:07 -03:00
816d0cca65 traefik: isolate custom rbac from k3s cleanup 2026-04-06 01:57:34 -03:00
801dde8242 maintenance: harden k3s traefik disable cleanup 2026-04-06 01:47:32 -03:00
aa447e6996 harbor: restore internal arm64 image refs for recovery bootstrap 2026-04-06 00:50:29 -03:00
99bd68f61b recovery: unblock harbor cold start and add power console 2026-04-06 00:22:54 -03:00
2a9485d9e0 maintenance: disable ariadne vault auth/oidc policy sync cron 2026-04-05 17:40:40 -03:00
2799b54b08 maintenance: pin metis to available image tag 2026-04-05 17:05:31 -03:00
3ce7b2eeb7 maintenance/monitoring: wire reciprocal metis hecate key + dampen alert flapping 2026-04-05 13:51:57 -03:00
8d1be9672c maintenance/metis: bump runner tags to 0.1.0-23 2026-04-05 11:41:02 -03:00
deb52c424b maintenance/vault: move Metis runtime secrets to Vault 2026-04-05 11:31:05 -03:00
0828f0cf9e maintenance: inject metis SSH keys directly from Vault 2026-04-05 10:31:20 -03:00
e84399d0b1 maintenance: source metis SSH keys from Vault 2026-04-05 10:25:29 -03:00
1c9716d855 maintenance: pass bastion key into metis env 2026-04-05 10:18:13 -03:00
0fc5ac3041 maintenance/metis: read optional ssh pubkeys from secret env 2026-04-05 10:07:09 -03:00
168e390f20 nextcloud: pin workload to worker rpi5 nodes 2026-04-04 16:07:17 -03:00
96bc93670b monitoring(power): rename hecate UPS peers to Pyrphoros and Statera 2026-04-04 05:54:16 -03:00
82e1b87b8f monitoring(overview): refine ups-climate row and climate/fan stat display 2026-04-04 04:40:22 -03:00
1b682cc60f monitoring(grafana): restart to pick up latest overview layout 2026-04-04 04:35:26 -03:00
5059d2918d monitoring(overview): swap jobs and power rows; tighten climate/fan display 2026-04-04 04:34:18 -03:00
d5fc6c89c4 monitoring(grafana): bump restart revision for overview dashboard reload 2026-04-04 01:34:36 -03:00
55b96c0675 monitoring(overview): place six power/climate panels on one row and fix test/job data 2026-04-04 01:33:15 -03:00
cdc3c081f5 monitoring(overview): replace power/climate summary row with six-panel layout 2026-04-03 22:16:02 -03:00
7ef4c895ba monitoring(grafana): bump restart revision to reload provisioned dashboards 2026-04-03 20:54:12 -03:00