258 Commits

Author SHA1 Message Date
12ce0f8e2a monitoring(overview): swap jobs and power rows; tighten climate/fan display 2026-04-04 04:34:18 -03:00
3c21b470ed monitoring(grafana): bump restart revision for overview dashboard reload 2026-04-04 01:34:36 -03:00
ab903e5619 monitoring(overview): place six power/climate panels on one row and fix test/job data 2026-04-04 01:33:15 -03:00
c5acc3dc13 monitoring(overview): replace power/climate summary row with six-panel layout 2026-04-03 22:16:02 -03:00
07a4515a6f monitoring(grafana): bump restart revision to reload provisioned dashboards 2026-04-03 20:54:12 -03:00
758654c9df monitoring(power): implement six-panel UPS and climate layout 2026-04-03 20:45:40 -03:00
e199d20a3e monitoring(power): add UPS status snapshot table and climate placeholders 2026-04-03 17:53:42 -03:00
0eb4cc6550 monitoring(power): wire generated power dashboard and split per-UPS panels 2026-04-03 17:49:09 -03:00
c406cba89d monitoring: scope hecate power queries to hecate-power job 2026-04-03 15:23:27 -03:00
40dce5ee49 monitoring: add power dashboard and reorder atlas overview rows 2026-04-03 14:55:16 -03:00
dd5b0187ed platform: expose metis on sentinel and move gitea to rpi5 2026-03-31 16:44:41 -03:00
d4a61bf63f maintenance: harden metis recovery and fix harbor rollout 2026-03-31 14:55:48 -03:00
d1ac3e0816 monitoring: combine Ariadne and Metis tests 2026-03-31 14:54:54 -03:00
be92017f4d maintenance: harden sd-write controls and recovery workflow 2026-03-31 00:06:44 -03:00
a7c3fcae3f monitoring: roll grafana to apply latest alert rules 2026-03-30 18:41:26 -03:00
2d7b51d3b3 monitoring: raise rootfs warning threshold to 85 percent 2026-03-30 18:41:05 -03:00
57375a81ad monitoring: fix noisy grafana email alerts and reload rules 2026-03-30 18:33:02 -03:00
244578cc01 chore: organize one-off jobs 2026-01-28 01:48:32 -03:00
b34f2abefd monitoring: fix grafana alert exec state 2026-01-27 23:34:11 -03:00
9409c037c9 monitoring: restart grafana for alerting reload 2026-01-27 23:29:46 -03:00
c5a7eece35 monitoring: tune cpu and maintenance alerts 2026-01-27 23:23:42 -03:00
ca7a08e791 monitoring: fix grafana smtp from address 2026-01-27 22:28:37 -03:00
029e4d4ca6 monitoring: send grafana alerts via postmark 2026-01-27 22:00:19 -03:00
38c8d08ab4 monitoring: fix gpu idle label 2026-01-27 21:46:58 -03:00
ba16f5119b monitoring: unify gpu namespace usage 2026-01-27 21:43:37 -03:00
51bf01a8fd monitoring: keep idle label in gpu share 2026-01-27 18:44:58 -03:00
1b04e6cb00 monitoring: fix gpu idle share 2026-01-27 17:51:13 -03:00
5f32dff73b monitoring: fix tegrastats regexes 2026-01-27 16:44:00 -03:00
dfb295e5f0 monitoring: expose jetson scrape line length 2026-01-27 16:38:09 -03:00
a7f3d49fea monitoring: read tegrastats per scrape 2026-01-27 16:34:31 -03:00
246ed6617e monitoring: read jetson stats on demand 2026-01-27 16:27:45 -03:00
1951291090 monitoring: refresh jetson stats on scrape 2026-01-27 16:23:23 -03:00
62a423f32c monitoring: fix jetson gpu metrics 2026-01-27 16:19:54 -03:00
9ea338b121 monitoring: restart jetson exporter 2026-01-26 22:51:41 -03:00
0331e7ea99 monitoring: fix jetson metrics newlines 2026-01-26 22:50:33 -03:00
1616994b19 monitoring: unify jetson gpu metrics 2026-01-26 22:26:24 -03:00
72bd22e912 monitoring: map dcgm to shared gpu resources 2026-01-26 20:58:06 -03:00
b0abb9bd6e ariadne: reduce comms noise, fix gpu labels 2026-01-26 20:54:33 -03:00
a988af3262 monitoring: alert on VM outage 2026-01-23 11:51:28 -03:00
ce5b1d1353 monitoring: add postgres metrics and update overview 2026-01-22 18:23:26 -03:00
d509dfaa22 ops: restore portal/ariadne and add postgres panels 2026-01-22 15:23:23 -03:00
4721d44a33 monitoring: enforce sorted job lists 2026-01-21 15:12:53 -03:00
db4c3b7c51 monitoring: tighten jobs/overview ordering 2026-01-21 15:01:02 -03:00
b0996e9a4f monitoring: refine jobs/overview panels 2026-01-21 14:31:11 -03:00
8b35ab0292 monitoring: refresh jobs dashboards 2026-01-21 13:37:36 -03:00
2e407e1962 monitoring: reschedule grafana user dedupe 2026-01-21 12:31:54 -03:00
5ae6b4b00c monitoring: harden grafana user dedupe 2026-01-21 12:30:08 -03:00
ae1fd5b661 monitoring: fix grafana user dedupe job 2026-01-21 12:25:53 -03:00
4e65f02fba monitoring: prepopulate vault for dedupe job 2026-01-21 12:18:57 -03:00
88de0f7cee monitoring: wire vault sa for dedupe job 2026-01-21 12:16:26 -03:00