Compare commits

..

419 Commits
deploy ... main

Author SHA1 Message Date
cc51eb6d1e Merge pull request 'feature/ariadne' (#11) from feature/ariadne into main
Reviewed-on: #11
2026-01-28 14:05:38 +00:00
aa608fbf0f atlasbot: improve fact parsing and fallback answers 2026-01-28 11:02:10 -03:00
436e56c5de atlasbot: favor factual fallback in fast mode 2026-01-28 04:10:31 -03:00
dda943ce16 atlasbot: expand full-pack triggers and strip inline confidence 2026-01-28 04:06:24 -03:00
043d1cbab3 atlasbot: clean fact labels and non-cluster confidence 2026-01-28 04:00:13 -03:00
da94cc6f97 atlasbot: improve fast fallback and usage filtering 2026-01-28 03:56:26 -03:00
7c0a25a0eb atlasbot: expand fast context for quantitative prompts 2026-01-28 03:51:37 -03:00
7194cad0a8 atlasbot: refine fast fact selection and prompts 2026-01-28 03:46:06 -03:00
eb567fda06 atlasbot: fix fallback fact parsing 2026-01-28 03:35:02 -03:00
a9d74a066f atlasbot: prefer fact fallback for quantitative prompts 2026-01-28 03:32:17 -03:00
19b52ac5e3 atlasbot: add fact-pack fallback for fast 2026-01-28 03:29:21 -03:00
885e7b6489 comms: use 14b model for atlasbot quick 2026-01-28 03:23:54 -03:00
8316e5dd15 atlasbot: fix tag detection for workload queries 2026-01-28 03:20:28 -03:00
be82109d4e atlasbot: enforce fast answer body 2026-01-28 03:17:46 -03:00
971848558a atlasbot: prioritize fact selection for quick answers 2026-01-28 03:14:12 -03:00
980c2cf1cc atlasbot: enrich fact pack summaries 2026-01-28 03:09:34 -03:00
08ac598181 atlasbot: streamline quick answers 2026-01-28 02:53:43 -03:00
349a46ceab comms: tune atlasbot quick model 2026-01-28 02:43:24 -03:00
666dcb3faa atlasbot: rework reasoning pipeline 2026-01-28 02:21:42 -03:00
769d3f41bf comms: roll atlasbot config 2026-01-28 01:58:23 -03:00
62e0a565f5 atlasbot: tighten fast facts 2026-01-28 01:58:07 -03:00
2a2179a138 comms: roll atlasbot config 2026-01-28 01:52:40 -03:00
c1e94d56c8 atlasbot: simplify fast path 2026-01-28 01:52:23 -03:00
244578cc01 chore: organize one-off jobs 2026-01-28 01:48:32 -03:00
0146e3dc95 maintenance: suspend ariadne migrate job 2026-01-28 01:35:34 -03:00
48c379dc88 comms: roll atlasbot config 2026-01-28 01:07:26 -03:00
6001876409 atlasbot: add per-hardware extremes 2026-01-28 01:07:13 -03:00
2fe3d5b932 atlasbot: roll config 2026-01-28 01:02:32 -03:00
474c472b1d atlasbot: enrich fact pack and selection 2026-01-28 01:02:14 -03:00
6578a8b08a atlasbot: roll config 2026-01-28 00:24:13 -03:00
44c22e3d00 atlasbot: improve multi-pass synthesis 2026-01-28 00:22:32 -03:00
2af817b9db atlasbot: speed up fast mode 2026-01-27 23:57:36 -03:00
2d90005076 atlasbot: fix insight scoring 2026-01-27 23:49:28 -03:00
a10050e4c7 atlasbot: overhaul reasoning pipeline 2026-01-27 23:45:08 -03:00
b34f2abefd monitoring: fix grafana alert exec state 2026-01-27 23:34:11 -03:00
9409c037c9 monitoring: restart grafana for alerting reload 2026-01-27 23:29:46 -03:00
3a2bb1bac9 chore: bump atlasbot checksum 2026-01-27 23:24:46 -03:00
f43acaa554 atlasbot: fix bottom ops and pod queries 2026-01-27 23:24:12 -03:00
c5a7eece35 monitoring: tune cpu and maintenance alerts 2026-01-27 23:23:42 -03:00
19d10ce585 chore: bump atlasbot checksum 2026-01-27 23:17:23 -03:00
7b1c891e70 atlasbot: improve metric detection and counts 2026-01-27 23:16:53 -03:00
67ca0d451d chore: bump atlasbot checksum 2026-01-27 23:02:22 -03:00
4b468b0f97 atlasbot: fix word boundary detection 2026-01-27 23:01:51 -03:00
380aae3b2c chore: bump atlasbot checksum 2026-01-27 22:55:24 -03:00
b9b25565a2 atlasbot: tighten scoring and readiness logic 2026-01-27 22:55:00 -03:00
24b0ac78c4 chore: bump atlasbot config checksum 2026-01-27 22:45:17 -03:00
23533e08ee atlasbot: refine cluster intent handling 2026-01-27 22:44:49 -03:00
fc10eed704 atlasbot: fix score formatting 2026-01-27 22:32:25 -03:00
ca7a08e791 monitoring: fix grafana smtp from address 2026-01-27 22:28:37 -03:00
868075426c atlasbot: overhaul open-ended reasoning 2026-01-27 22:22:50 -03:00
029e4d4ca6 monitoring: send grafana alerts via postmark 2026-01-27 22:00:19 -03:00
e97aaafed9 atlasbot: refine open-ended reasoning 2026-01-27 21:52:07 -03:00
38c8d08ab4 monitoring: fix gpu idle label 2026-01-27 21:46:58 -03:00
ba16f5119b monitoring: unify gpu namespace usage 2026-01-27 21:43:37 -03:00
2fe763189d atlasbot: roll pod after metric parsing update 2026-01-27 21:27:52 -03:00
832d5acf68 atlasbot: improve metric parsing and cluster intent 2026-01-27 21:27:19 -03:00
27e8a77044 atlasbot: add model fallback and rollout 2026-01-27 21:16:47 -03:00
65e50d1923 atlasbot: bump rollout checksum 2026-01-27 21:11:58 -03:00
e486245aaf atlasbot: guard open-ended LLM calls 2026-01-27 21:09:48 -03:00
34c91c6d08 atlasbot: refine open-ended reasoning pipeline 2026-01-27 21:02:20 -03:00
9e06d7afc8 atlasbot: route subjective queries to LLM 2026-01-27 20:02:09 -03:00
18e543d95a atlasbot: refine insight tone and status 2026-01-27 19:42:04 -03:00
20364a262c atlasbot: strengthen subjective insights 2026-01-27 19:37:20 -03:00
8842662239 atlasbot: refine node and postgres query handling 2026-01-27 19:13:31 -03:00
12fa7d02aa atlasbot: expand hardware and entity detection 2026-01-27 19:10:02 -03:00
9bf822ec36 atlasbot: answer hardware mix queries 2026-01-27 19:06:44 -03:00
ea8eda2c73 atlasbot: treat hardware prompts as cluster queries 2026-01-27 19:04:29 -03:00
243d3112ce atlasbot: prefer hardware for general interest 2026-01-27 19:01:16 -03:00
4bab34eae1 atlasbot: keep coolest answers opinionated 2026-01-27 18:58:59 -03:00
8bd4d9fc7a atlasbot: prioritize hardware for subjective prompts 2026-01-27 18:56:14 -03:00
69d121aa07 atlasbot: use hottest node labels for insights 2026-01-27 18:54:05 -03:00
79650616f1 atlasbot: make insights sound more human 2026-01-27 18:51:00 -03:00
c4ad82f122 atlasbot: add more opinionated hardware insight 2026-01-27 18:48:35 -03:00
4e51cf6b6c atlasbot: tighten insight phrasing 2026-01-27 18:45:49 -03:00
51bf01a8fd monitoring: keep idle label in gpu share 2026-01-27 18:44:58 -03:00
4e6d4f43b2 atlasbot: improve insight voice and avoid repeats 2026-01-27 18:43:03 -03:00
58dab1ca79 comms: roll atlasbot after history update 2026-01-27 18:32:54 -03:00
113bcdeded atlasbot: use history for subjective follow-ups 2026-01-27 18:32:27 -03:00
e05a949b9f comms: roll atlasbot for insight updates 2026-01-27 18:18:06 -03:00
0a10a2d861 atlasbot: add narrative insights 2026-01-27 18:17:29 -03:00
flux-bot
6fead623fa chore(bstein-dev-home): automated image update 2026-01-27 21:11:27 +00:00
flux-bot
ad01659cc4 chore(bstein-dev-home): automated image update 2026-01-27 21:11:24 +00:00
b04092b63c comms: roll atlasbot after bot updates 2026-01-27 18:10:30 -03:00
e87fa4369c atlasbot: make cluster answers more narrative 2026-01-27 18:08:19 -03:00
1b04e6cb00 monitoring: fix gpu idle share 2026-01-27 17:51:13 -03:00
5f32dff73b monitoring: fix tegrastats regexes 2026-01-27 16:44:00 -03:00
dfb295e5f0 monitoring: expose jetson scrape line length 2026-01-27 16:38:09 -03:00
a7f3d49fea monitoring: read tegrastats per scrape 2026-01-27 16:34:31 -03:00
246ed6617e monitoring: read jetson stats on demand 2026-01-27 16:27:45 -03:00
1951291090 monitoring: refresh jetson stats on scrape 2026-01-27 16:23:23 -03:00
62a423f32c monitoring: fix jetson gpu metrics 2026-01-27 16:19:54 -03:00
flux-bot
dedf566993 chore(maintenance): automated image update 2026-01-27 18:57:30 +00:00
354275f3ad atlasbot: avoid namespace-only workload matches 2026-01-27 15:45:18 -03:00
3f159c6c83 atlasbot: improve workload matching and fallbacks 2026-01-27 15:42:31 -03:00
631bd09778 atlasbot: return structured cluster summaries 2026-01-27 15:36:08 -03:00
b7792d30f1 atlasbot: answer cluster queries without llm 2026-01-27 15:30:43 -03:00
241a8889ee atlasbot: send snapshot as explicit context 2026-01-27 15:12:47 -03:00
864f1cab20 atlasbot: fix prompt formatting 2026-01-27 15:10:03 -03:00
dea70df209 atlasbot: strengthen cluster disambiguation 2026-01-27 15:07:28 -03:00
f649a6a9a2 atlasbot: force cluster intent in prompts 2026-01-27 15:04:10 -03:00
ca3cfaf1fc atlasbot: tighten cluster intent and snapshot framing 2026-01-27 15:00:55 -03:00
flux-bot
1682ccfb25 chore(bstein-dev-home): automated image update 2026-01-27 17:58:11 +00:00
flux-bot
18a4c58338 chore(bstein-dev-home): automated image update 2026-01-27 17:58:07 +00:00
92f4137e9c atlasbot: simplify cluster gating and context 2026-01-27 14:54:09 -03:00
cb7141dfb6 comms: roll atlasbot for mention stripping 2026-01-27 14:38:35 -03:00
cd45b7faba atlasbot: ignore mentions and gate cluster context 2026-01-27 14:38:35 -03:00
flux-bot
d03c846779 chore(bstein-dev-home): automated image update 2026-01-27 17:12:07 +00:00
flux-bot
a00bab5ee7 chore(bstein-dev-home): automated image update 2026-01-27 17:12:03 +00:00
975783a6b9 portal: allow longer atlasbot responses 2026-01-27 14:09:23 -03:00
c3b2c0cebb comms: roll atlasbot after answer tweaks 2026-01-27 13:18:01 -03:00
d2ade61d88 atlasbot: refine ready/pod counts 2026-01-27 13:17:33 -03:00
d74277a8bd comms: roll atlasbot after script update 2026-01-27 13:15:13 -03:00
31fbe48ca3 atlasbot: fix metric detection and role counts 2026-01-27 13:13:20 -03:00
70feb1ef85 atlasbot: refine role and hardware filters 2026-01-27 13:02:23 -03:00
159c9cfe68 atlasbot: use structured answers before LLM 2026-01-27 12:59:11 -03:00
b7f454b790 atlasbot: enrich snapshot facts and pod metrics 2026-01-27 12:53:17 -03:00
41b131c347 atlasbot: preserve response text with confidence 2026-01-27 12:47:28 -03:00
3b1e74d278 atlasbot: call ollama chat directly 2026-01-27 12:33:56 -03:00
d8ae9c5901 comms: restore atlasbot gateway URL 2026-01-27 12:23:05 -03:00
32851ca057 comms: point atlasbot to ollama and raise gateway memory 2026-01-27 12:20:50 -03:00
32125d7bab comms: bump atlasbot configmap checksum 2026-01-27 11:05:30 -03:00
a442ea6d5d atlasbot: strengthen facts context and replies 2026-01-27 11:03:55 -03:00
c0dd00c93d atlasbot: shrink facts context to avoid truncation 2026-01-27 06:45:18 -03:00
446115f07a atlasbot: enrich facts summary for LLM 2026-01-27 06:34:37 -03:00
a2f4c51e1d atlasbot: shift to facts context and upgrade model 2026-01-27 06:28:26 -03:00
flux-bot
4fcecc4707 chore(maintenance): automated image update 2026-01-27 09:00:40 +00:00
flux-bot
1459027abc chore(maintenance): automated image update 2026-01-27 08:50:29 +00:00
89935a579a atlasbot: use cluster snapshot + model update 2026-01-27 05:42:28 -03:00
flux-bot
b1aad04f3e chore(maintenance): automated image update 2026-01-27 08:14:36 +00:00
2dc208e919 comms: retain synapse admin ensure logs 2026-01-27 05:02:02 -03:00
292d513e10 comms: ensure synapse admin token 2026-01-27 04:58:13 -03:00
11ba37a4b2 comms: restart atlasbot for scoped hottest 2026-01-27 04:53:44 -03:00
d6b9d64e70 atlasbot: scope overall hottest node to atlas inventory 2026-01-27 04:53:33 -03:00
67b9babc0e comms: restart atlasbot for knowledge summaries 2026-01-27 04:51:33 -03:00
c219019ad5 atlasbot: add knowledge summaries and better fallback 2026-01-27 04:51:20 -03:00
0ef14c67fd comms: add synapse admin ensure job 2026-01-27 04:48:44 -03:00
39fd7adb55 comms: restart atlasbot for metrics formatting 2026-01-27 03:56:47 -03:00
600c124ef2 atlasbot: clarify scoped metrics and format percent values 2026-01-27 03:56:17 -03:00
flux-bot
5e4a974733 chore(maintenance): automated image update 2026-01-27 06:51:28 +00:00
f7fc152439 comms: rerun synapse seeder admin ensure 2026-01-27 01:22:02 -03:00
bab914c58f comms: rerun mas local user ensure 2026-01-27 01:19:43 -03:00
e24ff4782c comms: rerun ensure jobs and fix vault oidc env 2026-01-27 01:14:42 -03:00
9ecdf054d3 vault: bootstrap k8s auth config with root token 2026-01-27 01:04:57 -03:00
flux-bot
d9c8632b8d chore(bstein-dev-home): automated image update 2026-01-27 02:53:50 +00:00
flux-bot
d325111f34 chore(bstein-dev-home): automated image update 2026-01-27 02:52:49 +00:00
adc711be62 comms: rerun synapse user seed 2026-01-26 22:54:43 -03:00
66ce0caaf4 comms: restart atlasbot for op priority 2026-01-26 22:52:49 -03:00
9ea338b121 monitoring: restart jetson exporter 2026-01-26 22:51:41 -03:00
270dc93966 atlasbot: prioritize top queries over list 2026-01-26 22:51:04 -03:00
0331e7ea99 monitoring: fix jetson metrics newlines 2026-01-26 22:50:33 -03:00
flux-bot
f08d740d83 chore(bstein-dev-home): automated image update 2026-01-27 01:47:47 +00:00
flux-bot
328241b7ac chore(bstein-dev-home): automated image update 2026-01-27 01:47:43 +00:00
c8662a624e atlasbot: add internal endpoint and portal wiring 2026-01-26 22:43:58 -03:00
689bf10995 comms: restart atlasbot for generic planner 2026-01-26 22:39:01 -03:00
37a203509b atlasbot: replace targeted handlers with generic planner 2026-01-26 22:38:37 -03:00
flux-bot
6c413d4a50 chore(maintenance): automated image update 2026-01-27 01:27:02 +00:00
1616994b19 monitoring: unify jetson gpu metrics 2026-01-26 22:26:24 -03:00
ec834b7e0f vault: allow ariadne to use vault-admin role 2026-01-26 22:26:13 -03:00
8c90e0e527 comms: restart atlasbot for hottest node fix 2026-01-26 22:13:53 -03:00
6432472be7 atlasbot: answer hottest node queries via metrics 2026-01-26 22:13:04 -03:00
72bd22e912 monitoring: map dcgm to shared gpu resources 2026-01-26 20:58:06 -03:00
flux-bot
879a751429 chore(maintenance): automated image update 2026-01-26 23:54:53 +00:00
b0abb9bd6e ariadne: reduce comms noise, fix gpu labels 2026-01-26 20:54:33 -03:00
b27c80d5c0 atlasbot: improve node inventory reasoning 2026-01-26 19:53:11 -03:00
a61091c052 atlasbot: reload structured answers 2026-01-26 19:34:42 -03:00
16d0a22163 atlasbot: generalize inventory answers 2026-01-26 19:34:19 -03:00
2d09e7f965 atlasbot: reload inventory answers 2026-01-26 19:31:07 -03:00
bf2d4cff90 atlasbot: answer from live inventory 2026-01-26 19:29:26 -03:00
3e4351ef19 atlasbot: reload for live inventory 2026-01-26 19:24:03 -03:00
ff04341559 atlasbot: use live node inventory context 2026-01-26 19:22:28 -03:00
d666e6a156 atlasbot: roll deployment 2026-01-26 19:02:54 -03:00
b6e8c01e99 atlasbot: improve missing node inference 2026-01-26 19:01:26 -03:00
0d5e19e11a atlasbot: infer worker expected count from metrics 2026-01-26 18:50:23 -03:00
dfa13e22cc atlasbot: clarify worker count limits 2026-01-26 18:21:17 -03:00
65781aaca7 atlasbot: improve worker node answers 2026-01-26 18:18:42 -03:00
7bb1bd96fc atlasbot: improve worker readiness and metrics replies 2026-01-26 18:16:14 -03:00
be7846572f atlasbot: recognize prefix mentions 2026-01-26 15:54:00 -03:00
0ac0f920ca atlasbot: load metrics index and answer in rooms 2026-01-26 15:34:52 -03:00
33b5e2b678 atlasbot: add metrics kb and long timeout 2026-01-26 14:08:11 -03:00
fff00dbe95 atlasbot: ground node inventory and soften llm failures 2026-01-26 12:36:51 -03:00
53e4b4036b comms: bump atlasbot config checksum 2026-01-26 12:08:33 -03:00
28570a1f5c atlasbot: answer jetson nodes from knowledge 2026-01-26 12:06:48 -03:00
2c3ffdbf95 ai-llm: tighten gpu placement and resources 2026-01-26 11:44:28 -03:00
fec7713049 comms: bump atlasbot configmap checksum 2026-01-26 09:38:38 -03:00
352d4991f4 comms: handle arch node counts and extend LLM timeout 2026-01-26 09:36:08 -03:00
14d18048d5 comms: fix duplicate chat key annotations 2026-01-26 09:29:28 -03:00
7fd71f4bab comms: inject chat ai keys for atlasbot 2026-01-26 09:23:21 -03:00
flux-bot
f14be5d7ef chore(maintenance): automated image update 2026-01-26 06:33:26 +00:00
10003ca0d7 comms: sync atlas knowledge and use ariadne state 2026-01-26 03:32:17 -03:00
5aac018a7b comms: answer node name queries 2026-01-26 01:35:47 -03:00
36f7de76e9 comms: fix atlasbot node count matcher 2026-01-26 01:32:01 -03:00
5f0bc3832d comms: answer node count queries 2026-01-26 01:07:49 -03:00
cd6eaff7cb comms: normalize atlasbot replies 2026-01-26 00:52:35 -03:00
83b8e13661 ai: restart ollama deployment 2026-01-25 16:19:15 -03:00
ec6b51cfd2 comms: route atlasbot to chat gateway 2026-01-25 15:59:34 -03:00
flux-bot
04465407d2 chore(bstein-dev-home): automated image update 2026-01-25 18:06:59 +00:00
flux-bot
5a994f4d42 chore(bstein-dev-home): automated image update 2026-01-25 18:04:59 +00:00
flux-bot
af9fcdeae9 chore(bstein-dev-home): automated image update 2026-01-25 17:40:57 +00:00
flux-bot
39df6ff039 chore(bstein-dev-home): automated image update 2026-01-25 17:39:57 +00:00
flux-bot
70e79f25b0 chore(bstein-dev-home): automated image update 2026-01-25 00:07:26 +00:00
flux-bot
f471a30499 chore(bstein-dev-home): automated image update 2026-01-25 00:06:26 +00:00
ee154f1494 vaultwarden: bump to 1.35.2 2026-01-24 14:16:59 -03:00
flux-bot
d0c69cd480 chore(bstein-dev-home): automated image update 2026-01-24 14:46:38 +00:00
flux-bot
6e4e2bdc0c chore(bstein-dev-home): automated image update 2026-01-24 14:44:38 +00:00
flux-bot
0b7d87cef4 chore(bstein-dev-home): automated image update 2026-01-24 14:32:37 +00:00
flux-bot
a27bb0e198 chore(bstein-dev-home): automated image update 2026-01-24 14:31:37 +00:00
flux-bot
cf2d0c5eff chore(bstein-dev-home): automated image update 2026-01-24 10:16:15 +00:00
flux-bot
00eb4be529 chore(bstein-dev-home): automated image update 2026-01-24 10:15:15 +00:00
flux-bot
8b1b824a29 chore(maintenance): automated image update 2026-01-24 10:13:43 +00:00
flux-bot
a7f5a60190 chore(maintenance): automated image update 2026-01-24 09:29:39 +00:00
flux-bot
eeb84e8e70 chore(bstein-dev-home): automated image update 2026-01-24 02:07:32 +00:00
flux-bot
82312d0fbf chore(bstein-dev-home): automated image update 2026-01-24 02:05:32 +00:00
292ec7359b keycloak: rerun realm settings job 2026-01-23 22:41:41 -03:00
flux-bot
473bebaf52 chore(bstein-dev-home): automated image update 2026-01-24 01:33:33 +00:00
flux-bot
d07f14826b chore(bstein-dev-home): automated image update 2026-01-24 01:33:29 +00:00
e7d18be4ed keycloak: add vaultwarden_grandfathered flag 2026-01-23 22:31:10 -03:00
flux-bot
437281f6a5 chore(bstein-dev-home): automated image update 2026-01-23 23:53:21 +00:00
flux-bot
67643e3fad chore(bstein-dev-home): automated image update 2026-01-23 23:52:21 +00:00
flux-bot
38d2dad28f chore(bstein-dev-home): automated image update 2026-01-23 23:28:28 +00:00
flux-bot
82fceb11a4 chore(bstein-dev-home): automated image update 2026-01-23 23:28:20 +00:00
flux-bot
8e6d9e1c37 chore(bstein-dev-home): automated image update 2026-01-23 23:19:21 +00:00
flux-bot
a603b3726f chore(bstein-dev-home): automated image update 2026-01-23 23:19:18 +00:00
flux-bot
e43340f2a1 chore(bstein-dev-home): automated image update 2026-01-23 22:40:15 +00:00
flux-bot
115f86907f chore(bstein-dev-home): automated image update 2026-01-23 22:39:15 +00:00
flux-bot
aaef2b7ab5 chore(bstein-dev-home): automated image update 2026-01-23 22:25:15 +00:00
flux-bot
c24f2dafc1 chore(bstein-dev-home): automated image update 2026-01-23 22:24:13 +00:00
flux-bot
d9c3ff8195 chore(maintenance): automated image update 2026-01-23 22:21:43 +00:00
b94b016b0f flux: force apply migrations 2026-01-23 18:58:33 -03:00
flux-bot
5ec4bb9c61 chore(maintenance): automated image update 2026-01-23 21:44:40 +00:00
flux-bot
e2501bd3d0 chore(bstein-dev-home): automated image update 2026-01-23 21:28:08 +00:00
flux-bot
bc2e1058d6 chore(bstein-dev-home): automated image update 2026-01-23 21:27:08 +00:00
flux-bot
45352f79ba chore(bstein-dev-home): automated image update 2026-01-23 20:51:05 +00:00
flux-bot
7b336c76a1 chore(bstein-dev-home): automated image update 2026-01-23 20:50:05 +00:00
flux-bot
0127c62f51 chore(bstein-dev-home): automated image update 2026-01-23 20:48:05 +00:00
flux-bot
ee6ef74982 chore(bstein-dev-home): automated image update 2026-01-23 20:47:05 +00:00
d521c66d60 maintenance: rotate ariadne migrate job name 2026-01-23 17:21:37 -03:00
flux-bot
c28444a233 chore(bstein-dev-home): automated image update 2026-01-23 20:00:01 +00:00
flux-bot
8bdf60542d chore(bstein-dev-home): automated image update 2026-01-23 19:58:00 +00:00
flux-bot
0758c2e06d chore(maintenance): automated image update 2026-01-23 19:56:31 +00:00
flux-bot
00bcc0d4c2 chore(bstein-dev-home): automated image update 2026-01-23 19:13:56 +00:00
flux-bot
60840d1171 chore(bstein-dev-home): automated image update 2026-01-23 19:11:58 +00:00
3338efa58e finance: allow actual user creation 2026-01-23 14:07:52 -03:00
a988af3262 monitoring: alert on VM outage 2026-01-23 11:51:28 -03:00
flux-bot
ef42dac97b chore(bstein-dev-home): automated image update 2026-01-23 06:45:19 +00:00
flux-bot
df3f4a0c0b chore(bstein-dev-home): automated image update 2026-01-23 06:44:18 +00:00
fda986ab3d bstein-dev-home: separate portal migrations 2026-01-23 03:28:49 -03:00
flux-bot
ca47e03953 chore(bstein-dev-home): automated image update 2026-01-23 06:14:16 +00:00
flux-bot
3d4208f877 chore(bstein-dev-home): automated image update 2026-01-23 06:13:15 +00:00
3d2e0ead1c portal: bump migrate job name 2026-01-23 03:11:42 -03:00
18ac46d4b8 keycloak: bump realm settings job 2026-01-23 02:09:53 -03:00
3cacbad4c0 comms/keycloak: add mailu email claim 2026-01-23 02:04:51 -03:00
3d633a5627 comms: enable MSC4108 under experimental_features 2026-01-23 01:46:03 -03:00
58d9cb616f comms: enable MSC4108 rendezvous in synapse 2026-01-23 01:35:43 -03:00
flux-bot
3474df40d4 chore(bstein-dev-home): automated image update 2026-01-23 03:39:02 +00:00
flux-bot
4c66b538a7 chore(bstein-dev-home): automated image update 2026-01-23 03:38:02 +00:00
flux-bot
2475d4ca9d chore(bstein-dev-home): automated image update 2026-01-23 03:11:03 +00:00
flux-bot
1d39015d33 chore(bstein-dev-home): automated image update 2026-01-23 03:10:59 +00:00
flux-bot
e0bf10cad9 chore(bstein-dev-home): automated image update 2026-01-23 03:02:59 +00:00
flux-bot
72e6a09bd0 chore(bstein-dev-home): automated image update 2026-01-23 03:01:59 +00:00
flux-bot
b1fa40acc1 chore(bstein-dev-home): automated image update 2026-01-23 02:47:58 +00:00
flux-bot
e3247f606f chore(bstein-dev-home): automated image update 2026-01-23 02:46:57 +00:00
flux-bot
2dc680b8f8 chore(bstein-dev-home): automated image update 2026-01-23 01:52:53 +00:00
flux-bot
8dedefb4b4 chore(bstein-dev-home): automated image update 2026-01-23 01:51:53 +00:00
flux-bot
a18f7e98a2 chore(bstein-dev-home): automated image update 2026-01-23 01:42:52 +00:00
flux-bot
62d16ae388 chore(bstein-dev-home): automated image update 2026-01-23 01:32:51 +00:00
flux-bot
d3d680383b chore(bstein-dev-home): automated image update 2026-01-23 01:14:49 +00:00
flux-bot
8545f2bc50 chore(bstein-dev-home): automated image update 2026-01-23 01:12:49 +00:00
flux-bot
5ca247f143 chore(bstein-dev-home): automated image update 2026-01-23 01:08:49 +00:00
flux-bot
4d566a7388 chore(bstein-dev-home): automated image update 2026-01-23 01:07:49 +00:00
flux-bot
8913c5a5f2 chore(bstein-dev-home): automated image update 2026-01-22 22:16:37 +00:00
flux-bot
25c4f3e07b chore(bstein-dev-home): automated image update 2026-01-22 22:16:34 +00:00
flux-bot
8b7e21f0cc chore(bstein-dev-home): automated image update 2026-01-22 22:08:37 +00:00
flux-bot
301909f92e chore(bstein-dev-home): automated image update 2026-01-22 22:08:33 +00:00
flux-bot
0c27b48a1c chore(bstein-dev-home): automated image update 2026-01-22 21:53:32 +00:00
flux-bot
71996fb199 chore(bstein-dev-home): automated image update 2026-01-22 21:51:32 +00:00
flux-bot
7c9ee41180 chore(maintenance): automated image update 2026-01-22 21:41:04 +00:00
ce5b1d1353 monitoring: add postgres metrics and update overview 2026-01-22 18:23:26 -03:00
820e624a0b jenkins: set timezone to America/Chicago 2026-01-22 18:23:26 -03:00
flux-bot
cca3a756b3 chore(maintenance): automated image update 2026-01-22 21:02:01 +00:00
flux-bot
1e815ce011 chore(bstein-dev-home): automated image update 2026-01-22 21:00:34 +00:00
flux-bot
e5281ad4c0 chore(bstein-dev-home): automated image update 2026-01-22 21:00:29 +00:00
flux-bot
1e8a67904c chore(bstein-dev-home): automated image update 2026-01-22 18:48:16 +00:00
flux-bot
0290a5f715 chore(bstein-dev-home): automated image update 2026-01-22 18:47:16 +00:00
9b5d8ac45c jobs: force recreate migrate jobs 2026-01-22 15:39:57 -03:00
flux-bot
05c7642f5c chore(bstein-dev-home): automated image update 2026-01-22 18:35:20 +00:00
flux-bot
efa893b134 chore(bstein-dev-home): automated image update 2026-01-22 18:35:15 +00:00
flux-bot
7eba40a889 chore(bstein-dev-home): automated image update 2026-01-22 18:34:08 +00:00
flux-bot
8b90b44dfd chore(maintenance): automated image update 2026-01-22 18:33:48 +00:00
flux-bot
21800290ec chore(maintenance): automated image update 2026-01-22 18:33:30 +00:00
ec5e4ec4a3 images: auth image scan and bump tags 2026-01-22 15:33:08 -03:00
flux-bot
af024aa16a chore(maintenance): automated image update 2026-01-22 18:29:24 +00:00
flux-bot
da32ba1680 chore(bstein-dev-home): automated image update 2026-01-22 18:29:01 +00:00
8788d40dc6 ops: bump portal and ariadne image tags 2026-01-22 15:28:26 -03:00
d509dfaa22 ops: restore portal/ariadne and add postgres panels 2026-01-22 15:23:23 -03:00
156effebe3 ops: pause portal/ariadne and add migrate jobs 2026-01-22 14:09:39 -03:00
8e3fe266aa flux: temporarily drop harbor health checks 2026-01-22 13:38:06 -03:00
3fc17b0c7c harbor: fix ingress patch placement 2026-01-22 13:31:12 -03:00
d9695d32f6 harbor: route v2 ingress to registry 2026-01-22 13:26:38 -03:00
0697d7b1b3 keycloak: allow harbor direct grants 2026-01-22 12:41:58 -03:00
d2f118ed32 jenkins: pin vault sync to worker nodes 2026-01-22 10:56:27 -03:00
5e35b5f7a2 vault: unsuspend k8s auth config cronjob 2026-01-22 04:47:50 -03:00
94953ab0fe jenkins: sync harbor pull secret from vault 2026-01-22 04:45:24 -03:00
ba2b9acbcc jenkins: use shared harbor creds when present 2026-01-22 03:15:38 -03:00
flux-bot
955bbcf58f chore(bstein-dev-home): automated image update 2026-01-22 05:41:20 +00:00
flux-bot
62c0e32bc4 chore(bstein-dev-home): automated image update 2026-01-22 05:40:21 +00:00
flux-bot
6dcbdcf704 chore(bstein-dev-home): automated image update 2026-01-22 05:38:20 +00:00
flux-bot
c84af0b8df chore(bstein-dev-home): automated image update 2026-01-22 05:37:20 +00:00
flux-bot
3891f1d063 chore(maintenance): automated image update 2026-01-22 00:59:59 +00:00
flux-bot
beb923cf0e chore(maintenance): automated image update 2026-01-22 00:48:58 +00:00
flux-bot
aa3db22eaf chore(bstein-dev-home): automated image update 2026-01-22 00:17:42 +00:00
flux-bot
592435f760 chore(bstein-dev-home): automated image update 2026-01-22 00:16:42 +00:00
flux-bot
d54115df55 chore(bstein-dev-home): automated image update 2026-01-21 23:48:39 +00:00
flux-bot
75e2c745f7 chore(bstein-dev-home): automated image update 2026-01-21 23:47:39 +00:00
flux-bot
71122fc200 chore(bstein-dev-home): automated image update 2026-01-21 23:24:40 +00:00
flux-bot
41d38033b5 chore(bstein-dev-home): automated image update 2026-01-21 23:24:37 +00:00
flux-bot
067134fa1b chore(bstein-dev-home): automated image update 2026-01-21 22:56:34 +00:00
flux-bot
eb5256e6bc chore(bstein-dev-home): automated image update 2026-01-21 22:55:34 +00:00
flux-bot
d3b1a925b8 chore(maintenance): automated image update 2026-01-21 22:52:46 +00:00
flux-bot
6f4e5dbfe7 chore(bstein-dev-home): automated image update 2026-01-21 22:32:31 +00:00
flux-bot
d9cda5b6af chore(bstein-dev-home): automated image update 2026-01-21 22:30:31 +00:00
flux-bot
30b86a693f chore(maintenance): automated image update 2026-01-21 22:23:44 +00:00
flux-bot
da16998d2e chore(bstein-dev-home): automated image update 2026-01-21 22:07:29 +00:00
flux-bot
3a48569330 chore(bstein-dev-home): automated image update 2026-01-21 22:05:29 +00:00
flux-bot
3a987c29ff chore(bstein-dev-home): automated image update 2026-01-21 20:34:18 +00:00
flux-bot
66cb72947f chore(bstein-dev-home): automated image update 2026-01-21 20:33:18 +00:00
flux-bot
1039590b14 chore(bstein-dev-home): automated image update 2026-01-21 20:05:15 +00:00
flux-bot
298d261146 chore(bstein-dev-home): automated image update 2026-01-21 20:04:15 +00:00
4721d44a33 monitoring: enforce sorted job lists 2026-01-21 15:12:53 -03:00
db4c3b7c51 monitoring: tighten jobs/overview ordering 2026-01-21 15:01:02 -03:00
b0996e9a4f monitoring: refine jobs/overview panels 2026-01-21 14:31:11 -03:00
flux-bot
2138b93242 chore(maintenance): automated image update 2026-01-21 16:40:09 +00:00
8b35ab0292 monitoring: refresh jobs dashboards 2026-01-21 13:37:36 -03:00
2e407e1962 monitoring: reschedule grafana user dedupe 2026-01-21 12:31:54 -03:00
5ae6b4b00c monitoring: harden grafana user dedupe 2026-01-21 12:30:08 -03:00
ae1fd5b661 monitoring: fix grafana user dedupe job 2026-01-21 12:25:53 -03:00
4e65f02fba monitoring: prepopulate vault for dedupe job 2026-01-21 12:18:57 -03:00
88de0f7cee monitoring: wire vault sa for dedupe job 2026-01-21 12:16:26 -03:00
08716c6be6 monitoring: use python dedupe job 2026-01-21 12:15:03 -03:00
a0caeb407c monitoring: dedupe grafana user via api 2026-01-21 12:11:28 -03:00
6eeb551239 monitoring: add grafana user dedupe job 2026-01-21 12:08:23 -03:00
98b063f2dd grafana: allow email-based oauth user lookup 2026-01-21 11:45:11 -03:00
698b2fd96b monitoring: refresh testing dashboard 2026-01-21 11:29:48 -03:00
flux-bot
a9f6b04baa chore(maintenance): automated image update 2026-01-21 14:04:54 +00:00
flux-bot
d8a3b5250e chore(bstein-dev-home): automated image update 2026-01-21 13:36:39 +00:00
flux-bot
4484fed039 chore(maintenance): automated image update 2026-01-21 13:35:55 +00:00
7cf5e7e39d flux: simplify image automation messages 2026-01-21 10:35:29 -03:00
4de4630911 flux: fix image automation templates 2026-01-21 10:34:25 -03:00
6ac3b41b30 flux: align image automation namespaces 2026-01-21 10:33:06 -03:00
810e4c0efb flux: align imagepolicy tag setters 2026-01-21 10:20:53 -03:00
5e4ed17942 maintenance: bump ariadne image tag 2026-01-21 05:03:26 -03:00
a41ac1548c maintenance: fix ariadne comms endpoints and exec RBAC 2026-01-21 04:05:41 -03:00
b87fe4899c maintenance: bump ariadne image tag 2026-01-21 03:53:34 -03:00
0efc1ed6c4 ariadne: split portal and ariadne db secrets 2026-01-21 03:39:17 -03:00
439d824300 vault: allow ariadne to read needed secrets 2026-01-21 03:21:01 -03:00
80a7ec26e2 rbac: allow ariadne to read cronjobs 2026-01-21 03:05:53 -03:00
0d4f14c397 keycloak: bump realm settings job name 2026-01-21 03:03:32 -03:00
fb6ddce0c7 glue: centralize sync tasks in ariadne 2026-01-21 02:57:40 -03:00
1fedb5ecbe maintenance: wire ariadne db and dashboards 2026-01-20 23:03:39 -03:00
0bb45bca83 jenkins: fix dark theme injection 2026-01-20 18:13:49 -03:00
c846d2c1ba ci: add root Jenkinsfile and update keycloak ldap job 2026-01-20 18:11:13 -03:00
163f98c594 jenkins: inline dark theme css 2026-01-20 18:00:36 -03:00
954d0d36b9 jenkins: mount init scripts into home 2026-01-20 17:54:47 -03:00
6db7521114 jenkins: add local dark theme css 2026-01-20 17:43:23 -03:00
13891e794a jenkins: rotate cache/plugin pvcs 2026-01-20 17:32:27 -03:00
1522b7a019 jenkins: keep cache/plugin pvc sizes to avoid shrink 2026-01-20 17:21:42 -03:00
5c40efdbcc jenkins: right-size pvc requests 2026-01-20 17:19:58 -03:00
9ac66919d5 jenkins: expand pvc sizes and move /tmp to memory 2026-01-20 17:09:23 -03:00
c80f26625d jenkins: move agent workspace off node disk 2026-01-20 17:04:24 -03:00
f5eec19e11 jenkins: automate notifyCommit token 2026-01-20 11:54:15 -03:00
b54da8e3e0 jenkins: fix scmTrigger spec field 2026-01-20 11:23:06 -03:00
9f6824ad56 jenkins: use scmTrigger for pipeline polls 2026-01-20 11:14:29 -03:00
0d3c5eb976 jenkins: use pollSCM for pipeline triggers 2026-01-20 11:07:54 -03:00
9cdf244d98 jenkins: drop legacy cleanup and update triggers 2026-01-20 10:59:51 -03:00
36ae49f1fc jenkins: clean legacy quality-gate job 2026-01-20 10:37:57 -03:00
b8d8240383 jenkins: fix webhook trigger DSL 2026-01-20 10:31:30 -03:00
fe30570b62 jenkins: pin oic-auth for core 2.528.3 2026-01-20 10:23:08 -03:00
8e9db51f9d jenkins: restore multibranch + webhook token 2026-01-20 10:15:33 -03:00
ea6e600007 jenkins: drop removed multibranch plugin 2026-01-20 09:45:33 -03:00
b8f2d00547 jenkins: pin root url for OIDC 2026-01-20 09:37:21 -03:00
132074f0ff gitea: allow jenkins webhook 2026-01-20 09:06:39 -03:00
56b36330b2 glue: preserve keycloak profile updates 2026-01-20 03:59:19 -03:00
557663f524 ci(jenkins): add Ariadne pipeline job 2026-01-20 03:30:48 -03:00
5fe8866623 ci(jenkins): add multibranch quality gate 2026-01-20 03:21:36 -03:00
e2e7e58f32 maintenance: extend Ariadne schedules and RBAC 2026-01-20 03:01:59 -03:00
95a7ac235f mailu: restart postfix after canonical map update 2026-01-20 02:38:04 -03:00
814d1ce211 mailu: keep podop socketmap in canonical maps 2026-01-20 02:37:02 -03:00
d996bda2c1 mailu: restart postfix to load canonical map 2026-01-20 02:32:43 -03:00
2bbbf019ff mailu: rewrite double-bounce to base domain 2026-01-20 02:30:44 -03:00
34fb371270 portal: rerun onboarding e2e job 2026-01-20 01:20:16 -03:00
14864a3b8c jenkins: align quality gate branch 2026-01-20 01:14:30 -03:00
cfcda87f67 jenkins: re-target quality gate and restart 2026-01-20 01:08:51 -03:00
cac8a3cdde mailu: recreate postfix on upgrade 2026-01-20 01:07:01 -03:00
3e0260b945 ci: pin quality gate agents to rpi5 2026-01-20 01:05:06 -03:00
a8be46b422 mailu: prefer postmark smtp token for relay 2026-01-20 01:04:04 -03:00
a86d68ca74 mailu: use postmark server token for relay 2026-01-20 00:58:04 -03:00
f527da9cdb chore(portal): rerun onboarding e2e 2026-01-20 00:09:49 -03:00
8be01698a9 chore(maintenance): bump ariadne image tag 2026-01-20 00:07:45 -03:00
278b4541a2 chore(portal): rerun onboarding e2e 2026-01-19 23:58:37 -03:00
7d999cc6c6 fix(mailu): pin sync workloads to arm64 2026-01-19 23:51:55 -03:00
cffe53edbe chore(portal): rerun onboarding e2e 2026-01-19 23:47:24 -03:00
1b2243e2a8 chore(maintenance): bump ariadne image tag 2026-01-19 23:45:48 -03:00
34c42cfb62 core: fix postmark DNS and time sync 2026-01-19 23:45:31 -03:00
84cd05b08a chore(portal): rerun onboarding e2e 2026-01-19 23:31:45 -03:00
9ff88f7f13 fix(mailu): allow forced sync 2026-01-19 23:28:07 -03:00
901f3e797c chore(portal): rerun onboarding e2e 2026-01-19 23:05:46 -03:00
4b0d8fb301 chore(maintenance): bump ariadne image tag 2026-01-19 23:04:59 -03:00
c1f0ea421d fix: extend mailu mailbox wait for ariadne 2026-01-19 22:49:23 -03:00
67e422f56f chore: rerun portal onboarding e2e 2026-01-19 22:42:14 -03:00
c7e81674b0 fix: point portal at ariadne service 2026-01-19 22:38:22 -03:00
cff3ed0759 chore: run portal onboarding e2e job 2026-01-19 22:35:29 -03:00
7171e5a9ea fix: unblock keycloak and refresh glue checks 2026-01-19 22:33:34 -03:00
776aea25f5 bstein-dev-home: bump images to 0.1.1-107 2026-01-19 22:11:38 -03:00
fbdf53a9a8 chore: add maintenance image automation 2026-01-19 22:03:50 -03:00
a0c3b9f953 feat: wire portal to ariadne 2026-01-19 19:22:53 -03:00
61619ddf77 fix: allow maintenance vault sync role 2026-01-19 19:07:00 -03:00
ff3ed195ac chore: centralize harbor pull credentials 2026-01-19 19:02:14 -03:00
bb41c219f6 feat: add Ariadne service and glue scheduling 2026-01-19 16:58:02 -03:00
791108723e flux: point atlas to feature/ariadne 2026-01-19 16:16:04 -03:00
c4ce7e3981 Merge pull request 'deploy' (#10) from deploy into main
Reviewed-on: #10
2026-01-19 19:03:59 +00:00
172 changed files with 17182 additions and 1445 deletions

1
.gitignore vendored
View File

@ -6,4 +6,5 @@ __pycache__/
*.py[cod] *.py[cod]
.pytest_cache .pytest_cache
.venv .venv
.venv-ci
tmp/ tmp/

77
Jenkinsfile vendored Normal file
View File

@ -0,0 +1,77 @@
// Mirror of ci/Jenkinsfile.titan-iac for multibranch discovery.
pipeline {
agent {
kubernetes {
defaultContainer 'python'
yaml """
apiVersion: v1
kind: Pod
spec:
nodeSelector:
hardware: rpi5
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: "true"
containers:
- name: python
image: python:3.12-slim
command:
- cat
tty: true
"""
}
}
environment {
PIP_DISABLE_PIP_VERSION_CHECK = '1'
PYTHONUNBUFFERED = '1'
}
stages {
stage('Checkout') {
steps {
checkout scm
}
}
stage('Install deps') {
steps {
sh 'pip install --no-cache-dir -r ci/requirements.txt'
}
}
stage('Glue tests') {
steps {
sh 'pytest -q ci/tests/glue'
}
}
stage('Resolve Flux branch') {
steps {
script {
env.FLUX_BRANCH = sh(
returnStdout: true,
script: "awk '/branch:/{print $2; exit}' clusters/atlas/flux-system/gotk-sync.yaml"
).trim()
if (!env.FLUX_BRANCH) {
error('Flux branch not found in gotk-sync.yaml')
}
echo "Flux branch: ${env.FLUX_BRANCH}"
}
}
}
stage('Promote') {
when {
expression {
def branch = env.BRANCH_NAME ?: (env.GIT_BRANCH ?: '').replaceFirst('origin/', '')
return env.FLUX_BRANCH && branch == env.FLUX_BRANCH
}
}
steps {
withCredentials([usernamePassword(credentialsId: 'gitea-pat', usernameVariable: 'GIT_USER', passwordVariable: 'GIT_TOKEN')]) {
sh '''
set +x
git config user.email "jenkins@bstein.dev"
git config user.name "jenkins"
git remote set-url origin https://${GIT_USER}:${GIT_TOKEN}@scm.bstein.dev/bstein/titan-iac.git
git push origin HEAD:${FLUX_BRANCH}
'''
}
}
}
}
}

View File

@ -6,6 +6,10 @@ pipeline {
apiVersion: v1 apiVersion: v1
kind: Pod kind: Pod
spec: spec:
nodeSelector:
hardware: rpi5
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: "true"
containers: containers:
- name: python - name: python
image: python:3.12-slim image: python:3.12-slim
@ -18,7 +22,6 @@ spec:
environment { environment {
PIP_DISABLE_PIP_VERSION_CHECK = '1' PIP_DISABLE_PIP_VERSION_CHECK = '1'
PYTHONUNBUFFERED = '1' PYTHONUNBUFFERED = '1'
DEPLOY_BRANCH = 'deploy'
} }
stages { stages {
stage('Checkout') { stage('Checkout') {
@ -36,7 +39,27 @@ spec:
sh 'pytest -q ci/tests/glue' sh 'pytest -q ci/tests/glue'
} }
} }
stage('Resolve Flux branch') {
steps {
script {
env.FLUX_BRANCH = sh(
returnStdout: true,
script: "awk '/branch:/{print $2; exit}' clusters/atlas/flux-system/gotk-sync.yaml"
).trim()
if (!env.FLUX_BRANCH) {
error('Flux branch not found in gotk-sync.yaml')
}
echo "Flux branch: ${env.FLUX_BRANCH}"
}
}
}
stage('Promote') { stage('Promote') {
when {
expression {
def branch = env.BRANCH_NAME ?: (env.GIT_BRANCH ?: '').replaceFirst('origin/', '')
return env.FLUX_BRANCH && branch == env.FLUX_BRANCH
}
}
steps { steps {
withCredentials([usernamePassword(credentialsId: 'gitea-pat', usernameVariable: 'GIT_USER', passwordVariable: 'GIT_TOKEN')]) { withCredentials([usernamePassword(credentialsId: 'gitea-pat', usernameVariable: 'GIT_USER', passwordVariable: 'GIT_TOKEN')]) {
sh ''' sh '''
@ -44,7 +67,7 @@ spec:
git config user.email "jenkins@bstein.dev" git config user.email "jenkins@bstein.dev"
git config user.name "jenkins" git config user.name "jenkins"
git remote set-url origin https://${GIT_USER}:${GIT_TOKEN}@scm.bstein.dev/bstein/titan-iac.git git remote set-url origin https://${GIT_USER}:${GIT_TOKEN}@scm.bstein.dev/bstein/titan-iac.git
git push origin HEAD:${DEPLOY_BRANCH} git push origin HEAD:${FLUX_BRANCH}
''' '''
} }
} }

View File

@ -1,7 +1,16 @@
max_success_age_hours: 48 max_success_age_hours: 48
allow_suspended: allow_suspended:
- bstein-dev-home/vaultwarden-cred-sync
- comms/othrys-room-reset - comms/othrys-room-reset
- comms/pin-othrys-invite - comms/pin-othrys-invite
- comms/seed-othrys-room - comms/seed-othrys-room
- finance/firefly-user-sync - finance/firefly-user-sync
- health/wger-admin-ensure
- health/wger-user-sync - health/wger-user-sync
- mailu-mailserver/mailu-sync-nightly
- nextcloud/nextcloud-mail-sync
ariadne_schedule_tasks:
- schedule.mailu_sync
- schedule.nextcloud_sync
- schedule.vaultwarden_sync
- schedule.wger_admin

View File

@ -1,11 +1,19 @@
from __future__ import annotations from __future__ import annotations
import os import os
from pathlib import Path
import requests import requests
import yaml
VM_URL = os.environ.get("VM_URL", "http://victoria-metrics-single-server:8428").rstrip("/") VM_URL = os.environ.get("VM_URL", "http://victoria-metrics-single-server:8428").rstrip("/")
CONFIG_PATH = Path(__file__).with_name("config.yaml")
def _load_config() -> dict:
with CONFIG_PATH.open("r", encoding="utf-8") as handle:
return yaml.safe_load(handle) or {}
def _query(promql: str) -> list[dict]: def _query(promql: str) -> list[dict]:
@ -27,3 +35,14 @@ def test_glue_metrics_success_join():
) )
series = _query(query) series = _query(query)
assert series, "No glue cronjob last success series found" assert series, "No glue cronjob last success series found"
def test_ariadne_schedule_metrics_present():
cfg = _load_config()
expected = cfg.get("ariadne_schedule_tasks", [])
if not expected:
return
series = _query("ariadne_schedule_next_run_timestamp_seconds")
tasks = {item.get("metric", {}).get("task") for item in series}
missing = [task for task in expected if task not in tasks]
assert not missing, f"Missing Ariadne schedule metrics for: {', '.join(missing)}"

View File

@ -0,0 +1,17 @@
# clusters/atlas/flux-system/applications/bstein-dev-home-migrations/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: bstein-dev-home-migrations
namespace: flux-system
spec:
interval: 10m
path: ./services/bstein-dev-home/oneoffs/migrations
prune: true
force: true
sourceRef:
kind: GitRepository
name: flux-system
targetNamespace: bstein-dev-home
wait: false
suspend: true

View File

@ -3,7 +3,7 @@ apiVersion: image.toolkit.fluxcd.io/v1
kind: ImageUpdateAutomation kind: ImageUpdateAutomation
metadata: metadata:
name: bstein-dev-home name: bstein-dev-home
namespace: flux-system namespace: bstein-dev-home
spec: spec:
interval: 1m0s interval: 1m0s
sourceRef: sourceRef:
@ -13,14 +13,14 @@ spec:
git: git:
checkout: checkout:
ref: ref:
branch: feature/vault-consumption branch: feature/ariadne
commit: commit:
author: author:
email: ops@bstein.dev email: ops@bstein.dev
name: flux-bot name: flux-bot
messageTemplate: "chore(bstein-dev-home): update images to {{range .Updated.Images}}{{.}}{{end}}" messageTemplate: "chore(bstein-dev-home): automated image update"
push: push:
branch: feature/vault-consumption branch: feature/ariadne
update: update:
strategy: Setters strategy: Setters
path: services/bstein-dev-home path: services/bstein-dev-home

View File

@ -13,11 +13,6 @@ spec:
kind: GitRepository kind: GitRepository
name: flux-system name: flux-system
namespace: flux-system namespace: flux-system
healthChecks:
- apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
name: harbor
namespace: harbor
wait: false wait: false
dependsOn: dependsOn:
- name: core - name: core

View File

@ -12,6 +12,7 @@ resources:
- pegasus/image-automation.yaml - pegasus/image-automation.yaml
- bstein-dev-home/kustomization.yaml - bstein-dev-home/kustomization.yaml
- bstein-dev-home/image-automation.yaml - bstein-dev-home/image-automation.yaml
- bstein-dev-home-migrations/kustomization.yaml
- harbor/kustomization.yaml - harbor/kustomization.yaml
- harbor/image-automation.yaml - harbor/image-automation.yaml
- jellyfin/kustomization.yaml - jellyfin/kustomization.yaml

View File

@ -3,7 +3,7 @@ apiVersion: image.toolkit.fluxcd.io/v1
kind: ImageUpdateAutomation kind: ImageUpdateAutomation
metadata: metadata:
name: pegasus name: pegasus
namespace: flux-system namespace: jellyfin
spec: spec:
interval: 1m0s interval: 1m0s
sourceRef: sourceRef:

View File

@ -9,7 +9,7 @@ metadata:
spec: spec:
interval: 1m0s interval: 1m0s
ref: ref:
branch: deploy branch: feature/ariadne
secretRef: secretRef:
name: flux-system-gitea name: flux-system-gitea
url: ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git url: ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git

View File

@ -11,6 +11,7 @@ resources:
- monitoring/kustomization.yaml - monitoring/kustomization.yaml
- logging/kustomization.yaml - logging/kustomization.yaml
- maintenance/kustomization.yaml - maintenance/kustomization.yaml
- maintenance/image-automation.yaml
- longhorn-adopt/kustomization.yaml - longhorn-adopt/kustomization.yaml
- longhorn/kustomization.yaml - longhorn/kustomization.yaml
- longhorn-ui/kustomization.yaml - longhorn-ui/kustomization.yaml

View File

@ -0,0 +1,26 @@
# clusters/atlas/flux-system/platform/maintenance/image-automation.yaml
apiVersion: image.toolkit.fluxcd.io/v1
kind: ImageUpdateAutomation
metadata:
name: maintenance
namespace: maintenance
spec:
interval: 1m0s
sourceRef:
kind: GitRepository
name: flux-system
namespace: flux-system
git:
checkout:
ref:
branch: feature/ariadne
commit:
author:
email: ops@bstein.dev
name: flux-bot
messageTemplate: "chore(maintenance): automated image update"
push:
branch: feature/ariadne
update:
strategy: Setters
path: services/maintenance

View File

@ -8,6 +8,7 @@ spec:
interval: 10m interval: 10m
path: ./services/maintenance path: ./services/maintenance
prune: true prune: true
force: true
sourceRef: sourceRef:
kind: GitRepository kind: GitRepository
name: flux-system name: flux-system

View File

@ -32,6 +32,9 @@ data:
192.168.22.9 notes.bstein.dev 192.168.22.9 notes.bstein.dev
192.168.22.9 office.bstein.dev 192.168.22.9 office.bstein.dev
192.168.22.9 pegasus.bstein.dev 192.168.22.9 pegasus.bstein.dev
3.136.224.193 pm-bounces.bstein.dev
3.150.68.49 pm-bounces.bstein.dev
18.189.137.81 pm-bounces.bstein.dev
192.168.22.9 registry.bstein.dev 192.168.22.9 registry.bstein.dev
192.168.22.9 scm.bstein.dev 192.168.22.9 scm.bstein.dev
192.168.22.9 secret.bstein.dev 192.168.22.9 secret.bstein.dev

View File

@ -6,5 +6,6 @@ resources:
- ../modules/profiles/atlas-ha - ../modules/profiles/atlas-ha
- coredns-custom.yaml - coredns-custom.yaml
- coredns-deployment.yaml - coredns-deployment.yaml
- ntp-sync-daemonset.yaml
- ../sources/cert-manager/letsencrypt.yaml - ../sources/cert-manager/letsencrypt.yaml
- ../sources/cert-manager/letsencrypt-prod.yaml - ../sources/cert-manager/letsencrypt-prod.yaml

View File

@ -0,0 +1,50 @@
# infrastructure/core/ntp-sync-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ntp-sync
namespace: kube-system
labels:
app: ntp-sync
spec:
selector:
matchLabels:
app: ntp-sync
template:
metadata:
labels:
app: ntp-sync
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
- key: node-role.kubernetes.io/master
operator: DoesNotExist
containers:
- name: ntp-sync
image: public.ecr.aws/docker/library/busybox:1.36.1
imagePullPolicy: IfNotPresent
command: ["/bin/sh", "-c"]
args:
- |
set -eu
while true; do
ntpd -q -p pool.ntp.org || true
sleep 300
done
securityContext:
capabilities:
add: ["SYS_TIME"]
runAsUser: 0
runAsGroup: 0
resources:
requests:
cpu: 10m
memory: 16Mi
limits:
cpu: 50m
memory: 64Mi

View File

@ -11,7 +11,7 @@ spec:
roleName: "longhorn" roleName: "longhorn"
objects: | objects: |
- objectName: "harbor-pull__dockerconfigjson" - objectName: "harbor-pull__dockerconfigjson"
secretPath: "kv/data/atlas/harbor-pull/longhorn" secretPath: "kv/data/atlas/shared/harbor-pull"
secretKey: "dockerconfigjson" secretKey: "dockerconfigjson"
secretObjects: secretObjects:
- secretName: longhorn-registry - secretName: longhorn-registry

View File

@ -4,6 +4,10 @@ kind: Service
metadata: metadata:
name: postgres-service name: postgres-service
namespace: postgres namespace: postgres
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9187"
prometheus.io/path: "/metrics"
spec: spec:
clusterIP: None clusterIP: None
ports: ports:
@ -11,5 +15,9 @@ spec:
port: 5432 port: 5432
protocol: TCP protocol: TCP
targetPort: 5432 targetPort: 5432
- name: metrics
port: 9187
protocol: TCP
targetPort: 9187
selector: selector:
app: postgres app: postgres

View File

@ -58,6 +58,23 @@ spec:
- name: vault-secrets - name: vault-secrets
mountPath: /mnt/vault mountPath: /mnt/vault
readOnly: true readOnly: true
- name: postgres-exporter
image: quay.io/prometheuscommunity/postgres-exporter:v0.15.0
ports:
- name: metrics
containerPort: 9187
protocol: TCP
env:
- name: DATA_SOURCE_URI
value: "localhost:5432/postgres?sslmode=disable"
- name: DATA_SOURCE_USER
value: postgres
- name: DATA_SOURCE_PASS_FILE
value: /mnt/vault/postgres_password
volumeMounts:
- name: vault-secrets
mountPath: /mnt/vault
readOnly: true
volumes: volumes:
- name: vault-secrets - name: vault-secrets
csi: csi:

View File

@ -5,7 +5,7 @@ metadata:
name: letsencrypt-prod name: letsencrypt-prod
spec: spec:
acme: acme:
email: brad.stein@gmail.com email: brad@bstein.dev
server: https://acme-v02.api.letsencrypt.org/directory server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef: privateKeySecretRef:
name: letsencrypt-prod-account-key name: letsencrypt-prod-account-key

View File

@ -5,7 +5,7 @@ metadata:
name: letsencrypt name: letsencrypt
spec: spec:
acme: acme:
email: brad.stein@gmail.com email: brad@bstein.dev
server: https://acme-v02.api.letsencrypt.org/directory server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef: privateKeySecretRef:
name: letsencrypt-account-key name: letsencrypt-account-key

View File

@ -17,4 +17,5 @@ spec:
values: values:
syncSecret: syncSecret:
enabled: true enabled: true
enableSecretRotation: false enableSecretRotation: true
rotationPollInterval: 2m

View File

@ -1,8 +1,8 @@
{ {
"counts": { "counts": {
"helmrelease_host_hints": 17, "helmrelease_host_hints": 19,
"http_endpoints": 37, "http_endpoints": 45,
"services": 43, "services": 47,
"workloads": 54 "workloads": 74
} }
} }

File diff suppressed because it is too large Load Diff

View File

@ -8,6 +8,15 @@ sources:
- name: bstein-dev-home - name: bstein-dev-home
path: services/bstein-dev-home path: services/bstein-dev-home
targetNamespace: bstein-dev-home targetNamespace: bstein-dev-home
- name: bstein-dev-home-migrations
path: services/bstein-dev-home/migrations
targetNamespace: bstein-dev-home
- name: cert-manager
path: infrastructure/cert-manager
targetNamespace: cert-manager
- name: cert-manager-cleanup
path: infrastructure/cert-manager/cleanup
targetNamespace: cert-manager
- name: comms - name: comms
path: services/comms path: services/comms
targetNamespace: comms targetNamespace: comms
@ -17,6 +26,9 @@ sources:
- name: crypto - name: crypto
path: services/crypto path: services/crypto
targetNamespace: crypto targetNamespace: crypto
- name: finance
path: services/finance
targetNamespace: finance
- name: flux-system - name: flux-system
path: clusters/atlas/flux-system path: clusters/atlas/flux-system
targetNamespace: null targetNamespace: null
@ -29,6 +41,9 @@ sources:
- name: harbor - name: harbor
path: services/harbor path: services/harbor
targetNamespace: harbor targetNamespace: harbor
- name: health
path: services/health
targetNamespace: health
- name: helm - name: helm
path: infrastructure/sources/helm path: infrastructure/sources/helm
targetNamespace: flux-system targetNamespace: flux-system
@ -44,6 +59,12 @@ sources:
- name: logging - name: logging
path: services/logging path: services/logging
targetNamespace: null targetNamespace: null
- name: longhorn
path: infrastructure/longhorn/core
targetNamespace: longhorn-system
- name: longhorn-adopt
path: infrastructure/longhorn/adopt
targetNamespace: longhorn-system
- name: longhorn-ui - name: longhorn-ui
path: infrastructure/longhorn/ui-ingress path: infrastructure/longhorn/ui-ingress
targetNamespace: longhorn-system targetNamespace: longhorn-system
@ -98,9 +119,15 @@ sources:
- name: vault-csi - name: vault-csi
path: infrastructure/vault-csi path: infrastructure/vault-csi
targetNamespace: kube-system targetNamespace: kube-system
- name: vault-injector
path: infrastructure/vault-injector
targetNamespace: vault
- name: vaultwarden - name: vaultwarden
path: services/vaultwarden path: services/vaultwarden
targetNamespace: vaultwarden targetNamespace: vaultwarden
- name: wallet-monero-temp
path: services/crypto/wallet-monero-temp
targetNamespace: crypto
- name: xmr-miner - name: xmr-miner
path: services/crypto/xmr-miner path: services/crypto/xmr-miner
targetNamespace: crypto targetNamespace: crypto
@ -124,7 +151,7 @@ workloads:
kubernetes.io/arch: arm64 kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- registry.bstein.dev/bstein/bstein-dev-home-backend:0.1.1-92 - registry.bstein.dev/bstein/bstein-dev-home-backend:0.1.1-157
- kind: Deployment - kind: Deployment
namespace: bstein-dev-home namespace: bstein-dev-home
name: bstein-dev-home-frontend name: bstein-dev-home-frontend
@ -135,13 +162,22 @@ workloads:
kubernetes.io/arch: arm64 kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- registry.bstein.dev/bstein/bstein-dev-home-frontend:0.1.1-92 - registry.bstein.dev/bstein/bstein-dev-home-frontend:0.1.1-157
- kind: Deployment
namespace: bstein-dev-home
name: bstein-dev-home-vault-sync
labels:
app: bstein-dev-home-vault-sync
serviceAccountName: bstein-dev-home-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: bstein-dev-home namespace: bstein-dev-home
name: chat-ai-gateway name: chat-ai-gateway
labels: labels:
app: chat-ai-gateway app: chat-ai-gateway
serviceAccountName: null serviceAccountName: bstein-dev-home
nodeSelector: nodeSelector:
kubernetes.io/arch: arm64 kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
@ -157,12 +193,21 @@ workloads:
hardware: rpi5 hardware: rpi5
images: images:
- python:3.11-slim - python:3.11-slim
- kind: Deployment
namespace: comms
name: comms-vault-sync
labels:
app: comms-vault-sync
serviceAccountName: comms-vault
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: comms namespace: comms
name: coturn name: coturn
labels: labels:
app: coturn app: coturn
serviceAccountName: null serviceAccountName: comms-vault
nodeSelector: nodeSelector:
hardware: rpi5 hardware: rpi5
images: images:
@ -182,7 +227,7 @@ workloads:
name: livekit name: livekit
labels: labels:
app: livekit app: livekit
serviceAccountName: null serviceAccountName: comms-vault
nodeSelector: nodeSelector:
hardware: rpi5 hardware: rpi5
images: images:
@ -192,17 +237,17 @@ workloads:
name: livekit-token-service name: livekit-token-service
labels: labels:
app: livekit-token-service app: livekit-token-service
serviceAccountName: null serviceAccountName: comms-vault
nodeSelector: nodeSelector:
hardware: rpi5 hardware: rpi5
images: images:
- ghcr.io/element-hq/lk-jwt-service:0.3.0 - registry.bstein.dev/tools/lk-jwt-service-vault:0.3.0
- kind: Deployment - kind: Deployment
namespace: comms namespace: comms
name: matrix-authentication-service name: matrix-authentication-service
labels: labels:
app: matrix-authentication-service app: matrix-authentication-service
serviceAccountName: null serviceAccountName: comms-vault
nodeSelector: nodeSelector:
hardware: rpi5 hardware: rpi5
images: images:
@ -212,7 +257,7 @@ workloads:
name: matrix-guest-register name: matrix-guest-register
labels: labels:
app.kubernetes.io/name: matrix-guest-register app.kubernetes.io/name: matrix-guest-register
serviceAccountName: null serviceAccountName: comms-vault
nodeSelector: {} nodeSelector: {}
images: images:
- python:3.11-slim - python:3.11-slim
@ -235,12 +280,21 @@ workloads:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- ghcr.io/tari-project/xmrig@sha256:80defbfd0b640d604c91cb5101d3642db7928e1e68ee3c6b011289b3565a39d9 - ghcr.io/tari-project/xmrig@sha256:80defbfd0b640d604c91cb5101d3642db7928e1e68ee3c6b011289b3565a39d9
- kind: Deployment
namespace: crypto
name: crypto-vault-sync
labels:
app: crypto-vault-sync
serviceAccountName: crypto-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: crypto namespace: crypto
name: monero-p2pool name: monero-p2pool
labels: labels:
app: monero-p2pool app: monero-p2pool
serviceAccountName: null serviceAccountName: crypto-vault-sync
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
@ -255,6 +309,38 @@ workloads:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- registry.bstein.dev/crypto/monerod:0.18.4.1 - registry.bstein.dev/crypto/monerod:0.18.4.1
- kind: Deployment
namespace: crypto
name: wallet-monero-temp
labels:
app: wallet-monero-temp
serviceAccountName: crypto-vault-sync
nodeSelector:
node-role.kubernetes.io/worker: 'true'
images:
- registry.bstein.dev/crypto/monero-wallet-rpc:0.18.4.1
- kind: Deployment
namespace: finance
name: actual-budget
labels:
app: actual-budget
serviceAccountName: finance-vault
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- actualbudget/actual-server:26.1.0-alpine@sha256:34aae5813fdfee12af2a50c4d0667df68029f1d61b90f45f282473273eb70d0d
- kind: Deployment
namespace: finance
name: firefly
labels:
app: firefly
serviceAccountName: finance-vault
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- fireflyiii/core:version-6.4.15
- kind: Deployment - kind: Deployment
namespace: flux-system namespace: flux-system
name: helm-controller name: helm-controller
@ -344,17 +430,38 @@ workloads:
name: gitea name: gitea
labels: labels:
app: gitea app: gitea
serviceAccountName: null serviceAccountName: gitea-vault
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- gitea/gitea:1.23 - gitea/gitea:1.23
- kind: Deployment
namespace: harbor
name: harbor-vault-sync
labels:
app: harbor-vault-sync
serviceAccountName: harbor-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment
namespace: health
name: wger
labels:
app: wger
serviceAccountName: health-vault-sync
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- nginx:1.27.5-alpine@sha256:65645c7bb6a0661892a8b03b89d0743208a18dd2f3f17a54ef4b76fb8e2f2a10
- wger/server@sha256:710588b78af4e0aa0b4d8a8061e4563e16eae80eeaccfe7f9e0d9cbdd7f0cbc5
- kind: Deployment - kind: Deployment
namespace: jellyfin namespace: jellyfin
name: jellyfin name: jellyfin
labels: labels:
app: jellyfin app: jellyfin
serviceAccountName: null serviceAccountName: pegasus-vault-sync
nodeSelector: {} nodeSelector: {}
images: images:
- docker.io/jellyfin/jellyfin:10.11.5 - docker.io/jellyfin/jellyfin:10.11.5
@ -363,13 +470,22 @@ workloads:
name: pegasus name: pegasus
labels: labels:
app: pegasus app: pegasus
serviceAccountName: null serviceAccountName: pegasus-vault-sync
nodeSelector: nodeSelector:
kubernetes.io/arch: arm64 kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- alpine:3.20 - alpine:3.20
- registry.bstein.dev/streaming/pegasus:1.2.32 - registry.bstein.dev/streaming/pegasus-vault:1.2.32
- kind: Deployment
namespace: jellyfin
name: pegasus-vault-sync
labels:
app: pegasus-vault-sync
serviceAccountName: pegasus-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: jenkins namespace: jenkins
name: jenkins name: jenkins
@ -381,6 +497,26 @@ workloads:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- jenkins/jenkins:2.528.3-jdk21 - jenkins/jenkins:2.528.3-jdk21
- kind: Deployment
namespace: jenkins
name: jenkins-vault-sync
labels:
app: jenkins-vault-sync
serviceAccountName: jenkins-vault-sync
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- alpine:3.20
- kind: DaemonSet
namespace: kube-system
name: ntp-sync
labels:
app: ntp-sync
serviceAccountName: null
nodeSelector: {}
images:
- public.ecr.aws/docker/library/busybox:1.36.1
- kind: DaemonSet - kind: DaemonSet
namespace: kube-system namespace: kube-system
name: nvidia-device-plugin-jetson name: nvidia-device-plugin-jetson
@ -427,6 +563,16 @@ workloads:
kubernetes.io/os: linux kubernetes.io/os: linux
images: images:
- hashicorp/vault-csi-provider:1.7.0 - hashicorp/vault-csi-provider:1.7.0
- kind: Deployment
namespace: kube-system
name: coredns
labels:
k8s-app: kube-dns
serviceAccountName: coredns
nodeSelector:
kubernetes.io/os: linux
images:
- registry.bstein.dev/infra/coredns:1.12.1
- kind: DaemonSet - kind: DaemonSet
namespace: logging namespace: logging
name: node-image-gc-rpi4 name: node-image-gc-rpi4
@ -457,22 +603,41 @@ workloads:
hardware: rpi5 hardware: rpi5
images: images:
- bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131 - bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131
- kind: Deployment
namespace: logging
name: logging-vault-sync
labels:
app: logging-vault-sync
serviceAccountName: logging-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: logging namespace: logging
name: oauth2-proxy-logs name: oauth2-proxy-logs
labels: labels:
app: oauth2-proxy-logs app: oauth2-proxy-logs
serviceAccountName: null serviceAccountName: logging-vault-sync
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- quay.io/oauth2-proxy/oauth2-proxy:v7.6.0 - registry.bstein.dev/tools/oauth2-proxy-vault:v7.6.0
- kind: Deployment
namespace: longhorn-system
name: longhorn-vault-sync
labels:
app: longhorn-vault-sync
serviceAccountName: longhorn-vault-sync
nodeSelector:
node-role.kubernetes.io/worker: 'true'
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: longhorn-system namespace: longhorn-system
name: oauth2-proxy-longhorn name: oauth2-proxy-longhorn
labels: labels:
app: oauth2-proxy-longhorn app: oauth2-proxy-longhorn
serviceAccountName: null serviceAccountName: longhorn-vault
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
@ -489,13 +654,34 @@ workloads:
- registry.bstein.dev/bstein/kubectl:1.35.0 - registry.bstein.dev/bstein/kubectl:1.35.0
- kind: Deployment - kind: Deployment
namespace: mailu-mailserver namespace: mailu-mailserver
name: mailu-sync-listener name: mailu-vault-sync
labels: labels:
app: mailu-sync-listener app: mailu-vault-sync
serviceAccountName: null serviceAccountName: mailu-vault-sync
nodeSelector: {} nodeSelector: {}
images: images:
- python:3.11-alpine - alpine:3.20
- kind: DaemonSet
namespace: maintenance
name: disable-k3s-traefik
labels:
app: disable-k3s-traefik
serviceAccountName: disable-k3s-traefik
nodeSelector:
node-role.kubernetes.io/control-plane: 'true'
images:
- bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131
- kind: DaemonSet
namespace: maintenance
name: k3s-agent-restart
labels:
app: k3s-agent-restart
serviceAccountName: node-nofile
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131
- kind: DaemonSet - kind: DaemonSet
namespace: maintenance namespace: maintenance
name: node-image-sweeper name: node-image-sweeper
@ -515,6 +701,26 @@ workloads:
nodeSelector: {} nodeSelector: {}
images: images:
- bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131 - bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131
- kind: Deployment
namespace: maintenance
name: ariadne
labels:
app: ariadne
serviceAccountName: ariadne
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- registry.bstein.dev/bstein/ariadne:0.1.0-49
- kind: Deployment
namespace: maintenance
name: maintenance-vault-sync
labels:
app: maintenance-vault-sync
serviceAccountName: maintenance-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: DaemonSet - kind: DaemonSet
namespace: monitoring namespace: monitoring
name: dcgm-exporter name: dcgm-exporter
@ -534,12 +740,21 @@ workloads:
jetson: 'true' jetson: 'true'
images: images:
- python:3.10-slim - python:3.10-slim
- kind: Deployment
namespace: monitoring
name: monitoring-vault-sync
labels:
app: monitoring-vault-sync
serviceAccountName: monitoring-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: monitoring namespace: monitoring
name: postmark-exporter name: postmark-exporter
labels: labels:
app: postmark-exporter app: postmark-exporter
serviceAccountName: null serviceAccountName: monitoring-vault-sync
nodeSelector: {} nodeSelector: {}
images: images:
- python:3.12-alpine - python:3.12-alpine
@ -558,7 +773,7 @@ workloads:
name: nextcloud name: nextcloud
labels: labels:
app: nextcloud app: nextcloud
serviceAccountName: null serviceAccountName: nextcloud-vault
nodeSelector: nodeSelector:
hardware: rpi5 hardware: rpi5
images: images:
@ -568,7 +783,7 @@ workloads:
name: outline name: outline
labels: labels:
app: outline app: outline
serviceAccountName: null serviceAccountName: outline-vault
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
@ -588,7 +803,7 @@ workloads:
name: planka name: planka
labels: labels:
app: planka app: planka
serviceAccountName: null serviceAccountName: planka-vault
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
@ -603,13 +818,16 @@ workloads:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- postgres:15 - postgres:15
- quay.io/prometheuscommunity/postgres-exporter:v0.15.0
- kind: Deployment - kind: Deployment
namespace: sso namespace: sso
name: keycloak name: keycloak
labels: labels:
app: keycloak app: keycloak
serviceAccountName: null serviceAccountName: sso-vault
nodeSelector: {} nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images: images:
- quay.io/keycloak/keycloak:26.0.7 - quay.io/keycloak/keycloak:26.0.7
- kind: Deployment - kind: Deployment
@ -617,17 +835,26 @@ workloads:
name: oauth2-proxy name: oauth2-proxy
labels: labels:
app: oauth2-proxy app: oauth2-proxy
serviceAccountName: null serviceAccountName: sso-vault
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- quay.io/oauth2-proxy/oauth2-proxy:v7.6.0 - registry.bstein.dev/tools/oauth2-proxy-vault:v7.6.0
- kind: Deployment
namespace: sso
name: sso-vault-sync
labels:
app: sso-vault-sync
serviceAccountName: sso-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: StatefulSet - kind: StatefulSet
namespace: sso namespace: sso
name: openldap name: openldap
labels: labels:
app: openldap app: openldap
serviceAccountName: null serviceAccountName: sso-vault
nodeSelector: nodeSelector:
kubernetes.io/arch: arm64 kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
@ -640,7 +867,7 @@ workloads:
app: sui-metrics app: sui-metrics
serviceAccountName: sui-metrics serviceAccountName: sui-metrics
nodeSelector: nodeSelector:
kubernetes.io/hostname: titan-24 hardware: rpi5
images: images:
- victoriametrics/vmagent:v1.103.0 - victoriametrics/vmagent:v1.103.0
- kind: Deployment - kind: Deployment
@ -648,6 +875,8 @@ workloads:
name: traefik name: traefik
labels: labels:
app: traefik app: traefik
app.kubernetes.io/instance: traefik-kube-system
app.kubernetes.io/name: traefik
serviceAccountName: traefik-ingress-controller serviceAccountName: traefik-ingress-controller
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
@ -669,10 +898,12 @@ workloads:
name: vaultwarden name: vaultwarden
labels: labels:
app: vaultwarden app: vaultwarden
serviceAccountName: null serviceAccountName: vaultwarden-vault
nodeSelector: {} nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images: images:
- vaultwarden/server:1.33.2 - vaultwarden/server:1.35.2
services: services:
- namespace: ai - namespace: ai
name: ollama name: ollama
@ -1040,6 +1271,36 @@ services:
port: 3333 port: 3333
targetPort: 3333 targetPort: 3333
protocol: TCP protocol: TCP
- namespace: crypto
name: wallet-monero-temp
type: ClusterIP
selector:
app: wallet-monero-temp
ports:
- name: rpc
port: 18083
targetPort: 18083
protocol: TCP
- namespace: finance
name: actual-budget
type: ClusterIP
selector:
app: actual-budget
ports:
- name: http
port: 80
targetPort: 5006
protocol: TCP
- namespace: finance
name: firefly
type: ClusterIP
selector:
app: firefly
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
- namespace: flux-system - namespace: flux-system
name: notification-controller name: notification-controller
type: ClusterIP type: ClusterIP
@ -1082,7 +1343,7 @@ services:
protocol: TCP protocol: TCP
- namespace: gitea - namespace: gitea
name: gitea-ssh name: gitea-ssh
type: NodePort type: LoadBalancer
selector: selector:
app: gitea app: gitea
ports: ports:
@ -1090,6 +1351,16 @@ services:
port: 2242 port: 2242
targetPort: 2242 targetPort: 2242
protocol: TCP protocol: TCP
- namespace: health
name: wger
type: ClusterIP
selector:
app: wger
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
- namespace: jellyfin - namespace: jellyfin
name: jellyfin name: jellyfin
type: ClusterIP type: ClusterIP
@ -1124,21 +1395,6 @@ services:
port: 50000 port: 50000
targetPort: 50000 targetPort: 50000
protocol: TCP protocol: TCP
- namespace: kube-system
name: traefik
type: LoadBalancer
selector:
app.kubernetes.io/instance: traefik-kube-system
app.kubernetes.io/name: traefik
ports:
- name: web
port: 80
targetPort: web
protocol: TCP
- name: websecure
port: 443
targetPort: websecure
protocol: TCP
- namespace: logging - namespace: logging
name: oauth2-proxy-logs name: oauth2-proxy-logs
type: ClusterIP type: ClusterIP
@ -1191,15 +1447,15 @@ services:
port: 4190 port: 4190
targetPort: 4190 targetPort: 4190
protocol: TCP protocol: TCP
- namespace: mailu-mailserver - namespace: maintenance
name: mailu-sync-listener name: ariadne
type: ClusterIP type: ClusterIP
selector: selector:
app: mailu-sync-listener app: ariadne
ports: ports:
- name: http - name: http
port: 8080 port: 80
targetPort: 8080 targetPort: http
protocol: TCP protocol: TCP
- namespace: monitoring - namespace: monitoring
name: dcgm-exporter name: dcgm-exporter
@ -1291,6 +1547,10 @@ services:
port: 5432 port: 5432
targetPort: 5432 targetPort: 5432
protocol: TCP protocol: TCP
- name: metrics
port: 9187
targetPort: 9187
protocol: TCP
- namespace: sso - namespace: sso
name: keycloak name: keycloak
type: ClusterIP type: ClusterIP
@ -1335,6 +1595,20 @@ services:
port: 8429 port: 8429
targetPort: 8429 targetPort: 8429
protocol: TCP protocol: TCP
- namespace: traefik
name: traefik
type: LoadBalancer
selector:
app: traefik
ports:
- name: web
port: 80
targetPort: web
protocol: TCP
- name: websecure
port: 443
targetPort: websecure
protocol: TCP
- namespace: traefik - namespace: traefik
name: traefik-metrics name: traefik-metrics
type: ClusterIP type: ClusterIP
@ -1447,6 +1721,19 @@ http_endpoints:
kind: Ingress kind: Ingress
name: bstein-dev-home name: bstein-dev-home
source: bstein-dev-home source: bstein-dev-home
- host: budget.bstein.dev
path: /
backend:
namespace: finance
service: actual-budget
port: 80
workloads:
- kind: Deployment
name: actual-budget
via:
kind: Ingress
name: actual-budget
source: finance
- host: call.live.bstein.dev - host: call.live.bstein.dev
path: / path: /
backend: backend:
@ -1499,6 +1786,19 @@ http_endpoints:
kind: Ingress kind: Ingress
name: nextcloud name: nextcloud
source: nextcloud source: nextcloud
- host: health.bstein.dev
path: /
backend:
namespace: health
service: wger
port: 80
workloads:
- kind: Deployment
name: wger
via:
kind: Ingress
name: wger
source: health
- host: kit.live.bstein.dev - host: kit.live.bstein.dev
path: /livekit/jwt path: /livekit/jwt
backend: backend:
@ -1558,6 +1858,65 @@ http_endpoints:
kind: Ingress kind: Ingress
name: matrix-routing name: matrix-routing
source: comms source: comms
- host: live.bstein.dev
path: /_matrix/client/r0/register
backend:
namespace: comms
service: matrix-guest-register
port: 8080
workloads: &id003
- kind: Deployment
name: matrix-guest-register
via:
kind: Ingress
name: matrix-routing
source: comms
- host: live.bstein.dev
path: /_matrix/client/v3/login
backend:
namespace: comms
service: matrix-authentication-service
port: 8080
workloads: &id002
- kind: Deployment
name: matrix-authentication-service
via:
kind: Ingress
name: matrix-routing
source: comms
- host: live.bstein.dev
path: /_matrix/client/v3/logout
backend:
namespace: comms
service: matrix-authentication-service
port: 8080
workloads: *id002
via:
kind: Ingress
name: matrix-routing
source: comms
- host: live.bstein.dev
path: /_matrix/client/v3/refresh
backend:
namespace: comms
service: matrix-authentication-service
port: 8080
workloads: *id002
via:
kind: Ingress
name: matrix-routing
source: comms
- host: live.bstein.dev
path: /_matrix/client/v3/register
backend:
namespace: comms
service: matrix-guest-register
port: 8080
workloads: *id003
via:
kind: Ingress
name: matrix-routing
source: comms
- host: logs.bstein.dev - host: logs.bstein.dev
path: / path: /
backend: backend:
@ -1601,9 +1960,7 @@ http_endpoints:
namespace: comms namespace: comms
service: matrix-authentication-service service: matrix-authentication-service
port: 8080 port: 8080
workloads: &id002 workloads: *id002
- kind: Deployment
name: matrix-authentication-service
via: via:
kind: Ingress kind: Ingress
name: matrix-routing name: matrix-routing
@ -1647,9 +2004,7 @@ http_endpoints:
namespace: comms namespace: comms
service: matrix-guest-register service: matrix-guest-register
port: 8080 port: 8080
workloads: &id003 workloads: *id003
- kind: Deployment
name: matrix-guest-register
via: via:
kind: Ingress kind: Ingress
name: matrix-routing name: matrix-routing
@ -1722,6 +2077,19 @@ http_endpoints:
kind: Ingress kind: Ingress
name: monerod name: monerod
source: monerod source: monerod
- host: money.bstein.dev
path: /
backend:
namespace: finance
service: firefly
port: 80
workloads:
- kind: Deployment
name: firefly
via:
kind: Ingress
name: firefly
source: finance
- host: notes.bstein.dev - host: notes.bstein.dev
path: / path: /
backend: backend:
@ -1845,7 +2213,6 @@ helmrelease_host_hints:
- live.bstein.dev - live.bstein.dev
- matrix.live.bstein.dev - matrix.live.bstein.dev
comms:comms/othrys-synapse: comms:comms/othrys-synapse:
- bstein.dev
- kit.live.bstein.dev - kit.live.bstein.dev
- live.bstein.dev - live.bstein.dev
- matrix.live.bstein.dev - matrix.live.bstein.dev
@ -1856,6 +2223,8 @@ helmrelease_host_hints:
- registry.bstein.dev - registry.bstein.dev
logging:logging/data-prepper: logging:logging/data-prepper:
- registry.bstein.dev - registry.bstein.dev
longhorn:longhorn-system/longhorn:
- registry.bstein.dev
mailu:mailu-mailserver/mailu: mailu:mailu-mailserver/mailu:
- bstein.dev - bstein.dev
- mail.bstein.dev - mail.bstein.dev
@ -1863,5 +2232,8 @@ helmrelease_host_hints:
- alerts.bstein.dev - alerts.bstein.dev
monitoring:monitoring/grafana: monitoring:monitoring/grafana:
- bstein.dev - bstein.dev
- mail.bstein.dev
- metrics.bstein.dev - metrics.bstein.dev
- sso.bstein.dev - sso.bstein.dev
monitoring:monitoring/kube-state-metrics:
- atlas.bstein.dev

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@ -17,6 +17,11 @@ flowchart LR
host_bstein_dev --> svc_bstein_dev_home_bstein_dev_home_backend host_bstein_dev --> svc_bstein_dev_home_bstein_dev_home_backend
wl_bstein_dev_home_bstein_dev_home_backend["bstein-dev-home/bstein-dev-home-backend (Deployment)"] wl_bstein_dev_home_bstein_dev_home_backend["bstein-dev-home/bstein-dev-home-backend (Deployment)"]
svc_bstein_dev_home_bstein_dev_home_backend --> wl_bstein_dev_home_bstein_dev_home_backend svc_bstein_dev_home_bstein_dev_home_backend --> wl_bstein_dev_home_bstein_dev_home_backend
host_budget_bstein_dev["budget.bstein.dev"]
svc_finance_actual_budget["finance/actual-budget (Service)"]
host_budget_bstein_dev --> svc_finance_actual_budget
wl_finance_actual_budget["finance/actual-budget (Deployment)"]
svc_finance_actual_budget --> wl_finance_actual_budget
host_call_live_bstein_dev["call.live.bstein.dev"] host_call_live_bstein_dev["call.live.bstein.dev"]
svc_comms_element_call["comms/element-call (Service)"] svc_comms_element_call["comms/element-call (Service)"]
host_call_live_bstein_dev --> svc_comms_element_call host_call_live_bstein_dev --> svc_comms_element_call
@ -37,6 +42,11 @@ flowchart LR
host_cloud_bstein_dev --> svc_nextcloud_nextcloud host_cloud_bstein_dev --> svc_nextcloud_nextcloud
wl_nextcloud_nextcloud["nextcloud/nextcloud (Deployment)"] wl_nextcloud_nextcloud["nextcloud/nextcloud (Deployment)"]
svc_nextcloud_nextcloud --> wl_nextcloud_nextcloud svc_nextcloud_nextcloud --> wl_nextcloud_nextcloud
host_health_bstein_dev["health.bstein.dev"]
svc_health_wger["health/wger (Service)"]
host_health_bstein_dev --> svc_health_wger
wl_health_wger["health/wger (Deployment)"]
svc_health_wger --> wl_health_wger
host_kit_live_bstein_dev["kit.live.bstein.dev"] host_kit_live_bstein_dev["kit.live.bstein.dev"]
svc_comms_livekit_token_service["comms/livekit-token-service (Service)"] svc_comms_livekit_token_service["comms/livekit-token-service (Service)"]
host_kit_live_bstein_dev --> svc_comms_livekit_token_service host_kit_live_bstein_dev --> svc_comms_livekit_token_service
@ -50,6 +60,14 @@ flowchart LR
host_live_bstein_dev --> svc_comms_matrix_wellknown host_live_bstein_dev --> svc_comms_matrix_wellknown
svc_comms_othrys_synapse_matrix_synapse["comms/othrys-synapse-matrix-synapse (Service)"] svc_comms_othrys_synapse_matrix_synapse["comms/othrys-synapse-matrix-synapse (Service)"]
host_live_bstein_dev --> svc_comms_othrys_synapse_matrix_synapse host_live_bstein_dev --> svc_comms_othrys_synapse_matrix_synapse
svc_comms_matrix_guest_register["comms/matrix-guest-register (Service)"]
host_live_bstein_dev --> svc_comms_matrix_guest_register
wl_comms_matrix_guest_register["comms/matrix-guest-register (Deployment)"]
svc_comms_matrix_guest_register --> wl_comms_matrix_guest_register
svc_comms_matrix_authentication_service["comms/matrix-authentication-service (Service)"]
host_live_bstein_dev --> svc_comms_matrix_authentication_service
wl_comms_matrix_authentication_service["comms/matrix-authentication-service (Deployment)"]
svc_comms_matrix_authentication_service --> wl_comms_matrix_authentication_service
host_logs_bstein_dev["logs.bstein.dev"] host_logs_bstein_dev["logs.bstein.dev"]
svc_logging_oauth2_proxy_logs["logging/oauth2-proxy-logs (Service)"] svc_logging_oauth2_proxy_logs["logging/oauth2-proxy-logs (Service)"]
host_logs_bstein_dev --> svc_logging_oauth2_proxy_logs host_logs_bstein_dev --> svc_logging_oauth2_proxy_logs
@ -64,21 +82,20 @@ flowchart LR
svc_mailu_mailserver_mailu_front["mailu-mailserver/mailu-front (Service)"] svc_mailu_mailserver_mailu_front["mailu-mailserver/mailu-front (Service)"]
host_mail_bstein_dev --> svc_mailu_mailserver_mailu_front host_mail_bstein_dev --> svc_mailu_mailserver_mailu_front
host_matrix_live_bstein_dev["matrix.live.bstein.dev"] host_matrix_live_bstein_dev["matrix.live.bstein.dev"]
svc_comms_matrix_authentication_service["comms/matrix-authentication-service (Service)"]
host_matrix_live_bstein_dev --> svc_comms_matrix_authentication_service host_matrix_live_bstein_dev --> svc_comms_matrix_authentication_service
wl_comms_matrix_authentication_service["comms/matrix-authentication-service (Deployment)"]
svc_comms_matrix_authentication_service --> wl_comms_matrix_authentication_service
host_matrix_live_bstein_dev --> svc_comms_matrix_wellknown host_matrix_live_bstein_dev --> svc_comms_matrix_wellknown
host_matrix_live_bstein_dev --> svc_comms_othrys_synapse_matrix_synapse host_matrix_live_bstein_dev --> svc_comms_othrys_synapse_matrix_synapse
svc_comms_matrix_guest_register["comms/matrix-guest-register (Service)"]
host_matrix_live_bstein_dev --> svc_comms_matrix_guest_register host_matrix_live_bstein_dev --> svc_comms_matrix_guest_register
wl_comms_matrix_guest_register["comms/matrix-guest-register (Deployment)"]
svc_comms_matrix_guest_register --> wl_comms_matrix_guest_register
host_monero_bstein_dev["monero.bstein.dev"] host_monero_bstein_dev["monero.bstein.dev"]
svc_crypto_monerod["crypto/monerod (Service)"] svc_crypto_monerod["crypto/monerod (Service)"]
host_monero_bstein_dev --> svc_crypto_monerod host_monero_bstein_dev --> svc_crypto_monerod
wl_crypto_monerod["crypto/monerod (Deployment)"] wl_crypto_monerod["crypto/monerod (Deployment)"]
svc_crypto_monerod --> wl_crypto_monerod svc_crypto_monerod --> wl_crypto_monerod
host_money_bstein_dev["money.bstein.dev"]
svc_finance_firefly["finance/firefly (Service)"]
host_money_bstein_dev --> svc_finance_firefly
wl_finance_firefly["finance/firefly (Deployment)"]
svc_finance_firefly --> wl_finance_firefly
host_notes_bstein_dev["notes.bstein.dev"] host_notes_bstein_dev["notes.bstein.dev"]
svc_outline_outline["outline/outline (Service)"] svc_outline_outline["outline/outline (Service)"]
host_notes_bstein_dev --> svc_outline_outline host_notes_bstein_dev --> svc_outline_outline
@ -143,19 +160,29 @@ flowchart LR
svc_comms_livekit svc_comms_livekit
wl_comms_livekit wl_comms_livekit
svc_comms_othrys_synapse_matrix_synapse svc_comms_othrys_synapse_matrix_synapse
svc_comms_matrix_authentication_service
wl_comms_matrix_authentication_service
svc_comms_matrix_guest_register svc_comms_matrix_guest_register
wl_comms_matrix_guest_register wl_comms_matrix_guest_register
svc_comms_matrix_authentication_service
wl_comms_matrix_authentication_service
end end
subgraph crypto[crypto] subgraph crypto[crypto]
svc_crypto_monerod svc_crypto_monerod
wl_crypto_monerod wl_crypto_monerod
end end
subgraph finance[finance]
svc_finance_actual_budget
wl_finance_actual_budget
svc_finance_firefly
wl_finance_firefly
end
subgraph gitea[gitea] subgraph gitea[gitea]
svc_gitea_gitea svc_gitea_gitea
wl_gitea_gitea wl_gitea_gitea
end end
subgraph health[health]
svc_health_wger
wl_health_wger
end
subgraph jellyfin[jellyfin] subgraph jellyfin[jellyfin]
svc_jellyfin_pegasus svc_jellyfin_pegasus
wl_jellyfin_pegasus wl_jellyfin_pegasus

View File

@ -70,6 +70,7 @@ WORKER_NODES = [
"titan-13", "titan-13",
"titan-14", "titan-14",
"titan-15", "titan-15",
"titan-16",
"titan-17", "titan-17",
"titan-18", "titan-18",
"titan-19", "titan-19",
@ -207,7 +208,66 @@ def namespace_ram_raw(scope_var):
def namespace_gpu_usage_instant(scope_var): def namespace_gpu_usage_instant(scope_var):
return f"sum(DCGM_FI_DEV_GPU_UTIL{{{namespace_gpu_selector(scope_var)}}}) by (namespace)" return gpu_usage_by_namespace(scope_var)
def jetson_gpu_util_by_node():
return 'max by (node) (jetson_gr3d_freq_percent{node!=""})'
def dcgm_gpu_util_by_node():
dcgm_pod = 'label_replace(DCGM_FI_DEV_GPU_UTIL, "pod", "$1", "Hostname", "(.*)")'
dcgm_ns = 'label_replace(' + dcgm_pod + ', "namespace", "monitoring", "", "")'
return (
"avg by (node) ("
f"{dcgm_ns} * on(namespace,pod) group_left(node) "
'kube_pod_info{namespace="monitoring"}'
")"
)
def gpu_util_by_node():
return f"{dcgm_gpu_util_by_node()} or {jetson_gpu_util_by_node()}"
def gpu_util_by_hostname():
return 'label_replace(' + gpu_util_by_node() + ', "Hostname", "$1", "node", "(.*)")'
def gpu_node_labels():
return 'kube_node_labels{label_accelerator=~".+"} or kube_node_labels{label_jetson="true"}'
def gpu_requests_by_namespace_node(scope_var):
return (
"sum by (namespace,node) ("
f'kube_pod_container_resource_requests{{resource=~"nvidia.com/gpu.*",{scope_var}}} '
"* on(namespace,pod) group_left(node) kube_pod_info "
f"* on(node) group_left() ({gpu_node_labels()})"
")"
)
def gpu_usage_by_namespace(scope_var):
requests_by_ns = gpu_requests_by_namespace_node(scope_var)
total_by_node = f"sum by (node) ({requests_by_ns})"
return (
"sum by (namespace) ("
f"({requests_by_ns}) / clamp_min({total_by_node}, 1) "
f"* on(node) group_left() ({gpu_util_by_node()})"
")"
)
def jetson_gpu_usage_by_namespace(scope_var):
requests_by_ns = jetson_gpu_requests(scope_var)
total_by_node = f"sum by (node) ({requests_by_ns})"
return (
"sum by (namespace) ("
f"({requests_by_ns}) / clamp_min({total_by_node}, 1) "
f"* on(node) group_left() {jetson_gpu_util_by_node()}"
")"
)
def namespace_share_expr(resource_expr): def namespace_share_expr(resource_expr):
@ -227,7 +287,7 @@ def namespace_gpu_share_expr(scope_var):
usage = namespace_gpu_usage_instant(scope_var) usage = namespace_gpu_usage_instant(scope_var)
total = f"(sum({usage}) or on() vector(0))" total = f"(sum({usage}) or on() vector(0))"
share = f"100 * ({usage}) / clamp_min({total}, 1)" share = f"100 * ({usage}) / clamp_min({total}, 1)"
idle = 'label_replace(vector(100), "namespace", "idle", "", "") and on() (' + total + " == 0)" idle = 'label_replace(vector(100), "namespace", "idle", "", "") * scalar(' + total + " == bool 0)"
return f"({share}) or ({idle})" return f"({share}) or ({idle})"
@ -333,9 +393,60 @@ GLUE_STALE = f"({GLUE_LAST_SUCCESS_AGE} > bool {GLUE_STALE_WINDOW_SEC})"
GLUE_MISSING = f"({GLUE_JOBS} unless on(namespace,cronjob) kube_cronjob_status_last_successful_time)" GLUE_MISSING = f"({GLUE_JOBS} unless on(namespace,cronjob) kube_cronjob_status_last_successful_time)"
GLUE_STALE_ACTIVE = f"({GLUE_STALE} unless on(namespace,cronjob) {GLUE_SUSPENDED})" GLUE_STALE_ACTIVE = f"({GLUE_STALE} unless on(namespace,cronjob) {GLUE_SUSPENDED})"
GLUE_MISSING_ACTIVE = f"({GLUE_MISSING} unless on(namespace,cronjob) {GLUE_SUSPENDED})" GLUE_MISSING_ACTIVE = f"({GLUE_MISSING} unless on(namespace,cronjob) {GLUE_SUSPENDED})"
GLUE_STALE_COUNT = f"(sum({GLUE_STALE_ACTIVE}) + count({GLUE_MISSING_ACTIVE}))" GLUE_STALE_COUNT = f"(sum({GLUE_STALE_ACTIVE}) + count({GLUE_MISSING_ACTIVE})) or on() vector(0)"
GLUE_MISSING_COUNT = f"count({GLUE_MISSING_ACTIVE})" GLUE_MISSING_COUNT = f"count({GLUE_MISSING_ACTIVE}) or on() vector(0)"
GLUE_SUSPENDED_COUNT = f"sum({GLUE_SUSPENDED})" GLUE_SUSPENDED_COUNT = f"sum({GLUE_SUSPENDED}) or on() vector(0)"
ARIADNE_TASK_ERRORS_RANGE = 'sum by (task) (increase(ariadne_task_runs_total{status="error"}[$__range]))'
ARIADNE_TASK_ERRORS_24H = 'sum by (task) (increase(ariadne_task_runs_total{status="error"}[24h]))'
ARIADNE_TASK_ERRORS_1H = 'sum by (task) (increase(ariadne_task_runs_total{status="error"}[1h]))'
ARIADNE_TASK_ERRORS_30D = 'sum by (task) (increase(ariadne_task_runs_total{status="error"}[30d]))'
ARIADNE_TASK_SUCCESS_24H = 'sum by (task) (increase(ariadne_task_runs_total{status="ok"}[24h]))'
ARIADNE_TASK_RUNS_BY_STATUS_1H = 'sum by (status) (increase(ariadne_task_runs_total[1h]))'
ARIADNE_TASK_ERRORS_1H_TOTAL = 'sum(increase(ariadne_task_runs_total{status="error"}[1h]))'
ARIADNE_TASK_ERRORS_24H_TOTAL = 'sum(increase(ariadne_task_runs_total{status="error"}[24h]))'
ARIADNE_TASK_RUNS_1H_TOTAL = 'sum(increase(ariadne_task_runs_total[1h]))'
ARIADNE_TASK_ATTEMPTS_SERIES = 'sum(increase(ariadne_task_runs_total[$__interval]))'
ARIADNE_TASK_FAILURES_SERIES = 'sum(increase(ariadne_task_runs_total{status="error"}[$__interval]))'
ARIADNE_TASK_WARNINGS_SERIES = (
'sum(increase(ariadne_task_runs_total{status!~"ok|error"}[$__interval])) or on() vector(0)'
)
ARIADNE_SCHEDULE_LAST_SUCCESS_HOURS = "(time() - ariadne_schedule_last_success_timestamp_seconds) / 3600"
ARIADNE_SCHEDULE_LAST_ERROR_HOURS = "(time() - ariadne_schedule_last_error_timestamp_seconds) / 3600"
ARIADNE_SCHEDULE_LAST_SUCCESS_RANGE_HOURS = (
"(time() - max_over_time(ariadne_schedule_last_success_timestamp_seconds[$__range])) / 3600"
)
ARIADNE_SCHEDULE_LAST_ERROR_RANGE_HOURS = (
"(time() - max_over_time(ariadne_schedule_last_error_timestamp_seconds[$__range])) / 3600"
)
ARIADNE_ACCESS_REQUESTS = "ariadne_access_requests_total"
ARIADNE_CI_COVERAGE = 'ariadne_ci_coverage_percent{repo="ariadne"}'
ARIADNE_CI_TESTS = 'ariadne_ci_tests_total{repo="ariadne"}'
ARIADNE_TEST_SUCCESS_RATE = (
"100 * "
'sum(max_over_time(ariadne_ci_tests_total{repo="ariadne",result="passed"}[30d])) '
"/ clamp_min("
'sum(max_over_time(ariadne_ci_tests_total{repo="ariadne",result=~"passed|failed|error"}[30d])), 1)'
)
ARIADNE_TEST_FAILURES_24H = (
'sum by (result) (max_over_time(ariadne_ci_tests_total{repo="ariadne",result=~"failed|error"}[24h]))'
)
POSTGRES_CONN_USED = (
'label_replace(sum(pg_stat_activity_count), "conn", "used", "__name__", ".*") '
'or label_replace(max(pg_settings_max_connections), "conn", "max", "__name__", ".*")'
)
POSTGRES_CONN_HOTTEST = 'topk(1, sum by (datname) (pg_stat_activity_count))'
ONEOFF_JOB_OWNER = (
'label_replace(kube_job_owner{owner_kind="CronJob"}, "owner_name", "$1", "job_name", "(.*)")'
)
ONEOFF_JOB_PODS = f'(kube_pod_owner{{owner_kind="Job"}} unless on(namespace, owner_name) {ONEOFF_JOB_OWNER})'
ONEOFF_JOB_POD_AGE_HOURS = (
'((time() - kube_pod_start_time{pod!=""}) / 3600) '
f'* on(namespace,pod) group_left(owner_name) {ONEOFF_JOB_PODS} '
'* on(namespace,pod) group_left(phase) '
'max by (namespace,pod,phase) (kube_pod_status_phase{phase=~"Running|Succeeded"})'
)
GLUE_LAST_SUCCESS_RANGE_HOURS = f"(time() - max_over_time({GLUE_LAST_SUCCESS}[$__range])) / 3600"
GLUE_LAST_SCHEDULE_RANGE_HOURS = f"(time() - max_over_time({GLUE_LAST_SCHEDULE}[$__range])) / 3600"
GPU_NODES = ["titan-20", "titan-21", "titan-22", "titan-24"] GPU_NODES = ["titan-20", "titan-21", "titan-22", "titan-24"]
GPU_NODE_REGEX = "|".join(GPU_NODES) GPU_NODE_REGEX = "|".join(GPU_NODES)
TRAEFIK_ROUTER_EXPR = "sum by (router) (rate(traefik_router_requests_total[5m]))" TRAEFIK_ROUTER_EXPR = "sum by (router) (rate(traefik_router_requests_total[5m]))"
@ -513,6 +624,7 @@ def timeseries_panel(
grid, grid,
*, *,
unit="none", unit="none",
max_value=None,
legend=None, legend=None,
legend_display="table", legend_display="table",
legend_placement="bottom", legend_placement="bottom",
@ -537,6 +649,8 @@ def timeseries_panel(
"tooltip": {"mode": "multi"}, "tooltip": {"mode": "multi"},
}, },
} }
if max_value is not None:
panel["fieldConfig"]["defaults"]["max"] = max_value
if legend: if legend:
panel["targets"][0]["legendFormat"] = legend panel["targets"][0]["legendFormat"] = legend
if legend_calcs: if legend_calcs:
@ -688,13 +802,22 @@ def bargauge_panel(
grid, grid,
*, *,
unit="none", unit="none",
legend=None,
links=None, links=None,
limit=None, limit=None,
sort_order="desc",
thresholds=None, thresholds=None,
decimals=None, decimals=None,
instant=False, instant=False,
overrides=None,
): ):
"""Return a bar gauge panel with label-aware reduction.""" """Return a bar gauge panel with label-aware reduction."""
cleaned_expr = expr.strip()
if not cleaned_expr.startswith(("sort(", "sort_desc(")):
if sort_order == "desc":
expr = f"sort_desc({expr})"
elif sort_order == "asc":
expr = f"sort({expr})"
panel = { panel = {
"id": panel_id, "id": panel_id,
"type": "bargauge", "type": "bargauge",
@ -702,7 +825,12 @@ def bargauge_panel(
"datasource": PROM_DS, "datasource": PROM_DS,
"gridPos": grid, "gridPos": grid,
"targets": [ "targets": [
{"expr": expr, "refId": "A", "legendFormat": "{{node}}", **({"instant": True} if instant else {})} {
"expr": expr,
"refId": "A",
"legendFormat": legend or "{{node}}",
**({"instant": True} if instant else {}),
}
], ],
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
@ -732,6 +860,8 @@ def bargauge_panel(
}, },
}, },
} }
if overrides:
panel["fieldConfig"]["overrides"].extend(overrides)
if decimals is not None: if decimals is not None:
panel["fieldConfig"]["defaults"]["decimals"] = decimals panel["fieldConfig"]["defaults"]["decimals"] = decimals
if links: if links:
@ -740,7 +870,7 @@ def bargauge_panel(
panel["transformations"] = [ panel["transformations"] = [
{ {
"id": "sortBy", "id": "sortBy",
"options": {"fields": ["Value"], "order": "desc"}, "options": {"fields": ["Value"], "order": sort_order},
} }
] ]
if limit: if limit:
@ -780,6 +910,15 @@ def build_overview():
{"color": "red", "value": 3}, {"color": "red", "value": 3},
], ],
} }
age_thresholds = {
"mode": "absolute",
"steps": [
{"color": "green", "value": None},
{"color": "yellow", "value": 6},
{"color": "orange", "value": 24},
{"color": "red", "value": 48},
],
}
row1_stats = [ row1_stats = [
{ {
@ -982,7 +1121,7 @@ def build_overview():
30, 30,
"Mail Sent (1d)", "Mail Sent (1d)",
'max(postmark_outbound_sent{window="1d"})', 'max(postmark_outbound_sent{window="1d"})',
{"h": 2, "w": 5, "x": 0, "y": 8}, {"h": 3, "w": 4, "x": 0, "y": 8},
unit="none", unit="none",
links=link_to("atlas-mail"), links=link_to("atlas-mail"),
) )
@ -993,7 +1132,7 @@ def build_overview():
"type": "stat", "type": "stat",
"title": "Mail Bounces (1d)", "title": "Mail Bounces (1d)",
"datasource": PROM_DS, "datasource": PROM_DS,
"gridPos": {"h": 2, "w": 5, "x": 10, "y": 8}, "gridPos": {"h": 3, "w": 4, "x": 8, "y": 8},
"targets": [ "targets": [
{ {
"expr": 'max(postmark_outbound_bounce_rate{window="1d"})', "expr": 'max(postmark_outbound_bounce_rate{window="1d"})',
@ -1039,7 +1178,7 @@ def build_overview():
32, 32,
"Mail Success Rate (1d)", "Mail Success Rate (1d)",
'clamp_min(100 - max(postmark_outbound_bounce_rate{window="1d"}), 0)', 'clamp_min(100 - max(postmark_outbound_bounce_rate{window="1d"}), 0)',
{"h": 2, "w": 5, "x": 5, "y": 8}, {"h": 3, "w": 4, "x": 4, "y": 8},
unit="percent", unit="percent",
thresholds=mail_success_thresholds, thresholds=mail_success_thresholds,
decimals=1, decimals=1,
@ -1051,13 +1190,38 @@ def build_overview():
33, 33,
"Mail Limit Used (30d)", "Mail Limit Used (30d)",
"max(postmark_sending_limit_used_percent)", "max(postmark_sending_limit_used_percent)",
{"h": 2, "w": 5, "x": 15, "y": 8}, {"h": 3, "w": 4, "x": 12, "y": 8},
unit="percent", unit="percent",
thresholds=mail_limit_thresholds, thresholds=mail_limit_thresholds,
decimals=1, decimals=1,
links=link_to("atlas-mail"), links=link_to("atlas-mail"),
) )
) )
panels.append(
stat_panel(
34,
"Postgres Connections Used",
POSTGRES_CONN_USED,
{"h": 3, "w": 4, "x": 16, "y": 8},
decimals=0,
text_mode="name_and_value",
legend="{{conn}}",
instant=True,
)
)
panels.append(
stat_panel(
35,
"Postgres Hottest Connections",
POSTGRES_CONN_HOTTEST,
{"h": 3, "w": 4, "x": 20, "y": 8},
unit="none",
decimals=0,
text_mode="name_and_value",
legend="{{datname}}",
instant=True,
)
)
storage_panels = [ storage_panels = [
(23, "Astreae Usage", astreae_usage_expr("/mnt/astreae"), "percent"), (23, "Astreae Usage", astreae_usage_expr("/mnt/astreae"), "percent"),
@ -1071,13 +1235,104 @@ def build_overview():
panel_id, panel_id,
title, title,
expr, expr,
{"h": 6, "w": 6, "x": 6 * idx, "y": 10}, {"h": 3, "w": 6, "x": 6 * idx, "y": 11},
unit=unit, unit=unit,
thresholds=PERCENT_THRESHOLDS if unit == "percent" else None, thresholds=PERCENT_THRESHOLDS if unit == "percent" else None,
links=link_to("atlas-storage"), links=link_to("atlas-storage"),
) )
) )
panels.append(
bargauge_panel(
40,
"One-off Job Pods (age hours)",
ONEOFF_JOB_POD_AGE_HOURS,
{"h": 6, "w": 6, "x": 0, "y": 14},
unit="h",
instant=True,
legend="{{namespace}}/{{pod}}",
thresholds=age_thresholds,
limit=8,
decimals=2,
)
)
panels.append(
{
"id": 41,
"type": "timeseries",
"title": "Ariadne Attempts / Failures",
"datasource": PROM_DS,
"gridPos": {"h": 6, "w": 6, "x": 6, "y": 14},
"targets": [
{"expr": ARIADNE_TASK_ATTEMPTS_SERIES, "refId": "A", "legendFormat": "Attempts"},
{"expr": ARIADNE_TASK_FAILURES_SERIES, "refId": "B", "legendFormat": "Failures"},
],
"fieldConfig": {
"defaults": {"unit": "none"},
"overrides": [
{
"matcher": {"id": "byName", "options": "Attempts"},
"properties": [
{"id": "color", "value": {"mode": "fixed", "fixedColor": "green"}}
],
},
{
"matcher": {"id": "byName", "options": "Failures"},
"properties": [
{"id": "color", "value": {"mode": "fixed", "fixedColor": "red"}}
],
},
],
},
"options": {
"legend": {"displayMode": "table", "placement": "right"},
"tooltip": {"mode": "multi"},
},
}
)
panels.append(
timeseries_panel(
42,
"Ariadne Test Success Rate",
ARIADNE_TEST_SUCCESS_RATE,
{"h": 6, "w": 6, "x": 12, "y": 14},
unit="percent",
max_value=100,
legend=None,
legend_display="list",
)
)
panels.append(
bargauge_panel(
43,
"Tests with Failures (24h)",
ARIADNE_TEST_FAILURES_24H,
{"h": 6, "w": 6, "x": 18, "y": 14},
unit="none",
instant=True,
legend="{{result}}",
overrides=[
{
"matcher": {"id": "byName", "options": "error"},
"properties": [{"id": "color", "value": {"mode": "fixed", "fixedColor": "yellow"}}],
},
{
"matcher": {"id": "byName", "options": "failed"},
"properties": [{"id": "color", "value": {"mode": "fixed", "fixedColor": "red"}}],
},
],
thresholds={
"mode": "absolute",
"steps": [
{"color": "green", "value": None},
{"color": "yellow", "value": 1},
{"color": "orange", "value": 5},
{"color": "red", "value": 10},
],
},
)
)
cpu_scope = "$namespace_scope_cpu" cpu_scope = "$namespace_scope_cpu"
gpu_scope = "$namespace_scope_gpu" gpu_scope = "$namespace_scope_gpu"
ram_scope = "$namespace_scope_ram" ram_scope = "$namespace_scope_ram"
@ -1087,7 +1342,7 @@ def build_overview():
11, 11,
"Namespace CPU Share", "Namespace CPU Share",
namespace_cpu_share_expr(cpu_scope), namespace_cpu_share_expr(cpu_scope),
{"h": 9, "w": 8, "x": 0, "y": 16}, {"h": 9, "w": 8, "x": 0, "y": 20},
links=namespace_scope_links("namespace_scope_cpu"), links=namespace_scope_links("namespace_scope_cpu"),
description="Shares are normalized within the selected filter. Switching scope changes the denominator.", description="Shares are normalized within the selected filter. Switching scope changes the denominator.",
) )
@ -1097,7 +1352,7 @@ def build_overview():
12, 12,
"Namespace GPU Share", "Namespace GPU Share",
namespace_gpu_share_expr(gpu_scope), namespace_gpu_share_expr(gpu_scope),
{"h": 9, "w": 8, "x": 8, "y": 16}, {"h": 9, "w": 8, "x": 8, "y": 20},
links=namespace_scope_links("namespace_scope_gpu"), links=namespace_scope_links("namespace_scope_gpu"),
description="Shares are normalized within the selected filter. Switching scope changes the denominator.", description="Shares are normalized within the selected filter. Switching scope changes the denominator.",
) )
@ -1107,7 +1362,7 @@ def build_overview():
13, 13,
"Namespace RAM Share", "Namespace RAM Share",
namespace_ram_share_expr(ram_scope), namespace_ram_share_expr(ram_scope),
{"h": 9, "w": 8, "x": 16, "y": 16}, {"h": 9, "w": 8, "x": 16, "y": 20},
links=namespace_scope_links("namespace_scope_ram"), links=namespace_scope_links("namespace_scope_ram"),
description="Shares are normalized within the selected filter. Switching scope changes the denominator.", description="Shares are normalized within the selected filter. Switching scope changes the denominator.",
) )
@ -1119,7 +1374,7 @@ def build_overview():
14, 14,
"Worker Node CPU", "Worker Node CPU",
node_cpu_expr(worker_filter), node_cpu_expr(worker_filter),
{"h": 12, "w": 12, "x": 0, "y": 32}, {"h": 12, "w": 12, "x": 0, "y": 36},
unit="percent", unit="percent",
legend="{{node}}", legend="{{node}}",
legend_calcs=["last"], legend_calcs=["last"],
@ -1133,7 +1388,7 @@ def build_overview():
15, 15,
"Worker Node RAM", "Worker Node RAM",
node_mem_expr(worker_filter), node_mem_expr(worker_filter),
{"h": 12, "w": 12, "x": 12, "y": 32}, {"h": 12, "w": 12, "x": 12, "y": 36},
unit="percent", unit="percent",
legend="{{node}}", legend="{{node}}",
legend_calcs=["last"], legend_calcs=["last"],
@ -1148,7 +1403,7 @@ def build_overview():
16, 16,
"Control plane CPU", "Control plane CPU",
node_cpu_expr(CONTROL_ALL_REGEX), node_cpu_expr(CONTROL_ALL_REGEX),
{"h": 10, "w": 12, "x": 0, "y": 44}, {"h": 10, "w": 12, "x": 0, "y": 48},
unit="percent", unit="percent",
legend="{{node}}", legend="{{node}}",
legend_display="table", legend_display="table",
@ -1160,7 +1415,7 @@ def build_overview():
17, 17,
"Control plane RAM", "Control plane RAM",
node_mem_expr(CONTROL_ALL_REGEX), node_mem_expr(CONTROL_ALL_REGEX),
{"h": 10, "w": 12, "x": 12, "y": 44}, {"h": 10, "w": 12, "x": 12, "y": 48},
unit="percent", unit="percent",
legend="{{node}}", legend="{{node}}",
legend_display="table", legend_display="table",
@ -1173,7 +1428,7 @@ def build_overview():
28, 28,
"Node Pod Share", "Node Pod Share",
'(sum(kube_pod_info{pod!="" , node!=""}) by (node) / clamp_min(sum(kube_pod_info{pod!="" , node!=""}), 1)) * 100', '(sum(kube_pod_info{pod!="" , node!=""}) by (node) / clamp_min(sum(kube_pod_info{pod!="" , node!=""}), 1)) * 100',
{"h": 10, "w": 12, "x": 0, "y": 54}, {"h": 10, "w": 12, "x": 0, "y": 58},
) )
) )
panels.append( panels.append(
@ -1181,7 +1436,7 @@ def build_overview():
29, 29,
"Top Nodes by Pod Count", "Top Nodes by Pod Count",
'topk(12, sum(kube_pod_info{pod!="" , node!=""}) by (node))', 'topk(12, sum(kube_pod_info{pod!="" , node!=""}) by (node))',
{"h": 10, "w": 12, "x": 12, "y": 54}, {"h": 10, "w": 12, "x": 12, "y": 58},
unit="none", unit="none",
limit=12, limit=12,
decimals=0, decimals=0,
@ -1203,7 +1458,7 @@ def build_overview():
18, 18,
"Cluster Ingress Throughput", "Cluster Ingress Throughput",
NET_INGRESS_EXPR, NET_INGRESS_EXPR,
{"h": 7, "w": 8, "x": 0, "y": 25}, {"h": 7, "w": 8, "x": 0, "y": 29},
unit="Bps", unit="Bps",
legend="Ingress (Traefik)", legend="Ingress (Traefik)",
legend_display="list", legend_display="list",
@ -1216,7 +1471,7 @@ def build_overview():
19, 19,
"Cluster Egress Throughput", "Cluster Egress Throughput",
NET_EGRESS_EXPR, NET_EGRESS_EXPR,
{"h": 7, "w": 8, "x": 8, "y": 25}, {"h": 7, "w": 8, "x": 8, "y": 29},
unit="Bps", unit="Bps",
legend="Egress (Traefik)", legend="Egress (Traefik)",
legend_display="list", legend_display="list",
@ -1229,7 +1484,7 @@ def build_overview():
20, 20,
"Intra-Cluster Throughput", "Intra-Cluster Throughput",
NET_INTERNAL_EXPR, NET_INTERNAL_EXPR,
{"h": 7, "w": 8, "x": 16, "y": 25}, {"h": 7, "w": 8, "x": 16, "y": 29},
unit="Bps", unit="Bps",
legend="Internal traffic", legend="Internal traffic",
legend_display="list", legend_display="list",
@ -1243,7 +1498,7 @@ def build_overview():
21, 21,
"Root Filesystem Usage", "Root Filesystem Usage",
root_usage_expr(), root_usage_expr(),
{"h": 16, "w": 12, "x": 0, "y": 64}, {"h": 16, "w": 12, "x": 0, "y": 68},
unit="percent", unit="percent",
legend="{{node}}", legend="{{node}}",
legend_calcs=["last"], legend_calcs=["last"],
@ -1258,7 +1513,7 @@ def build_overview():
22, 22,
"Nodes Closest to Full Root Disks", "Nodes Closest to Full Root Disks",
f"topk(12, {root_usage_expr()})", f"topk(12, {root_usage_expr()})",
{"h": 16, "w": 12, "x": 12, "y": 64}, {"h": 16, "w": 12, "x": 12, "y": 68},
unit="percent", unit="percent",
thresholds=PERCENT_THRESHOLDS, thresholds=PERCENT_THRESHOLDS,
links=link_to("atlas-storage"), links=link_to("atlas-storage"),
@ -2153,16 +2408,103 @@ def build_mail_dashboard():
} }
def build_testing_dashboard(): def build_jobs_dashboard():
panels = [] panels = []
sort_desc = [{"id": "labelsToFields", "options": {}}, {"id": "sortBy", "options": {"fields": ["Value"], "order": "desc"}}] age_thresholds = {
"mode": "absolute",
"steps": [
{"color": "green", "value": None},
{"color": "yellow", "value": 6},
{"color": "orange", "value": 24},
{"color": "red", "value": 48},
],
}
recent_error_thresholds = {
"mode": "absolute",
"steps": [
{"color": "red", "value": None},
{"color": "orange", "value": 1},
{"color": "yellow", "value": 6},
{"color": "green", "value": 24},
],
}
task_error_thresholds = {
"mode": "absolute",
"steps": [
{"color": "green", "value": None},
{"color": "yellow", "value": 1},
{"color": "orange", "value": 3},
{"color": "red", "value": 5},
],
}
panels.append( panels.append(
stat_panel( bargauge_panel(
1, 1,
"Ariadne Task Errors (range)",
ARIADNE_TASK_ERRORS_RANGE,
{"h": 7, "w": 8, "x": 0, "y": 0},
unit="none",
instant=True,
legend="{{task}}",
thresholds=task_error_thresholds,
)
)
panels.append(
{
"id": 2,
"type": "timeseries",
"title": "Ariadne Attempts / Failures",
"datasource": PROM_DS,
"gridPos": {"h": 7, "w": 8, "x": 8, "y": 0},
"targets": [
{"expr": ARIADNE_TASK_ATTEMPTS_SERIES, "refId": "A", "legendFormat": "Attempts"},
{"expr": ARIADNE_TASK_FAILURES_SERIES, "refId": "B", "legendFormat": "Failures"},
],
"fieldConfig": {
"defaults": {"unit": "none"},
"overrides": [
{
"matcher": {"id": "byName", "options": "Attempts"},
"properties": [
{"id": "color", "value": {"mode": "fixed", "fixedColor": "green"}}
],
},
{
"matcher": {"id": "byName", "options": "Failures"},
"properties": [
{"id": "color", "value": {"mode": "fixed", "fixedColor": "red"}}
],
},
],
},
"options": {
"legend": {"displayMode": "table", "placement": "right"},
"tooltip": {"mode": "multi"},
},
}
)
panels.append(
bargauge_panel(
3,
"One-off Job Pods (age hours)",
ONEOFF_JOB_POD_AGE_HOURS,
{"h": 7, "w": 8, "x": 16, "y": 0},
unit="h",
instant=True,
legend="{{namespace}}/{{pod}}",
thresholds=age_thresholds,
limit=12,
decimals=2,
)
)
panels.append(
stat_panel(
4,
"Glue Jobs Stale (>36h)", "Glue Jobs Stale (>36h)",
GLUE_STALE_COUNT, GLUE_STALE_COUNT,
{"h": 4, "w": 6, "x": 0, "y": 0}, {"h": 4, "w": 4, "x": 0, "y": 7},
unit="none", unit="none",
thresholds={ thresholds={
"mode": "absolute", "mode": "absolute",
@ -2176,64 +2518,164 @@ def build_testing_dashboard():
) )
) )
panels.append( panels.append(
table_panel( stat_panel(
2,
"Glue Jobs Missing Success",
GLUE_MISSING_ACTIVE,
{"h": 4, "w": 6, "x": 6, "y": 0},
unit="none",
transformations=sort_desc,
instant=True,
)
)
panels.append(
table_panel(
3,
"Glue Jobs Suspended",
GLUE_SUSPENDED,
{"h": 4, "w": 6, "x": 12, "y": 0},
unit="none",
transformations=sort_desc,
instant=True,
)
)
panels.append(
table_panel(
4,
"Glue Jobs Active Runs",
GLUE_ACTIVE,
{"h": 4, "w": 6, "x": 18, "y": 0},
unit="none",
transformations=sort_desc,
instant=True,
)
)
panels.append(
table_panel(
5, 5,
"Glue Jobs Last Success (hours ago)", "Glue Jobs Missing Success",
GLUE_LAST_SUCCESS_AGE_HOURS, GLUE_MISSING_COUNT,
{"h": 8, "w": 12, "x": 0, "y": 4}, {"h": 4, "w": 4, "x": 4, "y": 7},
unit="none",
)
)
panels.append(
stat_panel(
6,
"Glue Jobs Suspended",
GLUE_SUSPENDED_COUNT,
{"h": 4, "w": 4, "x": 8, "y": 7},
unit="none",
)
)
panels.append(
stat_panel(
7,
"Ariadne Task Errors (1h)",
ARIADNE_TASK_ERRORS_1H_TOTAL,
{"h": 4, "w": 4, "x": 12, "y": 7},
unit="none",
)
)
panels.append(
stat_panel(
8,
"Ariadne Task Errors (24h)",
ARIADNE_TASK_ERRORS_24H_TOTAL,
{"h": 4, "w": 4, "x": 16, "y": 7},
unit="none",
)
)
panels.append(
stat_panel(
9,
"Ariadne Task Runs (1h)",
ARIADNE_TASK_RUNS_1H_TOTAL,
{"h": 4, "w": 4, "x": 20, "y": 7},
unit="none",
)
)
panels.append(
bargauge_panel(
10,
"Ariadne Schedule Last Error (hours ago)",
ARIADNE_SCHEDULE_LAST_ERROR_RANGE_HOURS,
{"h": 6, "w": 12, "x": 0, "y": 17},
unit="h", unit="h",
transformations=sort_desc,
instant=True, instant=True,
legend="{{task}}",
thresholds=recent_error_thresholds,
decimals=2,
)
)
panels.append(
bargauge_panel(
11,
"Ariadne Schedule Last Success (hours ago)",
ARIADNE_SCHEDULE_LAST_SUCCESS_RANGE_HOURS,
{"h": 6, "w": 12, "x": 12, "y": 17},
unit="h",
instant=True,
legend="{{task}}",
thresholds=age_thresholds,
decimals=2,
)
)
panels.append(
bargauge_panel(
12,
"Glue Jobs Last Success (hours ago)",
GLUE_LAST_SUCCESS_RANGE_HOURS,
{"h": 6, "w": 12, "x": 0, "y": 23},
unit="h",
instant=True,
legend="{{namespace}}/{{cronjob}}",
thresholds=age_thresholds,
decimals=2,
)
)
panels.append(
bargauge_panel(
13,
"Glue Jobs Last Schedule (hours ago)",
GLUE_LAST_SCHEDULE_RANGE_HOURS,
{"h": 6, "w": 12, "x": 12, "y": 23},
unit="h",
instant=True,
legend="{{namespace}}/{{cronjob}}",
thresholds=age_thresholds,
decimals=2,
)
)
panels.append(
bargauge_panel(
14,
"Ariadne Task Errors (1h)",
ARIADNE_TASK_ERRORS_1H,
{"h": 6, "w": 12, "x": 0, "y": 29},
unit="none",
instant=True,
legend="{{task}}",
thresholds=task_error_thresholds,
)
)
panels.append(
bargauge_panel(
15,
"Ariadne Task Errors (30d)",
ARIADNE_TASK_ERRORS_30D,
{"h": 6, "w": 12, "x": 12, "y": 29},
unit="none",
instant=True,
legend="{{task}}",
thresholds=task_error_thresholds,
)
)
panels.append(
bargauge_panel(
16,
"Ariadne Access Requests",
ARIADNE_ACCESS_REQUESTS,
{"h": 6, "w": 8, "x": 0, "y": 11},
unit="none",
instant=True,
legend="{{status}}",
)
)
panels.append(
stat_panel(
17,
"Ariadne CI Coverage (%)",
ARIADNE_CI_COVERAGE,
{"h": 6, "w": 4, "x": 8, "y": 11},
unit="percent",
decimals=1,
instant=True,
legend="{{branch}}",
) )
) )
panels.append( panels.append(
table_panel( table_panel(
6, 18,
"Glue Jobs Last Schedule (hours ago)", "Ariadne CI Tests (latest)",
GLUE_LAST_SCHEDULE_AGE_HOURS, ARIADNE_CI_TESTS,
{"h": 8, "w": 12, "x": 12, "y": 4}, {"h": 6, "w": 12, "x": 12, "y": 11},
unit="h", unit="none",
transformations=sort_desc, transformations=[{"id": "labelsToFields", "options": {}}, {"id": "sortBy", "options": {"fields": ["Value"], "order": "desc"}}],
instant=True, instant=True,
) )
) )
return { return {
"uid": "atlas-testing", "uid": "atlas-jobs",
"title": "Atlas Testing", "title": "Atlas Jobs",
"folderUid": PRIVATE_FOLDER, "folderUid": PRIVATE_FOLDER,
"editable": True, "editable": True,
"panels": panels, "panels": panels,
@ -2241,7 +2683,7 @@ def build_testing_dashboard():
"annotations": {"list": []}, "annotations": {"list": []},
"schemaVersion": 39, "schemaVersion": 39,
"style": "dark", "style": "dark",
"tags": ["atlas", "testing"], "tags": ["atlas", "jobs", "glue"],
} }
@ -2274,7 +2716,7 @@ def build_gpu_dashboard():
timeseries_panel( timeseries_panel(
3, 3,
"GPU Util by Node", "GPU Util by Node",
'sum by (Hostname) (DCGM_FI_DEV_GPU_UTIL{pod!=""})', gpu_util_by_hostname(),
{"h": 8, "w": 12, "x": 0, "y": 8}, {"h": 8, "w": 12, "x": 0, "y": 8},
unit="percent", unit="percent",
legend="{{Hostname}}", legend="{{Hostname}}",
@ -2338,9 +2780,9 @@ DASHBOARDS = {
"builder": build_mail_dashboard, "builder": build_mail_dashboard,
"configmap": ROOT / "services" / "monitoring" / "grafana-dashboard-mail.yaml", "configmap": ROOT / "services" / "monitoring" / "grafana-dashboard-mail.yaml",
}, },
"atlas-testing": { "atlas-jobs": {
"builder": build_testing_dashboard, "builder": build_jobs_dashboard,
"configmap": ROOT / "services" / "monitoring" / "grafana-dashboard-testing.yaml", "configmap": ROOT / "services" / "monitoring" / "grafana-dashboard-jobs.yaml",
}, },
"atlas-gpu": { "atlas-gpu": {
"builder": build_gpu_dashboard, "builder": build_gpu_dashboard,

View File

@ -20,11 +20,13 @@ import subprocess
import sys import sys
from dataclasses import dataclass from dataclasses import dataclass
from pathlib import Path from pathlib import Path
import shutil
from typing import Any, Iterable from typing import Any, Iterable
import yaml import yaml
REPO_ROOT = Path(__file__).resolve().parents[1] REPO_ROOT = Path(__file__).resolve().parents[1]
DASHBOARD_DIR = REPO_ROOT / "services" / "monitoring" / "dashboards"
CLUSTER_SCOPED_KINDS = { CLUSTER_SCOPED_KINDS = {
"Namespace", "Namespace",
@ -60,6 +62,70 @@ def _run(cmd: list[str], *, cwd: Path) -> str:
return res.stdout return res.stdout
def _sync_tree(source: Path, dest: Path) -> None:
if dest.exists():
shutil.rmtree(dest)
shutil.copytree(source, dest)
def _iter_dashboard_panels(dashboard: dict[str, Any]) -> Iterable[dict[str, Any]]:
panels = dashboard.get("panels") if isinstance(dashboard.get("panels"), list) else []
for panel in panels:
if not isinstance(panel, dict):
continue
if panel.get("type") == "row" and isinstance(panel.get("panels"), list):
yield from _iter_dashboard_panels({"panels": panel.get("panels")})
continue
yield panel
def _extract_metrics_index(dashboard_dir: Path) -> list[dict[str, Any]]:
index: list[dict[str, Any]] = []
for path in sorted(dashboard_dir.glob("*.json")):
try:
data = json.loads(path.read_text(encoding="utf-8"))
except json.JSONDecodeError:
continue
if not isinstance(data, dict):
continue
dash_title = data.get("title") or path.stem
dash_tags = data.get("tags") or []
for panel in _iter_dashboard_panels(data):
targets = panel.get("targets")
if not isinstance(targets, list):
continue
exprs: list[str] = []
for target in targets:
if not isinstance(target, dict):
continue
expr = target.get("expr")
if isinstance(expr, str) and expr.strip():
exprs.append(expr.strip())
if not exprs:
continue
datasource = panel.get("datasource") or {}
if isinstance(datasource, dict):
ds_uid = datasource.get("uid")
ds_type = datasource.get("type")
else:
ds_uid = None
ds_type = None
index.append(
{
"dashboard": dash_title,
"panel_title": panel.get("title") or "",
"panel_id": panel.get("id"),
"panel_type": panel.get("type"),
"description": panel.get("description") or "",
"tags": dash_tags,
"datasource_uid": ds_uid,
"datasource_type": ds_type,
"exprs": exprs,
}
)
return index
def kustomize_build(path: Path) -> str: def kustomize_build(path: Path) -> str:
rel = path.relative_to(REPO_ROOT) rel = path.relative_to(REPO_ROOT)
try: try:
@ -472,6 +538,11 @@ def main() -> int:
action="store_true", action="store_true",
help="Write generated files (otherwise just print a summary).", help="Write generated files (otherwise just print a summary).",
) )
ap.add_argument(
"--sync-comms",
action="store_true",
help="Mirror rendered knowledge into services/comms/knowledge for atlasbot.",
)
args = ap.parse_args() args = ap.parse_args()
out_dir = REPO_ROOT / args.out out_dir = REPO_ROOT / args.out
@ -504,6 +575,7 @@ def main() -> int:
summary_path = out_dir / "catalog" / "atlas-summary.json" summary_path = out_dir / "catalog" / "atlas-summary.json"
diagram_path = out_dir / "diagrams" / "atlas-http.mmd" diagram_path = out_dir / "diagrams" / "atlas-http.mmd"
runbooks_json_path = out_dir / "catalog" / "runbooks.json" runbooks_json_path = out_dir / "catalog" / "runbooks.json"
metrics_json_path = out_dir / "catalog" / "metrics.json"
catalog_rel = catalog_path.relative_to(REPO_ROOT).as_posix() catalog_rel = catalog_path.relative_to(REPO_ROOT).as_posix()
catalog_path.write_text( catalog_path.write_text(
@ -517,9 +589,14 @@ def main() -> int:
diagram_path.write_text(diagram, encoding="utf-8") diagram_path.write_text(diagram, encoding="utf-8")
# Render runbooks into JSON for lightweight, dependency-free consumption in-cluster. # Render runbooks into JSON for lightweight, dependency-free consumption in-cluster.
runbooks_dir = out_dir / "runbooks" runbook_dirs = [
out_dir / "runbooks",
out_dir / "software",
]
runbooks: list[dict[str, Any]] = [] runbooks: list[dict[str, Any]] = []
if runbooks_dir.exists(): for runbooks_dir in runbook_dirs:
if not runbooks_dir.exists():
continue
for md_file in sorted(runbooks_dir.glob("*.md")): for md_file in sorted(runbooks_dir.glob("*.md")):
raw = md_file.read_text(encoding="utf-8") raw = md_file.read_text(encoding="utf-8")
fm: dict[str, Any] = {} fm: dict[str, Any] = {}
@ -543,12 +620,22 @@ def main() -> int:
} }
) )
runbooks_json_path.write_text(json.dumps(runbooks, indent=2, sort_keys=False) + "\n", encoding="utf-8") runbooks_json_path.write_text(json.dumps(runbooks, indent=2, sort_keys=False) + "\n", encoding="utf-8")
metrics_index = _extract_metrics_index(DASHBOARD_DIR)
metrics_json_path.write_text(
json.dumps(metrics_index, indent=2, sort_keys=False) + "\n", encoding="utf-8"
)
print(f"Wrote {catalog_path.relative_to(REPO_ROOT)}") print(f"Wrote {catalog_path.relative_to(REPO_ROOT)}")
print(f"Wrote {catalog_json_path.relative_to(REPO_ROOT)}") print(f"Wrote {catalog_json_path.relative_to(REPO_ROOT)}")
print(f"Wrote {summary_path.relative_to(REPO_ROOT)}") print(f"Wrote {summary_path.relative_to(REPO_ROOT)}")
print(f"Wrote {diagram_path.relative_to(REPO_ROOT)}") print(f"Wrote {diagram_path.relative_to(REPO_ROOT)}")
print(f"Wrote {runbooks_json_path.relative_to(REPO_ROOT)}") print(f"Wrote {runbooks_json_path.relative_to(REPO_ROOT)}")
print(f"Wrote {metrics_json_path.relative_to(REPO_ROOT)}")
if args.sync_comms:
comms_dir = REPO_ROOT / "services" / "comms" / "knowledge"
_sync_tree(out_dir, comms_dir)
print(f"Synced {out_dir.relative_to(REPO_ROOT)} -> {comms_dir.relative_to(REPO_ROOT)}")
return 0 return 0

View File

@ -20,8 +20,9 @@ spec:
labels: labels:
app: ollama app: ollama
annotations: annotations:
ai.bstein.dev/model: qwen2.5-coder:7b-instruct-q4_0 ai.bstein.dev/model: qwen2.5:14b-instruct-q4_0
ai.bstein.dev/gpu: GPU pool (titan-20/21/22/24) ai.bstein.dev/gpu: GPU pool (titan-22/24)
ai.bstein.dev/restartedAt: "2026-01-26T12:00:00Z"
spec: spec:
affinity: affinity:
nodeAffinity: nodeAffinity:
@ -31,8 +32,6 @@ spec:
- key: kubernetes.io/hostname - key: kubernetes.io/hostname
operator: In operator: In
values: values:
- titan-20
- titan-21
- titan-22 - titan-22
- titan-24 - titan-24
runtimeClassName: nvidia runtimeClassName: nvidia
@ -53,7 +52,7 @@ spec:
- name: OLLAMA_MODELS - name: OLLAMA_MODELS
value: /root/.ollama value: /root/.ollama
- name: OLLAMA_MODEL - name: OLLAMA_MODEL
value: qwen2.5-coder:7b-instruct-q4_0 value: qwen2.5:14b-instruct-q4_0
command: command:
- /bin/sh - /bin/sh
- -c - -c
@ -68,8 +67,8 @@ spec:
mountPath: /root/.ollama mountPath: /root/.ollama
resources: resources:
requests: requests:
cpu: 250m cpu: 500m
memory: 1Gi memory: 2Gi
nvidia.com/gpu.shared: 1 nvidia.com/gpu.shared: 1
limits: limits:
nvidia.com/gpu.shared: 1 nvidia.com/gpu.shared: 1
@ -96,10 +95,10 @@ spec:
mountPath: /root/.ollama mountPath: /root/.ollama
resources: resources:
requests: requests:
cpu: "2" cpu: "4"
memory: 8Gi memory: 16Gi
nvidia.com/gpu.shared: 1 nvidia.com/gpu.shared: 1
limits: limits:
cpu: "4" cpu: "8"
memory: 12Gi memory: 24Gi
nvidia.com/gpu.shared: 1 nvidia.com/gpu.shared: 1

View File

@ -28,6 +28,7 @@ spec:
{{ with secret "kv/data/atlas/shared/chat-ai-keys-runtime" }} {{ with secret "kv/data/atlas/shared/chat-ai-keys-runtime" }}
export CHAT_KEY_MATRIX="{{ .Data.data.matrix }}" export CHAT_KEY_MATRIX="{{ .Data.data.matrix }}"
export CHAT_KEY_HOMEPAGE="{{ .Data.data.homepage }}" export CHAT_KEY_HOMEPAGE="{{ .Data.data.homepage }}"
export AI_ATLASBOT_TOKEN="{{ .Data.data.homepage }}"
{{ end }} {{ end }}
{{ with secret "kv/data/atlas/shared/portal-e2e-client" }} {{ with secret "kv/data/atlas/shared/portal-e2e-client" }}
export PORTAL_E2E_CLIENT_ID="{{ .Data.data.client_id }}" export PORTAL_E2E_CLIENT_ID="{{ .Data.data.client_id }}"
@ -58,14 +59,18 @@ spec:
args: args:
- >- - >-
. /vault/secrets/portal-env.sh . /vault/secrets/portal-env.sh
&& exec gunicorn -b 0.0.0.0:8080 --workers 2 --timeout 180 app:app && exec gunicorn -b 0.0.0.0:8080 --workers 2 --timeout 600 app:app
env: env:
- name: AI_CHAT_API - name: AI_CHAT_API
value: http://ollama.ai.svc.cluster.local:11434 value: http://ollama.ai.svc.cluster.local:11434
- name: AI_CHAT_MODEL - name: AI_CHAT_MODEL
value: qwen2.5-coder:7b-instruct-q4_0 value: qwen2.5-coder:7b-instruct-q4_0
- name: AI_CHAT_TIMEOUT_SEC - name: AI_CHAT_TIMEOUT_SEC
value: "60" value: "480"
- name: AI_ATLASBOT_ENDPOINT
value: http://atlasbot.comms.svc.cluster.local:8090/v1/answer
- name: AI_ATLASBOT_TIMEOUT_SEC
value: "30"
- name: AI_NODE_NAME - name: AI_NODE_NAME
valueFrom: valueFrom:
fieldRef: fieldRef:
@ -91,10 +96,28 @@ spec:
value: atlas value: atlas
- name: KEYCLOAK_ADMIN_CLIENT_ID - name: KEYCLOAK_ADMIN_CLIENT_ID
value: bstein-dev-home-admin value: bstein-dev-home-admin
- name: ARIADNE_URL
value: http://ariadne.maintenance.svc.cluster.local
- name: ARIADNE_TIMEOUT_SEC
value: "10"
- name: ACCOUNT_ALLOWED_GROUPS - name: ACCOUNT_ALLOWED_GROUPS
value: "" value: ""
- name: HTTP_CHECK_TIMEOUT_SEC - name: HTTP_CHECK_TIMEOUT_SEC
value: "2" value: "2"
- name: PORTAL_DB_POOL_MIN
value: "0"
- name: PORTAL_DB_POOL_MAX
value: "5"
- name: PORTAL_DB_CONNECT_TIMEOUT_SEC
value: "5"
- name: PORTAL_DB_LOCK_TIMEOUT_SEC
value: "5"
- name: PORTAL_DB_STATEMENT_TIMEOUT_SEC
value: "30"
- name: PORTAL_DB_IDLE_IN_TX_TIMEOUT_SEC
value: "10"
- name: PORTAL_RUN_MIGRATIONS
value: "false"
- name: ACCESS_REQUEST_SUBMIT_RATE_LIMIT - name: ACCESS_REQUEST_SUBMIT_RATE_LIMIT
value: "30" value: "30"
- name: ACCESS_REQUEST_SUBMIT_RATE_WINDOW_SEC - name: ACCESS_REQUEST_SUBMIT_RATE_WINDOW_SEC

View File

@ -47,6 +47,8 @@ spec:
env: env:
- name: UPSTREAM_URL - name: UPSTREAM_URL
value: http://bstein-dev-home-backend/api/chat value: http://bstein-dev-home-backend/api/chat
- name: UPSTREAM_TIMEOUT_SEC
value: "600"
ports: ports:
- name: http - name: http
containerPort: 8080 containerPort: 8080
@ -65,10 +67,10 @@ spec:
resources: resources:
requests: requests:
cpu: 20m cpu: 20m
memory: 64Mi memory: 128Mi
limits: limits:
cpu: 200m cpu: 200m
memory: 256Mi memory: 512Mi
volumeMounts: volumeMounts:
- name: code - name: code
mountPath: /app/gateway.py mountPath: /app/gateway.py

View File

@ -7,6 +7,8 @@ metadata:
spec: spec:
image: registry.bstein.dev/bstein/bstein-dev-home-frontend image: registry.bstein.dev/bstein/bstein-dev-home-frontend
interval: 1m0s interval: 1m0s
secretRef:
name: harbor-regcred
--- ---
apiVersion: image.toolkit.fluxcd.io/v1beta2 apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy kind: ImagePolicy
@ -28,6 +30,8 @@ metadata:
spec: spec:
image: registry.bstein.dev/bstein/bstein-dev-home-backend image: registry.bstein.dev/bstein/bstein-dev-home-backend
interval: 1m0s interval: 1m0s
secretRef:
name: harbor-regcred
--- ---
apiVersion: image.toolkit.fluxcd.io/v1beta2 apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy kind: ImagePolicy

View File

@ -16,13 +16,13 @@ resources:
- backend-deployment.yaml - backend-deployment.yaml
- backend-service.yaml - backend-service.yaml
- vaultwarden-cred-sync-cronjob.yaml - vaultwarden-cred-sync-cronjob.yaml
- portal-onboarding-e2e-test-job.yaml - oneoffs/portal-onboarding-e2e-test-job.yaml
- ingress.yaml - ingress.yaml
images: images:
- name: registry.bstein.dev/bstein/bstein-dev-home-frontend - name: registry.bstein.dev/bstein/bstein-dev-home-frontend
newTag: 0.1.1-102 # {"$imagepolicy": "bstein-dev-home:bstein-dev-home-frontend"} newTag: 0.1.1-162 # {"$imagepolicy": "bstein-dev-home:bstein-dev-home-frontend:tag"}
- name: registry.bstein.dev/bstein/bstein-dev-home-backend - name: registry.bstein.dev/bstein/bstein-dev-home-backend
newTag: 0.1.1-103 # {"$imagepolicy": "bstein-dev-home:bstein-dev-home-backend"} newTag: 0.1.1-162 # {"$imagepolicy": "bstein-dev-home:bstein-dev-home-backend:tag"}
configMapGenerator: configMapGenerator:
- name: chat-ai-gateway - name: chat-ai-gateway
namespace: bstein-dev-home namespace: bstein-dev-home

View File

@ -0,0 +1,6 @@
# services/bstein-dev-home/oneoffs/migrations/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: bstein-dev-home
resources:
- portal-migrate-job.yaml

View File

@ -0,0 +1,48 @@
# services/bstein-dev-home/oneoffs/migrations/portal-migrate-job.yaml
# One-off job for bstein-dev-home/bstein-dev-home-portal-migrate-36.
# Purpose: bstein dev home portal migrate 36 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1
kind: Job
metadata:
name: bstein-dev-home-portal-migrate-36
namespace: bstein-dev-home
annotations:
kustomize.toolkit.fluxcd.io/force: "true"
spec:
suspend: true
backoffLimit: 1
ttlSecondsAfterFinished: 3600
template:
metadata:
labels:
app: bstein-dev-home-portal-migrate
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "bstein-dev-home"
vault.hashicorp.com/agent-inject-secret-portal-env.sh: "kv/data/atlas/portal/atlas-portal-db"
vault.hashicorp.com/agent-inject-template-portal-env.sh: |
{{ with secret "kv/data/atlas/portal/atlas-portal-db" }}
export PORTAL_DATABASE_URL="{{ .Data.data.PORTAL_DATABASE_URL }}"
{{ end }}
spec:
serviceAccountName: bstein-dev-home
restartPolicy: Never
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: "true"
imagePullSecrets:
- name: harbor-regcred
containers:
- name: migrate
image: registry.bstein.dev/bstein/bstein-dev-home-backend:0.1.1-95
imagePullPolicy: Always
command: ["/bin/sh", "-c"]
args:
- >-
. /vault/secrets/portal-env.sh
&& exec python -m atlas_portal.migrate
env:
- name: PORTAL_RUN_MIGRATIONS
value: "true"

View File

@ -1,10 +1,15 @@
# services/bstein-dev-home/portal-onboarding-e2e-test-job.yaml # services/bstein-dev-home/oneoffs/portal-onboarding-e2e-test-job.yaml
# One-off job for bstein-dev-home/portal-onboarding-e2e-test-27.
# Purpose: portal onboarding e2e test 27 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: portal-onboarding-e2e-test-19 name: portal-onboarding-e2e-test-27
namespace: bstein-dev-home namespace: bstein-dev-home
spec: spec:
suspend: true
backoffLimit: 0 backoffLimit: 0
template: template:
metadata: metadata:

View File

@ -6,6 +6,7 @@ from urllib import request, error
UPSTREAM = os.environ.get("UPSTREAM_URL", "http://bstein-dev-home-backend/api/chat") UPSTREAM = os.environ.get("UPSTREAM_URL", "http://bstein-dev-home-backend/api/chat")
KEY_MATRIX = os.environ.get("CHAT_KEY_MATRIX", "") KEY_MATRIX = os.environ.get("CHAT_KEY_MATRIX", "")
KEY_HOMEPAGE = os.environ.get("CHAT_KEY_HOMEPAGE", "") KEY_HOMEPAGE = os.environ.get("CHAT_KEY_HOMEPAGE", "")
UPSTREAM_TIMEOUT_SEC = float(os.environ.get("UPSTREAM_TIMEOUT_SEC", "90"))
ALLOWED = {k for k in (KEY_MATRIX, KEY_HOMEPAGE) if k} ALLOWED = {k for k in (KEY_MATRIX, KEY_HOMEPAGE) if k}
@ -41,7 +42,7 @@ class Handler(BaseHTTPRequestHandler):
headers={"Content-Type": "application/json"}, headers={"Content-Type": "application/json"},
method="POST", method="POST",
) )
with request.urlopen(upstream_req, timeout=90) as resp: with request.urlopen(upstream_req, timeout=UPSTREAM_TIMEOUT_SEC) as resp:
data = resp.read() data = resp.read()
self.send_response(resp.status) self.send_response(resp.status)
for k, v in resp.headers.items(): for k, v in resp.headers.items():

View File

@ -11,7 +11,7 @@ spec:
roleName: "bstein-dev-home" roleName: "bstein-dev-home"
objects: | objects: |
- objectName: "harbor-pull__dockerconfigjson" - objectName: "harbor-pull__dockerconfigjson"
secretPath: "kv/data/atlas/harbor-pull/bstein-dev-home" secretPath: "kv/data/atlas/shared/harbor-pull"
secretKey: "dockerconfigjson" secretKey: "dockerconfigjson"
secretObjects: secretObjects:
- secretName: harbor-regcred - secretName: harbor-regcred

View File

@ -8,6 +8,7 @@ metadata:
atlas.bstein.dev/glue: "true" atlas.bstein.dev/glue: "true"
spec: spec:
schedule: "*/15 * * * *" schedule: "*/15 * * * *"
suspend: true
concurrencyPolicy: Forbid concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1 successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 3 failedJobsHistoryLimit: 3

View File

@ -16,7 +16,7 @@ spec:
labels: labels:
app: atlasbot app: atlasbot
annotations: annotations:
checksum/atlasbot-configmap: manual-atlasbot-4 checksum/atlasbot-configmap: manual-atlasbot-101
vault.hashicorp.com/agent-inject: "true" vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "comms" vault.hashicorp.com/role: "comms"
vault.hashicorp.com/agent-inject-secret-turn-secret: "kv/data/atlas/comms/turn-shared-secret" vault.hashicorp.com/agent-inject-secret-turn-secret: "kv/data/atlas/comms/turn-shared-secret"
@ -73,12 +73,33 @@ spec:
value: /kb value: /kb
- name: VM_URL - name: VM_URL
value: http://victoria-metrics-single-server.monitoring.svc.cluster.local:8428 value: http://victoria-metrics-single-server.monitoring.svc.cluster.local:8428
- name: ARIADNE_STATE_URL
value: http://ariadne.maintenance.svc.cluster.local/api/internal/cluster/state
- name: BOT_USER - name: BOT_USER
value: atlasbot value: atlasbot
- name: BOT_MENTIONS
value: atlasbot,aatlasbot,atlas_quick,atlas_smart
- name: OLLAMA_URL - name: OLLAMA_URL
value: https://chat.ai.bstein.dev/ value: http://ollama.ai.svc.cluster.local:11434
- name: OLLAMA_MODEL - name: OLLAMA_MODEL
value: qwen2.5-coder:7b-instruct-q4_0 value: qwen2.5:14b-instruct
- name: ATLASBOT_MODEL_FAST
value: qwen2.5:14b-instruct-q4_0
- name: ATLASBOT_MODEL_DEEP
value: qwen2.5:14b-instruct
- name: OLLAMA_FALLBACK_MODEL
value: qwen2.5:14b-instruct-q4_0
- name: OLLAMA_TIMEOUT_SEC
value: "600"
- name: ATLASBOT_THINKING_INTERVAL_SEC
value: "120"
- name: ATLASBOT_SNAPSHOT_TTL_SEC
value: "30"
- name: ATLASBOT_HTTP_PORT
value: "8090"
ports:
- name: http
containerPort: 8090
resources: resources:
requests: requests:
cpu: 100m cpu: 100m
@ -110,6 +131,8 @@ spec:
path: catalog/atlas.json path: catalog/atlas.json
- key: atlas-summary.json - key: atlas-summary.json
path: catalog/atlas-summary.json path: catalog/atlas-summary.json
- key: metrics.json
path: catalog/metrics.json
- key: runbooks.json - key: runbooks.json
path: catalog/runbooks.json path: catalog/runbooks.json
- key: atlas-http.mmd - key: atlas-http.mmd

View File

@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: atlasbot
namespace: comms
labels:
app: atlasbot
spec:
selector:
app: atlasbot
ports:
- name: http
port: 8090
targetPort: 8090
type: ClusterIP

View File

@ -8,7 +8,7 @@ metadata:
atlas.bstein.dev/glue: "true" atlas.bstein.dev/glue: "true"
spec: spec:
schedule: "*/1 * * * *" schedule: "*/1 * * * *"
suspend: false suspend: true
concurrencyPolicy: Forbid concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1 successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1 failedJobsHistoryLimit: 1

View File

@ -140,6 +140,7 @@ spec:
autocreate_auto_join_rooms: true autocreate_auto_join_rooms: true
default_room_version: "11" default_room_version: "11"
experimental_features: experimental_features:
msc4108_enabled: true
msc3266_enabled: true msc3266_enabled: true
msc4143_enabled: true msc4143_enabled: true
msc4222_enabled: true msc4222_enabled: true

View File

@ -1,8 +1,8 @@
{ {
"counts": { "counts": {
"helmrelease_host_hints": 17, "helmrelease_host_hints": 19,
"http_endpoints": 37, "http_endpoints": 45,
"services": 43, "services": 47,
"workloads": 54 "workloads": 74
} }
} }

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
# services/comms/knowledge/catalog/atlas.yaml # knowledge/catalog/atlas.yaml
# Generated by scripts/knowledge_render_atlas.py (do not edit by hand) # Generated by scripts/knowledge_render_atlas.py (do not edit by hand)
cluster: atlas cluster: atlas
sources: sources:
@ -8,6 +8,15 @@ sources:
- name: bstein-dev-home - name: bstein-dev-home
path: services/bstein-dev-home path: services/bstein-dev-home
targetNamespace: bstein-dev-home targetNamespace: bstein-dev-home
- name: bstein-dev-home-migrations
path: services/bstein-dev-home/migrations
targetNamespace: bstein-dev-home
- name: cert-manager
path: infrastructure/cert-manager
targetNamespace: cert-manager
- name: cert-manager-cleanup
path: infrastructure/cert-manager/cleanup
targetNamespace: cert-manager
- name: comms - name: comms
path: services/comms path: services/comms
targetNamespace: comms targetNamespace: comms
@ -17,6 +26,9 @@ sources:
- name: crypto - name: crypto
path: services/crypto path: services/crypto
targetNamespace: crypto targetNamespace: crypto
- name: finance
path: services/finance
targetNamespace: finance
- name: flux-system - name: flux-system
path: clusters/atlas/flux-system path: clusters/atlas/flux-system
targetNamespace: null targetNamespace: null
@ -29,6 +41,9 @@ sources:
- name: harbor - name: harbor
path: services/harbor path: services/harbor
targetNamespace: harbor targetNamespace: harbor
- name: health
path: services/health
targetNamespace: health
- name: helm - name: helm
path: infrastructure/sources/helm path: infrastructure/sources/helm
targetNamespace: flux-system targetNamespace: flux-system
@ -44,6 +59,12 @@ sources:
- name: logging - name: logging
path: services/logging path: services/logging
targetNamespace: null targetNamespace: null
- name: longhorn
path: infrastructure/longhorn/core
targetNamespace: longhorn-system
- name: longhorn-adopt
path: infrastructure/longhorn/adopt
targetNamespace: longhorn-system
- name: longhorn-ui - name: longhorn-ui
path: infrastructure/longhorn/ui-ingress path: infrastructure/longhorn/ui-ingress
targetNamespace: longhorn-system targetNamespace: longhorn-system
@ -98,9 +119,15 @@ sources:
- name: vault-csi - name: vault-csi
path: infrastructure/vault-csi path: infrastructure/vault-csi
targetNamespace: kube-system targetNamespace: kube-system
- name: vault-injector
path: infrastructure/vault-injector
targetNamespace: vault
- name: vaultwarden - name: vaultwarden
path: services/vaultwarden path: services/vaultwarden
targetNamespace: vaultwarden targetNamespace: vaultwarden
- name: wallet-monero-temp
path: services/crypto/wallet-monero-temp
targetNamespace: crypto
- name: xmr-miner - name: xmr-miner
path: services/crypto/xmr-miner path: services/crypto/xmr-miner
targetNamespace: crypto targetNamespace: crypto
@ -124,7 +151,7 @@ workloads:
kubernetes.io/arch: arm64 kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- registry.bstein.dev/bstein/bstein-dev-home-backend:0.1.1-92 - registry.bstein.dev/bstein/bstein-dev-home-backend:0.1.1-157
- kind: Deployment - kind: Deployment
namespace: bstein-dev-home namespace: bstein-dev-home
name: bstein-dev-home-frontend name: bstein-dev-home-frontend
@ -135,13 +162,22 @@ workloads:
kubernetes.io/arch: arm64 kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- registry.bstein.dev/bstein/bstein-dev-home-frontend:0.1.1-92 - registry.bstein.dev/bstein/bstein-dev-home-frontend:0.1.1-157
- kind: Deployment
namespace: bstein-dev-home
name: bstein-dev-home-vault-sync
labels:
app: bstein-dev-home-vault-sync
serviceAccountName: bstein-dev-home-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: bstein-dev-home namespace: bstein-dev-home
name: chat-ai-gateway name: chat-ai-gateway
labels: labels:
app: chat-ai-gateway app: chat-ai-gateway
serviceAccountName: null serviceAccountName: bstein-dev-home
nodeSelector: nodeSelector:
kubernetes.io/arch: arm64 kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
@ -157,12 +193,21 @@ workloads:
hardware: rpi5 hardware: rpi5
images: images:
- python:3.11-slim - python:3.11-slim
- kind: Deployment
namespace: comms
name: comms-vault-sync
labels:
app: comms-vault-sync
serviceAccountName: comms-vault
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: comms namespace: comms
name: coturn name: coturn
labels: labels:
app: coturn app: coturn
serviceAccountName: null serviceAccountName: comms-vault
nodeSelector: nodeSelector:
hardware: rpi5 hardware: rpi5
images: images:
@ -182,7 +227,7 @@ workloads:
name: livekit name: livekit
labels: labels:
app: livekit app: livekit
serviceAccountName: null serviceAccountName: comms-vault
nodeSelector: nodeSelector:
hardware: rpi5 hardware: rpi5
images: images:
@ -192,17 +237,17 @@ workloads:
name: livekit-token-service name: livekit-token-service
labels: labels:
app: livekit-token-service app: livekit-token-service
serviceAccountName: null serviceAccountName: comms-vault
nodeSelector: nodeSelector:
hardware: rpi5 hardware: rpi5
images: images:
- ghcr.io/element-hq/lk-jwt-service:0.3.0 - registry.bstein.dev/tools/lk-jwt-service-vault:0.3.0
- kind: Deployment - kind: Deployment
namespace: comms namespace: comms
name: matrix-authentication-service name: matrix-authentication-service
labels: labels:
app: matrix-authentication-service app: matrix-authentication-service
serviceAccountName: null serviceAccountName: comms-vault
nodeSelector: nodeSelector:
hardware: rpi5 hardware: rpi5
images: images:
@ -212,7 +257,7 @@ workloads:
name: matrix-guest-register name: matrix-guest-register
labels: labels:
app.kubernetes.io/name: matrix-guest-register app.kubernetes.io/name: matrix-guest-register
serviceAccountName: null serviceAccountName: comms-vault
nodeSelector: {} nodeSelector: {}
images: images:
- python:3.11-slim - python:3.11-slim
@ -235,12 +280,21 @@ workloads:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- ghcr.io/tari-project/xmrig@sha256:80defbfd0b640d604c91cb5101d3642db7928e1e68ee3c6b011289b3565a39d9 - ghcr.io/tari-project/xmrig@sha256:80defbfd0b640d604c91cb5101d3642db7928e1e68ee3c6b011289b3565a39d9
- kind: Deployment
namespace: crypto
name: crypto-vault-sync
labels:
app: crypto-vault-sync
serviceAccountName: crypto-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: crypto namespace: crypto
name: monero-p2pool name: monero-p2pool
labels: labels:
app: monero-p2pool app: monero-p2pool
serviceAccountName: null serviceAccountName: crypto-vault-sync
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
@ -255,6 +309,38 @@ workloads:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- registry.bstein.dev/crypto/monerod:0.18.4.1 - registry.bstein.dev/crypto/monerod:0.18.4.1
- kind: Deployment
namespace: crypto
name: wallet-monero-temp
labels:
app: wallet-monero-temp
serviceAccountName: crypto-vault-sync
nodeSelector:
node-role.kubernetes.io/worker: 'true'
images:
- registry.bstein.dev/crypto/monero-wallet-rpc:0.18.4.1
- kind: Deployment
namespace: finance
name: actual-budget
labels:
app: actual-budget
serviceAccountName: finance-vault
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- actualbudget/actual-server:26.1.0-alpine@sha256:34aae5813fdfee12af2a50c4d0667df68029f1d61b90f45f282473273eb70d0d
- kind: Deployment
namespace: finance
name: firefly
labels:
app: firefly
serviceAccountName: finance-vault
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- fireflyiii/core:version-6.4.15
- kind: Deployment - kind: Deployment
namespace: flux-system namespace: flux-system
name: helm-controller name: helm-controller
@ -344,17 +430,38 @@ workloads:
name: gitea name: gitea
labels: labels:
app: gitea app: gitea
serviceAccountName: null serviceAccountName: gitea-vault
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- gitea/gitea:1.23 - gitea/gitea:1.23
- kind: Deployment
namespace: harbor
name: harbor-vault-sync
labels:
app: harbor-vault-sync
serviceAccountName: harbor-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment
namespace: health
name: wger
labels:
app: wger
serviceAccountName: health-vault-sync
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- nginx:1.27.5-alpine@sha256:65645c7bb6a0661892a8b03b89d0743208a18dd2f3f17a54ef4b76fb8e2f2a10
- wger/server@sha256:710588b78af4e0aa0b4d8a8061e4563e16eae80eeaccfe7f9e0d9cbdd7f0cbc5
- kind: Deployment - kind: Deployment
namespace: jellyfin namespace: jellyfin
name: jellyfin name: jellyfin
labels: labels:
app: jellyfin app: jellyfin
serviceAccountName: null serviceAccountName: pegasus-vault-sync
nodeSelector: {} nodeSelector: {}
images: images:
- docker.io/jellyfin/jellyfin:10.11.5 - docker.io/jellyfin/jellyfin:10.11.5
@ -363,13 +470,22 @@ workloads:
name: pegasus name: pegasus
labels: labels:
app: pegasus app: pegasus
serviceAccountName: null serviceAccountName: pegasus-vault-sync
nodeSelector: nodeSelector:
kubernetes.io/arch: arm64 kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- alpine:3.20 - alpine:3.20
- registry.bstein.dev/streaming/pegasus:1.2.32 - registry.bstein.dev/streaming/pegasus-vault:1.2.32
- kind: Deployment
namespace: jellyfin
name: pegasus-vault-sync
labels:
app: pegasus-vault-sync
serviceAccountName: pegasus-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: jenkins namespace: jenkins
name: jenkins name: jenkins
@ -381,6 +497,26 @@ workloads:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- jenkins/jenkins:2.528.3-jdk21 - jenkins/jenkins:2.528.3-jdk21
- kind: Deployment
namespace: jenkins
name: jenkins-vault-sync
labels:
app: jenkins-vault-sync
serviceAccountName: jenkins-vault-sync
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- alpine:3.20
- kind: DaemonSet
namespace: kube-system
name: ntp-sync
labels:
app: ntp-sync
serviceAccountName: null
nodeSelector: {}
images:
- public.ecr.aws/docker/library/busybox:1.36.1
- kind: DaemonSet - kind: DaemonSet
namespace: kube-system namespace: kube-system
name: nvidia-device-plugin-jetson name: nvidia-device-plugin-jetson
@ -427,6 +563,16 @@ workloads:
kubernetes.io/os: linux kubernetes.io/os: linux
images: images:
- hashicorp/vault-csi-provider:1.7.0 - hashicorp/vault-csi-provider:1.7.0
- kind: Deployment
namespace: kube-system
name: coredns
labels:
k8s-app: kube-dns
serviceAccountName: coredns
nodeSelector:
kubernetes.io/os: linux
images:
- registry.bstein.dev/infra/coredns:1.12.1
- kind: DaemonSet - kind: DaemonSet
namespace: logging namespace: logging
name: node-image-gc-rpi4 name: node-image-gc-rpi4
@ -457,22 +603,41 @@ workloads:
hardware: rpi5 hardware: rpi5
images: images:
- bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131 - bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131
- kind: Deployment
namespace: logging
name: logging-vault-sync
labels:
app: logging-vault-sync
serviceAccountName: logging-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: logging namespace: logging
name: oauth2-proxy-logs name: oauth2-proxy-logs
labels: labels:
app: oauth2-proxy-logs app: oauth2-proxy-logs
serviceAccountName: null serviceAccountName: logging-vault-sync
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- quay.io/oauth2-proxy/oauth2-proxy:v7.6.0 - registry.bstein.dev/tools/oauth2-proxy-vault:v7.6.0
- kind: Deployment
namespace: longhorn-system
name: longhorn-vault-sync
labels:
app: longhorn-vault-sync
serviceAccountName: longhorn-vault-sync
nodeSelector:
node-role.kubernetes.io/worker: 'true'
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: longhorn-system namespace: longhorn-system
name: oauth2-proxy-longhorn name: oauth2-proxy-longhorn
labels: labels:
app: oauth2-proxy-longhorn app: oauth2-proxy-longhorn
serviceAccountName: null serviceAccountName: longhorn-vault
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
@ -489,13 +654,34 @@ workloads:
- registry.bstein.dev/bstein/kubectl:1.35.0 - registry.bstein.dev/bstein/kubectl:1.35.0
- kind: Deployment - kind: Deployment
namespace: mailu-mailserver namespace: mailu-mailserver
name: mailu-sync-listener name: mailu-vault-sync
labels: labels:
app: mailu-sync-listener app: mailu-vault-sync
serviceAccountName: null serviceAccountName: mailu-vault-sync
nodeSelector: {} nodeSelector: {}
images: images:
- python:3.11-alpine - alpine:3.20
- kind: DaemonSet
namespace: maintenance
name: disable-k3s-traefik
labels:
app: disable-k3s-traefik
serviceAccountName: disable-k3s-traefik
nodeSelector:
node-role.kubernetes.io/control-plane: 'true'
images:
- bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131
- kind: DaemonSet
namespace: maintenance
name: k3s-agent-restart
labels:
app: k3s-agent-restart
serviceAccountName: node-nofile
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131
- kind: DaemonSet - kind: DaemonSet
namespace: maintenance namespace: maintenance
name: node-image-sweeper name: node-image-sweeper
@ -515,6 +701,26 @@ workloads:
nodeSelector: {} nodeSelector: {}
images: images:
- bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131 - bitnami/kubectl@sha256:554ab88b1858e8424c55de37ad417b16f2a0e65d1607aa0f3fe3ce9b9f10b131
- kind: Deployment
namespace: maintenance
name: ariadne
labels:
app: ariadne
serviceAccountName: ariadne
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images:
- registry.bstein.dev/bstein/ariadne:0.1.0-49
- kind: Deployment
namespace: maintenance
name: maintenance-vault-sync
labels:
app: maintenance-vault-sync
serviceAccountName: maintenance-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: DaemonSet - kind: DaemonSet
namespace: monitoring namespace: monitoring
name: dcgm-exporter name: dcgm-exporter
@ -534,12 +740,21 @@ workloads:
jetson: 'true' jetson: 'true'
images: images:
- python:3.10-slim - python:3.10-slim
- kind: Deployment
namespace: monitoring
name: monitoring-vault-sync
labels:
app: monitoring-vault-sync
serviceAccountName: monitoring-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: Deployment - kind: Deployment
namespace: monitoring namespace: monitoring
name: postmark-exporter name: postmark-exporter
labels: labels:
app: postmark-exporter app: postmark-exporter
serviceAccountName: null serviceAccountName: monitoring-vault-sync
nodeSelector: {} nodeSelector: {}
images: images:
- python:3.12-alpine - python:3.12-alpine
@ -558,7 +773,7 @@ workloads:
name: nextcloud name: nextcloud
labels: labels:
app: nextcloud app: nextcloud
serviceAccountName: null serviceAccountName: nextcloud-vault
nodeSelector: nodeSelector:
hardware: rpi5 hardware: rpi5
images: images:
@ -568,7 +783,7 @@ workloads:
name: outline name: outline
labels: labels:
app: outline app: outline
serviceAccountName: null serviceAccountName: outline-vault
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
@ -588,7 +803,7 @@ workloads:
name: planka name: planka
labels: labels:
app: planka app: planka
serviceAccountName: null serviceAccountName: planka-vault
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
@ -603,13 +818,16 @@ workloads:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- postgres:15 - postgres:15
- quay.io/prometheuscommunity/postgres-exporter:v0.15.0
- kind: Deployment - kind: Deployment
namespace: sso namespace: sso
name: keycloak name: keycloak
labels: labels:
app: keycloak app: keycloak
serviceAccountName: null serviceAccountName: sso-vault
nodeSelector: {} nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images: images:
- quay.io/keycloak/keycloak:26.0.7 - quay.io/keycloak/keycloak:26.0.7
- kind: Deployment - kind: Deployment
@ -617,17 +835,26 @@ workloads:
name: oauth2-proxy name: oauth2-proxy
labels: labels:
app: oauth2-proxy app: oauth2-proxy
serviceAccountName: null serviceAccountName: sso-vault
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
images: images:
- quay.io/oauth2-proxy/oauth2-proxy:v7.6.0 - registry.bstein.dev/tools/oauth2-proxy-vault:v7.6.0
- kind: Deployment
namespace: sso
name: sso-vault-sync
labels:
app: sso-vault-sync
serviceAccountName: sso-vault-sync
nodeSelector: {}
images:
- alpine:3.20
- kind: StatefulSet - kind: StatefulSet
namespace: sso namespace: sso
name: openldap name: openldap
labels: labels:
app: openldap app: openldap
serviceAccountName: null serviceAccountName: sso-vault
nodeSelector: nodeSelector:
kubernetes.io/arch: arm64 kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
@ -640,7 +867,7 @@ workloads:
app: sui-metrics app: sui-metrics
serviceAccountName: sui-metrics serviceAccountName: sui-metrics
nodeSelector: nodeSelector:
kubernetes.io/hostname: titan-24 hardware: rpi5
images: images:
- victoriametrics/vmagent:v1.103.0 - victoriametrics/vmagent:v1.103.0
- kind: Deployment - kind: Deployment
@ -648,6 +875,8 @@ workloads:
name: traefik name: traefik
labels: labels:
app: traefik app: traefik
app.kubernetes.io/instance: traefik-kube-system
app.kubernetes.io/name: traefik
serviceAccountName: traefik-ingress-controller serviceAccountName: traefik-ingress-controller
nodeSelector: nodeSelector:
node-role.kubernetes.io/worker: 'true' node-role.kubernetes.io/worker: 'true'
@ -669,10 +898,12 @@ workloads:
name: vaultwarden name: vaultwarden
labels: labels:
app: vaultwarden app: vaultwarden
serviceAccountName: null serviceAccountName: vaultwarden-vault
nodeSelector: {} nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: 'true'
images: images:
- vaultwarden/server:1.33.2 - vaultwarden/server:1.35.2
services: services:
- namespace: ai - namespace: ai
name: ollama name: ollama
@ -1040,6 +1271,36 @@ services:
port: 3333 port: 3333
targetPort: 3333 targetPort: 3333
protocol: TCP protocol: TCP
- namespace: crypto
name: wallet-monero-temp
type: ClusterIP
selector:
app: wallet-monero-temp
ports:
- name: rpc
port: 18083
targetPort: 18083
protocol: TCP
- namespace: finance
name: actual-budget
type: ClusterIP
selector:
app: actual-budget
ports:
- name: http
port: 80
targetPort: 5006
protocol: TCP
- namespace: finance
name: firefly
type: ClusterIP
selector:
app: firefly
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
- namespace: flux-system - namespace: flux-system
name: notification-controller name: notification-controller
type: ClusterIP type: ClusterIP
@ -1082,7 +1343,7 @@ services:
protocol: TCP protocol: TCP
- namespace: gitea - namespace: gitea
name: gitea-ssh name: gitea-ssh
type: NodePort type: LoadBalancer
selector: selector:
app: gitea app: gitea
ports: ports:
@ -1090,6 +1351,16 @@ services:
port: 2242 port: 2242
targetPort: 2242 targetPort: 2242
protocol: TCP protocol: TCP
- namespace: health
name: wger
type: ClusterIP
selector:
app: wger
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
- namespace: jellyfin - namespace: jellyfin
name: jellyfin name: jellyfin
type: ClusterIP type: ClusterIP
@ -1124,21 +1395,6 @@ services:
port: 50000 port: 50000
targetPort: 50000 targetPort: 50000
protocol: TCP protocol: TCP
- namespace: kube-system
name: traefik
type: LoadBalancer
selector:
app.kubernetes.io/instance: traefik-kube-system
app.kubernetes.io/name: traefik
ports:
- name: web
port: 80
targetPort: web
protocol: TCP
- name: websecure
port: 443
targetPort: websecure
protocol: TCP
- namespace: logging - namespace: logging
name: oauth2-proxy-logs name: oauth2-proxy-logs
type: ClusterIP type: ClusterIP
@ -1191,15 +1447,15 @@ services:
port: 4190 port: 4190
targetPort: 4190 targetPort: 4190
protocol: TCP protocol: TCP
- namespace: mailu-mailserver - namespace: maintenance
name: mailu-sync-listener name: ariadne
type: ClusterIP type: ClusterIP
selector: selector:
app: mailu-sync-listener app: ariadne
ports: ports:
- name: http - name: http
port: 8080 port: 80
targetPort: 8080 targetPort: http
protocol: TCP protocol: TCP
- namespace: monitoring - namespace: monitoring
name: dcgm-exporter name: dcgm-exporter
@ -1291,6 +1547,10 @@ services:
port: 5432 port: 5432
targetPort: 5432 targetPort: 5432
protocol: TCP protocol: TCP
- name: metrics
port: 9187
targetPort: 9187
protocol: TCP
- namespace: sso - namespace: sso
name: keycloak name: keycloak
type: ClusterIP type: ClusterIP
@ -1335,6 +1595,20 @@ services:
port: 8429 port: 8429
targetPort: 8429 targetPort: 8429
protocol: TCP protocol: TCP
- namespace: traefik
name: traefik
type: LoadBalancer
selector:
app: traefik
ports:
- name: web
port: 80
targetPort: web
protocol: TCP
- name: websecure
port: 443
targetPort: websecure
protocol: TCP
- namespace: traefik - namespace: traefik
name: traefik-metrics name: traefik-metrics
type: ClusterIP type: ClusterIP
@ -1447,6 +1721,19 @@ http_endpoints:
kind: Ingress kind: Ingress
name: bstein-dev-home name: bstein-dev-home
source: bstein-dev-home source: bstein-dev-home
- host: budget.bstein.dev
path: /
backend:
namespace: finance
service: actual-budget
port: 80
workloads:
- kind: Deployment
name: actual-budget
via:
kind: Ingress
name: actual-budget
source: finance
- host: call.live.bstein.dev - host: call.live.bstein.dev
path: / path: /
backend: backend:
@ -1499,6 +1786,19 @@ http_endpoints:
kind: Ingress kind: Ingress
name: nextcloud name: nextcloud
source: nextcloud source: nextcloud
- host: health.bstein.dev
path: /
backend:
namespace: health
service: wger
port: 80
workloads:
- kind: Deployment
name: wger
via:
kind: Ingress
name: wger
source: health
- host: kit.live.bstein.dev - host: kit.live.bstein.dev
path: /livekit/jwt path: /livekit/jwt
backend: backend:
@ -1558,6 +1858,65 @@ http_endpoints:
kind: Ingress kind: Ingress
name: matrix-routing name: matrix-routing
source: comms source: comms
- host: live.bstein.dev
path: /_matrix/client/r0/register
backend:
namespace: comms
service: matrix-guest-register
port: 8080
workloads: &id003
- kind: Deployment
name: matrix-guest-register
via:
kind: Ingress
name: matrix-routing
source: comms
- host: live.bstein.dev
path: /_matrix/client/v3/login
backend:
namespace: comms
service: matrix-authentication-service
port: 8080
workloads: &id002
- kind: Deployment
name: matrix-authentication-service
via:
kind: Ingress
name: matrix-routing
source: comms
- host: live.bstein.dev
path: /_matrix/client/v3/logout
backend:
namespace: comms
service: matrix-authentication-service
port: 8080
workloads: *id002
via:
kind: Ingress
name: matrix-routing
source: comms
- host: live.bstein.dev
path: /_matrix/client/v3/refresh
backend:
namespace: comms
service: matrix-authentication-service
port: 8080
workloads: *id002
via:
kind: Ingress
name: matrix-routing
source: comms
- host: live.bstein.dev
path: /_matrix/client/v3/register
backend:
namespace: comms
service: matrix-guest-register
port: 8080
workloads: *id003
via:
kind: Ingress
name: matrix-routing
source: comms
- host: logs.bstein.dev - host: logs.bstein.dev
path: / path: /
backend: backend:
@ -1601,9 +1960,7 @@ http_endpoints:
namespace: comms namespace: comms
service: matrix-authentication-service service: matrix-authentication-service
port: 8080 port: 8080
workloads: &id002 workloads: *id002
- kind: Deployment
name: matrix-authentication-service
via: via:
kind: Ingress kind: Ingress
name: matrix-routing name: matrix-routing
@ -1647,9 +2004,7 @@ http_endpoints:
namespace: comms namespace: comms
service: matrix-guest-register service: matrix-guest-register
port: 8080 port: 8080
workloads: &id003 workloads: *id003
- kind: Deployment
name: matrix-guest-register
via: via:
kind: Ingress kind: Ingress
name: matrix-routing name: matrix-routing
@ -1722,6 +2077,19 @@ http_endpoints:
kind: Ingress kind: Ingress
name: monerod name: monerod
source: monerod source: monerod
- host: money.bstein.dev
path: /
backend:
namespace: finance
service: firefly
port: 80
workloads:
- kind: Deployment
name: firefly
via:
kind: Ingress
name: firefly
source: finance
- host: notes.bstein.dev - host: notes.bstein.dev
path: / path: /
backend: backend:
@ -1845,7 +2213,6 @@ helmrelease_host_hints:
- live.bstein.dev - live.bstein.dev
- matrix.live.bstein.dev - matrix.live.bstein.dev
comms:comms/othrys-synapse: comms:comms/othrys-synapse:
- bstein.dev
- kit.live.bstein.dev - kit.live.bstein.dev
- live.bstein.dev - live.bstein.dev
- matrix.live.bstein.dev - matrix.live.bstein.dev
@ -1856,6 +2223,8 @@ helmrelease_host_hints:
- registry.bstein.dev - registry.bstein.dev
logging:logging/data-prepper: logging:logging/data-prepper:
- registry.bstein.dev - registry.bstein.dev
longhorn:longhorn-system/longhorn:
- registry.bstein.dev
mailu:mailu-mailserver/mailu: mailu:mailu-mailserver/mailu:
- bstein.dev - bstein.dev
- mail.bstein.dev - mail.bstein.dev
@ -1863,5 +2232,8 @@ helmrelease_host_hints:
- alerts.bstein.dev - alerts.bstein.dev
monitoring:monitoring/grafana: monitoring:monitoring/grafana:
- bstein.dev - bstein.dev
- mail.bstein.dev
- metrics.bstein.dev - metrics.bstein.dev
- sso.bstein.dev - sso.bstein.dev
monitoring:monitoring/kube-state-metrics:
- atlas.bstein.dev

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@ -17,6 +17,11 @@ flowchart LR
host_bstein_dev --> svc_bstein_dev_home_bstein_dev_home_backend host_bstein_dev --> svc_bstein_dev_home_bstein_dev_home_backend
wl_bstein_dev_home_bstein_dev_home_backend["bstein-dev-home/bstein-dev-home-backend (Deployment)"] wl_bstein_dev_home_bstein_dev_home_backend["bstein-dev-home/bstein-dev-home-backend (Deployment)"]
svc_bstein_dev_home_bstein_dev_home_backend --> wl_bstein_dev_home_bstein_dev_home_backend svc_bstein_dev_home_bstein_dev_home_backend --> wl_bstein_dev_home_bstein_dev_home_backend
host_budget_bstein_dev["budget.bstein.dev"]
svc_finance_actual_budget["finance/actual-budget (Service)"]
host_budget_bstein_dev --> svc_finance_actual_budget
wl_finance_actual_budget["finance/actual-budget (Deployment)"]
svc_finance_actual_budget --> wl_finance_actual_budget
host_call_live_bstein_dev["call.live.bstein.dev"] host_call_live_bstein_dev["call.live.bstein.dev"]
svc_comms_element_call["comms/element-call (Service)"] svc_comms_element_call["comms/element-call (Service)"]
host_call_live_bstein_dev --> svc_comms_element_call host_call_live_bstein_dev --> svc_comms_element_call
@ -37,6 +42,11 @@ flowchart LR
host_cloud_bstein_dev --> svc_nextcloud_nextcloud host_cloud_bstein_dev --> svc_nextcloud_nextcloud
wl_nextcloud_nextcloud["nextcloud/nextcloud (Deployment)"] wl_nextcloud_nextcloud["nextcloud/nextcloud (Deployment)"]
svc_nextcloud_nextcloud --> wl_nextcloud_nextcloud svc_nextcloud_nextcloud --> wl_nextcloud_nextcloud
host_health_bstein_dev["health.bstein.dev"]
svc_health_wger["health/wger (Service)"]
host_health_bstein_dev --> svc_health_wger
wl_health_wger["health/wger (Deployment)"]
svc_health_wger --> wl_health_wger
host_kit_live_bstein_dev["kit.live.bstein.dev"] host_kit_live_bstein_dev["kit.live.bstein.dev"]
svc_comms_livekit_token_service["comms/livekit-token-service (Service)"] svc_comms_livekit_token_service["comms/livekit-token-service (Service)"]
host_kit_live_bstein_dev --> svc_comms_livekit_token_service host_kit_live_bstein_dev --> svc_comms_livekit_token_service
@ -50,6 +60,14 @@ flowchart LR
host_live_bstein_dev --> svc_comms_matrix_wellknown host_live_bstein_dev --> svc_comms_matrix_wellknown
svc_comms_othrys_synapse_matrix_synapse["comms/othrys-synapse-matrix-synapse (Service)"] svc_comms_othrys_synapse_matrix_synapse["comms/othrys-synapse-matrix-synapse (Service)"]
host_live_bstein_dev --> svc_comms_othrys_synapse_matrix_synapse host_live_bstein_dev --> svc_comms_othrys_synapse_matrix_synapse
svc_comms_matrix_guest_register["comms/matrix-guest-register (Service)"]
host_live_bstein_dev --> svc_comms_matrix_guest_register
wl_comms_matrix_guest_register["comms/matrix-guest-register (Deployment)"]
svc_comms_matrix_guest_register --> wl_comms_matrix_guest_register
svc_comms_matrix_authentication_service["comms/matrix-authentication-service (Service)"]
host_live_bstein_dev --> svc_comms_matrix_authentication_service
wl_comms_matrix_authentication_service["comms/matrix-authentication-service (Deployment)"]
svc_comms_matrix_authentication_service --> wl_comms_matrix_authentication_service
host_logs_bstein_dev["logs.bstein.dev"] host_logs_bstein_dev["logs.bstein.dev"]
svc_logging_oauth2_proxy_logs["logging/oauth2-proxy-logs (Service)"] svc_logging_oauth2_proxy_logs["logging/oauth2-proxy-logs (Service)"]
host_logs_bstein_dev --> svc_logging_oauth2_proxy_logs host_logs_bstein_dev --> svc_logging_oauth2_proxy_logs
@ -64,21 +82,20 @@ flowchart LR
svc_mailu_mailserver_mailu_front["mailu-mailserver/mailu-front (Service)"] svc_mailu_mailserver_mailu_front["mailu-mailserver/mailu-front (Service)"]
host_mail_bstein_dev --> svc_mailu_mailserver_mailu_front host_mail_bstein_dev --> svc_mailu_mailserver_mailu_front
host_matrix_live_bstein_dev["matrix.live.bstein.dev"] host_matrix_live_bstein_dev["matrix.live.bstein.dev"]
svc_comms_matrix_authentication_service["comms/matrix-authentication-service (Service)"]
host_matrix_live_bstein_dev --> svc_comms_matrix_authentication_service host_matrix_live_bstein_dev --> svc_comms_matrix_authentication_service
wl_comms_matrix_authentication_service["comms/matrix-authentication-service (Deployment)"]
svc_comms_matrix_authentication_service --> wl_comms_matrix_authentication_service
host_matrix_live_bstein_dev --> svc_comms_matrix_wellknown host_matrix_live_bstein_dev --> svc_comms_matrix_wellknown
host_matrix_live_bstein_dev --> svc_comms_othrys_synapse_matrix_synapse host_matrix_live_bstein_dev --> svc_comms_othrys_synapse_matrix_synapse
svc_comms_matrix_guest_register["comms/matrix-guest-register (Service)"]
host_matrix_live_bstein_dev --> svc_comms_matrix_guest_register host_matrix_live_bstein_dev --> svc_comms_matrix_guest_register
wl_comms_matrix_guest_register["comms/matrix-guest-register (Deployment)"]
svc_comms_matrix_guest_register --> wl_comms_matrix_guest_register
host_monero_bstein_dev["monero.bstein.dev"] host_monero_bstein_dev["monero.bstein.dev"]
svc_crypto_monerod["crypto/monerod (Service)"] svc_crypto_monerod["crypto/monerod (Service)"]
host_monero_bstein_dev --> svc_crypto_monerod host_monero_bstein_dev --> svc_crypto_monerod
wl_crypto_monerod["crypto/monerod (Deployment)"] wl_crypto_monerod["crypto/monerod (Deployment)"]
svc_crypto_monerod --> wl_crypto_monerod svc_crypto_monerod --> wl_crypto_monerod
host_money_bstein_dev["money.bstein.dev"]
svc_finance_firefly["finance/firefly (Service)"]
host_money_bstein_dev --> svc_finance_firefly
wl_finance_firefly["finance/firefly (Deployment)"]
svc_finance_firefly --> wl_finance_firefly
host_notes_bstein_dev["notes.bstein.dev"] host_notes_bstein_dev["notes.bstein.dev"]
svc_outline_outline["outline/outline (Service)"] svc_outline_outline["outline/outline (Service)"]
host_notes_bstein_dev --> svc_outline_outline host_notes_bstein_dev --> svc_outline_outline
@ -143,19 +160,29 @@ flowchart LR
svc_comms_livekit svc_comms_livekit
wl_comms_livekit wl_comms_livekit
svc_comms_othrys_synapse_matrix_synapse svc_comms_othrys_synapse_matrix_synapse
svc_comms_matrix_authentication_service
wl_comms_matrix_authentication_service
svc_comms_matrix_guest_register svc_comms_matrix_guest_register
wl_comms_matrix_guest_register wl_comms_matrix_guest_register
svc_comms_matrix_authentication_service
wl_comms_matrix_authentication_service
end end
subgraph crypto[crypto] subgraph crypto[crypto]
svc_crypto_monerod svc_crypto_monerod
wl_crypto_monerod wl_crypto_monerod
end end
subgraph finance[finance]
svc_finance_actual_budget
wl_finance_actual_budget
svc_finance_firefly
wl_finance_firefly
end
subgraph gitea[gitea] subgraph gitea[gitea]
svc_gitea_gitea svc_gitea_gitea
wl_gitea_gitea wl_gitea_gitea
end end
subgraph health[health]
svc_health_wger
wl_health_wger
end
subgraph jellyfin[jellyfin] subgraph jellyfin[jellyfin]
svc_jellyfin_pegasus svc_jellyfin_pegasus
wl_jellyfin_pegasus wl_jellyfin_pegasus

View File

@ -0,0 +1,26 @@
# Metis (node recovery)
## Node classes (current map)
- rpi5 Ubuntu workers: titan-04,05,06,07,08,09,10,11,20,21 (Ubuntu 24.04.3, k3s agent)
- rpi5 control-plane: titan-0a/0b/0c (Ubuntu 24.04.1, k3s server, control-plane taint)
- rpi4 Armbian longhorn: titan-13/15/17/19 (Armbian 6.6.x, k3s agent, longhorn disks)
- rpi4 Armbian standard: titan-12/14/18 (Armbian 6.6.x, k3s agent)
- amd64 agents: titan-22/24 (Debian 13, k3s agent)
- External/non-cluster: tethys, titan-db, titan-jh, oceanus/titan-23, future titan-20/21 (when added), plus any newcomers.
## Longhorn disk UUIDs (critical nodes)
- titan-13: /mnt/astreae UUID=6031fa8b-f28c-45c3-b7bc-6133300e07c6 (ext4); /mnt/asteria UUID=cbd4989d-62b5-4741-8b2a-28fdae259cae (ext4)
- titan-15: /mnt/astreae UUID=f3362f14-5822-449f-944b-ac570b5cd615 (ext4); /mnt/asteria UUID=9c5316e6-f847-4884-b502-11f2d0d15d6f (ext4)
- titan-17: /mnt/astreae UUID=1fecdade-08b0-49cb-9ae3-be6c188b0a96 (ext4); /mnt/asteria UUID=2fe9f613-d372-47ca-b84f-82084e4edda0 (ext4)
- titan-19: /mnt/astreae UUID=4890abb9-dda2-4f4f-9c0f-081ee82849cf (ext4); /mnt/asteria UUID=2b4ea28d-b0e6-4fa3-841b-cd7067ae9153 (ext4)
## Metis repo (~/Development/metis)
- CLI skeleton in Go (`cmd/metis`), inventory loader (`pkg/inventory`), plan builder (`pkg/plan`).
- `inventory.example.yaml` shows expected schema (classes + per-node overlay, Longhorn disks, labels, taints).
- `AGENTS.md` in repo is untracked and holds raw notes.
## Next implementation steps
- Add per-class golden image refs and checksums (Harbor or file://) when ready.
- Implement burn execution: download with checksum, write via dd/etcher-equivalent, mount boot/root to inject hostname/IP/k3s tokens/labels/taints, journald/GC drop-ins, and Longhorn fstab entries. Add Windows writer (diskpart + wmic) and Linux writer (dd + sgdisk) paths.
- Add Keycloak/SSH bootstrap: ensure ssh user, authorized keys, and k3s token/URL injection for agents; control-plane restore path with etcd snapshot selection.
- Add per-host inventory entries for tethys, titan-db, titan-jh, oceanus/titan-23, future 20/21 once audited.

View File

@ -0,0 +1,30 @@
---
title: Othrys verification checklist
tags:
- comms
- matrix
- element
- livekit
entrypoints:
- https://live.bstein.dev
- https://matrix.live.bstein.dev
---
1) Guest join:
- Open a private window and visit:
`https://live.bstein.dev/#/room/#othrys:live.bstein.dev?action=join`
- Confirm the guest join flow works and the displayname becomes `<word>-<word>`.
2) Keycloak login:
- Log in from `https://live.bstein.dev` and confirm MAS -> Keycloak -> Element redirect.
3) Video rooms:
- Start an Element Call room and confirm audio/video with a second account.
- Check that guests can read public rooms but cannot start calls.
4) Well-known:
- `https://live.bstein.dev/.well-known/matrix/client` returns JSON.
- `https://matrix.live.bstein.dev/.well-known/matrix/client` returns JSON.
5) TURN reachability:
- Confirm `turn.live.bstein.dev:3478` and `turns:5349` are reachable from WAN.

View File

@ -0,0 +1,73 @@
# Metis (node recovery)
## Node classes (current map)
- rpi5 Ubuntu workers: titan-04,05,06,07,08,09,10,11,20,21 (Ubuntu 24.04.3, k3s agent)
- rpi5 control-plane: titan-0a/0b/0c (Ubuntu 24.04.1, k3s server, control-plane taint)
- rpi4 Armbian longhorn: titan-13/15/17/19 (Armbian 6.6.x, k3s agent, longhorn disks)
- rpi4 Armbian standard: titan-12/14/18 (Armbian 6.6.x, k3s agent)
- amd64 agents: titan-22/24 (Debian 13, k3s agent)
- External/non-cluster: tethys, titan-db, titan-jh, oceanus/titan-23, plus any newcomers.
### Jetson nodes (titan-20/21)
- Ubuntu 20.04.6 (Focal), kernel 5.10.104-tegra, CRI containerd 2.0.5-k3s2, arch arm64.
- Storage: NVMe 232G at / (ext4); onboard mmc partitions present but root on NVMe; 1.9T sda present (unused).
- k3s agent with drop-in 99-nofile.conf.
## Longhorn disk UUIDs (critical nodes)
- titan-13: /mnt/astreae UUID=6031fa8b-f28c-45c3-b7bc-6133300e07c6 (ext4); /mnt/asteria UUID=cbd4989d-62b5-4741-8b2a-28fdae259cae (ext4)
- titan-15: /mnt/astreae UUID=f3362f14-5822-449f-944b-ac570b5cd615 (ext4); /mnt/asteria UUID=9c5316e6-f847-4884-b502-11f2d0d15d6f (ext4)
- titan-17: /mnt/astreae UUID=1fecdade-08b0-49cb-9ae3-be6c188b0a96 (ext4); /mnt/asteria UUID=2fe9f613-d372-47ca-b84f-82084e4edda0 (ext4)
- titan-19: /mnt/astreae UUID=4890abb9-dda2-4f4f-9c0f-081ee82849cf (ext4); /mnt/asteria UUID=2b4ea28d-b0e6-4fa3-841b-cd7067ae9153 (ext4)
## Metis repo (~/Development/metis)
- CLI skeleton in Go (`cmd/metis`), inventory loader (`pkg/inventory`), plan builder (`pkg/plan`).
- `inventory.example.yaml` shows expected schema (classes + per-node overlay, Longhorn disks, labels, taints).
- `AGENTS.md` in repo is untracked and holds raw notes.
## Next implementation steps
- Add per-class golden image refs and checksums (Harbor or file://) when ready.
- Implement burn execution: download with checksum, write via dd/etcher-equivalent, mount boot/root to inject hostname/IP/k3s tokens/labels/taints, journald/GC drop-ins, and Longhorn fstab entries. Add Windows writer (diskpart + wmic) and Linux writer (dd + sgdisk) paths.
- Add Keycloak/SSH bootstrap: ensure ssh user, authorized keys, and k3s token/URL injection for agents; control-plane restore path with etcd snapshot selection.
- Add per-host inventory entries for tethys, titan-db, titan-jh, oceanus/titan-23, future 20/21 once audited.
## Node OS/Kernel/CRI snapshot (Jan 2026)
- titan-04: Ubuntu 24.04.3 LTS, kernel 6.8.0-1031-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-05: Ubuntu 24.04.3 LTS, kernel 6.8.0-1039-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-06: Ubuntu 24.04.3 LTS, kernel 6.8.0-1039-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-07: Ubuntu 24.04.3 LTS, kernel 6.8.0-1039-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-08: Ubuntu 24.04.3 LTS, kernel 6.8.0-1039-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-09: Ubuntu 24.04.3 LTS, kernel 6.8.0-1031-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-0a: Ubuntu 24.04.1 LTS, kernel 6.8.0-1038-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-0b: Ubuntu 24.04.1 LTS, kernel 6.8.0-1038-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-0c: Ubuntu 24.04.1 LTS, kernel 6.8.0-1038-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-10: Ubuntu 24.04.3 LTS, kernel 6.8.0-1039-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-11: Ubuntu 24.04.3 LTS, kernel 6.8.0-1039-raspi, CRI containerd://2.0.5-k3s2, arch arm64
- titan-12: Armbian 24.11.1 noble, kernel 6.6.60-current-bcm2711, CRI containerd://1.7.23-k3s2, arch arm64
- titan-13: Armbian 25.2.1 noble, kernel 6.6.63-current-bcm2711, CRI containerd://1.7.23-k3s2, arch arm64
- titan-14: Armbian 24.11.1 noble, kernel 6.6.60-current-bcm2711, CRI containerd://1.7.23-k3s2, arch arm64
- titan-15: Armbian 25.2.1 noble, kernel 6.6.63-current-bcm2711, CRI containerd://1.7.23-k3s2, arch arm64
- titan-17: Armbian 25.2.1 noble, kernel 6.6.63-current-bcm2711, CRI containerd://1.7.23-k3s2, arch arm64
- titan-18: Armbian 24.11.1 noble, kernel 6.6.60-current-bcm2711, CRI containerd://1.7.23-k3s2, arch arm64
- titan-19: Armbian 25.2.1 noble, kernel 6.6.63-current-bcm2711, CRI containerd://1.7.23-k3s2, arch arm64
- titan-20: Ubuntu 20.04.6 LTS, kernel 5.10.104-tegra, CRI containerd://2.0.5-k3s2, arch arm64
- titan-21: Ubuntu 20.04.6 LTS, kernel 5.10.104-tegra, CRI containerd://2.0.5-k3s2, arch arm64
- titan-22: Debian 13 (trixie), kernel 6.12.41+deb13-amd64, CRI containerd://2.0.5-k3s2, arch amd64
- titan-24: Debian 13 (trixie), kernel 6.12.57+deb13-amd64, CRI containerd://2.0.5-k3s2, arch amd64
### External hosts
- titan-db: Ubuntu 24.10, kernel 6.11.0-1015-raspi, root on /dev/sda2 ext4 (465G), boot vfat /dev/sda1; PostgreSQL service enabled.
- titan-jh: Arch Linux ARM (rolling), kernel 6.18.4-2-rpi, NVMe root ext4 238G (/), boot vfat 512M; ~495 packages installed (pacman -Q).
- titan-23/oceanus: TODO audit (future).
### Control plane Pis (titan-0a/0b/0c)
- Ubuntu 24.04.1 LTS, kernel 6.8.0-1038-raspi, containerd 2.0.5-k3s2.
- Storage: 477G SSD root (/dev/sda2 ext4), /boot/firmware vfat (/dev/sda1). fstab uses LABEL=writable and LABEL=system-boot.
- k3s server (control-plane taint expected); etcd snapshots not yet cataloged (TODO).
## k3s versions
- rpi5 workers/control-plane: k3s v1.33.3+k3s1 (crictl v1.31.0-k3s2)
- rpi4 nodes: k3s v1.31.5+k3s1 (crictl v1.31.0-k3s2)
- Jetson titan-20/21: k3s v1.33.3+k3s1 (per node info), crictl v1.31.0-k3s2

View File

@ -14,6 +14,7 @@ resources:
- guest-register-deployment.yaml - guest-register-deployment.yaml
- guest-register-service.yaml - guest-register-service.yaml
- atlasbot-deployment.yaml - atlasbot-deployment.yaml
- atlasbot-service.yaml
- wellknown.yaml - wellknown.yaml
- atlasbot-rbac.yaml - atlasbot-rbac.yaml
- mas-secrets-ensure-rbac.yaml - mas-secrets-ensure-rbac.yaml
@ -21,23 +22,24 @@ resources:
- mas-db-ensure-rbac.yaml - mas-db-ensure-rbac.yaml
- synapse-signingkey-ensure-rbac.yaml - synapse-signingkey-ensure-rbac.yaml
- vault-sync-deployment.yaml - vault-sync-deployment.yaml
- mas-admin-client-secret-ensure-job.yaml - oneoffs/mas-admin-client-secret-ensure-job.yaml
- mas-db-ensure-job.yaml - oneoffs/mas-db-ensure-job.yaml
- comms-secrets-ensure-job.yaml - oneoffs/comms-secrets-ensure-job.yaml
- synapse-signingkey-ensure-job.yaml - oneoffs/synapse-admin-ensure-job.yaml
- synapse-seeder-admin-ensure-job.yaml - oneoffs/synapse-signingkey-ensure-job.yaml
- synapse-user-seed-job.yaml - oneoffs/synapse-seeder-admin-ensure-job.yaml
- mas-local-users-ensure-job.yaml - oneoffs/synapse-user-seed-job.yaml
- oneoffs/mas-local-users-ensure-job.yaml
- mas-deployment.yaml - mas-deployment.yaml
- livekit-token-deployment.yaml - livekit-token-deployment.yaml
- livekit.yaml - livekit.yaml
- coturn.yaml - coturn.yaml
- seed-othrys-room.yaml - seed-othrys-room.yaml
- guest-name-job.yaml - guest-name-job.yaml
- othrys-kick-numeric-job.yaml - oneoffs/othrys-kick-numeric-job.yaml
- pin-othrys-job.yaml - pin-othrys-job.yaml
- reset-othrys-room-job.yaml - reset-othrys-room-job.yaml
- bstein-force-leave-job.yaml - oneoffs/bstein-force-leave-job.yaml
- livekit-ingress.yaml - livekit-ingress.yaml
- livekit-middlewares.yaml - livekit-middlewares.yaml
- matrix-ingress.yaml - matrix-ingress.yaml
@ -73,5 +75,6 @@ configMapGenerator:
- INDEX.md=knowledge/INDEX.md - INDEX.md=knowledge/INDEX.md
- atlas.json=knowledge/catalog/atlas.json - atlas.json=knowledge/catalog/atlas.json
- atlas-summary.json=knowledge/catalog/atlas-summary.json - atlas-summary.json=knowledge/catalog/atlas-summary.json
- metrics.json=knowledge/catalog/metrics.json
- runbooks.json=knowledge/catalog/runbooks.json - runbooks.json=knowledge/catalog/runbooks.json
- atlas-http.mmd=knowledge/diagrams/atlas-http.mmd - atlas-http.mmd=knowledge/diagrams/atlas-http.mmd

View File

@ -72,7 +72,7 @@ data:
template: "{{ user.name }}" template: "{{ user.name }}"
email: email:
action: force action: force
template: "{{ user.email }}" template: "{{ user.mailu_email }}"
policy: policy:
data: data:

View File

@ -1,10 +1,15 @@
# services/comms/bstein-force-leave-job.yaml # services/comms/oneoffs/bstein-force-leave-job.yaml
# One-off job for comms/bstein-leave-rooms-12.
# Purpose: bstein leave rooms 12 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: bstein-leave-rooms-12 name: bstein-leave-rooms-12
namespace: comms namespace: comms
spec: spec:
suspend: true
backoffLimit: 0 backoffLimit: 0
template: template:
metadata: metadata:

View File

@ -1,10 +1,15 @@
# services/comms/comms-secrets-ensure-job.yaml # services/comms/oneoffs/comms-secrets-ensure-job.yaml
# One-off job for comms/comms-secrets-ensure-7.
# Purpose: comms secrets ensure 7 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: comms-secrets-ensure-6 name: comms-secrets-ensure-7
namespace: comms namespace: comms
spec: spec:
suspend: true
backoffLimit: 1 backoffLimit: 1
ttlSecondsAfterFinished: 3600 ttlSecondsAfterFinished: 3600
template: template:

View File

@ -1,4 +1,8 @@
# services/comms/mas-admin-client-secret-ensure-job.yaml # services/comms/oneoffs/mas-admin-client-secret-ensure-job.yaml
# One-off job for comms/mas-admin-client-secret-writer.
# Purpose: mas admin client secret writer (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: v1 apiVersion: v1
kind: ServiceAccount kind: ServiceAccount
metadata: metadata:
@ -41,6 +45,7 @@ metadata:
name: mas-admin-client-secret-ensure-11 name: mas-admin-client-secret-ensure-11
namespace: comms namespace: comms
spec: spec:
suspend: true
backoffLimit: 2 backoffLimit: 2
template: template:
spec: spec:

View File

@ -1,10 +1,15 @@
# services/comms/mas-db-ensure-job.yaml # services/comms/oneoffs/mas-db-ensure-job.yaml
# One-off job for comms/mas-db-ensure-22.
# Purpose: mas db ensure 22 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: mas-db-ensure-22 name: mas-db-ensure-22
namespace: comms namespace: comms
spec: spec:
suspend: true
backoffLimit: 1 backoffLimit: 1
ttlSecondsAfterFinished: 600 ttlSecondsAfterFinished: 600
template: template:

View File

@ -1,10 +1,15 @@
# services/comms/mas-local-users-ensure-job.yaml # services/comms/oneoffs/mas-local-users-ensure-job.yaml
# One-off job for comms/mas-local-users-ensure-18.
# Purpose: mas local users ensure 18 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: mas-local-users-ensure-15 name: mas-local-users-ensure-18
namespace: comms namespace: comms
spec: spec:
suspend: true
backoffLimit: 1 backoffLimit: 1
ttlSecondsAfterFinished: 3600 ttlSecondsAfterFinished: 3600
template: template:

View File

@ -1,10 +1,15 @@
# services/comms/othrys-kick-numeric-job.yaml # services/comms/oneoffs/othrys-kick-numeric-job.yaml
# One-off job for comms/othrys-kick-numeric-8.
# Purpose: othrys kick numeric 8 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: othrys-kick-numeric-8 name: othrys-kick-numeric-8
namespace: comms namespace: comms
spec: spec:
suspend: true
backoffLimit: 0 backoffLimit: 0
template: template:
metadata: metadata:

View File

@ -0,0 +1,219 @@
# services/comms/oneoffs/synapse-admin-ensure-job.yaml
# One-off job for comms/synapse-admin-ensure-3.
# Purpose: synapse admin ensure 3 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1
kind: Job
metadata:
name: synapse-admin-ensure-3
namespace: comms
spec:
suspend: true
backoffLimit: 0
ttlSecondsAfterFinished: 3600
template:
spec:
serviceAccountName: comms-secrets-ensure
restartPolicy: Never
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/worker
operator: Exists
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
containers:
- name: ensure
image: python:3.11-slim
env:
- name: VAULT_ADDR
value: http://vault.vault.svc.cluster.local:8200
- name: VAULT_ROLE
value: comms-secrets
- name: SYNAPSE_ADMIN_URL
value: http://othrys-synapse-matrix-synapse.comms.svc.cluster.local:8008
command:
- /bin/sh
- -c
- |
set -euo pipefail
pip install --no-cache-dir psycopg2-binary bcrypt
python - <<'PY'
import json
import os
import secrets
import string
import time
import urllib.error
import urllib.request
import bcrypt
import psycopg2
VAULT_ADDR = os.environ.get("VAULT_ADDR", "http://vault.vault.svc.cluster.local:8200").rstrip("/")
VAULT_ROLE = os.environ.get("VAULT_ROLE", "comms-secrets")
SA_TOKEN_PATH = "/var/run/secrets/kubernetes.io/serviceaccount/token"
PGHOST = "postgres-service.postgres.svc.cluster.local"
PGPORT = 5432
PGDATABASE = "synapse"
PGUSER = "synapse"
def log(msg: str) -> None:
print(msg, flush=True)
def request_json(url: str, payload: dict | None = None) -> dict:
data = None
headers = {"Content-Type": "application/json"}
if payload is not None:
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(url, data=data, headers=headers, method="POST" if data else "GET")
with urllib.request.urlopen(req, timeout=30) as resp:
return json.loads(resp.read().decode("utf-8"))
def vault_login() -> str:
with open(SA_TOKEN_PATH, "r", encoding="utf-8") as f:
jwt = f.read().strip()
payload = {"jwt": jwt, "role": VAULT_ROLE}
resp = request_json(f"{VAULT_ADDR}/v1/auth/kubernetes/login", payload)
token = resp.get("auth", {}).get("client_token")
if not token:
raise RuntimeError("vault login failed")
return token
def vault_get(token: str, path: str) -> dict:
req = urllib.request.Request(
f"{VAULT_ADDR}/v1/kv/data/atlas/{path}",
headers={"X-Vault-Token": token},
)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
payload = json.loads(resp.read().decode("utf-8"))
return payload.get("data", {}).get("data", {})
except urllib.error.HTTPError as exc:
if exc.code == 404:
return {}
raise
def vault_put(token: str, path: str, data: dict) -> None:
payload = {"data": data}
req = urllib.request.Request(
f"{VAULT_ADDR}/v1/kv/data/atlas/{path}",
data=json.dumps(payload).encode("utf-8"),
headers={"X-Vault-Token": token, "Content-Type": "application/json"},
method="POST",
)
with urllib.request.urlopen(req, timeout=30) as resp:
resp.read()
def random_password(length: int = 32) -> str:
alphabet = string.ascii_letters + string.digits
return "".join(secrets.choice(alphabet) for _ in range(length))
def ensure_admin_creds(token: str) -> dict:
data = vault_get(token, "comms/synapse-admin")
username = (data.get("username") or "").strip() or "synapse-admin"
password = (data.get("password") or "").strip()
if not password:
password = random_password()
data["username"] = username
data["password"] = password
vault_put(token, "comms/synapse-admin", data)
return data
def ensure_user(cur, cols, user_id, password, admin):
now_ms = int(time.time() * 1000)
values = {
"name": user_id,
"password_hash": bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode(),
"creation_ts": now_ms,
}
def add_flag(name, flag):
if name not in cols:
return
if cols[name]["type"] in ("smallint", "integer"):
values[name] = int(flag)
else:
values[name] = bool(flag)
add_flag("admin", admin)
add_flag("deactivated", False)
add_flag("shadow_banned", False)
add_flag("is_guest", False)
columns = list(values.keys())
placeholders = ", ".join(["%s"] * len(columns))
updates = ", ".join([f"{col}=EXCLUDED.{col}" for col in columns if col != "name"])
query = f"INSERT INTO users ({', '.join(columns)}) VALUES ({placeholders}) ON CONFLICT (name) DO UPDATE SET {updates};"
cur.execute(query, [values[c] for c in columns])
def get_cols(cur):
cur.execute(
"""
SELECT column_name, is_nullable, column_default, data_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'users'
"""
)
cols = {}
for name, is_nullable, default, data_type in cur.fetchall():
cols[name] = {
"nullable": is_nullable == "YES",
"default": default,
"type": data_type,
}
return cols
def ensure_access_token(cur, user_id, token_value):
cur.execute("SELECT COALESCE(MAX(id), 0) + 1 FROM access_tokens")
token_id = cur.fetchone()[0]
cur.execute(
"""
INSERT INTO access_tokens (id, user_id, token, device_id, valid_until_ms)
VALUES (%s, %s, %s, %s, NULL)
ON CONFLICT (token) DO NOTHING
""",
(token_id, user_id, token_value, "ariadne-admin"),
)
vault_token = vault_login()
admin_data = ensure_admin_creds(vault_token)
if admin_data.get("access_token"):
log("synapse admin token already present")
raise SystemExit(0)
synapse_db = vault_get(vault_token, "comms/synapse-db")
pg_password = synapse_db.get("POSTGRES_PASSWORD")
if not pg_password:
raise RuntimeError("synapse db password missing")
user_id = f"@{admin_data['username']}:live.bstein.dev"
conn = psycopg2.connect(
host=PGHOST,
port=PGPORT,
dbname=PGDATABASE,
user=PGUSER,
password=pg_password,
)
token_value = secrets.token_urlsafe(32)
try:
with conn:
with conn.cursor() as cur:
cols = get_cols(cur)
ensure_user(cur, cols, user_id, admin_data["password"], True)
ensure_access_token(cur, user_id, token_value)
finally:
conn.close()
admin_data["access_token"] = token_value
vault_put(vault_token, "comms/synapse-admin", admin_data)
log("synapse admin token stored")
PY

View File

@ -1,10 +1,15 @@
# services/comms/synapse-seeder-admin-ensure-job.yaml # services/comms/oneoffs/synapse-seeder-admin-ensure-job.yaml
# One-off job for comms/synapse-seeder-admin-ensure-9.
# Purpose: synapse seeder admin ensure 9 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: synapse-seeder-admin-ensure-7 name: synapse-seeder-admin-ensure-9
namespace: comms namespace: comms
spec: spec:
suspend: true
backoffLimit: 2 backoffLimit: 2
template: template:
metadata: metadata:

View File

@ -1,10 +1,15 @@
# services/comms/synapse-signingkey-ensure-job.yaml # services/comms/oneoffs/synapse-signingkey-ensure-job.yaml
# One-off job for comms/othrys-synapse-signingkey-ensure-7.
# Purpose: othrys synapse signingkey ensure 7 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: othrys-synapse-signingkey-ensure-7 name: othrys-synapse-signingkey-ensure-7
namespace: comms namespace: comms
spec: spec:
suspend: true
backoffLimit: 2 backoffLimit: 2
template: template:
spec: spec:

View File

@ -1,10 +1,15 @@
# services/comms/synapse-user-seed-job.yaml # services/comms/oneoffs/synapse-user-seed-job.yaml
# One-off job for comms/synapse-user-seed-8.
# Purpose: synapse user seed 8 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: synapse-user-seed-7 name: synapse-user-seed-8
namespace: comms namespace: comms
spec: spec:
suspend: true
backoffLimit: 1 backoffLimit: 1
ttlSecondsAfterFinished: 3600 ttlSecondsAfterFinished: 3600
template: template:

File diff suppressed because it is too large Load Diff

View File

@ -11,7 +11,7 @@ spec:
roleName: "comms" roleName: "comms"
objects: | objects: |
- objectName: "harbor-pull__dockerconfigjson" - objectName: "harbor-pull__dockerconfigjson"
secretPath: "kv/data/atlas/harbor-pull/comms" secretPath: "kv/data/atlas/shared/harbor-pull"
secretKey: "dockerconfigjson" secretKey: "dockerconfigjson"
secretObjects: secretObjects:
- secretName: harbor-regcred - secretName: harbor-regcred

View File

@ -11,7 +11,7 @@ spec:
roleName: "crypto" roleName: "crypto"
objects: | objects: |
- objectName: "harbor-pull__dockerconfigjson" - objectName: "harbor-pull__dockerconfigjson"
secretPath: "kv/data/atlas/harbor-pull/crypto" secretPath: "kv/data/atlas/shared/harbor-pull"
secretKey: "dockerconfigjson" secretKey: "dockerconfigjson"
secretObjects: secretObjects:
- secretName: harbor-regcred - secretName: harbor-regcred

View File

@ -90,6 +90,8 @@ spec:
value: openid value: openid
- name: ACTUAL_MULTIUSER - name: ACTUAL_MULTIUSER
value: "true" value: "true"
- name: ACTUAL_USER_CREATION_MODE
value: login
- name: ACTUAL_OPENID_DISCOVERY_URL - name: ACTUAL_OPENID_DISCOVERY_URL
value: https://sso.bstein.dev/realms/atlas value: https://sso.bstein.dev/realms/atlas
- name: ACTUAL_OPENID_AUTHORIZATION_ENDPOINT - name: ACTUAL_OPENID_AUTHORIZATION_ENDPOINT
@ -128,6 +130,8 @@ spec:
value: openid value: openid
- name: ACTUAL_MULTIUSER - name: ACTUAL_MULTIUSER
value: "true" value: "true"
- name: ACTUAL_USER_CREATION_MODE
value: login
- name: ACTUAL_OPENID_DISCOVERY_URL - name: ACTUAL_OPENID_DISCOVERY_URL
value: https://sso.bstein.dev/realms/atlas value: https://sso.bstein.dev/realms/atlas
- name: ACTUAL_OPENID_AUTHORIZATION_ENDPOINT - name: ACTUAL_OPENID_AUTHORIZATION_ENDPOINT

View File

@ -6,6 +6,7 @@ metadata:
namespace: finance namespace: finance
spec: spec:
schedule: "0 3 * * *" schedule: "0 3 * * *"
suspend: true
concurrencyPolicy: Forbid concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1 successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 3 failedJobsHistoryLimit: 3

View File

@ -9,7 +9,7 @@ resources:
- finance-secrets-ensure-rbac.yaml - finance-secrets-ensure-rbac.yaml
- actual-budget-data-pvc.yaml - actual-budget-data-pvc.yaml
- firefly-storage-pvc.yaml - firefly-storage-pvc.yaml
- finance-secrets-ensure-job.yaml - oneoffs/finance-secrets-ensure-job.yaml
- actual-budget-deployment.yaml - actual-budget-deployment.yaml
- firefly-deployment.yaml - firefly-deployment.yaml
- firefly-user-sync-cronjob.yaml - firefly-user-sync-cronjob.yaml

View File

@ -1,10 +1,15 @@
# services/finance/finance-secrets-ensure-job.yaml # services/finance/oneoffs/finance-secrets-ensure-job.yaml
# One-off job for finance/finance-secrets-ensure-5.
# Purpose: finance secrets ensure 5 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: finance-secrets-ensure-5 name: finance-secrets-ensure-5
namespace: finance namespace: finance
spec: spec:
suspend: true
backoffLimit: 1 backoffLimit: 1
ttlSecondsAfterFinished: 3600 ttlSecondsAfterFinished: 3600
template: template:

View File

@ -29,3 +29,17 @@ subjects:
- kind: ServiceAccount - kind: ServiceAccount
name: bstein-dev-home name: bstein-dev-home
namespace: bstein-dev-home namespace: bstein-dev-home
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ariadne-firefly-user-sync
namespace: finance
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: bstein-dev-home-firefly-user-sync
subjects:
- kind: ServiceAccount
name: ariadne
namespace: maintenance

View File

@ -169,6 +169,8 @@ spec:
value: "trace" value: "trace"
- name: GITEA__service__REQUIRE_SIGNIN_VIEW - name: GITEA__service__REQUIRE_SIGNIN_VIEW
value: "false" value: "false"
- name: GITEA__webhook__ALLOWED_HOST_LIST
value: "ci.bstein.dev"
- name: GITEA__server__PROXY_HEADERS - name: GITEA__server__PROXY_HEADERS
value: "X-Forwarded-For, X-Forwarded-Proto, X-Forwarded-Host" value: "X-Forwarded-For, X-Forwarded-Proto, X-Forwarded-Host"
- name: GITEA__session__COOKIE_SECURE - name: GITEA__session__COOKIE_SECURE

View File

@ -391,6 +391,16 @@ spec:
$patch: delete $patch: delete
- name: core-writable - name: core-writable
emptyDir: {} emptyDir: {}
- target:
kind: Ingress
name: harbor-ingress
patch: |-
- op: replace
path: /spec/rules/0/http/paths/2/backend/service/name
value: harbor-registry
- op: replace
path: /spec/rules/0/http/paths/2/backend/service/port/number
value: 5000
- target: - target:
kind: Deployment kind: Deployment
name: harbor-jobservice name: harbor-jobservice

View File

@ -11,7 +11,7 @@ spec:
roleName: "harbor" roleName: "harbor"
objects: | objects: |
- objectName: "harbor-pull__dockerconfigjson" - objectName: "harbor-pull__dockerconfigjson"
secretPath: "kv/data/atlas/harbor-pull/harbor" secretPath: "kv/data/atlas/shared/harbor-pull"
secretKey: "dockerconfigjson" secretKey: "dockerconfigjson"
secretObjects: secretObjects:
- secretName: harbor-regcred - secretName: harbor-regcred

View File

@ -8,7 +8,7 @@ rules:
- apiGroups: ["batch"] - apiGroups: ["batch"]
resources: ["cronjobs"] resources: ["cronjobs"]
verbs: ["get"] verbs: ["get"]
resourceNames: ["wger-user-sync"] resourceNames: ["wger-user-sync", "wger-admin-ensure"]
- apiGroups: ["batch"] - apiGroups: ["batch"]
resources: ["jobs"] resources: ["jobs"]
verbs: ["create", "get", "list", "watch"] verbs: ["create", "get", "list", "watch"]
@ -29,3 +29,17 @@ subjects:
- kind: ServiceAccount - kind: ServiceAccount
name: bstein-dev-home name: bstein-dev-home
namespace: bstein-dev-home namespace: bstein-dev-home
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ariadne-wger-user-sync
namespace: health
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: bstein-dev-home-wger-user-sync
subjects:
- kind: ServiceAccount
name: ariadne
namespace: maintenance

View File

@ -8,6 +8,7 @@ metadata:
atlas.bstein.dev/glue: "true" atlas.bstein.dev/glue: "true"
spec: spec:
schedule: "15 3 * * *" schedule: "15 3 * * *"
suspend: true
concurrencyPolicy: Forbid concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1 successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 3 failedJobsHistoryLimit: 3

View File

@ -0,0 +1,13 @@
# services/jenkins/cache-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-cache-v2
namespace: jenkins
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: astreae

View File

@ -18,7 +18,7 @@ data:
logoutFromOpenIdProvider: true logoutFromOpenIdProvider: true
postLogoutRedirectUrl: "https://ci.bstein.dev" postLogoutRedirectUrl: "https://ci.bstein.dev"
sendScopesInTokenRequest: true sendScopesInTokenRequest: true
rootURLFromRequest: true rootURLFromRequest: false
userNameField: "preferred_username" userNameField: "preferred_username"
fullNameFieldName: "name" fullNameFieldName: "name"
emailFieldName: "email" emailFieldName: "email"
@ -49,8 +49,15 @@ data:
jobs: jobs:
- script: | - script: |
pipelineJob('harbor-arm-build') { pipelineJob('harbor-arm-build') {
properties {
pipelineTriggers {
triggers { triggers {
scm('H/5 * * * *') scmTrigger {
scmpoll_spec('H/5 * * * *')
ignorePostCommitHooks(false)
}
}
}
} }
definition { definition {
cpsScm { cpsScm {
@ -83,8 +90,15 @@ data:
} }
} }
pipelineJob('ci-demo') { pipelineJob('ci-demo') {
properties {
pipelineTriggers {
triggers { triggers {
scm('H/1 * * * *') scmTrigger {
scmpoll_spec('H/1 * * * *')
ignorePostCommitHooks(false)
}
}
}
} }
definition { definition {
cpsScm { cpsScm {
@ -102,8 +116,15 @@ data:
} }
} }
pipelineJob('bstein-dev-home') { pipelineJob('bstein-dev-home') {
properties {
pipelineTriggers {
triggers { triggers {
scm('H/2 * * * *') scmTrigger {
scmpoll_spec('H/2 * * * *')
ignorePostCommitHooks(false)
}
}
}
} }
definition { definition {
cpsScm { cpsScm {
@ -120,9 +141,42 @@ data:
} }
} }
} }
pipelineJob('data-prepper') { pipelineJob('ariadne') {
properties {
pipelineTriggers {
triggers { triggers {
scm('H/5 * * * *') scmTrigger {
scmpoll_spec('H/2 * * * *')
ignorePostCommitHooks(false)
}
}
}
}
definition {
cpsScm {
scm {
git {
remote {
url('https://scm.bstein.dev/bstein/ariadne.git')
credentials('gitea-pat')
}
branches('*/master')
}
}
scriptPath('Jenkinsfile')
}
}
}
pipelineJob('data-prepper') {
properties {
pipelineTriggers {
triggers {
scmTrigger {
scmpoll_spec('H/5 * * * *')
ignorePostCommitHooks(false)
}
}
}
} }
definition { definition {
cpsScm { cpsScm {
@ -139,24 +193,39 @@ data:
} }
} }
} }
pipelineJob('titan-iac-quality-gate') { multibranchPipelineJob('titan-iac-quality-gate') {
triggers { branchSources {
scm('H/5 * * * *') branchSource {
} source {
definition {
cpsScm {
scm {
git { git {
remote { id('titan-iac-quality-gate')
url('https://scm.bstein.dev/bstein/titan-iac.git') remote('https://scm.bstein.dev/bstein/titan-iac.git')
credentials('gitea-pat') credentialsId('gitea-pat')
}
branches('*/feature/vault-consumption')
} }
} }
}
}
factory {
workflowBranchProjectFactory {
scriptPath('ci/Jenkinsfile.titan-iac') scriptPath('ci/Jenkinsfile.titan-iac')
} }
} }
orphanedItemStrategy {
discardOldItems {
numToKeep(30)
}
}
triggers {
periodicFolderTrigger {
interval('12h')
}
}
configure { node ->
def webhookToken = System.getenv('TITAN_IAC_WEBHOOK_TOKEN') ?: ''
def triggers = node / 'triggers'
def webhook = triggers.appendNode('com.igalg.jenkins.plugins.mswt.trigger.ComputedFolderWebHookTrigger')
webhook.appendNode('token', webhookToken)
}
} }
base.yaml: | base.yaml: |
jenkins: jenkins:
@ -189,6 +258,11 @@ data:
templates: templates:
- name: "default" - name: "default"
namespace: "jenkins" namespace: "jenkins"
workspaceVolume:
dynamicPVC:
accessModes: "ReadWriteOnce"
requestsSize: "20Gi"
storageClassName: "astreae"
containers: containers:
- name: "jnlp" - name: "jnlp"
args: "^${computer.jnlpmac} ^${computer.name}" args: "^${computer.jnlpmac} ^${computer.name}"
@ -217,3 +291,6 @@ data:
crumbIssuer: crumbIssuer:
standard: standard:
excludeClientIPFromCrumb: true excludeClientIPFromCrumb: true
unclassified:
location:
url: "https://ci.bstein.dev/"

View File

@ -6,12 +6,17 @@ metadata:
namespace: jenkins namespace: jenkins
data: data:
plugins.txt: | plugins.txt: |
kubernetes kubernetes:4416.v2ea_b_5372da_a_e
workflow-aggregator workflow-aggregator:608.v67378e9d3db_1
git git:5.8.1
pipeline-utility-steps pipeline-utility-steps:2.20.0
configuration-as-code configuration-as-code:2031.veb_a_fdda_b_3ffd
configuration-as-code-support oic-auth:4.609.v9de140f63d01
oic-auth job-dsl:1.93
job-dsl simple-theme-plugin:230.v8b_fd91b_b_800c
simple-theme-plugin workflow-multibranch:821.vc3b_4ea_780798
branch-api:2.1268.v044a_87612da_8
scm-api:724.v7d839074eb_5c
gitea:268.v75e47974c01d
gitea-checks:603.621.vc708da_fb_371d
multibranch-scan-webhook-trigger:1.0.11

View File

@ -22,23 +22,33 @@ spec:
vault.hashicorp.com/role: "jenkins" vault.hashicorp.com/role: "jenkins"
vault.hashicorp.com/agent-inject-secret-jenkins-env: "kv/data/atlas/jenkins/jenkins-oidc" vault.hashicorp.com/agent-inject-secret-jenkins-env: "kv/data/atlas/jenkins/jenkins-oidc"
vault.hashicorp.com/agent-inject-template-jenkins-env: | vault.hashicorp.com/agent-inject-template-jenkins-env: |
{{- with secret "kv/data/atlas/jenkins/jenkins-oidc" -}} {{ with secret "kv/data/atlas/jenkins/jenkins-oidc" }}
OIDC_CLIENT_ID={{ .Data.data.clientId }} OIDC_CLIENT_ID={{ .Data.data.clientId }}
OIDC_CLIENT_SECRET={{ .Data.data.clientSecret }} OIDC_CLIENT_SECRET={{ .Data.data.clientSecret }}
OIDC_AUTH_URL={{ .Data.data.authorizationUrl }} OIDC_AUTH_URL={{ .Data.data.authorizationUrl }}
OIDC_TOKEN_URL={{ .Data.data.tokenUrl }} OIDC_TOKEN_URL={{ .Data.data.tokenUrl }}
OIDC_USERINFO_URL={{ .Data.data.userInfoUrl }} OIDC_USERINFO_URL={{ .Data.data.userInfoUrl }}
OIDC_LOGOUT_URL={{ .Data.data.logoutUrl }} OIDC_LOGOUT_URL={{ .Data.data.logoutUrl }}
{{- end }} {{ end }}
{{- with secret "kv/data/atlas/jenkins/harbor-robot-creds" -}} {{ with secret "kv/data/atlas/jenkins/harbor-robot-creds" }}
HARBOR_ROBOT_USERNAME={{ .Data.data.username }}
HARBOR_ROBOT_PASSWORD={{ .Data.data.password }}
{{ end }}
{{ with secret "kv/data/atlas/shared/harbor-pull" }}
{{- if and .Data.data.username .Data.data.password }}
HARBOR_ROBOT_USERNAME={{ .Data.data.username }} HARBOR_ROBOT_USERNAME={{ .Data.data.username }}
HARBOR_ROBOT_PASSWORD={{ .Data.data.password }} HARBOR_ROBOT_PASSWORD={{ .Data.data.password }}
{{- end }} {{- end }}
{{- with secret "kv/data/atlas/jenkins/gitea-pat" -}} {{ end }}
{{ with secret "kv/data/atlas/jenkins/gitea-pat" }}
GITEA_PAT_USERNAME={{ .Data.data.username }} GITEA_PAT_USERNAME={{ .Data.data.username }}
GITEA_PAT_TOKEN={{ .Data.data.token }} GITEA_PAT_TOKEN={{ .Data.data.token }}
{{- end -}} {{ end }}
bstein.dev/restarted-at: "2026-01-19T00:25:00Z" {{ with secret "kv/data/atlas/jenkins/webhook-tokens" }}
TITAN_IAC_WEBHOOK_TOKEN={{ .Data.data.titan_iac_quality_gate }}
GIT_NOTIFY_TOKEN_BSTEIN_DEV_HOME={{ .Data.data.git_notify_bstein_dev_home }}
{{ end }}
bstein.dev/restarted-at: "2026-01-20T14:52:41Z"
spec: spec:
serviceAccountName: jenkins serviceAccountName: jenkins
nodeSelector: nodeSelector:
@ -98,7 +108,9 @@ spec:
containerPort: 50000 containerPort: 50000
env: env:
- name: JAVA_OPTS - name: JAVA_OPTS
value: "-Xms512m -Xmx2048m" value: "-Xms512m -Xmx2048m -Duser.timezone=America/Chicago"
- name: TZ
value: "America/Chicago"
- name: JENKINS_OPTS - name: JENKINS_OPTS
value: "--webroot=/var/jenkins_cache/war" value: "--webroot=/var/jenkins_cache/war"
- name: JENKINS_SLAVE_AGENT_PORT - name: JENKINS_SLAVE_AGENT_PORT
@ -148,6 +160,8 @@ spec:
mountPath: /config/jcasc mountPath: /config/jcasc
- name: init-scripts - name: init-scripts
mountPath: /usr/share/jenkins/ref/init.groovy.d mountPath: /usr/share/jenkins/ref/init.groovy.d
- name: init-scripts
mountPath: /var/jenkins_home/init.groovy.d
- name: plugin-dir - name: plugin-dir
mountPath: /usr/share/jenkins/ref/plugins mountPath: /usr/share/jenkins/ref/plugins
- name: tmp - name: tmp
@ -157,9 +171,11 @@ spec:
persistentVolumeClaim: persistentVolumeClaim:
claimName: jenkins claimName: jenkins
- name: jenkins-cache - name: jenkins-cache
emptyDir: {} persistentVolumeClaim:
claimName: jenkins-cache-v2
- name: plugin-dir - name: plugin-dir
emptyDir: {} persistentVolumeClaim:
claimName: jenkins-plugins-v2
- name: plugins - name: plugins
configMap: configMap:
name: jenkins-plugins name: jenkins-plugins
@ -170,4 +186,5 @@ spec:
configMap: configMap:
name: jenkins-init-scripts name: jenkins-init-scripts
- name: tmp - name: tmp
emptyDir: {} emptyDir:
medium: Memory

View File

@ -5,9 +5,14 @@ namespace: jenkins
resources: resources:
- namespace.yaml - namespace.yaml
- serviceaccount.yaml - serviceaccount.yaml
- vault-serviceaccount.yaml
- pvc.yaml - pvc.yaml
- cache-pvc.yaml
- plugins-pvc.yaml
- configmap-jcasc.yaml - configmap-jcasc.yaml
- configmap-plugins.yaml - configmap-plugins.yaml
- secretproviderclass.yaml
- vault-sync-deployment.yaml
- deployment.yaml - deployment.yaml
- service.yaml - service.yaml
- ingress.yaml - ingress.yaml
@ -16,6 +21,7 @@ configMapGenerator:
- name: jenkins-init-scripts - name: jenkins-init-scripts
namespace: jenkins namespace: jenkins
files: files:
- git-notify-token.groovy=scripts/git-notify-token.groovy
- theme.groovy=scripts/theme.groovy - theme.groovy=scripts/theme.groovy
options: options:
disableNameSuffixHash: true disableNameSuffixHash: true

View File

@ -0,0 +1,13 @@
# services/jenkins/plugins-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-plugins-v2
namespace: jenkins
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: astreae

View File

@ -0,0 +1,41 @@
import hudson.plugins.git.ApiTokenPropertyConfiguration
import hudson.Util
import java.nio.charset.StandardCharsets
import java.security.MessageDigest
def entries = [
[env: 'GIT_NOTIFY_TOKEN_BSTEIN_DEV_HOME', name: 'gitea-bstein-dev-home'],
]
entries.each { entry ->
def token = System.getenv(entry.env)
if (!token || token.trim().isEmpty()) {
println("Git notifyCommit token ${entry.env} missing; skipping")
return
}
try {
def config = ApiTokenPropertyConfiguration.get()
if (config.hasMatchingApiToken(token)) {
println("Git notifyCommit token ${entry.name} already configured")
return
}
def digest = MessageDigest.getInstance("SHA-256")
def hash = Util.toHexString(digest.digest(token.getBytes(StandardCharsets.US_ASCII)))
def field = ApiTokenPropertyConfiguration.class.getDeclaredField("apiTokens")
field.setAccessible(true)
def tokens = field.get(config)
def ctor = ApiTokenPropertyConfiguration.HashedApiToken.class.getDeclaredConstructor(String.class, String.class)
ctor.setAccessible(true)
tokens.add(ctor.newInstance(entry.name, hash))
config.save()
println("Added git notifyCommit access token ${entry.name}")
} catch (Throwable e) {
println("Failed to configure git notifyCommit token ${entry.name}: ${e.class.simpleName}: ${e.message}")
}
}

View File

@ -1,15 +1,137 @@
import jenkins.model.Jenkins import jenkins.model.Jenkins
import org.codefirst.SimpleThemeDecorator import org.codefirst.SimpleThemeDecorator
import org.jenkinsci.plugins.simpletheme.CssTextThemeElement
def instance = Jenkins.get() def instance = Jenkins.get()
def decorators = instance.getExtensionList(SimpleThemeDecorator.class) def decorators = instance.getExtensionList(SimpleThemeDecorator.class)
if (decorators?.size() > 0) { if (decorators?.size() > 0) {
def theme = decorators[0] def theme = decorators[0]
theme.setCssUrl("https://jenkins-contrib-themes.github.io/jenkins-material-theme/dist/material-ocean.css") def cssRules = """
:root,
.app-theme-picker__picker[data-theme=none] {
--background: #0f1216 !important;
--header-background: #141922 !important;
--header-border: #2b313b !important;
--white: #141922 !important;
--black: #e6e9ef !important;
--very-light-grey: #171b21 !important;
--light-grey: #202734 !important;
--medium-grey: #2b313b !important;
--dark-grey: #0b0f14 !important;
--text-color: #e6e9ef !important;
--text-color-secondary: #a6adba !important;
--card-background: #171b21 !important;
--card-border-color: #2b313b !important;
--pane-header-bg: #1f252d !important;
--pane-header-border-color: #2b313b !important;
--pane-border-color: #2b313b !important;
--pane-text-color: #e6e9ef !important;
--pane-header-text-color: #e6e9ef !important;
--link-color: #8fb7ff !important;
--link-color--hover: #b0ccff !important;
--link-dark-color: #e6e9ef !important;
--link-dark-color--hover: #b0ccff !important;
--input-color: #151a20 !important;
--input-border: #2b313b !important;
--input-border-hover: #3a424d !important;
--button-background: #232a33 !important;
--button-background--hover: #2b313b !important;
--button-background--active: #323b46 !important;
--item-background--hover: #232a33 !important;
--item-background--active: #2b313b !important;
--accent-color: #8fb7ff !important;
}
body,
#page-body,
#page-header,
#header,
#main-panel,
#main-panel-content,
#side-panel,
.top-sticker-inner,
.bottom-sticker-inner,
#breadcrumbBar,
#breadcrumbs {
background-color: var(--background) !important;
color: var(--text-color) !important;
}
.jenkins-card,
.jenkins-section,
.jenkins-section__item,
#main-panel .jenkins-card,
#main-panel .jenkins-section {
background-color: var(--card-background) !important;
color: var(--text-color) !important;
border-color: var(--card-border-color) !important;
}
table.pane,
table.pane td,
table.pane th,
#projectstatus td,
#projectstatus th {
background-color: var(--card-background) !important;
color: var(--text-color) !important;
}
table.pane tr:nth-child(even) td,
#projectstatus tr:hover td {
background-color: #1f252d !important;
}
input,
select,
textarea,
#search-box {
background-color: #151a20 !important;
color: var(--text-color) !important;
border-color: var(--input-border) !important;
}
a,
a:visited,
a:link {
color: var(--link-color) !important;
}
a:hover {
opacity: 0.85;
}
#side-panel .task-link,
#breadcrumbs a,
#breadcrumbs,
#projectstatus th a {
color: var(--text-color-secondary) !important;
}
.console-output,
.console-output pre,
pre,
code,
.CodeMirror {
background-color: #0c0f14 !important;
color: #d9dee7 !important;
}
#footer {
background-color: var(--background) !important;
color: var(--text-color-secondary) !important;
}
.jenkins_ver:after {
content: "atlas dark";
}
""".stripIndent().trim()
theme.setElements([new CssTextThemeElement(cssRules)])
theme.setCssUrl("")
theme.setCssRules(cssRules)
theme.setJsUrl("") theme.setJsUrl("")
theme.setTheme("") theme.save()
instance.save()
println("Applied simple-theme-plugin dark theme") println("Applied simple-theme-plugin dark theme")
} else { } else {
println("simple-theme-plugin not installed; skipping theme configuration") println("simple-theme-plugin not installed; skipping theme configuration")

View File

@ -0,0 +1,21 @@
# services/jenkins/secretproviderclass.yaml
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: jenkins-vault
namespace: jenkins
spec:
provider: vault
parameters:
vaultAddress: "http://vault.vault.svc.cluster.local:8200"
roleName: "jenkins"
objects: |
- objectName: "harbor-pull__dockerconfigjson"
secretPath: "kv/data/atlas/shared/harbor-pull"
secretKey: "dockerconfigjson"
secretObjects:
- secretName: harbor-bstein-robot
type: kubernetes.io/dockerconfigjson
data:
- objectName: harbor-pull__dockerconfigjson
key: .dockerconfigjson

View File

@ -0,0 +1,6 @@
# services/jenkins/vault-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: jenkins-vault-sync
namespace: jenkins

View File

@ -0,0 +1,37 @@
# services/jenkins/vault-sync-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jenkins-vault-sync
namespace: jenkins
spec:
replicas: 1
selector:
matchLabels:
app: jenkins-vault-sync
template:
metadata:
labels:
app: jenkins-vault-sync
spec:
serviceAccountName: jenkins-vault-sync
nodeSelector:
kubernetes.io/arch: arm64
node-role.kubernetes.io/worker: "true"
containers:
- name: sync
image: alpine:3.20
command: ["/bin/sh", "-c"]
args:
- "sleep infinity"
volumeMounts:
- name: vault-secrets
mountPath: /vault/secrets
readOnly: true
volumes:
- name: vault-secrets
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: jenkins-vault

View File

@ -126,7 +126,7 @@ spec:
- name: KC_EVENTS_LISTENERS - name: KC_EVENTS_LISTENERS
value: jboss-logging,mailu-http value: jboss-logging,mailu-http
- name: KC_SPI_EVENTS_LISTENER_MAILU-HTTP_ENDPOINT - name: KC_SPI_EVENTS_LISTENER_MAILU-HTTP_ENDPOINT
value: http://mailu-sync-listener.mailu-mailserver.svc.cluster.local:8080/events value: http://ariadne.maintenance.svc.cluster.local/events
ports: ports:
- containerPort: 8080 - containerPort: 8080
name: http name: http

View File

@ -10,21 +10,21 @@ resources:
- secretproviderclass.yaml - secretproviderclass.yaml
- vault-sync-deployment.yaml - vault-sync-deployment.yaml
- deployment.yaml - deployment.yaml
- realm-settings-job.yaml - oneoffs/realm-settings-job.yaml
- portal-admin-client-secret-ensure-job.yaml - oneoffs/portal-admin-client-secret-ensure-job.yaml
- portal-e2e-client-job.yaml - oneoffs/portal-e2e-client-job.yaml
- portal-e2e-target-client-job.yaml - oneoffs/portal-e2e-target-client-job.yaml
- portal-e2e-token-exchange-permissions-job.yaml - oneoffs/portal-e2e-token-exchange-permissions-job.yaml
- portal-e2e-token-exchange-test-job.yaml - oneoffs/portal-e2e-token-exchange-test-job.yaml
- portal-e2e-execute-actions-email-test-job.yaml - oneoffs/portal-e2e-execute-actions-email-test-job.yaml
- ldap-federation-job.yaml - oneoffs/ldap-federation-job.yaml
- user-overrides-job.yaml - oneoffs/user-overrides-job.yaml
- mas-secrets-ensure-job.yaml - oneoffs/mas-secrets-ensure-job.yaml
- synapse-oidc-secret-ensure-job.yaml - oneoffs/synapse-oidc-secret-ensure-job.yaml
- logs-oidc-secret-ensure-job.yaml - oneoffs/logs-oidc-secret-ensure-job.yaml
- harbor-oidc-secret-ensure-job.yaml - oneoffs/harbor-oidc-secret-ensure-job.yaml
- vault-oidc-secret-ensure-job.yaml - oneoffs/vault-oidc-secret-ensure-job.yaml
- actual-oidc-secret-ensure-job.yaml - oneoffs/actual-oidc-secret-ensure-job.yaml
- service.yaml - service.yaml
- ingress.yaml - ingress.yaml
generatorOptions: generatorOptions:

View File

@ -1,10 +1,15 @@
# services/keycloak/actual-oidc-secret-ensure-job.yaml # services/keycloak/oneoffs/actual-oidc-secret-ensure-job.yaml
# One-off job for sso/actual-oidc-secret-ensure-3.
# Purpose: actual oidc secret ensure 3 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: actual-oidc-secret-ensure-3 name: actual-oidc-secret-ensure-3
namespace: sso namespace: sso
spec: spec:
suspend: true
backoffLimit: 0 backoffLimit: 0
ttlSecondsAfterFinished: 3600 ttlSecondsAfterFinished: 3600
template: template:

View File

@ -1,10 +1,15 @@
# services/keycloak/harbor-oidc-secret-ensure-job.yaml # services/keycloak/oneoffs/harbor-oidc-secret-ensure-job.yaml
# One-off job for sso/harbor-oidc-secret-ensure-10.
# Purpose: harbor oidc secret ensure 10 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: harbor-oidc-secret-ensure-9 name: harbor-oidc-secret-ensure-10
namespace: sso namespace: sso
spec: spec:
suspend: true
backoffLimit: 0 backoffLimit: 0
ttlSecondsAfterFinished: 3600 ttlSecondsAfterFinished: 3600
template: template:

View File

@ -1,10 +1,15 @@
# services/keycloak/ldap-federation-job.yaml # services/keycloak/oneoffs/ldap-federation-job.yaml
# One-off job for sso/keycloak-ldap-federation-12.
# Purpose: keycloak ldap federation 12 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: keycloak-ldap-federation-11 name: keycloak-ldap-federation-12
namespace: sso namespace: sso
spec: spec:
suspend: true
backoffLimit: 2 backoffLimit: 2
template: template:
metadata: metadata:
@ -325,6 +330,54 @@ spec:
if status not in (201, 204): if status not in (201, 204):
raise SystemExit(f"Unexpected group mapper create status: {status}") raise SystemExit(f"Unexpected group mapper create status: {status}")
def ensure_user_attr_mapper(name: str, ldap_attr: str, user_attr: str):
mapper = None
for c in components:
if c.get("name") == name and c.get("parentId") == ldap_component_id:
mapper = c
break
payload = {
"name": name,
"providerId": "user-attribute-ldap-mapper",
"providerType": "org.keycloak.storage.ldap.mappers.LDAPStorageMapper",
"parentId": ldap_component_id,
"config": {
"ldap.attribute": [ldap_attr],
"user.model.attribute": [user_attr],
"read.only": ["false"],
"always.read.value.from.ldap": ["false"],
"is.mandatory.in.ldap": ["false"],
},
}
if mapper:
payload["id"] = mapper["id"]
payload["parentId"] = mapper.get("parentId", payload["parentId"])
print(f"Updating LDAP user mapper: {payload['id']} ({name})")
status, _, _ = http_json(
"PUT",
f"{base_url}/admin/realms/{realm}/components/{payload['id']}",
token,
payload,
)
if status not in (200, 204):
raise SystemExit(f"Unexpected user mapper update status for {name}: {status}")
else:
print(f"Creating LDAP user mapper: {name}")
status, _, _ = http_json(
"POST",
f"{base_url}/admin/realms/{realm}/components",
token,
payload,
)
if status not in (201, 204):
raise SystemExit(f"Unexpected user mapper create status for {name}: {status}")
ensure_user_attr_mapper("openldap-email", "mail", "email")
ensure_user_attr_mapper("openldap-first-name", "givenName", "firstName")
ensure_user_attr_mapper("openldap-last-name", "sn", "lastName")
# Cleanup duplicate LDAP federation providers and their child components (mappers, etc). # Cleanup duplicate LDAP federation providers and their child components (mappers, etc).
# Keep only the canonical provider we updated/created above. # Keep only the canonical provider we updated/created above.
try: try:

View File

@ -1,10 +1,15 @@
# services/keycloak/logs-oidc-secret-ensure-job.yaml # services/keycloak/oneoffs/logs-oidc-secret-ensure-job.yaml
# One-off job for sso/logs-oidc-secret-ensure-10.
# Purpose: logs oidc secret ensure 10 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: logs-oidc-secret-ensure-10 name: logs-oidc-secret-ensure-10
namespace: sso namespace: sso
spec: spec:
suspend: true
backoffLimit: 0 backoffLimit: 0
ttlSecondsAfterFinished: 3600 ttlSecondsAfterFinished: 3600
template: template:

View File

@ -1,4 +1,8 @@
# services/keycloak/mas-secrets-ensure-job.yaml # services/keycloak/oneoffs/mas-secrets-ensure-job.yaml
# One-off job for sso/mas-secrets-ensure.
# Purpose: mas secrets ensure (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: v1 apiVersion: v1
kind: ServiceAccount kind: ServiceAccount
metadata: metadata:
@ -13,6 +17,7 @@ metadata:
name: mas-secrets-ensure-21 name: mas-secrets-ensure-21
namespace: sso namespace: sso
spec: spec:
suspend: true
backoffLimit: 0 backoffLimit: 0
ttlSecondsAfterFinished: 3600 ttlSecondsAfterFinished: 3600
template: template:

View File

@ -1,10 +1,15 @@
# services/keycloak/portal-admin-client-secret-ensure-job.yaml # services/keycloak/oneoffs/portal-admin-client-secret-ensure-job.yaml
# One-off job for sso/keycloak-portal-admin-secret-ensure-4.
# Purpose: keycloak portal admin secret ensure 4 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: keycloak-portal-admin-secret-ensure-4 name: keycloak-portal-admin-secret-ensure-4
namespace: sso namespace: sso
spec: spec:
suspend: true
backoffLimit: 0 backoffLimit: 0
template: template:
metadata: metadata:

View File

@ -1,10 +1,15 @@
# services/keycloak/portal-e2e-client-job.yaml # services/keycloak/oneoffs/portal-e2e-client-job.yaml
# One-off job for sso/keycloak-portal-e2e-client-8.
# Purpose: keycloak portal e2e client 8 (see container args/env in this file).
# Run by setting spec.suspend to false, reconcile, then set it back to true.
# Safe to delete the finished Job/pod; it should not run continuously.
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: keycloak-portal-e2e-client-8 name: keycloak-portal-e2e-client-8
namespace: sso namespace: sso
spec: spec:
suspend: true
backoffLimit: 0 backoffLimit: 0
template: template:
metadata: metadata:

Some files were not shown because too many files have changed in this diff Show More