monitoring: restore README
This commit is contained in:
parent
68d4f43903
commit
b703e66b98
28
services/monitoring/README.md
Normal file
28
services/monitoring/README.md
Normal file
@ -0,0 +1,28 @@
|
|||||||
|
# services/monitoring
|
||||||
|
|
||||||
|
## Grafana admin secret
|
||||||
|
|
||||||
|
The Grafana Helm release expects a pre-existing secret named `grafana-admin`
|
||||||
|
in the `monitoring` namespace. Create or rotate it with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl create secret generic grafana-admin \
|
||||||
|
--namespace monitoring \
|
||||||
|
--from-literal=admin-user=admin \
|
||||||
|
--from-literal=admin-password='REPLACE_ME'
|
||||||
|
```
|
||||||
|
|
||||||
|
Update the password whenever you rotate credentials.
|
||||||
|
|
||||||
|
## DCGM exporter image
|
||||||
|
|
||||||
|
The NVIDIA GPU metrics DaemonSet expects `registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04`, mirrored from `docker.io/nvidia/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04`. Refresh it in Zot when bumping versions:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
skopeo copy \
|
||||||
|
--all \
|
||||||
|
docker://docker.io/nvidia/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04 \
|
||||||
|
docker://registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04
|
||||||
|
```
|
||||||
|
|
||||||
|
When finished mirroring from the control-plane, you can remove temporary tooling with `sudo apt-get purge -y skopeo && sudo apt-get autoremove -y` and clear `~/.config/containers/auth.json`.
|
||||||
Loading…
x
Reference in New Issue
Block a user