Compare commits
2 Commits
a1c8a99866
...
75a992b829
| Author | SHA1 | Date | |
|---|---|---|---|
| 75a992b829 | |||
| a87a5f7bff |
@ -1,47 +1,60 @@
|
|||||||
# Soteria PVC Restore Drill (backup.bstein.dev)
|
# Soteria PVC Restore Drill (backup.bstein.dev)
|
||||||
|
|
||||||
Use this runbook for a minimal production-safe restore drill after each meaningful Soteria change.
|
Use this checklist after meaningful Soteria backup, restore, auth, or alerting changes.
|
||||||
|
|
||||||
## Preconditions
|
## Production Restore Drill Checklist
|
||||||
|
|
||||||
- `maintenance` kustomization is reconciled and healthy in Flux.
|
1. Verify baseline health before touching restores.
|
||||||
- `soteria` and `oauth2-proxy-soteria` Deployments are ready in `maintenance`.
|
- `flux get kustomizations -n flux-system maintenance`
|
||||||
- Operator account is in Keycloak group `admin` or `maintenance`.
|
- `kubectl -n maintenance get deploy soteria oauth2-proxy-soteria`
|
||||||
- Source PVC is not ephemeral/test throwaway storage that should be excluded from backup policy.
|
2. Confirm operator access and source safety.
|
||||||
|
- Operator must be in Keycloak group `admin` or `maintenance`.
|
||||||
## Operator Flow (UI)
|
- Choose a real source PVC that is expected to be backed up, not a throwaway test PVC.
|
||||||
|
3. Run the UI flow at `https://backup.bstein.dev`.
|
||||||
1. Open `https://backup.bstein.dev` and sign in through Keycloak.
|
- Sign in via Keycloak.
|
||||||
2. In `PVC Inventory`, pick source namespace/PVC.
|
- In `PVC Inventory`, select source namespace and PVC.
|
||||||
3. Click `Backup now` and wait for success response in `Last Action`.
|
- Click `Backup now` and wait for success in `Last Action`.
|
||||||
4. Click `Restore`, choose a completed backup snapshot, and set:
|
- Click `Restore` and pick a completed snapshot.
|
||||||
- `Target namespace`: destination namespace (defaults to source)
|
- Set `Target namespace` and unique `Target PVC name` (`restore-<source-pvc>-<date>`).
|
||||||
- `Target PVC name`: unique drill PVC name (`restore-<source-pvc>-<date>`)
|
- Click `Create restore PVC`.
|
||||||
5. Click `Create restore PVC`.
|
4. Validate restore output.
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
1. Confirm restore target exists:
|
|
||||||
- `kubectl -n <target-namespace> get pvc <target-pvc>`
|
- `kubectl -n <target-namespace> get pvc <target-pvc>`
|
||||||
2. Confirm backup telemetry is present:
|
- If workload-level validation is required, attach a temporary pod and inspect expected files/data.
|
||||||
- `kubectl -n monitoring port-forward svc/victoria-metrics-k8s-stack 8428:8428`
|
5. Clean up.
|
||||||
- `curl -fsS 'http://127.0.0.1:8428/api/v1/query?query=max%20by%20(namespace%2Cpvc)(pvc_backup_age_hours)'`
|
|
||||||
3. Confirm alerting input stays healthy:
|
|
||||||
- `pvc_backup_health{namespace="<source-namespace>",pvc="<source-pvc>"} == 1`
|
|
||||||
|
|
||||||
## Cleanup
|
|
||||||
|
|
||||||
1. Remove drill PVC after validation:
|
|
||||||
- `kubectl -n <target-namespace> delete pvc <target-pvc>`
|
- `kubectl -n <target-namespace> delete pvc <target-pvc>`
|
||||||
2. If a detached restore Longhorn volume remains, remove it in Longhorn UI/API.
|
- Remove detached restore Longhorn volume from Longhorn UI/API if one remains.
|
||||||
|
|
||||||
|
## Alert Query Verification (`maint-soteria-*`)
|
||||||
|
|
||||||
|
Start a local query endpoint:
|
||||||
|
|
||||||
|
`kubectl -n monitoring port-forward svc/victoria-metrics-k8s-stack 8428:8428`
|
||||||
|
|
||||||
|
Validate each alert expression directly.
|
||||||
|
|
||||||
|
1. `maint-soteria-refresh-stale` (`time() - soteria_inventory_refresh_timestamp_seconds`, threshold `> 900`).
|
||||||
|
- `curl -fsS --get 'http://127.0.0.1:8428/api/v1/query' --data-urlencode 'query=time() - soteria_inventory_refresh_timestamp_seconds'`
|
||||||
|
- `curl -fsS --get 'http://127.0.0.1:8428/api/v1/query' --data-urlencode 'query=(time() - soteria_inventory_refresh_timestamp_seconds) > bool 900'`
|
||||||
|
- Healthy expectation: age is below `900` and threshold query returns `0`.
|
||||||
|
2. `maint-soteria-backup-unhealthy` (`sum((1 - pvc_backup_health{driver="longhorn"}) > bool 0) or on() vector(0)`, threshold `> 0`).
|
||||||
|
- `curl -fsS --get 'http://127.0.0.1:8428/api/v1/query' --data-urlencode 'query=sum((1 - pvc_backup_health{driver="longhorn"}) > bool 0) or on() vector(0)'`
|
||||||
|
- `curl -fsS --get 'http://127.0.0.1:8428/api/v1/query' --data-urlencode 'query=(1 - pvc_backup_health{driver="longhorn"}) > bool 0'`
|
||||||
|
- `curl -fsS --get 'http://127.0.0.1:8428/api/v1/query' --data-urlencode 'query=max by (namespace,pvc) (pvc_backup_age_hours{driver="longhorn"})'`
|
||||||
|
- Healthy expectation: unhealthy count is `0`; no series should be `1` in the per-PVC unhealthy query.
|
||||||
|
3. `maint-soteria-authz-denials` (`sum(increase(soteria_authz_denials_total[15m])) or on() vector(0)`, threshold `> 9` for 10m).
|
||||||
|
- `curl -fsS --get 'http://127.0.0.1:8428/api/v1/query' --data-urlencode 'query=sum(increase(soteria_authz_denials_total[15m])) or on() vector(0)'`
|
||||||
|
- `curl -fsS --get 'http://127.0.0.1:8428/api/v1/query' --data-urlencode 'query=sum by (reason) (increase(soteria_authz_denials_total[15m]))'`
|
||||||
|
- Healthy expectation: total remains below `10` in normal operation; spikes should map to expected `reason` labels.
|
||||||
|
|
||||||
## Failure Triage
|
## Failure Triage
|
||||||
|
|
||||||
- `401/403` on UI/API:
|
- `401/403` on UI or API:
|
||||||
- Verify oauth2-proxy group claims include `admin` or `maintenance`.
|
- Verify oauth2-proxy group claims include `admin` or `maintenance`.
|
||||||
- Restore conflict:
|
- Restore conflict:
|
||||||
- Target PVC already exists; pick a new target PVC name.
|
- Target PVC already exists; choose a new target PVC name.
|
||||||
- Freshness alert firing (`maint-soteria-refresh-stale`):
|
- `maint-soteria-refresh-stale` firing:
|
||||||
- Check Soteria pod health and `/metrics` scrape reachability from `monitoring`.
|
- Check Soteria pod health and `/metrics` scrape reachability from `monitoring`.
|
||||||
- Unhealthy PVC alert firing (`maint-soteria-backup-unhealthy`):
|
- `maint-soteria-backup-unhealthy` firing:
|
||||||
- Inspect `pvc_backup_health` and `pvc_backup_age_hours` for stale/missing backup coverage.
|
- Inspect `pvc_backup_health` and `pvc_backup_age_hours` to identify stale or missing backups.
|
||||||
|
- `maint-soteria-authz-denials` firing:
|
||||||
|
- Confirm expected OIDC groups and inspect denial `reason` labels for policy or header regressions.
|
||||||
|
|||||||
@ -38,6 +38,7 @@ resources:
|
|||||||
- node-image-sweeper-daemonset.yaml
|
- node-image-sweeper-daemonset.yaml
|
||||||
- metis-service.yaml
|
- metis-service.yaml
|
||||||
- soteria-networkpolicy.yaml
|
- soteria-networkpolicy.yaml
|
||||||
|
- oauth2-proxy-soteria-networkpolicy.yaml
|
||||||
- soteria-ingress.yaml
|
- soteria-ingress.yaml
|
||||||
- soteria-certificate.yaml
|
- soteria-certificate.yaml
|
||||||
- oauth2-proxy-soteria.yaml
|
- oauth2-proxy-soteria.yaml
|
||||||
|
|||||||
23
services/maintenance/oauth2-proxy-soteria-networkpolicy.yaml
Normal file
23
services/maintenance/oauth2-proxy-soteria-networkpolicy.yaml
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
# services/maintenance/oauth2-proxy-soteria-networkpolicy.yaml
|
||||||
|
apiVersion: networking.k8s.io/v1
|
||||||
|
kind: NetworkPolicy
|
||||||
|
metadata:
|
||||||
|
name: oauth2-proxy-soteria-ingress
|
||||||
|
namespace: maintenance
|
||||||
|
spec:
|
||||||
|
podSelector:
|
||||||
|
matchLabels:
|
||||||
|
app: oauth2-proxy-soteria
|
||||||
|
policyTypes:
|
||||||
|
- Ingress
|
||||||
|
ingress:
|
||||||
|
- from:
|
||||||
|
- namespaceSelector:
|
||||||
|
matchLabels:
|
||||||
|
kubernetes.io/metadata.name: traefik
|
||||||
|
podSelector:
|
||||||
|
matchLabels:
|
||||||
|
app: traefik
|
||||||
|
ports:
|
||||||
|
- protocol: TCP
|
||||||
|
port: 4180
|
||||||
@ -131,7 +131,7 @@ data:
|
|||||||
type: threshold
|
type: threshold
|
||||||
conditions:
|
conditions:
|
||||||
- evaluator:
|
- evaluator:
|
||||||
params: [3]
|
params: [2]
|
||||||
type: gt
|
type: gt
|
||||||
operator:
|
operator:
|
||||||
type: and
|
type: and
|
||||||
@ -578,7 +578,7 @@ data:
|
|||||||
type: threshold
|
type: threshold
|
||||||
conditions:
|
conditions:
|
||||||
- evaluator:
|
- evaluator:
|
||||||
params: [10]
|
params: [9]
|
||||||
type: gt
|
type: gt
|
||||||
operator:
|
operator:
|
||||||
type: and
|
type: and
|
||||||
@ -793,3 +793,440 @@ data:
|
|||||||
summary: "Postmark exporter reports sustained API outage"
|
summary: "Postmark exporter reports sustained API outage"
|
||||||
labels:
|
labels:
|
||||||
severity: warning
|
severity: warning
|
||||||
|
- orgId: 1
|
||||||
|
name: typhon
|
||||||
|
folder: Alerts
|
||||||
|
interval: 1m
|
||||||
|
rules:
|
||||||
|
- uid: typhon-exporter-down
|
||||||
|
title: "Typhon exporter down (>10m)"
|
||||||
|
condition: C
|
||||||
|
for: "10m"
|
||||||
|
data:
|
||||||
|
- refId: A
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 600
|
||||||
|
to: 0
|
||||||
|
datasourceUid: atlas-vm
|
||||||
|
model:
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
expr: max(typhon_up) or on() vector(0)
|
||||||
|
legendFormat: typhon_up
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: atlas-vm
|
||||||
|
- refId: B
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: A
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
reducer: last
|
||||||
|
type: reduce
|
||||||
|
- refId: C
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: B
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
type: threshold
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params: [1]
|
||||||
|
type: lt
|
||||||
|
operator:
|
||||||
|
type: and
|
||||||
|
reducer:
|
||||||
|
type: last
|
||||||
|
type: query
|
||||||
|
noDataState: Alerting
|
||||||
|
execErrState: Alerting
|
||||||
|
annotations:
|
||||||
|
summary: "Typhon has been down for >10m"
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
- uid: typhon-data-stale
|
||||||
|
title: "Typhon data stale (>180s for 10m)"
|
||||||
|
condition: C
|
||||||
|
for: "10m"
|
||||||
|
data:
|
||||||
|
- refId: A
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 600
|
||||||
|
to: 0
|
||||||
|
datasourceUid: atlas-vm
|
||||||
|
model:
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
expr: max(typhon_data_age_seconds) or on() vector(0)
|
||||||
|
legendFormat: data age
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: atlas-vm
|
||||||
|
- refId: B
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: A
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
reducer: last
|
||||||
|
type: reduce
|
||||||
|
- refId: C
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: B
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
type: threshold
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params: [180]
|
||||||
|
type: gt
|
||||||
|
operator:
|
||||||
|
type: and
|
||||||
|
reducer:
|
||||||
|
type: last
|
||||||
|
type: query
|
||||||
|
noDataState: NoData
|
||||||
|
execErrState: Error
|
||||||
|
annotations:
|
||||||
|
summary: "Typhon data age >180s for >10m"
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
- uid: typhon-auth-failures
|
||||||
|
title: "Typhon auth failures burst"
|
||||||
|
condition: C
|
||||||
|
for: "5m"
|
||||||
|
data:
|
||||||
|
- refId: A
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 600
|
||||||
|
to: 0
|
||||||
|
datasourceUid: atlas-vm
|
||||||
|
model:
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
expr: sum(increase(typhon_poll_errors_total{reason=\"auth\"}[10m])) or on() vector(0)
|
||||||
|
legendFormat: auth failures 10m
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: atlas-vm
|
||||||
|
- refId: B
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: A
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
reducer: last
|
||||||
|
type: reduce
|
||||||
|
- refId: C
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: B
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
type: threshold
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params: [3]
|
||||||
|
type: gt
|
||||||
|
operator:
|
||||||
|
type: and
|
||||||
|
reducer:
|
||||||
|
type: last
|
||||||
|
type: query
|
||||||
|
noDataState: NoData
|
||||||
|
execErrState: Error
|
||||||
|
annotations:
|
||||||
|
summary: "Typhon auth failures exceeded threshold in 10m"
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
- uid: typhon-api-errors
|
||||||
|
title: "Typhon API/timeouts burst"
|
||||||
|
condition: C
|
||||||
|
for: "15m"
|
||||||
|
data:
|
||||||
|
- refId: A
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 900
|
||||||
|
to: 0
|
||||||
|
datasourceUid: atlas-vm
|
||||||
|
model:
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
expr: sum(increase(typhon_poll_errors_total{reason=~\"api|timeout|unknown\"}[15m])) or on() vector(0)
|
||||||
|
legendFormat: poll errors 15m
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: atlas-vm
|
||||||
|
- refId: B
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: A
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
reducer: last
|
||||||
|
type: reduce
|
||||||
|
- refId: C
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: B
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
type: threshold
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params: [10]
|
||||||
|
type: gt
|
||||||
|
operator:
|
||||||
|
type: and
|
||||||
|
reducer:
|
||||||
|
type: last
|
||||||
|
type: query
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: Error
|
||||||
|
annotations:
|
||||||
|
summary: "Typhon API/timeouts exceeded threshold in 15m"
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
- uid: typhon-temp-critical
|
||||||
|
title: "Tent temperature critical (>34C)"
|
||||||
|
condition: C
|
||||||
|
for: "10m"
|
||||||
|
data:
|
||||||
|
- refId: A
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 600
|
||||||
|
to: 0
|
||||||
|
datasourceUid: atlas-vm
|
||||||
|
model:
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
expr: max(typhon_temperature_celsius) or on() vector(0)
|
||||||
|
legendFormat: max temp
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: atlas-vm
|
||||||
|
- refId: B
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: A
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
reducer: last
|
||||||
|
type: reduce
|
||||||
|
- refId: C
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: B
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
type: threshold
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params: [34]
|
||||||
|
type: gt
|
||||||
|
operator:
|
||||||
|
type: and
|
||||||
|
reducer:
|
||||||
|
type: last
|
||||||
|
type: query
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: Error
|
||||||
|
annotations:
|
||||||
|
summary: "Typhon reports tent temperature >34C for >10m"
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
- uid: typhon-humidity-high
|
||||||
|
title: "Tent humidity high (>75%)"
|
||||||
|
condition: C
|
||||||
|
for: "20m"
|
||||||
|
data:
|
||||||
|
- refId: A
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 1200
|
||||||
|
to: 0
|
||||||
|
datasourceUid: atlas-vm
|
||||||
|
model:
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
expr: max(typhon_relative_humidity_percent) or on() vector(0)
|
||||||
|
legendFormat: max humidity
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: atlas-vm
|
||||||
|
- refId: B
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: A
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
reducer: last
|
||||||
|
type: reduce
|
||||||
|
- refId: C
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: B
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
type: threshold
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params: [75]
|
||||||
|
type: gt
|
||||||
|
operator:
|
||||||
|
type: and
|
||||||
|
reducer:
|
||||||
|
type: last
|
||||||
|
type: query
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: Error
|
||||||
|
annotations:
|
||||||
|
summary: "Typhon reports relative humidity >75% for >20m"
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
- uid: typhon-humidity-low
|
||||||
|
title: "Tent humidity low (<30%)"
|
||||||
|
condition: C
|
||||||
|
for: "20m"
|
||||||
|
data:
|
||||||
|
- refId: A
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 1200
|
||||||
|
to: 0
|
||||||
|
datasourceUid: atlas-vm
|
||||||
|
model:
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
expr: min(typhon_relative_humidity_percent)
|
||||||
|
legendFormat: min humidity
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: atlas-vm
|
||||||
|
- refId: B
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: A
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
reducer: last
|
||||||
|
type: reduce
|
||||||
|
- refId: C
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: B
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
type: threshold
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params: [30]
|
||||||
|
type: lt
|
||||||
|
operator:
|
||||||
|
type: and
|
||||||
|
reducer:
|
||||||
|
type: last
|
||||||
|
type: query
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: Error
|
||||||
|
annotations:
|
||||||
|
summary: "Typhon reports relative humidity <30% for >20m"
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
- uid: typhon-vpd-high
|
||||||
|
title: "Tent VPD high (>2.0 kPa)"
|
||||||
|
condition: C
|
||||||
|
for: "20m"
|
||||||
|
data:
|
||||||
|
- refId: A
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 1200
|
||||||
|
to: 0
|
||||||
|
datasourceUid: atlas-vm
|
||||||
|
model:
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
expr: max(typhon_vpd_kpa) or on() vector(0)
|
||||||
|
legendFormat: max vpd
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: atlas-vm
|
||||||
|
- refId: B
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: A
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
reducer: last
|
||||||
|
type: reduce
|
||||||
|
- refId: C
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: B
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
type: threshold
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params: [2.0]
|
||||||
|
type: gt
|
||||||
|
operator:
|
||||||
|
type: and
|
||||||
|
reducer:
|
||||||
|
type: last
|
||||||
|
type: query
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: Error
|
||||||
|
annotations:
|
||||||
|
summary: "Typhon reports VPD >2.0 kPa for >20m"
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
- uid: typhon-vpd-low
|
||||||
|
title: "Tent VPD low (<0.4 kPa)"
|
||||||
|
condition: C
|
||||||
|
for: "20m"
|
||||||
|
data:
|
||||||
|
- refId: A
|
||||||
|
relativeTimeRange:
|
||||||
|
from: 1200
|
||||||
|
to: 0
|
||||||
|
datasourceUid: atlas-vm
|
||||||
|
model:
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
expr: min(typhon_vpd_kpa)
|
||||||
|
legendFormat: min vpd
|
||||||
|
datasource:
|
||||||
|
type: prometheus
|
||||||
|
uid: atlas-vm
|
||||||
|
- refId: B
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: A
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
reducer: last
|
||||||
|
type: reduce
|
||||||
|
- refId: C
|
||||||
|
datasourceUid: __expr__
|
||||||
|
model:
|
||||||
|
expression: B
|
||||||
|
intervalMs: 60000
|
||||||
|
maxDataPoints: 43200
|
||||||
|
type: threshold
|
||||||
|
conditions:
|
||||||
|
- evaluator:
|
||||||
|
params: [0.4]
|
||||||
|
type: lt
|
||||||
|
operator:
|
||||||
|
type: and
|
||||||
|
reducer:
|
||||||
|
type: last
|
||||||
|
type: query
|
||||||
|
noDataState: OK
|
||||||
|
execErrState: Error
|
||||||
|
annotations:
|
||||||
|
summary: "Typhon reports VPD <0.4 kPa for >20m"
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user