Migrate to netdata for monitoring
Netdata means we don't need to manually create graphs, and is overall a simpler and solution than Grafana + Prometheus + Loki.
We already have a POC running on monitoring which gets default metrics from the core services VMs. To get up to par with our existing monitoring stack, we also need to get metrics from:
-
Other nodes (ideally provisioned with ansible) -
Minio -
HP Smart Array -
Email -
Caddy -
blackbox (or just use some uptime thing) -
RouterOS / MicroTik
We also need to set up alerting, and then figure out what to use instead of Loki for log retention.
Edited by Aria Shrimpton