Sentinel Monitoring
Real-time infrastructure monitoring dashboard with anomaly detection, predictive alerts, and automated incident response workflows.
Sentinel is an infrastructure monitoring platform that goes beyond dashboards. It combines real-time metrics collection with ML-based anomaly detection to catch problems before they become incidents.
The ingestion pipeline handles 500K metrics per second using a Go collector service that batches writes to ClickHouse. We chose ClickHouse over TimescaleDB for its columnar compression and query performance on time-series aggregations.
Anomaly detection runs as a Python sidecar that analyzes metric streams using a combination of statistical methods (z-score, IQR) and a lightweight LSTM model trained on historical patterns. When an anomaly is detected, it creates an alert with context: what changed, when it started, and which services are affected.
The response automation is where Sentinel gets interesting. Alerts can trigger runbooks — automated playbooks that execute diagnostic commands, scale resources, or restart services. Common incidents like disk pressure or memory spikes are resolved automatically.
The React frontend uses WebSocket connections for real-time dashboard updates. Charts render with D3.js for full control over visualization. The most-used feature is the service dependency map, which shows real-time health status of every service and the connections between them.