Predictive Incident Detection
Predictive Incident Detection allows Operyn to identify issues before they impact users by analyzing telemetry trends and using AI to forecast future states.
How it Works
- Trend Detection: The
AnomalyDetectorin the Incident Engine monitors incoming metrics for rapid changes (e.g., CPU increasing at >5% per minute). - AI Forecasting: When a suspicious trend is detected, the Incident Engine calls the AI Engine's
/forecastendpoint. - Future State Prediction: The
ForecastingAgentuses an LLM to analyze historical data points and predict if a threshold breach is likely within the next 15 minutes. - Early Warning Incidents: If a breach is predicted with high confidence, Operyn creates a
HIGHseverity incident with the[PREDICTIVE]prefix, allowing SREs to intervene before an outage occurs.
Configuration
Predictive detection behaviors can be tuned via the AnomalyDetector configuration:
- Slope Threshold: Minimum rate of change to trigger analysis.
- Prediction Window: How far into the future (in minutes) to predict. Default is 15 minutes.
Benefits
- Reduced MTTR: By catching issues early, we can often fix them before they become critical.
- Proactive Scaling: Predictive signals can be used to trigger automated scaling before load causes latency spikes.
- Noise Reduction: AI filtering ensures that transient spikes don't trigger unnecessary alerts unless they are part of a growing trend.