Change Insight
Change Insight is a Phase 4 feature that automatically correlates incidents with recent infrastructure changes. It provides operators with immediate context on what might have triggered an incident, such as a recent deployment, configuration change, or scaling event.
How it Works
- Trigger: When a new incident is created in the Incident Engine, the
ChangeAnalyzerservice is invoked. - K8s Event Retrieval: The system queries the Kubernetes API for events that occurred in the 30-minute window preceding the incident.
- Correlation Logic:
- Service Match: Events are filtered based on the services affected by the incident.
- Relevance Scoring: Each event is assigned a relevance score (0.0 to 1.0) based on its type (Warning vs. Normal), message content (e.g., "BackOff", "Failed"), and proximity to the incident start time.
- Enrichment: The top 5 most relevant changes are formatted and appended to the incident's description.
Insight Types
- DEPLOYMENT: Detects new rollouts, ReplicaSet updates, and image changes.
- CONFIG_CHANGE: Identifies modifications to ConfigMaps or Secrets.
- SCALING_EVENT: Highlights HPA adjustments or manual scaling actions.
- UNKNOWN: Captures other significant K8s events (e.g., Node pressure, scheduling failures).
Benefits
- Reduced MTTR: Operators don't have to manually hunt for recent changes in different tools.
- Contextual Awareness: Immediate visibility into "Who changed what and when" relative to the failure.
- Improved AI Diagnosis: By providing these insights to the AI Engine, subsequent diagnoses become more accurate.