Change Insight

Change Insight is a Phase 4 feature that automatically correlates incidents with recent infrastructure changes. It provides operators with immediate context on what might have triggered an incident, such as a recent deployment, configuration change, or scaling event.

How it Works

  1. Trigger: When a new incident is created in the Incident Engine, the ChangeAnalyzer service is invoked.
  2. K8s Event Retrieval: The system queries the Kubernetes API for events that occurred in the 30-minute window preceding the incident.
  3. Correlation Logic:
    • Service Match: Events are filtered based on the services affected by the incident.
    • Relevance Scoring: Each event is assigned a relevance score (0.0 to 1.0) based on its type (Warning vs. Normal), message content (e.g., "BackOff", "Failed"), and proximity to the incident start time.
  4. Enrichment: The top 5 most relevant changes are formatted and appended to the incident's description.

Insight Types

  • DEPLOYMENT: Detects new rollouts, ReplicaSet updates, and image changes.
  • CONFIG_CHANGE: Identifies modifications to ConfigMaps or Secrets.
  • SCALING_EVENT: Highlights HPA adjustments or manual scaling actions.
  • UNKNOWN: Captures other significant K8s events (e.g., Node pressure, scheduling failures).

Benefits

  • Reduced MTTR: Operators don't have to manually hunt for recent changes in different tools.
  • Contextual Awareness: Immediate visibility into "Who changed what and when" relative to the failure.
  • Improved AI Diagnosis: By providing these insights to the AI Engine, subsequent diagnoses become more accurate.