Correlation Rules

Correlation rules let you control how Operyn groups, merges, or suppresses related incidents. Instead of flooding your team with dozens of independent alerts during a cascading failure, correlation rules consolidate them into a manageable set of actionable incidents.

Why Correlation Rules?

During a production incident, a single root cause often triggers alerts across multiple services and signal types. Without correlation:

  • A database outage might create separate incidents for every downstream service.
  • An error-rate spike and a latency spike on the same service appear as two unrelated incidents.
  • The same failure pattern generates a new incident every few minutes.

Correlation rules solve this by automatically linking related incidents based on configurable conditions.

Rule Anatomy

Each correlation rule has the following fields:

FieldTypeDescription
namestringHuman-readable name (e.g. "Same-service error storm")
descriptionstringExplanation of what the rule does
enabledbooleanWhether the rule is active
conditionsobjectMatch criteria (see below)
actionstringWhat happens when incidents match: merge, group, or suppress
prioritynumberEvaluation order — higher priority rules are evaluated first

Match Conditions

The conditions object defines when two incidents should be correlated:

FieldTypeRequiredDescription
matchFieldsstring[]YesFields to compare between incidents. Options: service, signalType, severity
timeWindowMinutesnumberYesMaximum time gap between incidents for them to be considered related
severitiesstring[]NoOnly apply this rule to incidents with these severity levels
signalTypesstring[]NoOnly apply this rule to these signal types

For two incidents to match a rule, all specified matchFields must be equal and the incidents must fall within the timeWindowMinutes window.

Example

A rule with matchFields: ["service", "signalType"] and timeWindowMinutes: 10 would correlate two incidents only if they:

  1. Affect the same service.
  2. Were triggered by the same signal type.
  3. Occurred within 10 minutes of each other.

Actions

When a rule matches, one of three actions is taken:

Merge

The newer incident becomes a child of the existing incident. The child's parentIncidentId is set to the parent's ID. In the dashboard, merged incidents appear as a single entry with a duplicate count badge.

Use merge when the incidents are truly the same problem — e.g. repeated error-rate spikes on the same service.

Group

Both incidents remain independent but are linked as related. They appear separately in the incident list but are queryable via the "Related Incidents" section on the detail page.

Use group when the incidents are likely caused by the same root cause but affect different services or have different characteristics — e.g. a database outage causing failures in both the billing service and the checkout service.

Suppress

The newer incident is discarded entirely. It is not persisted to the database.

Use suppress cautiously — only for known noisy patterns where the existing incident already captures the problem. For example, suppressing LOW severity log anomalies when a CRITICAL incident is already open for the same service.

Priority and Evaluation Order

Rules are evaluated in descending priority order (highest number first). The first matching rule wins — subsequent rules are not evaluated for that pair of incidents.

This means you should:

  • Give specific, narrow rules a higher priority.
  • Give broad, catch-all rules a lower priority.

For example:

PriorityRuleAction
10Same service + same signal type within 5 minMerge
5Same signal type within 10 min (any service)Group
1Any incidents within 30 minGroup

Managing Rules in the Dashboard

The Correlation Rules page is available at Console > Correlation Rules in the sidebar.

Creating a Rule

  1. Click New Rule in the top right.
  2. Fill in the rule name and description.
  3. Select Match Fields — click to toggle service, signalType, and severity.
  4. Set the Time Window in minutes.
  5. Choose an Action from the dropdown: Merge, Group, or Suppress.
  6. Set a Priority (0–100).
  7. Click Create Rule.

Toggling and Deleting

  • Each rule has a toggle switch to enable/disable it without deleting.
  • Hover over a rule card to reveal the delete button.
  • Disabled rules are visually dimmed but preserved for re-enabling later.

Common Patterns

Same-service error storm

Merge repeated incidents from the same service within a short window:

{
  "name": "Same-service error storm",
  "conditions": {
    "matchFields": ["service", "signalType"],
    "timeWindowMinutes": 10
  },
  "action": "merge",
  "priority": 10
}

Cross-service cascade

Group incidents across different services when they share the same signal type (likely the same root cause):

{
  "name": "Cross-service cascade",
  "conditions": {
    "matchFields": ["signalType"],
    "timeWindowMinutes": 15,
    "severities": ["HIGH", "CRITICAL"]
  },
  "action": "group",
  "priority": 5
}

Suppress low-severity noise

Suppress low-severity incidents when a higher-severity incident is already open:

{
  "name": "Suppress low-severity noise",
  "conditions": {
    "matchFields": ["service"],
    "timeWindowMinutes": 30,
    "severities": ["LOW"]
  },
  "action": "suppress",
  "priority": 1
}

API Reference

Correlation rules can also be managed programmatically via the Incident Engine API. See the Incident Engine API Reference for full endpoint documentation:

  • GET /correlation/rules — list all rules
  • GET /correlation/rules/tuning/recommendations — get outcome-based priority suggestions (last 30d)
  • POST /correlation/rules — create a rule
  • PUT /correlation/rules/:id — update a rule
  • DELETE /correlation/rules/:id — delete a rule