Correlation Rules
Correlation rules let you control how Operyn groups, merges, or suppresses related incidents. Instead of flooding your team with dozens of independent alerts during a cascading failure, correlation rules consolidate them into a manageable set of actionable incidents.
Why Correlation Rules?
During a production incident, a single root cause often triggers alerts across multiple services and signal types. Without correlation:
- A database outage might create separate incidents for every downstream service.
- An error-rate spike and a latency spike on the same service appear as two unrelated incidents.
- The same failure pattern generates a new incident every few minutes.
Correlation rules solve this by automatically linking related incidents based on configurable conditions.
Rule Anatomy
Each correlation rule has the following fields:
| Field | Type | Description |
|---|---|---|
name | string | Human-readable name (e.g. "Same-service error storm") |
description | string | Explanation of what the rule does |
enabled | boolean | Whether the rule is active |
conditions | object | Match criteria (see below) |
action | string | What happens when incidents match: merge, group, or suppress |
priority | number | Evaluation order — higher priority rules are evaluated first |
Match Conditions
The conditions object defines when two incidents should be correlated:
| Field | Type | Required | Description |
|---|---|---|---|
matchFields | string[] | Yes | Fields to compare between incidents. Options: service, signalType, severity |
timeWindowMinutes | number | Yes | Maximum time gap between incidents for them to be considered related |
severities | string[] | No | Only apply this rule to incidents with these severity levels |
signalTypes | string[] | No | Only apply this rule to these signal types |
For two incidents to match a rule, all specified matchFields must be equal and the incidents must fall within the timeWindowMinutes window.
Example
A rule with matchFields: ["service", "signalType"] and timeWindowMinutes: 10 would correlate two incidents only if they:
- Affect the same service.
- Were triggered by the same signal type.
- Occurred within 10 minutes of each other.
Actions
When a rule matches, one of three actions is taken:
Merge
The newer incident becomes a child of the existing incident. The child's parentIncidentId is set to the parent's ID. In the dashboard, merged incidents appear as a single entry with a duplicate count badge.
Use merge when the incidents are truly the same problem — e.g. repeated error-rate spikes on the same service.
Group
Both incidents remain independent but are linked as related. They appear separately in the incident list but are queryable via the "Related Incidents" section on the detail page.
Use group when the incidents are likely caused by the same root cause but affect different services or have different characteristics — e.g. a database outage causing failures in both the billing service and the checkout service.
Suppress
The newer incident is discarded entirely. It is not persisted to the database.
Use suppress cautiously — only for known noisy patterns where the existing incident already captures the problem. For example, suppressing LOW severity log anomalies when a CRITICAL incident is already open for the same service.
Priority and Evaluation Order
Rules are evaluated in descending priority order (highest number first). The first matching rule wins — subsequent rules are not evaluated for that pair of incidents.
This means you should:
- Give specific, narrow rules a higher priority.
- Give broad, catch-all rules a lower priority.
For example:
| Priority | Rule | Action |
|---|---|---|
| 10 | Same service + same signal type within 5 min | Merge |
| 5 | Same signal type within 10 min (any service) | Group |
| 1 | Any incidents within 30 min | Group |
Managing Rules in the Dashboard
The Correlation Rules page is available at Console > Correlation Rules in the sidebar.
Creating a Rule
- Click New Rule in the top right.
- Fill in the rule name and description.
- Select Match Fields — click to toggle
service,signalType, andseverity. - Set the Time Window in minutes.
- Choose an Action from the dropdown: Merge, Group, or Suppress.
- Set a Priority (0–100).
- Click Create Rule.
Toggling and Deleting
- Each rule has a toggle switch to enable/disable it without deleting.
- Hover over a rule card to reveal the delete button.
- Disabled rules are visually dimmed but preserved for re-enabling later.
Common Patterns
Same-service error storm
Merge repeated incidents from the same service within a short window:
{
"name": "Same-service error storm",
"conditions": {
"matchFields": ["service", "signalType"],
"timeWindowMinutes": 10
},
"action": "merge",
"priority": 10
}
Cross-service cascade
Group incidents across different services when they share the same signal type (likely the same root cause):
{
"name": "Cross-service cascade",
"conditions": {
"matchFields": ["signalType"],
"timeWindowMinutes": 15,
"severities": ["HIGH", "CRITICAL"]
},
"action": "group",
"priority": 5
}
Suppress low-severity noise
Suppress low-severity incidents when a higher-severity incident is already open:
{
"name": "Suppress low-severity noise",
"conditions": {
"matchFields": ["service"],
"timeWindowMinutes": 30,
"severities": ["LOW"]
},
"action": "suppress",
"priority": 1
}
API Reference
Correlation rules can also be managed programmatically via the Incident Engine API. See the Incident Engine API Reference for full endpoint documentation:
GET /correlation/rules— list all rulesGET /correlation/rules/tuning/recommendations— get outcome-based priority suggestions (last 30d)POST /correlation/rules— create a rulePUT /correlation/rules/:id— update a ruleDELETE /correlation/rules/:id— delete a rule