Remediation API Reference

Remediation endpoints drive approval workflows, execution audit, and operator control-surface signals.

Base URL: http://localhost:3003 (local) or your deployed core-platform.


Auth + Permissions

All endpoints require dashboard JWT auth.

Endpoint classPermission
Read actions/stats/history/priority signalremediation:view
Read reporting settingsremediation:view
Update reporting settingspolicy:update
Approve actionremediation:approve
Reject actionremediation:reject

Endpoints

GET /remediation/reporting/settings

Get organization-scoped compliance settings for remediation data retention and scheduled report preferences.

Response — 200 OK

{
  "organizationId": "org_123",
  "retentionDays": 30,
  "scheduledReportEnabled": false,
  "reportCadence": "weekly",
  "reportRecipients": [],
  "lastReportAt": null,
  "createdAt": "2026-04-21T00:00:00.000Z",
  "updatedAt": "2026-04-21T00:00:00.000Z"
}

GET /remediation/reporting/runs?status=&from=&to=&limit=

List recent scheduled remediation report executions (newest first).

Query params (optional):

  • status: running | success | failed
  • from: ISO timestamp lower bound for startedAt
  • to: ISO timestamp upper bound for startedAt
  • limit: max rows

Response — 200 OK

[
  {
    "id": "94000000-0000-4000-8000-000000000001",
    "organizationId": "org_123",
    "cadence": "weekly",
    "recipients": ["sre@company.com"],
    "deliveredCount": 1,
    "failedDeliveryCount": 0,
    "deliveryFailures": [],
    "deletedCount": 2,
    "totalActions": 14,
    "failedActions": 3,
    "pendingApprovals": 1,
    "queuedExecutions": 2,
    "status": "success",
    "error": null,
    "startedAt": "2026-04-20T06:00:00.000Z",
    "completedAt": "2026-04-20T06:00:05.000Z"
  }
]

GET /remediation/reporting/runs/export/csv?status=&from=&to=&limit=

Export report-run history as CSV.

Response — 200 OK

  • Content-Type: text/csv
  • Content-Disposition: attachment; filename="remediation-report-runs.csv"

POST /remediation/reporting/runs/:runId/retry

Retry delivery for a specific prior report run (creates a new run record).

Response — 200 OK

Returns new run row.


PUT /remediation/reporting/settings

Update retention and scheduled report preferences.

Request Body

{
  "retentionDays": 90,
  "scheduledReportEnabled": true,
  "reportCadence": "weekly",
  "reportRecipients": ["sre@company.com", "compliance@company.com"]
}

Rules:

  • retentionDays: integer 7..3650
  • reportCadence: daily | weekly
  • reportRecipients: max 20 emails
  • scheduled delivery uses server-side email adapter (SMTP_* env config)

POST /remediation/reporting/enforce-retention

Run retention cleanup immediately for current organization using configured retentionDays.

Response — 200 OK

{
  "organizationId": "org_123",
  "retentionDays": 30,
  "cutoff": "2026-03-23T00:00:00.000Z",
  "deletedCount": 18
}

GET /remediation/actions?status=&environment=&limit=

List remediation actions from audit trail with optional filtering.

Query Parameters

ParameterTypeDefaultDescription
statusstringFilter by action status (PENDING_APPROVAL, QUEUED, EXECUTING, FAILED, COMPLETED, etc.)
environmentstringFilter by environment slug
limitnumber50Max rows returned

Response — 200 OK

Array of remediation audit rows (latest first).


GET /remediation/stats?environment=

Get aggregate remediation stats used by control center summaries.

Query Parameters

ParameterTypeDefaultDescription
environmentstringOptional environment scope

Response — 200 OK

{
  "total": 42,
  "pending": 3,
  "executing": 2,
  "completed": 31,
  "failed": 6,
  "successRate": 73.8
}

GET /remediation/priority-signal?environment=

Get operator-priority signal for Remediation Control Center hero.

Derived from approval pressure, failure pressure, and active execution load.

Query Parameters

ParameterTypeDefaultDescription
environmentstringOptional environment scope

Response — 200 OK

{
  "level": "warning",
  "label": "Warning",
  "reason": "Failures rising and approval queue growing.",
  "pendingApprovals": 4,
  "activeExecutions": 2,
  "failedInLast24h": 5
}

POST /remediation/request

Request remediation action for an incident.

This endpoint is what the incident investigation UI calls after an operator confirms Execute on a suggested fix. The frontend maps the selected diagnosis suggestion into the request payload, sends it here, then refetches GET /remediation/history/:incidentId and highlights the newest history row.

Request Body

{
  "incidentId": "a0000000-0000-4000-8000-000000000001",
  "actionType": "restart-service",
  "parameters": { "service": "api-gateway" },
  "serviceName": "api-gateway",
  "environment": "production",
  "severity": "HIGH"
}

Response — 201 Created

Returns the created remediation audit/action row. Status depends on policy evaluation:

Policy outcomeExpected statusOperator UX
Manual approval requiredPENDING_APPROVALShow in pending remediation surfaces.
Auto-approved / allowedQUEUED or execution statusShow in timeline/history; worker may advance to EXECUTING, COMPLETED, or FAILED.
Blocked / validation errorError responseKeep execute dialog open and show error toast.

Frontend Contract

  • Do not optimistically mark execution successful. Treat this call as "request created", not "fix completed".
  • On success, refetch incident remediation history and pending-action queries before trusting local state.
  • If history refetch returns rows, highlight the newest createdAt row for operator feedback.
  • If suggestion mapping fails client-side, block the request and show Unable to map suggested action to executable remediation.

POST /remediation/:id/approve

Approve pending remediation action.

Request Body

{
  "reason": "Safe change window active"
}

Response — 200 OK

{
  "status": "QUEUED"
}

Other successful response shapes:

{
  "status": "POLICY_VIOLATION",
  "violations": ["Environment production is blocked from automated remediation."],
  "detail": "Environment production is blocked from automated remediation. Policy match: global blockedEnvironments includes production."
}
{
  "status": "COMPLETED"
}

Semantics

  • Only PENDING_APPROVAL actions are queued by approval.
  • If action is already no longer pending, API returns current status instead of queueing duplicate work.
  • If safety guardrail blocks approval and reason is omitted, response is POLICY_VIOLATION; no queue job is created.
  • If safety guardrail blocks approval and reason is provided, request is treated as an operator override; action moves to QUEUED and audit details include override reason plus policy violations.
  • 404 Not Found means action ID is not visible in current organization.
  • A queued approval enqueues BullMQ remediation job; worker later advances status through execution lifecycle.

Operator UX Contract

  • Incident detail opens approve dialog first so operator can provide optional reason or override reason.
  • Remediation Control Center approves inline with default reason Approved from remediation control center.
  • UI may optimistically hide pending card/table row immediately after click.
  • If response is POLICY_VIOLATION, restore hidden row and show policy-violation error/override prompt.
  • If request rejects or throws, restore hidden row and show backend message when available.
  • On success, refetch remediation history/actions and close action sheet/dialog.

POST /remediation/:id/reject

Reject pending remediation action.

Request Body

{
  "reason": "Requires manual rollback instead"
}

Response — 200 OK

{
  "status": "REJECTED"
}

Semantics

  • Only PENDING_APPROVAL actions can be rejected.
  • Rejecting a non-pending action returns 400 Bad Request with message like Cannot reject action in "QUEUED" status — only PENDING_APPROVAL actions can be rejected.
  • 404 Not Found means action ID is not visible in current organization.
  • Rejection never enqueues execution.
  • Audit details include actor, optional rejection reason, and prior policy context.

Operator UX Contract

  • Incident detail rejects inline with reason Rejected by operator from incident investigation view.
  • Remediation Control Center rejects inline with reason Rejected from remediation control center.
  • UI may optimistically hide pending card/table row immediately after click.
  • On success, show rejected toast, refetch history/actions, and close any open action sheet.
  • On error, restore hidden row and show backend message when available.

GET /remediation/history/:incidentId

List remediation action history for one incident.

Response — 200 OK

Array of audit rows scoped to incident and organization.

Frontend Contract

  • Treat history as source of truth after request/approve/reject mutations.
  • New rows should be sorted by createdAt when choosing newest action to highlight.
  • Timeline and pending-approval UI should derive final status from refetched history/actions, not from local optimistic state.

GET /remediation/export/csv?status=&environment=&limit=

Export remediation audit rows as CSV for internal compliance/reporting workflows.

Query Parameters

ParameterTypeDefaultDescription
statusstringOptional status filter
environmentstringOptional environment scope
limitnumber500Maximum exported rows

Response — 200 OK

  • Content-Type: text/csv
  • Content-Disposition: attachment; filename="remediation-audit-export.csv"
  • CSV columns:
    • id
    • incidentId
    • actionType
    • status
    • performedBy
    • serviceName
    • environment
    • createdAt
    • updatedAt
    • details
    • error