What is Operyn?

Operyn is an AI-powered incident response and self-healing platform. It ingests observability signals — logs, metrics, and events — from your infrastructure, correlates them into actionable incidents, diagnoses root causes using AI, and orchestrates automated or human-approved remediations.

Why Operyn?

Modern infrastructure generates massive volumes of telemetry. SRE teams deal with:

  • Alert fatigue — hundreds of noisy alerts that bury real incidents.
  • Slow diagnosis — manually correlating logs, metrics, and traces across services.
  • Repetitive fixes — applying the same runbooks to the same failure modes.

Operyn addresses all three by closing the loop from detection to resolution:

Telemetry → Ingestion → Incident Detection → AI Diagnosis → Remediation → Resolution

Core Components

Ingestion Service

The entry point for all telemetry. Accepts logs and metrics via a REST API, normalises them into a consistent schema, indexes them in OpenSearch for search, and enqueues them for downstream processing.

  • POST /events/logs — single log event
  • POST /events/logs/batch — batch of log events
  • POST /events/metrics — single metric data point

Incident Module (core-platform)

Consumes events from the ingestion queue, applies detection rules and anomaly thresholds, and creates incidents when conditions are met. Persists incidents to PostgreSQL and streams updates via SSE.

AI Diagnosis Module (core-platform)

Receives diagnosis requests from the incident module. Uses LangChain and LangGraph to orchestrate multi-step reasoning: gather context, hypothesise a root cause, validate against evidence, and produce a structured diagnosis with confidence scores and suggested fixes. When confidence is low, the workflow can request typed, read-only diagnostic probe runs (Kubernetes + AWS) through the safe tool runner.

Remediation Module (core-platform)

Executes approved remediation actions against your infrastructure. Supports actions like restarting services, scaling pods, rolling back deployments, clearing caches, and running custom scripts. Operates in simulate mode (log-only) or live mode (execute on Kubernetes). Every action is recorded in an audit log.

Change Insight (NEW)

The Change Insight engine automatically correlates incidents with recent infrastructure, deployment, and configuration changes. It helps SREs answer "What changed recently that could have caused this?" by providing a timeline of events leading up to the incident.

Predictive Detection (NEW)

Operyn uses trend analysis and AI forecasting to identify potential incidents before they impact users. By detecting rapid deteriorations in metrics (e.g., memory leaks or load spikes), it can create early warning incidents to allow for proactive intervention.

Automated Postmortems (NEW)

After an incident is resolved, Operyn generates comprehensive AI-powered postmortem reports, summarizing the timeline, root cause, impact, and remediation steps taken, saving hours of manual documentation.

Notification Service

Dispatches incident alerts and status updates to your team's communication channels: Slack, Jira, Email, and generic webhooks.

Dashboard

A Next.js operations console for SREs. Provides:

  • Real-time incident feed with advanced filtering (severity, status, service, assignee) and deduplication grouping.
  • Incident detail views with AI diagnosis, remediation approval workflows, comment threads, and team assignment.
  • Provider-aware investigation probes in incident detail, with structured evidence summaries and run history.
  • Incident velocity chart with configurable time ranges (24h, 7d, All).
  • Correlation rules management for configuring how related incidents are grouped.
  • Team management with on-call tracking and incident assignment.

Architecture Diagram

Logs / Metrics
      │
      ▼
┌─────────────┐     BullMQ      ┌──────────────────────────────┐
│  Ingestion  │ ──────────────► │ Core Platform                │
│  Service    │                 │ detect/correlate/diagnose    │
└─────────────┘                 │ diagnostic probes + notify   │
                                │ remediate + audit            │
                                └──────────────────────────────┘
                                         │
                                         ▼
                                ┌──────────────────┐
                                │   Dashboard      │
                                │   (Next.js 15)   │
                                └──────────────────┘

Tech Stack

LayerTechnology
RuntimeBun
MonorepoTurborepo
ServicesNestJS + Fastify
FrontendNext.js 15 + Tailwind CSS v4
AILangChain + LangGraph (OpenAI / Anthropic / Gemini)
DatabasePostgreSQL
QueueRedis (BullMQ)
SearchOpenSearch

Next Steps