Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Incident Response Runbook (ReasonKit MCP (Pro))

This runbook provides standardized procedures for handling security and operational incidents involving ReasonKit MCP (Pro) deployments.


1. Incident Classification

SeverityDescriptionResponse Time
P0 (Critical)Core reasoning engine down; Data breach; PII leakage.< 15 Minutes
P1 (High)Significant latency degradation (>2s); Specific ThinkTools failing.< 1 Hour
P2 (Medium)Minor feature bugs; Intermittent API errors.< 4 Hours
P3 (Low)Documentation typos; Aesthetic UI issues.Next Business Day

2. Response Phases

Phase 1: Identification & Triage

  • Alert Source: PagerDuty, Datadog (Logic Drift), or User Report.
  • Action: Validate the incident. Identify the TenantID and RequestID affected.

Phase 2: Containment

  • Action: If a specific model is hallucinating, switch the reasoning profile to a fallback model (e.g., switch from deep to balanced).
  • Action: If PII leakage is detected, rotate API keys and flush the associated vector cache immediately.

Phase 3: Eradication & Recovery

  • Action: Patch the vulnerable protocol or logic step.
  • Action: Redeploy affected ReasonKit nodes.
  • Action: Verify system health via rk benchmark.

Phase 4: Post-Mortem

  • Action: Document the root cause.
  • Action: Update the ThinkTool protocol to prevent recurrence.

3. Specific Scenarios

Scenario: Reasoning Logic Drift (Confidence < 0.4)

  • Detection: Prometheus alert on reasonkit_logic_confidence.
  • Immediate Action: Check the upstream LLM status page. If the model is degraded, use rk-core to route to a secondary provider.

Scenario: Unauthorized API Access

  • Detection: Multiple security.auth_fail events in audit logs.
  • Immediate Action: Revoke the compromised API key. Whitelist known IP ranges in the Gateway Router.

Scenario: PII Leakage

  • Detection: security.pii_redacted failure in logs.
  • Immediate Action: Isolate the reasoning node. Execute rk-mem cache-clear --tenant <id>. Notify the Privacy Officer.

4. Communication Plan

  • Internal: Join the #war-room-reasonkit Slack channel.
  • External: Update status.reasonkit.sh. Send “Incident Identified” email to affected Enterprise admins.