Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Load Testing Guidelines

Reasoning chains are computationally expensive. This guide explains how to perform load testing on your ReasonKit infrastructure to ensure stability under peak demand.


1. Testing Goals

  1. Identify Latency Ceiling: At what concurrent request count does the TTFV (Time To First Verification) exceed 1 second?
  2. Validate Fair Use: Ensure that one tenant cannot starve another tenant’s reasoning throughput.
  3. Stress Model Providers: Determine the point at which upstream LLM rate limits (RK-2002) are triggered.

2. Tools & Setup

We recommend using k6 or Locust for simulating concurrent reasoning sessions.

Sample k6 Script (load_test.js)

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 10,
  duration: '30s',
};

export default function () {
  const url = 'http://localhost:3000/v1/think';
  const payload = JSON.stringify({
    prompt: 'Synthesize the pros and cons of nuclear energy.',
    profile: 'balanced'
  });

  const params = {
    headers: {
      'Content-Type': 'application/json',
      'X-API-Key': 'test-key',
    },
  };

  const res = http.post(url, payload, params);
  check(res, { 'status was 200': (r) => r.status == 200 });
  sleep(1);
}

3. Key Metrics to Observe

Monitor your ReasonKit nodes for the following:

  • reasoning_steps_per_second: Throughput of the logic engine.
  • llm_backpressure_count: Number of requests waiting for upstream model tokens.
  • memory_usage_mb: Monitor for leaks during long-running chains.
  • reasonkit_logic_drift: Do confidence scores decrease as the system is stressed? (Indicative of provider degradation).

4. Optimization Strategies

If you hit performance bottlenecks:

  1. Horizontal Scaling: Increase the number of ReasonKit Core nodes.
  2. Request Batching: Use the batch ThinkTool pattern for high-volume, low-priority tasks.
  3. Semantic Caching: Enable the caching layer to serve identical reasoning steps from memory.
  4. Smart Queuing: Implement a priority queue at the Gateway to favor “Quick” profiles over “Deep” profiles.