Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance

Optimize ReasonKit for speed and cost efficiency.

Performance Overview

ReasonKit’s performance depends on:

  1. LLM Provider - Response times vary by provider/model
  2. Profile Depth - More tools = more time
  3. Network Latency - Distance to API servers
  4. Token Count - Longer prompts/responses = more time

Benchmarks

Typical execution times (Claude 3 Sonnet):

ProfileToolsAvg TimeTokens
Quick2~15s~2K
Balanced5~45s~5K
Deep6~90s~15K
Paranoid7~180s~40K

Optimization Strategies

1. Choose Appropriate Profile

Don’t use paranoid for everything:

# Low stakes = quick
rk-core think "Should I buy this $20 item?" --quick

# High stakes = paranoid
rk-core think "Should I invest my savings?" --paranoid

2. Use Faster Models

Trade reasoning depth for speed:

# Fastest (Claude Haiku)
rk-core think "question" --model claude-3-haiku

# Balanced (Claude Sonnet)
rk-core think "question" --model claude-3-sonnet

# Best reasoning (Claude Opus)
rk-core think "question" --model claude-3-opus

Model speed comparison:

ModelRelative SpeedRelative Quality
Claude 3 Haiku1.0x (fastest)Good
GPT-3.5 Turbo1.1xGood
Claude 3 Sonnet2.5xGreat
GPT-4 Turbo3.0xGreat
Claude 3 Opus5.0xBest

3. Parallel Execution

Run tools concurrently when possible:

[execution]
parallel = true  # Run independent tools in parallel
max_concurrent = 3

Tools that can run in parallel:

  • GigaThink + LaserLogic (no dependencies)
  • ProofGuard (can run independently)

Tools that must be sequential:

  • BrutalHonesty (benefits from prior analysis)
  • Synthesis (requires all tool outputs)

4. Caching

Cache identical queries:

[cache]
enabled = true
ttl_seconds = 3600  # 1 hour
max_entries = 1000
storage = "memory"  # or "disk"
# First run: Full analysis
rk-core think "Should I take this job?" --profile balanced
# Time: 45s

# Second run (same query): Cached
rk-core think "Should I take this job?" --profile balanced
# Time: <1s

5. Streaming

Get results as they complete:

# Stream mode
rk-core think "question" --stream

Shows each tool’s output as it completes rather than waiting for all.

6. Local Models

For maximum privacy and no network latency:

# Use Ollama
ollama serve
rk-core think "question" --provider ollama --model llama3

# Performance varies by hardware:
# - M2 MacBook Pro: ~2-5 tokens/sec (Llama 3 8B)
# - RTX 4090: ~20-50 tokens/sec (Llama 3 8B)

Cost Optimization

Token Costs

Approximate costs per analysis (as of 2024):

ProfileClaude SonnetGPT-4 TurboClaude Opus
Quick$0.02$0.06$0.10
Balanced$0.05$0.15$0.25
Deep$0.15$0.45$0.75
Paranoid$0.40$1.20$2.00

Cost Reduction Strategies

  1. Use cheaper models for simple questions

    rk-core think "simple question" --model claude-3-haiku
    
  2. Limit perspectives/sources

    rk-core think "question" --perspectives 5 --sources 2
    
  3. Use summary mode

    rk-core think "question" --summary-only
    
  4. Set token limits

    [limits]
    max_input_tokens = 2000
    max_output_tokens = 2000
    

Budget Controls

[budget]
daily_limit_usd = 10.00
alert_threshold = 0.80  # Alert at 80% of limit
hard_stop = true  # Stop if limit reached

Monitoring

Built-in Metrics

# Show execution stats
rk-core think "question" --show-stats

# Output:
# Execution time: 45.2s
# Tokens used: 4,892
# Estimated cost: $0.05
# Cache hits: 0

Logging

[logging]
level = "info"  # debug for detailed timing
file = "~/.local/share/reasonkit/logs/rk.log"

[telemetry]
enabled = true
endpoint = "http://localhost:4317"  # OpenTelemetry

Prometheus Metrics

# Start with metrics endpoint
rk-core serve --metrics-port 9090

# Metrics available:
# reasonkit_analysis_duration_seconds
# reasonkit_tokens_used_total
# reasonkit_cache_hits_total
# reasonkit_errors_total

Hardware Requirements

Minimum

  • 2 CPU cores
  • 4GB RAM
  • Network connection
  • 4+ CPU cores
  • 8GB RAM
  • SSD storage (for caching)
  • Fast network connection

For Local Models

  • Apple Silicon (M1/M2/M3) or
  • NVIDIA GPU with 8GB+ VRAM
  • 32GB+ RAM for larger models