Performance

Optimize ReasonKit for speed and cost efficiency.

Performance Overview

ReasonKit’s performance depends on:

LLM Provider - Response times vary by provider/model
Profile Depth - More tools = more time
Network Latency - Distance to API servers
Token Count - Longer prompts/responses = more time

Benchmarks

Typical execution times (Claude 3 Sonnet):

Profile	Tools	Avg Time	Tokens
Quick	2	~15s	~2K
Balanced	5	~45s	~5K
Deep	6	~90s	~15K
Paranoid	7	~180s	~40K

Optimization Strategies

1. Choose Appropriate Profile

Don’t use paranoid for everything:

# Low stakes = quick
rk think "Should I buy this $20 item?" --quick

# High stakes = paranoid
rk think "Should I invest my savings?" --paranoid

2. Use Faster Models

Trade reasoning depth for speed:

# Fastest (Claude Haiku)
rk think "question" --model claude-3-haiku

# Balanced (Claude Sonnet)
rk think "question" --model claude-3-sonnet

# Best reasoning (Claude Opus)
rk think "question" --model claude-3-opus

Model speed comparison:

Model	Relative Speed	Relative Quality
Claude 3 Haiku	1.0x (fastest)	Good
GPT-3.5 Turbo	1.1x	Good
Claude 3 Sonnet	2.5x	Great
GPT-4 Turbo	3.0x	Great
Claude 3 Opus	5.0x	Best

3. Parallel Execution

Run tools concurrently when possible:

[execution]
parallel = true  # Run independent tools in parallel
max_concurrent = 3

Tools that can run in parallel:

GigaThink + LaserLogic (no dependencies)
ProofGuard (can run independently)

Tools that must be sequential:

BrutalHonesty (benefits from prior analysis)
Synthesis (requires all tool outputs)

4. Caching

Cache identical queries:

[cache]
enabled = true
ttl_seconds = 3600  # 1 hour
max_entries = 1000
storage = "memory"  # or "disk"

# First run: Full analysis
rk think "Should I take this job?" --profile balanced
# Time: 45s

# Second run (same query): Cached
rk think "Should I take this job?" --profile balanced
# Time: <1s

5. Streaming

Get results as they complete:

# Stream mode
rk think "question" --stream

Shows each tool’s output as it completes rather than waiting for all.

6. Local Models

For maximum privacy and no network latency:

# Use Ollama
ollama serve
rk think "question" --provider ollama --model llama3

# Performance varies by hardware:
# - M2 MacBook Pro: ~2-5 tokens/sec (Llama 3 8B)
# - RTX 4090: ~20-50 tokens/sec (Llama 3 8B)

Cost Optimization

Token Costs

Approximate costs per analysis (as of 2024):

Profile	Claude Sonnet	GPT-4 Turbo	Claude Opus
Quick	$0.02	$0.06	$0.10
Balanced	$0.05	$0.15	$0.25
Deep	$0.15	$0.45	$0.75
Paranoid	$0.40	$1.20	$2.00

Cost Reduction Strategies

Use cheaper models for simple questions

rk think "simple question" --model claude-3-haiku

Limit perspectives/sources

rk think "question" --perspectives 5 --sources 2

Use summary mode
```
rk think "question" --summary-only
```

Set token limits

[limits]
max_input_tokens = 2000
max_output_tokens = 2000

Budget Controls

[budget]
daily_limit_usd = 10.00
alert_threshold = 0.80  # Alert at 80% of limit
hard_stop = true  # Stop if limit reached

Monitoring

Built-in Metrics

# Show execution stats
rk think "question" --show-stats

# Output:
# Execution time: 45.2s
# Tokens used: 4,892
# Estimated cost: $0.05
# Cache hits: 0

Logging

[logging]
level = "info"  # debug for detailed timing
file = "~/.local/share/reasonkit/logs/rk.log"

[telemetry]
enabled = true
endpoint = "http://localhost:4317"  # OpenTelemetry

Prometheus Metrics

# Start with metrics endpoint
rk serve --metrics-port 9090

# Metrics available:
# reasonkit_analysis_duration_seconds
# reasonkit_tokens_used_total
# reasonkit_cache_hits_total
# reasonkit_errors_total

ReasonKit Documentation