Performance
Optimize ReasonKit for speed and cost efficiency.
Performance Overview
ReasonKit’s performance depends on:
- LLM Provider - Response times vary by provider/model
- Profile Depth - More tools = more time
- Network Latency - Distance to API servers
- Token Count - Longer prompts/responses = more time
Benchmarks
Typical execution times (Claude 3 Sonnet):
| Profile | Tools | Avg Time | Tokens |
|---|---|---|---|
| Quick | 2 | ~15s | ~2K |
| Balanced | 5 | ~45s | ~5K |
| Deep | 6 | ~90s | ~15K |
| Paranoid | 7 | ~180s | ~40K |
Optimization Strategies
1. Choose Appropriate Profile
Don’t use paranoid for everything:
# Low stakes = quick
rk-core think "Should I buy this $20 item?" --quick
# High stakes = paranoid
rk-core think "Should I invest my savings?" --paranoid
2. Use Faster Models
Trade reasoning depth for speed:
# Fastest (Claude Haiku)
rk-core think "question" --model claude-3-haiku
# Balanced (Claude Sonnet)
rk-core think "question" --model claude-3-sonnet
# Best reasoning (Claude Opus)
rk-core think "question" --model claude-3-opus
Model speed comparison:
| Model | Relative Speed | Relative Quality |
|---|---|---|
| Claude 3 Haiku | 1.0x (fastest) | Good |
| GPT-3.5 Turbo | 1.1x | Good |
| Claude 3 Sonnet | 2.5x | Great |
| GPT-4 Turbo | 3.0x | Great |
| Claude 3 Opus | 5.0x | Best |
3. Parallel Execution
Run tools concurrently when possible:
[execution]
parallel = true # Run independent tools in parallel
max_concurrent = 3
Tools that can run in parallel:
- GigaThink + LaserLogic (no dependencies)
- ProofGuard (can run independently)
Tools that must be sequential:
- BrutalHonesty (benefits from prior analysis)
- Synthesis (requires all tool outputs)
4. Caching
Cache identical queries:
[cache]
enabled = true
ttl_seconds = 3600 # 1 hour
max_entries = 1000
storage = "memory" # or "disk"
# First run: Full analysis
rk-core think "Should I take this job?" --profile balanced
# Time: 45s
# Second run (same query): Cached
rk-core think "Should I take this job?" --profile balanced
# Time: <1s
5. Streaming
Get results as they complete:
# Stream mode
rk-core think "question" --stream
Shows each tool’s output as it completes rather than waiting for all.
6. Local Models
For maximum privacy and no network latency:
# Use Ollama
ollama serve
rk-core think "question" --provider ollama --model llama3
# Performance varies by hardware:
# - M2 MacBook Pro: ~2-5 tokens/sec (Llama 3 8B)
# - RTX 4090: ~20-50 tokens/sec (Llama 3 8B)
Cost Optimization
Token Costs
Approximate costs per analysis (as of 2024):
| Profile | Claude Sonnet | GPT-4 Turbo | Claude Opus |
|---|---|---|---|
| Quick | $0.02 | $0.06 | $0.10 |
| Balanced | $0.05 | $0.15 | $0.25 |
| Deep | $0.15 | $0.45 | $0.75 |
| Paranoid | $0.40 | $1.20 | $2.00 |
Cost Reduction Strategies
-
Use cheaper models for simple questions
rk-core think "simple question" --model claude-3-haiku -
Limit perspectives/sources
rk-core think "question" --perspectives 5 --sources 2 -
Use summary mode
rk-core think "question" --summary-only -
Set token limits
[limits] max_input_tokens = 2000 max_output_tokens = 2000
Budget Controls
[budget]
daily_limit_usd = 10.00
alert_threshold = 0.80 # Alert at 80% of limit
hard_stop = true # Stop if limit reached
Monitoring
Built-in Metrics
# Show execution stats
rk-core think "question" --show-stats
# Output:
# Execution time: 45.2s
# Tokens used: 4,892
# Estimated cost: $0.05
# Cache hits: 0
Logging
[logging]
level = "info" # debug for detailed timing
file = "~/.local/share/reasonkit/logs/rk.log"
[telemetry]
enabled = true
endpoint = "http://localhost:4317" # OpenTelemetry
Prometheus Metrics
# Start with metrics endpoint
rk-core serve --metrics-port 9090
# Metrics available:
# reasonkit_analysis_duration_seconds
# reasonkit_tokens_used_total
# reasonkit_cache_hits_total
# reasonkit_errors_total
Hardware Requirements
Minimum
- 2 CPU cores
- 4GB RAM
- Network connection
Recommended
- 4+ CPU cores
- 8GB RAM
- SSD storage (for caching)
- Fast network connection
For Local Models
- Apple Silicon (M1/M2/M3) or
- NVIDIA GPU with 8GB+ VRAM
- 32GB+ RAM for larger models