Skip to main content

Performance Benchmarks

Comprehensive performance analysis of Osaurus compared to other local LLM solutions on Apple Silicon.

Methodology

All benchmarks were conducted under controlled conditions to ensure fair comparison:

  • Hardware: Apple M2 Pro, 32GB RAM
  • macOS Version: 15.5
  • Model: Llama 3.2 3B Instruct (4-bit quantization)
  • Prompt: "Explain quantum computing in simple terms"
  • Settings: Default configuration for each server
  • Measurements: 20-run average, excluding first run

Key Metrics

Time to First Token (TTFT)

The time from request to first token generation—critical for perceived responsiveness.

ServerTTFT (ms)Relative
Ollama33ms1.0×
Osaurus87ms2.6×
LM Studio113ms3.4×

Throughput

Characters generated per second during streaming.

ServerThroughputRelative
LM Studio588 chars/s1.06×
Osaurus554 chars/s1.0×
Ollama430 chars/s0.78×

End-to-End Latency

Total time from request to completion.

ServerTotal TimeRelative
LM Studio1.22s1.0×
Osaurus1.24s1.02×
Ollama1.62s1.33×

Optimization Tips

Based on benchmark results:

  1. For Speed

    • Use 4-bit quantization
    • Choose smaller models (2-3B)
    • Limit context to 2048 tokens
    • Disable logging in production
  2. For Quality

    • Use 8-bit quantization when possible
    • Select 7B+ models
    • Allow full context windows
    • Enable temperature sampling
  3. For Efficiency

    • Keep 1-2 models loaded
    • Use model aliases for quick switching
    • Monitor memory pressure
    • Restart periodically for long-running servers

Conclusions

Osaurus demonstrates:

  • Competitive Performance — Matches or exceeds alternatives in key metrics
  • Efficient Memory Usage — Lower RAM footprint than competitors
  • Consistent Latency — Predictable performance under load
  • Native Optimization — Leverages Apple Silicon effectively

The benchmarks show Osaurus is particularly well-suited for:

  • Production deployments requiring consistent performance
  • Memory-constrained environments
  • High-throughput applications
  • Native macOS integrations

Want to contribute benchmarks? Join the Discord community.