Performance Benchmarks

Name: Osaurus
Author: Osaurus

Comprehensive performance analysis of Osaurus compared to other local LLM solutions on Apple Silicon.

Methodology

All benchmarks were conducted under controlled conditions to ensure fair comparison:

The time from request to first token generation—critical for perceived responsiveness.

Characters generated per second during streaming.

Total time from request to completion.

Based on benchmark results:

For Speed
- Use 4-bit quantization
- Choose smaller models (2-3B)
- Limit context to 2048 tokens
- Disable logging in production
For Quality
- Use 8-bit quantization when possible
- Select 7B+ models
- Allow full context windows
- Enable temperature sampling
For Efficiency
- Keep 1-2 models loaded
- Use model aliases for quick switching
- Monitor memory pressure
- Restart periodically for long-running servers

Osaurus demonstrates:

The benchmarks show Osaurus is particularly well-suited for:

Want to contribute benchmarks? Join the Discord community.