Model Management

Name: Osaurus
Author: Osaurus

Osaurus supports a wide variety of MLX-optimized models and Apple Foundation Models. This guide covers model selection, management, and optimization.

Model Recommendations

Best Overall Performance

Llama 3.2 3B Instruct (4-bit)

Excellent quality and speed balance
2GB download size
Suitable for most tasks
Performs well on 8GB+ systems

Code Generation

DeepSeek Coder 7B (4-bit)

Specialized for programming tasks
Strong multi-language support
4GB download size
Ideal for code reviews and generation

Maximum Speed

Gemma 2 2B Instruct (4-bit)

Ultra-fast response times
1.5GB download size
Good for simple tasks
Runs efficiently on all M-series Macs

Highest Quality

Llama 3.2 8B Instruct (4-bit)

Superior quality under 10B parameters
5GB download size
More nuanced responses
Recommended for 16GB+ RAM

Model Manager

Access the Model Manager through the Osaurus menu bar icon.

Downloading Models

Click the Osaurus menu bar icon
Select Model Manager
Browse or search for models
Click Download on your chosen model
Monitor progress in the download queue

Model Information

Each model displays:

Name — Model identifier
Size — Download and disk size
Quantization — Bit precision (4-bit, 8-bit)
Parameters — Model size in billions
Download Status — Current state

Managing Storage

Models are stored in:

~/Library/Containers/ai.dinoki.osaurus/Data/Library/Application Support/models/

To remove models:

Open Model Manager
Find the downloaded model
Click Delete
Confirm removal

Model Types

MLX Models

MLX models are optimized specifically for Apple Silicon:

4-bit Quantization — Best speed/quality trade-off
8-bit Quantization — Higher quality, more memory
16-bit — Maximum quality, significant memory usage

Apple Foundation Models

Available on supported macOS versions:

# Use with model ID "foundation"
curl -X POST http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Features:

System-integrated model
No download required
Optimized for Apple Silicon
Privacy-focused design

Model Naming Convention

Osaurus uses consistent model naming:

{model-family}-{version}-{size}-{variant}-{quantization}

Examples:

llama-3.2-3b-instruct-4bit
mistral-7b-instruct-v0.2-4bit
deepseek-coder-7b-instruct-4bit

Performance Characteristics

Memory Requirements

Model Size	4-bit	8-bit	16-bit
2-3B	2-3GB	4-6GB	8-12GB
7-8B	4-5GB	8-10GB	16-20GB
13B	8-10GB	16-20GB	32-40GB
30B+	20-25GB	40-50GB	80-100GB

Speed Benchmarks

Typical tokens per second on M2:

Model	4-bit	8-bit
3B	40-60	30-45
7B	20-35	15-25
13B	12-20	8-15

Model Configuration

Context Length

Default context lengths by model family:

Llama 3.2 — 4096 tokens
Mistral — 8192 tokens
Qwen 2.5 — 32768 tokens
DeepSeek — 4096 tokens

Temperature Settings

Recommended temperature ranges:

Creative Writing — 0.7-1.0
Code Generation — 0.1-0.3
General Chat — 0.5-0.7
Factual Responses — 0.0-0.3

System Prompts

Configure default system prompts in Settings:

{
  "model": "llama-3.2-3b-instruct-4bit",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful, concise assistant."
    },
    {
      "role": "user",
      "content": "Explain quantum computing"
    }
  ]
}

Model Selection Guide

By Use Case

General Purpose

Llama 3.2 3B/8B
Mistral 7B
Qwen 2.5 3B/7B

Programming

DeepSeek Coder 7B
Code Llama 7B/13B
Qwen 2.5 Coder

Creative Writing

Llama 3.2 8B
Mistral 7B
Neural Chat 7B

Technical/Scientific

Llama 3.2 8B
Qwen 2.5 7B
Mistral 7B

By System Resources

8GB RAM

2-3B models (4-bit)
Single model at a time

16GB RAM

7-8B models (4-bit)
3B models (8-bit)
Multiple small models

32GB+ RAM

13B models (4-bit)
7-8B models (8-bit)
Larger context windows

Advanced Configuration

There are no global model aliasing or preloading options at this time. Control behavior per request via the OpenAI-compatible API.

Troubleshooting

Model Not Found

Verify model is downloaded in Model Manager
Check exact model name:
```
curl http://127.0.0.1:1337/v1/models
```
Ensure correct spelling and format

Slow Performance

Check Activity Monitor for memory pressure
Try smaller or more quantized models
Close unnecessary applications
Reduce context length in requests

Download Issues

Check internet connection
Verify available disk space
Try pausing and resuming download
Check Model Manager logs

Memory Errors

Monitor RAM usage during inference
Switch to more quantized versions
Reduce max_tokens in requests
Consider smaller models

Model Updates

Osaurus periodically updates available models:

New models appear automatically in Model Manager
Updated versions are marked with badges
Old versions remain usable until deleted
Check GitHub releases for model announcements

Questions about models? Join our Discord community or check the benchmarks page.

Model Recommendations​

Best Overall Performance​

Code Generation​

Maximum Speed​

Highest Quality​

Model Manager​

Downloading Models​

Model Information​

Managing Storage​

Model Types​

MLX Models​

Apple Foundation Models​

Model Naming Convention​

Performance Characteristics​

Memory Requirements​

Speed Benchmarks​

Model Configuration​

Context Length​

Temperature Settings​

System Prompts​

Model Selection Guide​

By Use Case​

By System Resources​

Advanced Configuration​

Troubleshooting​

Model Not Found​

Slow Performance​

Download Issues​

Memory Errors​

Model Updates​