Quickstart

Name: Osaurus
Author: Osaurus

Get your first local LLM running in minutes. This guide walks through installation, setup, and your first API call.

Setup

Install Osaurus

Install using Homebrew:

brew install --cask osaurus

Alternatively, download directly from GitHub releases.

Launch the Application

Open Osaurus from Spotlight (⌘ Space) or Applications
Look for the Osaurus icon in your menu bar
Click the icon to access the control panel

Start the Server

Click Start Server in the menu
Wait for the status to show Running on port 1337
Your local LLM server is now active

Download a Model

Select Model Manager from the menu bar
Browse available models or use search
For first-time users, we recommend Llama 3.2 3B Instruct 4bit
- Balanced performance and quality
- 2GB download size
- Runs efficiently on 8GB+ Macs
Click Download and wait for completion

Test the API

Verify your installation with a simple request:

curl -s http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-3b-instruct-4bit",
    "messages": [{"role":"user","content":"Hello! Tell me a fun fact about dinosaurs."}],
    "max_tokens": 100
  }' | jq

Using the Chat Interface

Osaurus includes an integrated chat interface for direct model interaction.

Accessing Chat

Press ⌘; (Command + Semicolon) to open the chat overlay. Type your message and press Enter to send. Press ⌘; again to close.

Chat Features

Markdown Rendering — Formatted responses with syntax highlighting
Copy Messages — Click the copy icon on any message
Stop Generation — Interrupt streaming responses
Model Selection — Switch models using the dropdown
System Prompts — Configure in Settings → Chat

Example Requests

Creative Writing

curl -s http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-3b-instruct-4bit",
    "messages": [{"role":"user","content":"Write a haiku about coding late at night"}]
  }' | jq -r '.choices[0].message.content'

Code Generation

curl -s http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-3b-instruct-4bit",
    "messages": [{"role":"user","content":"Write a Python function to reverse a string without using built-in functions"}]
  }' | jq -r '.choices[0].message.content'

Streaming Responses

curl -N http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-3b-instruct-4bit",
    "messages": [{"role":"user","content":"Explain quantum computing in simple terms"}],
    "stream": true
  }'

Apple Foundation Models

On macOS 26 Tahoe or later, access Apple's system models:

# Check availability
curl -s http://127.0.0.1:1337/v1/models | jq '.data[] | select(.id=="foundation")'

# Use foundation model
curl -s http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [{"role":"user","content":"What makes Apple Silicon special?"}]
  }' | jq

Command Line Interface

Control Osaurus from Terminal:

# Start server
osaurus serve --port 1337

# Enable LAN access
osaurus serve --port 1337 --expose

# Check status
osaurus status

# Stop server
osaurus stop

# Open UI
osaurus ui

Python Integration

Use the OpenAI SDK with Osaurus:

from openai import OpenAI

# Configure for local server
client = OpenAI(
    base_url="http://127.0.0.1:1337/v1",
    api_key="not-needed"
)

# Make a request
response = client.chat.completions.create(
    model="llama-3.2-3b-instruct-4bit",
    messages=[
        {"role": "user", "content": "Write a joke about programming"}
    ]
)

print(response.choices[0].message.content)

JavaScript Integration

Access Osaurus from Node.js or browser environments:

const response = await fetch("http://127.0.0.1:1337/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "llama-3.2-3b-instruct-4bit",
    messages: [{ role: "user", content: "What's the weather like on Mars?" }],
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

Model Recommendations

Fast Response Models (4-bit)

Llama 3.2 3B Instruct — Excellent general-purpose model
Qwen 2.5 3B Instruct — Strong reasoning capabilities
Gemma 2 2B Instruct — Optimized for speed

Quality-Focused Models

Llama 3.2 8B Instruct — Superior quality with reasonable speed
Mistral 7B Instruct — Well-rounded performance
DeepSeek Coder 7B — Specialized for programming tasks

High-Memory Systems (32GB+)

Consider 8-bit variants for enhanced quality or larger 13B-30B models for advanced use cases.

Performance Optimization

Model Selection — Start with 4-bit models for optimal speed
Resource Management — Close unnecessary applications
Context Length — Shorter prompts yield faster responses
Response Streaming — Improves perceived performance
System Monitoring — Use Osaurus's built-in monitor

Troubleshooting

Model Not Found

Verify download completion in Model Manager
Check exact model name: curl http://127.0.0.1:1337/v1/models
Ensure lowercase naming with hyphens

Slow Performance

Try smaller models (3B vs 7B)
Reduce max_tokens parameter
Free up system memory
Check Activity Monitor for resource usage

Connection Issues

Verify server status: osaurus status
Check port configuration (default: 1337)
Ensure firewall allows localhost connections

Next Steps

Explore these resources to expand your usage:

API Reference — Complete endpoint documentation
Model Guide — Detailed model information
Configuration — Customize Osaurus settings
Integrations — Build with Osaurus
Community — Connect with other users

You're now running AI locally on your Mac.
No cloud dependencies. No usage limits. Complete privacy.

Setup​

Install Osaurus​

Launch the Application​

Start the Server​

Download a Model​

Test the API​

Using the Chat Interface​

Accessing Chat​

Chat Features​

Example Requests​

Creative Writing​

Code Generation​

Streaming Responses​

Apple Foundation Models​

Command Line Interface​

Python Integration​

JavaScript Integration​

Model Recommendations​

Fast Response Models (4-bit)​

Quality-Focused Models​

High-Memory Systems (32GB+)​

Performance Optimization​

Troubleshooting​

Model Not Found​

Slow Performance​

Connection Issues​

Next Steps​