Quick Start
You're five minutes away from running AI locally on your Mac. This guide walks you through installation, downloading your first model, and having your first conversation.
Setup
Install Osaurus
Install using Homebrew:
brew install --cask osaurus
Alternatively, download directly from GitHub releases.
Launch the Application
- Open Osaurus from Spotlight (⌘ Space) or Applications
- Look for the Osaurus icon in your menu bar
- Click the icon to access the control panel
Start the Server
- Click Start Server in the menu
- Wait for the status to show Running on port 1337
- Your local LLM server is now active
Download a Model
- Select Model Manager from the menu bar
- Browse available models or use search
- Select a model that fits your system's memory (see Model Management for details)
- Click Download and wait for completion
Test the API
Verify your installation with a simple request:
curl -s http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-3b-instruct-4bit",
"messages": [{"role":"user","content":"Hello! Tell me a fun fact about dinosaurs."}],
"max_tokens": 100
}' | jq
Using the Chat Interface
Osaurus includes an integrated chat interface for direct model interaction.
Accessing Chat
Press ⌘; (Command + Semicolon) to open the chat overlay. Type your message and press Enter to send. Press ⌘; again to close.
Chat Features
- Markdown Rendering — Formatted responses with syntax highlighting
- Copy Messages — Click the copy icon on any message
- Stop Generation — Interrupt streaming responses
- Model Selection — Switch models using the dropdown
- System Prompts — Configure in Settings → Chat
Example Requests
Creative Writing
curl -s http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-3b-instruct-4bit",
"messages": [{"role":"user","content":"Write a haiku about coding late at night"}]
}' | jq -r '.choices[0].message.content'
Code Generation
curl -s http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-3b-instruct-4bit",
"messages": [{"role":"user","content":"Write a Python function to reverse a string without using built-in functions"}]
}' | jq -r '.choices[0].message.content'
Streaming Responses
curl -N http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-3b-instruct-4bit",
"messages": [{"role":"user","content":"Explain quantum computing in simple terms"}],
"stream": true
}'
Apple Foundation Models
On macOS 26 Tahoe or later, access Apple's system models:
# Check availability
curl -s http://127.0.0.1:1337/v1/models | jq '.data[] | select(.id=="foundation")'
# Use foundation model
curl -s http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "foundation",
"messages": [{"role":"user","content":"What makes Apple Silicon special?"}]
}' | jq
Command Line Interface
Control Osaurus from Terminal:
# Start server
osaurus serve --port 1337
# Enable LAN access
osaurus serve --port 1337 --expose
# Check status
osaurus status
# Stop server
osaurus stop
# Open UI
osaurus ui
Python Integration
Use the OpenAI SDK with Osaurus:
from openai import OpenAI
# Configure for local server
client = OpenAI(
base_url="http://127.0.0.1:1337/v1",
api_key="not-needed"
)
# Make a request
response = client.chat.completions.create(
model="llama-3.2-3b-instruct-4bit",
messages=[
{"role": "user", "content": "Write a joke about programming"}
]
)
print(response.choices[0].message.content)
JavaScript Integration
Access Osaurus from Node.js or browser environments:
const response = await fetch("http://127.0.0.1:1337/v1/chat/completions", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "llama-3.2-3b-instruct-4bit",
messages: [{ role: "user", content: "What's the weather like on Mars?" }],
}),
});
const data = await response.json();
console.log(data.choices[0].message.content);
Performance Optimization
- Model Selection — Start with 4-bit models for optimal speed
- Resource Management — Close unnecessary applications
- Context Length — Shorter prompts yield faster responses
- Response Streaming — Improves perceived performance
- System Monitoring — Use Osaurus's built-in monitor
Troubleshooting
Model Not Found
- Verify download completion in Model Manager
- Check exact model name:
curl http://127.0.0.1:1337/v1/models - Ensure lowercase naming with hyphens
Slow Performance
- Try smaller models (3B vs 7B)
- Reduce
max_tokensparameter - Free up system memory
- Check Activity Monitor for resource usage
Connection Issues
- Verify server status:
osaurus status - Check port configuration (default: 1337)
- Ensure firewall allows localhost connections
What's Next?
Now that you're up and running, explore what Osaurus can do:
For Everyone:
- Chat Interface — Master the chat overlay
- Personas — Create custom AI assistants
- Voice Input — Talk to your AI hands-free
For Developers:
- API Reference — Complete endpoint documentation
- SDK Examples — Python, JavaScript, and more
- Tools & Plugins — Extend AI with native tools
Need help? Join our Discord community or check GitHub issues.