Apple Intelligence Integration
Osaurus integrates seamlessly with Apple Foundation Models when available on your system, giving you access to the system's default on-device language model with zero configuration.
🍎 Overview
Apple Foundation Models provide:
- System-integrated AI — Uses the same models as system features
- Hardware acceleration — Optimized for Apple Neural Engine (ANE)
- Zero setup — No downloads or configuration needed
- Privacy-first — All processing happens on-device
📋 Requirements
- macOS 26 (Tahoe) or later
- Apple Silicon Mac (M1, M2, M3, or newer)
- Apple Intelligence enabled in System Settings
While Osaurus itself runs on macOS 15.5+, Apple Foundation Models specifically require macOS 26 (Tahoe) or later.
🚀 Setup
- Update macOS to version 26 (Tahoe) or later
- Enable Apple Intelligence in System Settings → Apple Intelligence & Siri
- Start Osaurus — It automatically detects Foundation Models
- Verify availability:
curl -s http://127.0.0.1:1337/v1/models | jq '.data[] | select(.id=="foundation")'
If you see a foundation entry, you're ready to use Apple's models!
💬 Using Foundation Models
Basic Chat
Use model: "foundation" in your requests:
curl -s http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "foundation",
"messages": [{"role":"user","content":"Explain quantum computing simply"}],
"max_tokens": 200
}' | jq -r '.choices[0].message.content'
Using the Alias
You can also use model: "default" which maps to Foundation Models when available:
curl -s http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default",
"messages": [{"role":"user","content":"Write a haiku about coding"}]
}' | jq -r '.choices[0].message.content'
Streaming Responses
Foundation Models support streaming for real-time output:
curl -N http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "foundation",
"messages": [{"role":"user","content":"Tell me a story about a brave robot"}],
"stream": true
}'
🛠️ Advanced Features
Function/Tool Calling
Osaurus transparently maps OpenAI-style tools to Apple's tool interface:
curl -s http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "foundation",
"messages": [{"role":"user","content":"What is the weather in San Francisco?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
}],
"tool_choice": "auto"
}'
Key Points:
- Tools work identically to MLX models
- Streaming emits OpenAI-style
tool_callsdeltas - Your existing tool-calling code works unchanged
System Prompts
Foundation Models respect system prompts for consistent behavior:
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:1337/v1", api_key="not-needed")
response = client.chat.completions.create(
model="foundation",
messages=[
{"role": "system", "content": "You are a helpful coding assistant. Always include comments in code examples."},
{"role": "user", "content": "Write a Python function to calculate factorial"}
]
)
⚡ Performance Characteristics
Advantages
- Instant loading — No model initialization required
- ANE acceleration — Leverages dedicated neural hardware
- Memory efficient — Shared with system services
- Consistent quality — Same model as system features
Considerations
- Fixed model — Cannot choose different sizes/versions
- System dependent — Requires specific macOS version
- Limited configuration — Less control than MLX models
🔍 Detection and Fallback
Programmatic Detection
import requests
def has_foundation_models():
try:
response = requests.get("http://127.0.0.1:1337/v1/models")
models = response.json()["data"]
return any(m["id"] == "foundation" for m in models)
except:
return False
# Use Foundation Models if available, fall back to MLX
if has_foundation_models():
model = "foundation"
else:
model = "llama-3.2-3b-instruct-4bit"
Graceful Fallback
async function getBestModel() {
try {
const response = await fetch("http://127.0.0.1:1337/v1/models");
const { data } = await response.json();
// Prefer Foundation Models if available
if (data.some((m) => m.id === "foundation")) {
return "foundation";
}
// Fall back to first available MLX model
return (
data.find((m) => m.id !== "foundation")?.id ||
"llama-3.2-3b-instruct-4bit"
);
} catch (error) {
return "llama-3.2-3b-instruct-4bit";
}
}
🔐 Privacy & Security
- 100% on-device — No data leaves your Mac
- No telemetry — Apple Foundation Models don't phone home via Osaurus
- Sandboxed — Runs within macOS security boundaries
- No API keys — No authentication or tracking
🐛 Troubleshooting
Foundation model not appearing
-
Check macOS version:
sw_vers -productVersion
# Should be 26.0 or higher -
Verify Apple Intelligence is enabled:
- System Settings → Apple Intelligence & Siri
- Toggle "Apple Intelligence" ON
-
Restart Osaurus after enabling Apple Intelligence
-
Check system requirements:
sysctl -n machdep.cpu.brand_string
# Should show Apple M1, M2, M3, etc.
Errors using foundation model
"Model not found" error:
- Foundation Models not available on your system
- Fall back to an MLX model
- Check
/v1/modelsendpoint for available models
Slow or no response:
- System may be loading the model initially
- Check Activity Monitor for high system usage
- Ensure adequate free memory (8GB+ recommended)
Unexpected output:
- Foundation Models may behave differently than MLX models
- Adjust prompts and parameters as needed
- Use system prompts for consistent behavior
Performance issues
-
Free up resources:
- Quit unnecessary apps
- Check Activity Monitor for memory pressure
-
Optimize requests:
{
"max_tokens": 200, // Limit output length
"temperature": 0.7, // Balance creativity/consistency
"stream": true // Better perceived performance
} -
Monitor system health:
# Check Osaurus health
curl -s http://127.0.0.1:1337/health | jq
# Check system memory pressure
vm_stat | grep "Pages free"
🎯 Best Practices
- Prefer Foundation Models when available — Better integration and performance
- Implement fallback logic — Handle systems without Apple Intelligence
- Use streaming — Foundation Models excel at streaming responses
- Test on both — Ensure your app works with and without Foundation Models
- Monitor availability — Models may be temporarily unavailable during system updates
🔗 Related Resources
- Model Management — Learn about all supported models
- API Reference — Complete API documentation
- Configuration — Server settings
- Apple Intelligence Docs — Official Apple documentation
Questions about Apple Intelligence?
Join our Discord community for help