Apple Intelligence Integration

Name: Osaurus
Author: Osaurus

Osaurus integrates seamlessly with Apple Foundation Models when available on your system, giving you access to the system's default on-device language model with zero configuration.

🍎 Overview

Apple Foundation Models provide:

System-integrated AI — Uses the same models as system features
Hardware acceleration — Optimized for Apple Neural Engine (ANE)
Zero setup — No downloads or configuration needed
Privacy-first — All processing happens on-device

📋 Requirements

macOS 26 (Tahoe) or later
Apple Silicon Mac (M1, M2, M3, or newer)
Apple Intelligence enabled in System Settings

Compatibility Note

While Osaurus itself runs on macOS 15.5+, Apple Foundation Models specifically require macOS 26 (Tahoe) or later.

🚀 Setup

Update macOS to version 26 (Tahoe) or later
Enable Apple Intelligence in System Settings → Apple Intelligence & Siri
Start Osaurus — It automatically detects Foundation Models
Verify availability:

curl -s http://127.0.0.1:1337/v1/models | jq '.data[] | select(.id=="foundation")'

If you see a foundation entry, you're ready to use Apple's models!

💬 Using Foundation Models

Basic Chat

Use model: "foundation" in your requests:

curl -s http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [{"role":"user","content":"Explain quantum computing simply"}],
    "max_tokens": 200
  }' | jq -r '.choices[0].message.content'

Using the Alias

You can also use model: "default" which maps to Foundation Models when available:

curl -s http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default",
    "messages": [{"role":"user","content":"Write a haiku about coding"}]
  }' | jq -r '.choices[0].message.content'

Streaming Responses

Foundation Models support streaming for real-time output:

curl -N http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [{"role":"user","content":"Tell me a story about a brave robot"}],
    "stream": true
  }'

🛠️ Advanced Features

Function/Tool Calling

Osaurus transparently maps OpenAI-style tools to Apple's tool interface:

curl -s http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [{"role":"user","content":"What is the weather in San Francisco?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
          "type": "object",
          "properties": {"city": {"type": "string"}},
          "required": ["city"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

Key Points:

Tools work identically to MLX models
Streaming emits OpenAI-style tool_calls deltas
Your existing tool-calling code works unchanged

System Prompts

Foundation Models respect system prompts for consistent behavior:

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:1337/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="foundation",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant. Always include comments in code examples."},
        {"role": "user", "content": "Write a Python function to calculate factorial"}
    ]
)

⚡ Performance Characteristics

Advantages

Instant loading — No model initialization required
ANE acceleration — Leverages dedicated neural hardware
Memory efficient — Shared with system services
Consistent quality — Same model as system features

Considerations

Fixed model — Cannot choose different sizes/versions
System dependent — Requires specific macOS version
Limited configuration — Less control than MLX models

🔍 Detection and Fallback

Programmatic Detection

import requests

def has_foundation_models():
    try:
        response = requests.get("http://127.0.0.1:1337/v1/models")
        models = response.json()["data"]
        return any(m["id"] == "foundation" for m in models)
    except:
        return False

# Use Foundation Models if available, fall back to MLX
if has_foundation_models():
    model = "foundation"
else:
    model = "llama-3.2-3b-instruct-4bit"

Graceful Fallback

async function getBestModel() {
  try {
    const response = await fetch("http://127.0.0.1:1337/v1/models");
    const { data } = await response.json();

    // Prefer Foundation Models if available
    if (data.some((m) => m.id === "foundation")) {
      return "foundation";
    }

    // Fall back to first available MLX model
    return (
      data.find((m) => m.id !== "foundation")?.id ||
      "llama-3.2-3b-instruct-4bit"
    );
  } catch (error) {
    return "llama-3.2-3b-instruct-4bit";
  }
}

🔐 Privacy & Security

100% on-device — No data leaves your Mac
No telemetry — Apple Foundation Models don't phone home via Osaurus
Sandboxed — Runs within macOS security boundaries
No API keys — No authentication or tracking

🐛 Troubleshooting

Foundation model not appearing

Check macOS version:

sw_vers -productVersion
# Should be 26.0 or higher

Verify Apple Intelligence is enabled:
- System Settings → Apple Intelligence & Siri
- Toggle "Apple Intelligence" ON
Restart Osaurus after enabling Apple Intelligence

Check system requirements:

sysctl -n machdep.cpu.brand_string
# Should show Apple M1, M2, M3, etc.

Errors using foundation model

"Model not found" error:

Foundation Models not available on your system
Fall back to an MLX model
Check /v1/models endpoint for available models

Slow or no response:

System may be loading the model initially
Check Activity Monitor for high system usage
Ensure adequate free memory (8GB+ recommended)

Unexpected output:

Foundation Models may behave differently than MLX models
Adjust prompts and parameters as needed
Use system prompts for consistent behavior

Performance issues

Free up resources:
- Quit unnecessary apps
- Check Activity Monitor for memory pressure

Optimize requests:

{
  "max_tokens": 200, // Limit output length
  "temperature": 0.7, // Balance creativity/consistency
  "stream": true // Better perceived performance
}

Monitor system health:

# Check Osaurus health
curl -s http://127.0.0.1:1337/health | jq

# Check system memory pressure
vm_stat | grep "Pages free"

🎯 Best Practices

Prefer Foundation Models when available — Better integration and performance
Implement fallback logic — Handle systems without Apple Intelligence
Use streaming — Foundation Models excel at streaming responses
Test on both — Ensure your app works with and without Foundation Models
Monitor availability — Models may be temporarily unavailable during system updates

Model Management — Learn about all supported models
API Reference — Complete API documentation
Configuration — Server settings
Apple Intelligence Docs — Official Apple documentation

Questions about Apple Intelligence?
Join our Discord community for help

🍎 Overview​

📋 Requirements​

🚀 Setup​

💬 Using Foundation Models​

Basic Chat​

Using the Alias​

Streaming Responses​

🛠️ Advanced Features​

Function/Tool Calling​

System Prompts​

⚡ Performance Characteristics​

Advantages​

Considerations​

🔍 Detection and Fallback​

Programmatic Detection​

Graceful Fallback​

🔐 Privacy & Security​

🐛 Troubleshooting​

Foundation model not appearing​

Errors using foundation model​

Performance issues​

🎯 Best Practices​

🔗 Related Resources​