Architecture

Name: Osaurus
Author: Osaurus

If you're building on top of Osaurus — writing plugins, scripts, or integrations — this page is the orientation. It maps the user-facing surfaces (chat overlay, management window, HTTP API) to the components underneath, and points to the deeper pages for each layer.

The harness

Osaurus presents three entry points:

The chat overlay (⌘;) — the daily driver
The Management window (⌘ ⇧ M) — settings, agents, models, plugins, tools, memory, themes, automation
The HTTP API (on :1337) — OpenAI / Anthropic / Open Responses / Ollama / MCP

All three funnel into the same agent loop, which talks to your memory, skills/methods, and the automation surface (schedules, watchers). Inference goes out to local MLX, Apple Foundation, or any cloud provider you've connected. Tools span native plugins (v1/v2 ABI), remote MCP servers, and the Linux sandbox. Underneath everything: identity (signed requests, access keys), encrypted storage (SQLCipher), and relay (public tunnels).

How the pieces fit together

flowchart TB
    User[You]
    Chat[Chat Overlay - ⌘;]
    Mgmt[Management Window - ⌘ ⇧ M]
    HTTP[HTTP API on :1337]

    User --> Chat
    User --> Mgmt
    User --> HTTP

    subgraph harness [The Harness]
        Loop[Agent Loop]
        Mem[Memory]
        Skills[Skills and Methods]
        Auto[Schedules and Watchers]
    end

    Chat --> Loop
    Mgmt --> Loop
    HTTP --> Loop

    Loop --> Mem
    Loop --> Skills
    Auto --> Loop

    subgraph providers [Inference]
        MLX[MLX Local]
        Foundation[Apple Foundation]
        Cloud[Cloud Providers]
    end

    Loop --> MLX
    Loop --> Foundation
    Loop --> Cloud

    subgraph plugins [Tools]
        Native[Native Plugins]
        MCP[Remote MCP]
        Sandbox[Linux Sandbox]
    end

    Loop --> Native
    Loop --> MCP
    Loop --> Sandbox

    subgraph foundationLayer [Foundations]
        Identity[Identity and Access]
        Storage[Encrypted Storage]
        Relay[Relay Tunnels]
    end

    harness --> foundationLayer
    plugins --> foundationLayer

Layers

Layer	What it does	Reference
Entry points	Chat overlay (`⌘;`), Management window (`⌘ ⇧ M`), HTTP API on `:1337`	Chat, HTTP API, CLI
Harness	Tasks, Memory, Skills/Methods, Schedules/Watchers — the continuity layer	Tasks, Memory, Skills, Methods, Schedules, Watchers
Inference	MLX local models, Apple Foundation Models, cloud providers — all behind the same picker	Models, Apple Intelligence, Inference Runtime
Tools	20+ native plugins (Mail, Calendar, Browser, Git, …), remote MCP aggregation, the Linux Sandbox	Tools & Plugins, Plugin Authoring, Sandbox Internals, Remote MCP Providers
Foundations	Identity (signed requests, `osk-v1` keys), encrypted storage (SQLCipher), Public Links (public tunnels)	Identity Cryptography, Storage & Encryption, Public Links

Entry points

Chat overlay

A glass-style overlay summoned with ⌘; from anywhere on macOS. Holds zero, one, or many chat windows. Each window has its own active agent, working folder / Sandbox state, model selection, and conversation history. Multi-window mode lets you run several agents side by side.

The overlay is also where voice input lives: the microphone in the input bar, plus VAD wake-word activation and global Transcription Mode.

Management window

⌘ ⇧ M. Tabs for everything that isn't a single chat: Models, Providers, Agents, Plugins, Sandbox, Tools, Skills, Commands, Memory, Schedules, Watchers, Voice, Themes, Insights, Server, Permissions, Identity, Storage, Settings.

HTTP API

A local server on port 1337 (configurable). Speaks OpenAI Chat Completions, Anthropic Messages, Open Responses, and Ollama Chat APIs side by side, plus MCP server endpoints (/mcp/health, /mcp/tools, /mcp/call) and Osaurus-specific routes (/agents/{id}/run, /memory/ingest, /agents, /pair).

Harness

The harness is what makes Osaurus more than a thin SDK shim:

Agent Loop — every chat is an agent loop. The model writes a markdown todo list, calls tools, iterates, and ends with a verified summary or pauses to ask one critical question.
Memory — persistent on-device memory with three layers (identity, pinned facts, episodes) plus a transcript fallback. Distillation runs once per session, gated on a configured Core Model.
Skills & Methods — reusable capabilities. Skills are markdown packages of expertise; Methods are scored YAML workflows the agent saved from past runs. Both are auto-selected via RAG preflight.
Schedules & Watchers — automation. Schedules run on a clock; watchers react to file system changes via FSEvents.

Plugins, schedules, watchers, and the HTTP API all dispatch through the same agent loop — same engine, same loop tools, same intercepts. Sessions are tagged with their source (chat / plugin / http / schedule / watcher) so you can audit what spawned each conversation in the chat sidebar.

Inference

Three local options and a cloud surface, all behind the same model picker:

MLX — local transformer / SSM models, optimized for Apple Silicon via vmlx-swift-lm's BatchEngine (continuous batching, content-addressed prefix caching). Inference Runtime →
Apple Foundation Models — Apple's on-device system model (model: "foundation") on macOS 26+. Zero downloads, zero config.
Liquid Foundation Models — non-transformer architecture optimized for edge.
Cloud providers — OpenAI, Anthropic, xAI, OpenRouter, Venice, Ollama, LM Studio. API keys in macOS Keychain.

Memory and agent context persist across all of them — switching from local Gemma to Claude 4 doesn't lose what your agent has learned about you.

Tools

Two ABIs for native plugins:

v1 — tools only
v2 — full host API: HTTP routes, SQLite-backed config, web app serving, agent dispatch, inference, events

Plus remote MCP providers to aggregate tools from external MCP servers, and the Linux Sandbox (macOS 26+) for safe code execution. The sandbox itself accepts JSON-recipe plugins so users can extend an agent's capabilities without compiling anything.

Every tool — built-in, folder, sandbox, plugin, MCP-aggregated — returns the same canonical Tool Contract envelope.

Foundations

The trust layer underneath everything:

Identity — secp256k1 master key in iCloud Keychain (biometric-gated), deterministic per-agent child keys, Apple App Attest device assertion, osk-v1 access keys for external callers (scoped, expirable, revocable).
Encrypted Storage — SQLCipher across chat history, memory, methods, tool index, and plugin databases. Large attachments spilled to AES-GCM .osec blobs. Key in macOS Keychain, device-bound.
Public Links — secure WebSocket tunnels through agent.osaurus.ai per agent. The agent's cryptographic address is the routing key. No port forwarding.

These are the boundaries. See Security & Privacy for the user-facing summary, Identity Cryptography and Storage & Encryption for the specs.

Where to go next

Build a thing:

HTTP API — endpoint reference, streaming, function calling
SDK Examples — Python, JavaScript, Anthropic SDK, Open Responses
CLI — osaurus serve / mcp / tools / run
Tools & Plugins → Plugin Authoring → Tool Contract
Sandbox Internals — VM, vsock bridge, plugin recipes

Understand a piece:

Inference Runtime — BatchEngine, KV cache, model leases
Identity Cryptography — full crypto spec
Storage & Encryption — SQLCipher migration, key rotation, recovery
Developer Tools — Insights and Server Explorer in the Management window
Building from Source — clone, build, test, contribute

The harness​

How the pieces fit together​

Layers​

Entry points​

Chat overlay​

Management window​

HTTP API​

Harness​

Inference​

Tools​

Foundations​

Where to go next​