Skip to main content

Server Settings

Osaurus is designed to work out of the box with sensible defaults. This page covers the knobs you can turn.

Environment variables

VariableDescriptionDefault
OSU_PORTServer port number1337
OSU_MODELS_DIRCustom MLX models directory~/MLXModels
# Persistent (shell profile)
export OSU_PORT=8080
export OSU_MODELS_DIR=/Volumes/External/MLXModels

# Or inline
OSU_PORT=8080 osaurus serve

Server flags

osaurus serve accepts:

OptionDescriptionDefault
--port, -pServer port1337
--exposeBind to all interfaces (LAN access)localhost only
osaurus serve # localhost:1337
osaurus serve --port 8080 # localhost:8080
osaurus serve --expose # 0.0.0.0:1337 (LAN)
osaurus serve --expose --port 1337 # 0.0.0.0:1337 (LAN, explicit)
LAN exposure

When you --expose, anyone on your network can reach your Osaurus. Use access keys to protect endpoints — see Identity.

Capabilities (auto-selection)

Tools, skills, and methods are auto-selected via RAG before each turn. Configure the search width in Management → Settings → Capabilities:

ModeMethodsToolsSkills
off000
narrow121
balanced (default)352
wide584

Higher modes give the agent more tools to choose from at the cost of larger system prompts. Skills → · Methods →

Memory

Memory is on by default, with eight settings. Edit them in Management → Memory or in ~/.osaurus/config/memory.json:

SettingDefaultDescription
enabledtrueMaster toggle
embeddingBackendmlxEmbedding backend (mlx / none)
embeddingModelnomic-embed-text-v1.5Embedding model used by VecturaKit
extractionModesessionEndWhen to distill (sessionEnd / manual)
relevanceGateModeheuristicRead-path gate (off / heuristic / llm)
memoryBudgetTokens800Per-request budget (100–4,000)
summaryDebounceSeconds60Inactivity before distillation (10–3,600)
consolidationIntervalHours24Background consolidator cadence (1–168)
salienceFloor0.2Eviction threshold for pinned facts (0–1)
episodeRetentionDays365Episode/transcript retention (0 = forever)

Memory → · Memory Internals →

Local inference

Settings → Local Inference → Model Management:

SettingDescription
Eviction policyStrict (One Model) keeps one model loaded (default); Flexible (Multi Model) allows concurrent models for high-RAM systems
Top PDefault top-p for inference (per-request override available)
Allowed originsCORS origins (currently *)

Advanced (defaults)

One advanced tunable, exposed only via defaults:

defaults write ai.osaurus ai.osaurus.scheduler.mlxBatchEngineMaxBatchSize -int 8

Default 4, clamped to [1, 32]. Higher values raise total throughput at the cost of wired-memory footprint and per-request latency. Inference Runtime details →

Sandbox (macOS 26+)

The Linux sandbox is configured in Management → Sandbox → Container → Resources or by editing ~/.osaurus/config/sandbox.json:

{
"autoStart": true,
"cpus": 2,
"memoryGB": 2,
"network": "outbound"
}
SettingRangeDefault
autoStarttrue / falsetrue
cpus1–82
memoryGB1–82
networkoutbound / noneoutbound

Changes require a container restart. Sandbox Internals →

Storage encryption

Since 0.17.7, every Osaurus SQLite database is encrypted at rest with SQLCipher. The data-encryption key lives in your macOS Keychain. Settings → Storage is where you back up, rotate, and recover.

You don't normally configure anything — it just works. Storage & Encryption →

API path prefixes

Endpoints are available under multiple prefixes for compatibility:

  • /v1/endpoint — OpenAI style
  • /api/endpoint — generic / Ollama style
  • /v1/api/endpoint — combined

All prefixes route to the same handlers.

HTTP server limits

To prevent unauthenticated clients from exhausting host memory, Osaurus rejects oversized request bodies before the auth gate:

EndpointLimit
POST /pair64 KiB
Other public HTTP routes32 MiB
Sandbox host bridge8 MiB

Oversized requests return 413 Payload Too Large.

Where things live

WhatPathOverride
MLX models~/MLXModels/OSU_MODELS_DIR
App data root~/.osaurus/not configurable
Plugin install root~/.osaurus/Tools/<plugin_id>/<version>/not configurable
Voice models~/Library/Application Support/FluidAudio/Models/not configurable
Memory~/.osaurus/memory/memory.sqlite (encrypted) + vectura/{agent}/not configurable
Chat history~/.osaurus/chat-history/history.sqlite (encrypted) + blobs/*.osecnot configurable
Methods~/.osaurus/methods/methods.sqlite (encrypted)not configurable
Tool index~/.osaurus/tool-index/tool_index.sqlite (encrypted)not configurable
Schedules~/.osaurus/schedules/{uuid}.jsonnot configurable
Watchers~/.osaurus/watchers/{uuid}.jsonnot configurable
Skills~/.osaurus/skills/{name}/SKILL.mdnot configurable
Themes~/.osaurus/themes/{uuid}.jsonnot configurable
Sandbox plugins~/.osaurus/sandbox-plugins/not configurable
Sandbox container~/.osaurus/container/not configurable
Configs~/.osaurus/config/*.jsonedit directly
Encryption keymacOS Keychain (com.osaurus.storage)see Storage
Identity master keyiCloud Keychainsee Identity

Per-request configuration

Most generation behavior is per-request via API parameters:

{
"model": "gemma-4-e2b-it-4bit",
"messages": [{ "role": "user", "content": "Hello" }],
"temperature": 0.7,
"max_tokens": 1000,
"top_p": 0.9,
"stream": true,
"session_id": "my-conversation"
}

HTTP API reference →

Single-machine local-first development:

osaurus serve # default port, loopback only

LAN access for testing on phone or another laptop:

osaurus serve --expose
# Then mint an osk-v1 access key from Identity → Access Keys

External drive for large models:

export OSU_MODELS_DIR=/Volumes/ModelsDrive/MLXModels
osaurus serve

Multiple instances (one per project, etc.):

# Terminal 1
OSU_PORT=1337 osaurus serve

# Terminal 2 (separate models dir if you want isolation)
OSU_MODELS_DIR=~/MLXModels-experimental OSU_PORT=1338 osaurus serve

Related: