Tool Contract

Name: Osaurus
Author: Osaurus

Every Osaurus tool — global built-in, folder tool, sandbox tool, plugin tool, MCP-aggregated tool — returns a JSON string in exactly one of two shapes. This page is the one-stop reference for tool authors.

The type lives at Tools/ToolEnvelope.swift in the osaurus repo.

Success envelope

{
  "ok": true,
  "tool": "sandbox_write_file",
  "result": { "path": "/home/agent/foo.txt", "size": 123 },
  "warnings": ["slow disk"]
}

Field	Description
`ok`	Always `true`
`tool`	Optional — the tool name. Populated automatically by the helpers.
`result`	The tool's payload. Object, array, string, number, bool, or null.
`warnings`	Optional list of non-fatal notes the model should read

`text` convenience

Tools whose primary output is a single human-readable string (folder tools, capability listings, search-memory hits, todo/complete/clarify) use:

return ToolEnvelope.success(tool: name, text: "Found 3 matches\n...")

which is sugar for result: { "text": "..." }. The chat UI's tool-call card detects this pattern and renders the text verbatim as Markdown instead of a JSON code block.

Failure envelope

{
  "ok": false,
  "kind": "invalid_args",
  "message": "Missing required argument `content` (string).",
  "field": "content",
  "expected": "non-empty string of file contents",
  "tool": "sandbox_write_file",
  "retryable": true
}

Field	Description
`ok`	Always `false`
`kind`	Classification — see the table below
`message`	Human- and model-readable explanation
`field`	Optional — offending argument name when `kind` is `invalid_args`
`expected`	Optional — what the argument should look like (example form)
`tool`	Optional — populated automatically
`retryable`	Whether a retry might succeed. Defaulted by `kind`.

Kinds

`kind`	meaning	default `retryable`
`invalid_args`	argument missing, malformed, or scope-incompatible	`true`
`rejected`	blocked by configured policy	`false`
`user_denied`	user clicked Deny on an interactive approval	`false`
`timeout`	tool ran past its time budget	`true`
`execution_error`	tool ran but failed (process exited non-zero, file missing…)	`true`
`unavailable`	tool exists but can't run right now (sandbox booting, etc.)	`true`
`tool_not_found`	model called a tool the registry doesn't have	`false`

Detection

Code paths that need to distinguish success from failure without parsing the whole envelope use:

ToolEnvelope.isError(resultString)     // true for failure envelopes + legacy prefixes
ToolEnvelope.isSuccess(resultString)   // symmetric
ToolEnvelope.successPayload(result)    // returns the `result` dict for a success
ToolEnvelope.failureMessage(result)    // returns `message` (falls back to the input)

These also recognise the legacy [REJECTED] / [TIMEOUT] prefixes and the legacy ToolErrorEnvelope JSON shape so partial migrations don't mis-classify.

Writing a tool

Use the require… helpers on OsaurusTool to build failure envelopes with the right field / expected automatically:

func execute(argumentsJSON: String) async throws -> String {
    let argsReq = requireArgumentsDictionary(argumentsJSON, tool: name)
    guard case .value(let args) = argsReq else { return argsReq.failureEnvelope ?? "" }

    let pathReq = requireString(
        args, "path",
        expected: "relative path under the agent home",
        tool: name
    )
    guard case .value(let path) = pathReq else { return pathReq.failureEnvelope ?? "" }

    // ... do work ...
    return ToolEnvelope.success(tool: name, result: ["path": path, "size": 123])
}

Sandbox tools have requirePath(_:home:tool:) on top that routes through SandboxPathSanitizer and turns a rejection into an invalid_args envelope with the specific reason (path traversal, dangerous character, outside allowed roots, etc.).

Thrown errors

Tool bodies that throw (folder tools, for historical reasons) have the exception mapped to the envelope at the catch site via ToolEnvelope.fromError(_:tool:). That helper understands FolderToolError, ToolRegistry permission NSError codes, and any other Error (falls through to execution_error).

Schema

Add "additionalProperties": false to every new tool's top-level schema. SchemaValidator enforces it at ToolRegistry.execute time and emits invalid_args with field: <unknown> for the model.

Scalar types are intentionally lenient: integer, number, and boolean properties accept native JSON values and string-encoded equivalents ("15", "3.14", "true"/"yes"/"1"). array properties additionally accept a string that JSON-decodes to an array ("[\"a\",\"b\"]"). This matches the tool-side ArgumentCoercion helpers so local models that emit slightly off types don't bounce on the preflight when the body would coerce anyway. string, object, and enum checks remain strict, and array still rejects bare non-array strings so the model gets a clear signal.

Prefer:

enum for closed-set values (chartType, scope, language, …)
default declared in the schema for any default the implementation uses
Concrete examples in description strings

Special-case markers (artifact, chart)

share_artifact and render_chart carry marker-delimited blobs (---SHARED_ARTIFACT_START--- / ---CHART_START---) because the chat UI is tightly coupled to those parsers. The markers ride inside the envelope's result.text string — downstream parsers extract text from the envelope first, then scan for markers. Prefer not to add new marker-based flows; treat them as legacy.

`share_artifact` failure envelopes

The chat-layer wrapper differentiates four failure modes for share_artifact so the model can self-correct on the next turn instead of retrying the same path. Each maps to a specific ToolEnvelope.failure shape:

Path rejected (pathRejected) → kind: invalid_args, field: "path", message names the trusted root and suggests sandbox_search_files.
File not found (fileNotFound) → kind: execution_error, message enumerates every candidate path the resolver tried (e.g. <home>/foo.png, <home>/output/foo.png, <home>/dist/foo.png, …) so the model knows exactly where to look next.
Copy failed (copyFailed) → kind: execution_error, message carries the FS error string (disk full, perms) plus the source path.
Filename rejected (destinationRejected) → kind: invalid_args, field: "filename", asks for a plain basename.

Empty-string filler in optional fields (content: "", filename: "") is treated as absent on entry — many models pass empty placeholders for unused fields, and rejecting that as invalid_args was a footgun.

`sandbox_exec` background flag

Foreground (default): returns {stdout, stderr, exit_code, cwd} when the command finishes (capped by timeout, max 300s). Pass background:true to spawn a detached process — the tool returns {pid, log_file, cwd, background:true} as soon as the spawn shim returns. Manage the resulting job through sandbox_process (poll/wait/kill).

Loop tools

Three special tools drive the inline UI for the Tasks experience. The chat layer intercepts their results and renders the live to-do list, "Completed" banner, and clarifying-question prompt. They're available in every chat.

`todo`

Publishes or updates the plan as a markdown checklist. The list lives in the chat and ticks off as the agent works. Each call replaces the whole list, so the agent can rewrite the plan as it learns more.

`complete`

Ends the loop with a summary of what was done and how it was verified. Becomes a "Completed" banner in the chat. Placeholder summaries (done, ok, looks good) are rejected by the validator so the agent can't fake completion — partial completions must be honest about what was and wasn't done.

`clarify`

Pauses the loop and asks one critical question. Optional one-tap answer chips (options[]) let the user answer with a single click. The validator only accepts clarify calls when the question genuinely changes the outcome — agents can't use it for cosmetic preferences mid-task.

Result envelopes

All three return their payload via the text convenience (a single human-readable string). The chat-layer parsers consume the envelope's result.text to render the corresponding UI element.

Related:

Plugin Authoring — how to wire your tool into the registry
Tools & Plugins — what tools exist and how they're auto-selected
Sandbox Internals — sandbox-specific success/failure shapes

Success envelope​

text convenience​

Failure envelope​

Kinds​

Detection​

Writing a tool​

Thrown errors​

Schema​

Special-case markers (artifact, chart)​

share_artifact failure envelopes​

sandbox_exec background flag​

Loop tools​

todo​

complete​

clarify​

Result envelopes​