Skip to main content

Tool Contract

Every Osaurus tool — global built-in, folder tool, sandbox tool, plugin tool, MCP-aggregated tool — returns a JSON string in exactly one of two shapes. This page is the one-stop reference for tool authors.

The type lives at Tools/ToolEnvelope.swift in the osaurus repo.

Success envelope

{
"ok": true,
"tool": "sandbox_write_file",
"result": { "path": "/home/agent/foo.txt", "size": 123 },
"warnings": ["slow disk"]
}
FieldDescription
okAlways true
toolOptional — the tool name. Populated automatically by the helpers.
resultThe tool's payload. Object, array, string, number, bool, or null.
warningsOptional list of non-fatal notes the model should read

text convenience

Tools whose primary output is a single human-readable string (folder tools, capability listings, search-memory hits, todo/complete/clarify) use:

return ToolEnvelope.success(tool: name, text: "Found 3 matches\n...")

which is sugar for result: { "text": "..." }. The chat UI's tool-call card detects this pattern and renders the text verbatim as Markdown instead of a JSON code block.

Failure envelope

{
"ok": false,
"kind": "invalid_args",
"message": "Missing required argument `content` (string).",
"field": "content",
"expected": "non-empty string of file contents",
"tool": "sandbox_write_file",
"retryable": true
}
FieldDescription
okAlways false
kindClassification — see the table below
messageHuman- and model-readable explanation
fieldOptional — offending argument name when kind is invalid_args
expectedOptional — what the argument should look like (example form)
toolOptional — populated automatically
retryableWhether a retry might succeed. Defaulted by kind.

Kinds

kindmeaningdefault retryable
invalid_argsargument missing, malformed, or scope-incompatibletrue
rejectedblocked by configured policyfalse
user_denieduser clicked Deny on an interactive approvalfalse
timeouttool ran past its time budgettrue
execution_errortool ran but failed (process exited non-zero, file missing…)true
unavailabletool exists but can't run right now (sandbox booting, etc.)true
tool_not_foundmodel called a tool the registry doesn't havefalse

Detection

Code paths that need to distinguish success from failure without parsing the whole envelope use:

ToolEnvelope.isError(resultString) // true for failure envelopes + legacy prefixes
ToolEnvelope.isSuccess(resultString) // symmetric
ToolEnvelope.successPayload(result) // returns the `result` dict for a success
ToolEnvelope.failureMessage(result) // returns `message` (falls back to the input)

These also recognise the legacy [REJECTED] / [TIMEOUT] prefixes and the legacy ToolErrorEnvelope JSON shape so partial migrations don't mis-classify.

Writing a tool

Use the require… helpers on OsaurusTool to build failure envelopes with the right field / expected automatically:

func execute(argumentsJSON: String) async throws -> String {
let argsReq = requireArgumentsDictionary(argumentsJSON, tool: name)
guard case .value(let args) = argsReq else { return argsReq.failureEnvelope ?? "" }

let pathReq = requireString(
args, "path",
expected: "relative path under the agent home",
tool: name
)
guard case .value(let path) = pathReq else { return pathReq.failureEnvelope ?? "" }

// ... do work ...
return ToolEnvelope.success(tool: name, result: ["path": path, "size": 123])
}

Sandbox tools have requirePath(_:home:tool:) on top that routes through SandboxPathSanitizer and turns a rejection into an invalid_args envelope with the specific reason (path traversal, dangerous character, outside allowed roots, etc.).

Thrown errors

Tool bodies that throw (folder tools, for historical reasons) have the exception mapped to the envelope at the catch site via ToolEnvelope.fromError(_:tool:). That helper understands FolderToolError, ToolRegistry permission NSError codes, and any other Error (falls through to execution_error).

Schema

Add "additionalProperties": false to every new tool's top-level schema. SchemaValidator enforces it at ToolRegistry.execute time and emits invalid_args with field: <unknown> for the model.

Scalar types are intentionally lenient: integer, number, and boolean properties accept native JSON values and string-encoded equivalents ("15", "3.14", "true"/"yes"/"1"). array properties additionally accept a string that JSON-decodes to an array ("[\"a\",\"b\"]"). This matches the tool-side ArgumentCoercion helpers so local models that emit slightly off types don't bounce on the preflight when the body would coerce anyway. string, object, and enum checks remain strict, and array still rejects bare non-array strings so the model gets a clear signal.

Prefer:

  • enum for closed-set values (chartType, scope, language, …)
  • default declared in the schema for any default the implementation uses
  • Concrete examples in description strings

Special-case markers (artifact, chart)

share_artifact and render_chart carry marker-delimited blobs (---SHARED_ARTIFACT_START--- / ---CHART_START---) because the chat UI is tightly coupled to those parsers. The markers ride inside the envelope's result.text string — downstream parsers extract text from the envelope first, then scan for markers. Prefer not to add new marker-based flows; treat them as legacy.

share_artifact failure envelopes

The chat-layer wrapper differentiates four failure modes for share_artifact so the model can self-correct on the next turn instead of retrying the same path. Each maps to a specific ToolEnvelope.failure shape:

  • Path rejected (pathRejected) → kind: invalid_args, field: "path", message names the trusted root and suggests sandbox_search_files.
  • File not found (fileNotFound) → kind: execution_error, message enumerates every candidate path the resolver tried (e.g. <home>/foo.png, <home>/output/foo.png, <home>/dist/foo.png, …) so the model knows exactly where to look next.
  • Copy failed (copyFailed) → kind: execution_error, message carries the FS error string (disk full, perms) plus the source path.
  • Filename rejected (destinationRejected) → kind: invalid_args, field: "filename", asks for a plain basename.

Empty-string filler in optional fields (content: "", filename: "") is treated as absent on entry — many models pass empty placeholders for unused fields, and rejecting that as invalid_args was a footgun.

sandbox_exec background flag

Foreground (default): returns {stdout, stderr, exit_code, cwd} when the command finishes (capped by timeout, max 300s). Pass background:true to spawn a detached process — the tool returns {pid, log_file, cwd, background:true} as soon as the spawn shim returns. Manage the resulting job through sandbox_process (poll/wait/kill).

Loop tools

Three special tools drive the inline UI for the Tasks experience. The chat layer intercepts their results and renders the live to-do list, "Completed" banner, and clarifying-question prompt. They're available in every chat.

todo

Publishes or updates the plan as a markdown checklist. The list lives in the chat and ticks off as the agent works. Each call replaces the whole list, so the agent can rewrite the plan as it learns more.

complete

Ends the loop with a summary of what was done and how it was verified. Becomes a "Completed" banner in the chat. Placeholder summaries (done, ok, looks good) are rejected by the validator so the agent can't fake completion — partial completions must be honest about what was and wasn't done.

clarify

Pauses the loop and asks one critical question. Optional one-tap answer chips (options[]) let the user answer with a single click. The validator only accepts clarify calls when the question genuinely changes the outcome — agents can't use it for cosmetic preferences mid-task.

Result envelopes

All three return their payload via the text convenience (a single human-readable string). The chat-layer parsers consume the envelope's result.text to render the corresponding UI element.


Related: