Skip to main content

Tasks

This is the part that makes Osaurus more than a chat box. When you ask the AI to do something — not just explain something — it doesn't reply with a long paragraph and stop. It writes a plan, calls the tools it needs, runs them, surfaces the results, and finishes with a verified summary.

What it looks like

When you give the agent a real task, here's what you'll see:

  • A live to-do list appears in the chat and ticks off as it works
  • Tool calls show up inline — the agent reads files, searches the web, runs a command, calls one of your plugins
  • Generated files (images, charts, reports, code) appear as artifact cards you can click, copy, or save
  • A "Completed" summary at the end with what was done and how it was verified
  • The agent only pauses to ask when a question genuinely changes the outcome — otherwise it runs straight through

Every chat in Osaurus has this capability built in. The same chat window handles a quick question or a multi-step task — there are no modes to switch.

The loop in one glance

┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐
│ user input │ ──▶ │ agent thinks │ ──▶ │ tool calls + replies │
└──────────────┘ └──────────────┘ └──────────────────────┘
▲ │
│ │
└───── todo / clarify ──┘

complete(summary)


loop ends

Three special tools drive that experience: a "todo" tool publishes the live checklist, a "clarify" tool pauses to ask one critical question, and a "complete" tool ends the run with a verified summary. You don't configure any of this — it just happens. (For the formal schemas, see Tool Contract → Loop tools.)

Power-ups: working folder and Sandbox

By default, the agent has a strong general tool kit selected automatically based on your message — web search, fetch, your installed plugins. Two toggles on the chat input bar give it more:

Power-upWhat it addsWhen to use
Working folderScoped file/search/git tools for one folderEditing code in a real repo, reorganizing a directory, summarizing a project
Sandbox (macOS 26+)Shell access in an isolated Linux VMRunning scripts, installing packages, scraping URLs, building/testing

Pick one or the other — they're mutually exclusive per chat.

Pick a working folder

Click the folder icon next to the input bar and pick a folder. The agent loads the folder's tree, manifest, and git status, and gets file tools scoped to just that folder:

ToolWhat it does
file_treeShow the folder structure (skipping the obvious noise like node_modules)
file_readRead a file (line ranges supported)
file_writeCreate or overwrite a file
file_editMake a precise edit to part of a file
file_searchFast text search across the folder
shell_runRun a shell command — for builds, installs, mv/cp/rm/mkdir (asks before running)
git_status / git_diff / git_commitWhen the folder is a git repo. git_commit asks before running.

Osaurus remembers your folder choice across launches via macOS's security-scoped bookmarks. The project's language (Swift, Node, Python, Rust, Go) is auto-detected from manifests; project-level guidance files (AGENTS.md, CLAUDE.md, .cursorrules) are loaded automatically. Paths the agent uses must stay strictly under the folder — anything outside is rejected before execution.

Every write/exec/git-mutating call is logged so you can review or undo individual operations.

Toggle the Sandbox (macOS 26+)

Toggle Sandbox on the input bar to give the agent shell access in an isolated Linux VM (Apple Containerization framework, Alpine Linux). Each agent gets its own Linux user with its own home directory.

What's available inside:

  • Full POSIX userland: shell, coreutils, find, grep, sed, awk, tar
  • Python (pip), Node.js (npm), system packages (apk)
  • Compilers and build tools as needed
  • Per-agent home at /workspace/agents/{name}/ (mounted from your Mac)

Read-only sandbox tools are always available. Write, exec, install, and secret tools require autonomous_exec enabled on the agent. Sandbox Internals →

Sharing artifacts

If the agent generates a file — image, chart, website, report, code — it surfaces it in the chat as an artifact card. The user does not see arbitrary files written to disk or to the sandbox; this card is how the result reaches the chat thread.

Artifacts are persisted under ~/.osaurus/artifacts/{session}/ and rendered inline.

Where each mode shines

You want to…Mode
Ask a question, summarize, brainstormPlain (no folder, no sandbox)
Edit code in a real repoWorking folder
Run a script, scrape a URL, install a package, build/testSandbox
Refactor across many files, then run testsWorking folder + delegate execution to your local tooling

Best practices

  • Be specific. "Add a logout button to the navbar" beats "update the UI".
  • Pick the right power-up. Working folder for code in a real repo. Sandbox for "run this", "scrape that", "install this". Neither for plain Q&A.
  • Trust the live checklist. Watch it as the agent works — you'll catch anything heading the wrong direction early.
  • Trust the "Completed" summary. If the task is partial, the agent will say so honestly — vague summaries like "done" or "looks good" are rejected.

Plugins, schedules, watchers, and the HTTP API all dispatch the same task experience. See Plugin Authoring, Schedules, Watchers, and HTTP API.

Related:

  • Sandbox Internals — VM, plugin recipes, host bridge, security
  • Tools & Plugins — what tools exist and how they're built
  • Tool Contract — the success/failure envelope every tool returns; full loop-tool schemas
  • Agentsautonomous_exec flag and per-agent settings