Skip to main content

API Reference

Osaurus provides OpenAI-compatible, Anthropic-compatible, Ollama-compatible, and MCP APIs for seamless integration with existing tools and AI agents.

Base URL

http://127.0.0.1:1337

Override the port with the OSU_PORT environment variable.

All endpoints support common API prefixes for compatibility:

  • /v1/endpoint — OpenAI style
  • /api/endpoint — Generic style
  • /v1/api/endpoint — Combined style

Endpoints Overview

Core API

EndpointMethodDescription
/GETServer status (plain text)
/healthGETHealth check (JSON)
/v1/modelsGETList available models (OpenAI)
/api/tagsGETList available models (Ollama)
/v1/chat/completionsPOSTChat completion (OpenAI)
/messagesPOSTChat completion (Anthropic)
/api/chatPOSTChat completion (Ollama)

MCP Endpoints

EndpointMethodDescription
/mcp/healthGETMCP server health
/mcp/toolsGETList available tools
/mcp/callPOSTExecute a tool

Core Endpoints

GET /

Simple status check returning plain text.

Response:

Osaurus is running

GET /health

Health check endpoint returning JSON status.

Response:

{
"status": "ok",
"timestamp": "2024-03-15T10:30:45Z"
}

GET /v1/models

List all available models in OpenAI format.

Response:

{
"object": "list",
"data": [
{
"id": "llama-3.2-3b-instruct-4bit",
"object": "model",
"created": 1234567890,
"owned_by": "osaurus"
},
{
"id": "foundation",
"object": "model",
"created": 1234567890,
"owned_by": "apple"
}
]
}

GET /api/tags

List all available models in Ollama format.

Response:

{
"models": [
{
"name": "llama-3.2-3b-instruct-4bit",
"size": 2147483648,
"digest": "sha256:abcd1234...",
"modified_at": "2024-03-15T10:30:45Z"
}
]
}

POST /v1/chat/completions

Create a chat completion using OpenAI format.

Request Body:

{
"model": "llama-3.2-3b-instruct-4bit",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, how are you?"
}
],
"max_tokens": 1000,
"temperature": 0.7,
"top_p": 0.9,
"stream": false,
"tools": []
}

Parameters:

ParameterTypeRequiredDescription
modelstringYesModel ID to use
messagesarrayYesArray of message objects
max_tokensintegerNoMaximum tokens to generate (default: 2048)
temperaturefloatNoSampling temperature 0-2 (default: 0.7)
top_pfloatNoNucleus sampling threshold (default: 0.9)
streambooleanNoEnable SSE streaming (default: false)
toolsarrayNoFunction/tool definitions
tool_choicestring/objectNoTool selection strategy

Response (Non-streaming):

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1234567890,
"model": "llama-3.2-3b-instruct-4bit",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'm doing well, thank you! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 15,
"total_tokens": 40
}
}

Response (Streaming):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"I'm"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" doing"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

POST /api/chat

Create a chat completion using Ollama format.

Request Body:

{
"model": "llama-3.2-3b-instruct-4bit",
"messages": [
{
"role": "user",
"content": "Why is the sky blue?"
}
],
"stream": false,
"options": {
"temperature": 0.7,
"top_p": 0.9,
"num_predict": 1000
}
}

Response:

{
"model": "llama-3.2-3b-instruct-4bit",
"created_at": "2024-03-15T10:30:45Z",
"message": {
"role": "assistant",
"content": "The sky appears blue due to Rayleigh scattering..."
},
"done": true,
"total_duration": 1234567890,
"eval_count": 85
}

POST /messages

Create a chat completion using Anthropic format. This endpoint is compatible with the Anthropic Claude API.

Request Body:

{
"model": "llama-3.2-3b-instruct-4bit",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"system": "You are a helpful assistant.",
"stream": false
}

Parameters:

ParameterTypeRequiredDescription
modelstringYesModel ID to use
messagesarrayYesArray of message objects
max_tokensintegerYesMaximum tokens to generate
systemstringNoSystem prompt (Anthropic style)
temperaturefloatNoSampling temperature 0-1 (default: 1.0)
top_pfloatNoNucleus sampling threshold
top_kintegerNoTop-k sampling
streambooleanNoEnable SSE streaming (default: false)
stop_sequencesarrayNoSequences that stop generation

Response (Non-streaming):

{
"id": "msg_123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "I'm doing well, thank you! How can I help you today?"
}
],
"model": "llama-3.2-3b-instruct-4bit",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 25,
"output_tokens": 15
}
}

Response (Streaming):

When stream: true, responses are sent as Server-Sent Events:

event: message_start
data: {"type":"message_start","message":{"id":"msg_123","type":"message","role":"assistant","content":[],"model":"llama-3.2-3b-instruct-4bit"}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"I'm"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" doing"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":15}}

event: message_stop
data: {"type":"message_stop"}

Example with Python (Anthropic SDK):

import anthropic

client = anthropic.Anthropic(
base_url="http://127.0.0.1:1337",
api_key="osaurus" # Any value works
)

message = client.messages.create(
model="llama-3.2-3b-instruct-4bit",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello!"}
]
)

print(message.content[0].text)

Example with cURL:

curl http://127.0.0.1:1337/messages \
-H "Content-Type: application/json" \
-H "x-api-key: osaurus" \
-d '{
"model": "llama-3.2-3b-instruct-4bit",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'

MCP Endpoints

GET /mcp/health

Check MCP server availability.

Response:

{
"status": "ok",
"tools_available": 12
}

GET /mcp/tools

List all available MCP tools from installed plugins.

Response:

{
"tools": [
{
"name": "read_file",
"description": "Read contents of a file",
"inputSchema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Path to the file"
}
},
"required": ["path"]
}
},
{
"name": "browser_navigate",
"description": "Navigate to a URL in the browser",
"inputSchema": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "URL to navigate to"
}
},
"required": ["url"]
}
}
]
}

POST /mcp/call

Execute an MCP tool.

Request Body:

{
"name": "read_file",
"arguments": {
"path": "/etc/hosts"
}
}

Response:

{
"content": [
{
"type": "text",
"text": "# Host Database\n127.0.0.1 localhost\n..."
}
]
}

Error Response:

{
"error": {
"code": "tool_not_found",
"message": "Tool 'unknown_tool' not found"
}
}

Function Calling

Osaurus supports OpenAI-style function calling for structured interactions.

Defining Tools

{
"model": "llama-3.2-3b-instruct-4bit",
"messages": [
{"role": "user", "content": "What's the weather in San Francisco?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}
]
}

Response with Tool Call

{
"id": "chatcmpl-123",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"San Francisco, CA\",\"unit\":\"fahrenheit\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
]
}

Tool Choice Options

  • "auto" — Model decides whether to use tools (default)
  • "none" — Disable tool usage
  • {"type": "function", "function": {"name": "function_name"}} — Force specific function

Authentication

Osaurus does not require authentication by default. When using SDK clients, pass any value for the API key:

client = OpenAI(
base_url="http://127.0.0.1:1337/v1",
api_key="osaurus" # Any value works
)

Error Handling

Errors follow the OpenAI error format:

{
"error": {
"message": "Model not found: gpt-4",
"type": "invalid_request_error",
"code": "model_not_found"
}
}

Common Error Codes:

CodeDescription
model_not_foundRequested model doesn't exist
invalid_requestMalformed request body
context_length_exceededInput exceeds model's context window
tool_not_foundMCP tool not installed
internal_server_errorServer-side error

CORS Support

Built-in CORS support for browser-based applications:

  • Allowed Origins: * (all origins)
  • Allowed Methods: GET, POST, OPTIONS
  • Allowed Headers: Content-Type, Authorization

Quick Examples

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:1337/v1", api_key="osaurus")

response = client.chat.completions.create(
model="llama-3.2-3b-instruct-4bit",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

cURL

curl http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-3b-instruct-4bit",
"messages": [{"role": "user", "content": "Hello!"}]
}'

MCP Tool Call

curl -X POST http://127.0.0.1:1337/mcp/call \
-H "Content-Type: application/json" \
-d '{
"name": "current_time",
"arguments": {}
}'

For more examples, see the SDK Examples or Integration Guide.