API Reference

Name: Osaurus
Author: Osaurus

Osaurus provides OpenAI-compatible and Ollama-compatible APIs for seamless integration with existing tools and libraries.

Base URL

http://127.0.0.1:1337

All endpoints support common API prefixes for compatibility:

/v1/endpoint — OpenAI style
/api/endpoint — Generic style
/v1/api/endpoint — Combined style

Endpoints Overview

Endpoint	Method	Description
`/`	GET	Server status (plain text)
`/health`	GET	Health check (JSON)
`/v1/models`	GET	List available models (OpenAI)
`/api/tags`	GET	List available models (Ollama)
`/v1/chat/completions`	POST	Chat completion (OpenAI)
`/api/chat`	POST	Chat completion (Ollama)

Endpoint Details

GET /

Simple status check returning plain text.

Response:

Osaurus is running

GET /health

Health check endpoint returning JSON status.

Response:

{
  "status": "ok",
  "timestamp": "2024-03-15T10:30:45Z"
}

GET /v1/models

List all available models in OpenAI format.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "llama-3.2-3b-instruct-4bit",
      "object": "model",
      "created": 1234567890,
      "owned_by": "osaurus"
    },
    {
      "id": "foundation",
      "object": "model",
      "created": 1234567890,
      "owned_by": "apple"
    }
  ]
}

GET /api/tags

List all available models in Ollama format.

Response:

{
  "models": [
    {
      "name": "llama-3.2-3b-instruct-4bit",
      "size": 2147483648,
      "digest": "sha256:abcd1234...",
      "modified_at": "2024-03-15T10:30:45Z"
    }
  ]
}

POST /v1/chat/completions

Create a chat completion using OpenAI format.

Request Body:

{
  "model": "llama-3.2-3b-instruct-4bit",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "max_tokens": 1000,
  "temperature": 0.7,
  "top_p": 0.9,
  "stream": false,
  "tools": []
}

Parameters:

Parameter	Type	Required	Description
`model`	string	Yes	Model ID to use
`messages`	array	Yes	Array of message objects
`max_tokens`	integer	No	Maximum tokens to generate (default: 2048)
`temperature`	float	No	Sampling temperature 0-2 (default: 0.7)
`top_p`	float	No	Nucleus sampling threshold (default: 0.9)
`stream`	boolean	No	Enable SSE streaming (default: false)
`tools`	array	No	Function/tool definitions
`tool_choice`	string/object	No	Tool selection strategy

Response (Non-streaming):

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "llama-3.2-3b-instruct-4bit",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm doing well, thank you! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 15,
    "total_tokens": 40
  }
}

Response (Streaming):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"llama-3.2-3b-instruct-4bit","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"llama-3.2-3b-instruct-4bit","choices":[{"index":0,"delta":{"content":"I'm"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"llama-3.2-3b-instruct-4bit","choices":[{"index":0,"delta":{"content":" doing"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"llama-3.2-3b-instruct-4bit","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

POST /api/chat

Create a chat completion using Ollama format.

Request Body:

{
  "model": "llama-3.2-3b-instruct-4bit",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ],
  "stream": false,
  "options": {
    "temperature": 0.7,
    "top_p": 0.9,
    "num_predict": 1000
  }
}

Parameters:

Parameter	Type	Required	Description
`model`	string	Yes	Model name to use
`messages`	array	Yes	Array of message objects
`stream`	boolean	No	Enable streaming (default: false)
`options`	object	No	Model parameters

Options Object:

Option	Type	Description
`temperature`	float	Sampling temperature (0-2)
`top_p`	float	Nucleus sampling threshold
`num_predict`	integer	Max tokens to generate

Response:

{
  "model": "llama-3.2-3b-instruct-4bit",
  "created_at": "2024-03-15T10:30:45Z",
  "message": {
    "role": "assistant",
    "content": "The sky appears blue due to a phenomenon called Rayleigh scattering..."
  },
  "done": true,
  "total_duration": 1234567890,
  "load_duration": 123456789,
  "prompt_eval_duration": 12345678,
  "eval_duration": 234567890,
  "eval_count": 85
}

Function Calling

Osaurus supports OpenAI-style function calling for structured interactions.

Basic Function Call

{
  "model": "llama-3.2-3b-instruct-4bit",
  "messages": [
    {
      "role": "user",
      "content": "What's the weather in San Francisco?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather in a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Response with Tool Call

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "llama-3.2-3b-instruct-4bit",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"San Francisco, CA\",\"unit\":\"fahrenheit\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Tool Choice Options

"auto" — Model decides whether to use tools (default)
"none" — Disable tool usage
{"type": "function", "function": {"name": "function_name"}} — Force specific function

Authentication

Osaurus does not require authentication by default. When using SDK clients, you can pass any value for the API key:

client = OpenAI(
    base_url="http://127.0.0.1:1337/v1",
    api_key="not-needed"
)

Error Handling

Errors follow the OpenAI error format:

{
  "error": {
    "message": "Model not found: gpt-4",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}

Common Error Codes:

Code	Description
`model_not_found`	Requested model doesn't exist
`invalid_request`	Malformed request body
`context_length_exceeded`	Input exceeds model's context window
`rate_limit_exceeded`	Too many concurrent requests
`internal_server_error`	Server-side error

CORS Support

Osaurus includes built-in CORS support for browser-based applications:

Allowed Origins: * (all origins)
Allowed Methods: GET, POST, OPTIONS
Allowed Headers: Content-Type, Authorization

Rate Limiting

Default limits per model:

Concurrent Requests: 10
Requests per Minute: 100
Max Context Length: Model-dependent (typically 2048-4096 tokens)

SDK Examples

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:1337/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="llama-3.2-3b-instruct-4bit",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
    max_tokens=500
)

print(response.choices[0].message.content)

JavaScript/TypeScript

const response = await fetch("http://127.0.0.1:1337/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "llama-3.2-3b-instruct-4bit",
    messages: [{ role: "user", content: "Explain quantum computing" }],
    max_tokens: 500,
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

Streaming with JavaScript

const response = await fetch("http://127.0.0.1:1337/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "llama-3.2-3b-instruct-4bit",
    messages: [{ role: "user", content: "Tell me a story" }],
    stream: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split("\n");

  for (const line of lines) {
    if (line.startsWith("data: ")) {
      const data = line.slice(6);
      if (data === "[DONE]") break;

      const json = JSON.parse(data);
      const content = json.choices[0].delta.content;
      if (content) process.stdout.write(content);
    }
  }
}

Best Practices

Model Selection — Use lowercase model names with hyphens
Streaming — Enable for better perceived performance
Error Handling — Always check for error responses
Context Management — Monitor token usage to avoid limits
Connection Pooling — Reuse HTTP connections when possible

For more examples, see the Integration Guide or visit our GitHub examples.

Base URL​

Endpoints Overview​

Endpoint Details​

GET /​

GET /health​

GET /v1/models​

GET /api/tags​

POST /v1/chat/completions​

POST /api/chat​

Function Calling​

Basic Function Call​

Response with Tool Call​

Tool Choice Options​

Authentication​

Error Handling​

CORS Support​

Rate Limiting​

SDK Examples​

Python (OpenAI SDK)​

JavaScript/TypeScript​

Streaming with JavaScript​

Best Practices​

Base URL

Endpoints Overview

Endpoint Details

GET /

GET /health

GET /v1/models

GET /api/tags

POST /v1/chat/completions

POST /api/chat

Function Calling

Basic Function Call

Response with Tool Call

Tool Choice Options

Authentication

Error Handling

CORS Support

Rate Limiting

SDK Examples

Python (OpenAI SDK)

JavaScript/TypeScript

Streaming with JavaScript

Best Practices