LLMTune APIBETA

Build on LLMTune's inference & fine-tuning platform

OpenAI-compatible endpoints backed by IO.net compute. Use the LLMTune API to launch inference, progressive fine-tunes, and webhook-driven deployments with a single key.

Quick start

Authenticate, choose a model, and stream your first completion.

1. Create an API key

Go to API Keys in the dashboard. Keys grant full workspace access—store them securely.

2. Make your first call

POST /v1/models/{modelId}/inference accepts OpenAI-style payloads and returns text, tokens, and latency metrics.
curl https://api.llmtune.io/v1/models/meta-llama/Llama-3.3-70B-Instruct/inference \
  -H "Authorization: Bearer sk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Summarize LLMTune in one sentence.",
    "temperature": 0.6,
    "maxTokens": 200
  }'

Authentication

Every request requires a Bearer token header.

Headers

Include your secret key on each request. Rotate keys from the dashboard when compromised.
Authorization: Bearer sk_live_your_key
Never embed secret keys in client apps. Proxy requests through your backend.

REST reference

Primary endpoints for inference and batch jobs.

POST/v1/models/{modelId}/inference

Run inference against a deployed LLMTune or IO.net hosted model.

Request body

{
  "prompt": "string · required",
  "temperature": 0.7,
  "maxTokens": 1024,
  "topP": 1.0,
  "metadata": { "conversationId": "optional" }
}

Response

{
  "text": "Generated response...",
  "tokens": 228,
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "latency": 234,
  "metadata": { "conversationId": "optional" }
}
POST/v1/playground/inference

Mirror of the UI playground. Useful for quick smoke tests.

Request body

{
  "modelId": "meta-llama/Llama-3.3-70B-Instruct",
  "prompt": "string · required",
  "temperature": 0.7,
  "maxTokens": 800
}
POST/v1/batch/inference

Submit up to 100 inference jobs per call. Optionally provide a webhook for async results.

Request body

{
  "modelId": "meta-llama/Llama-3.3-70B-Instruct",
  "requests": [
    { "id": "req-1", "prompt": "First prompt" },
    { "id": "req-2", "prompt": "Second prompt" }
  ],
  "webhookUrl": "https://app.yourdomain.com/batch-callback"
}

Response

{
  "batchId": "batch-uuid",
  "status": "queued",
  "summary": { "total": 2, "accepted": 2 }
}

Fine-tuning API

Launch and monitor fine-tunes with LLMTune + IO.net compute.

POST/v1/fine-tune

Submit a fine-tuning job. Datasets can be HTTPS, S3, GCS or LLMTune uploads.

Request body

{
  "baseModel": "meta-llama/Llama-3.3-70B-Instruct",
  "dataset": "s3://your-bucket/dataset.jsonl",
  "trainingMethod": "sft",
  "hyperparameters": {
    "learningRate": 0.0001,
    "epochs": 3,
    "batchSize": 64
  },
  "webhookUrl": "https://app.yourdomain.com/fine-tune-callback"
}
GET/v1/fine-tune/{jobId}

Retrieve job status, metrics, and errors.

Response

{
  "id": "job-uuid",
  "status": "training",
  "progress": 42.3,
  "epochs": { "current": 1, "total": 3 },
  "metrics": {
    "loss": 1.28,
    "evalLoss": 1.02,
    "tokensPerSecond": 178
  },
  "createdAt": "2025-11-01T15:23:54Z"
}

Webhooks

Receive lifecycle events for fine-tunes and deployments.

Event payload

POST https://your-app.com/webhooks/llmtune

{
  "event": "training.completed",
  "data": {
    "jobId": "job-uuid",
    "baseModel": "meta-llama/Llama-3.3-70B-Instruct",
    "artifact": "ionet-finetune-model-id"
  },
  "deliveredAt": "2025-11-08T14:22:31Z"
}

Events: training.started training.completed training.failed model.deployed

Error handling

Standard error schema is returned for all unsuccessful responses.

401 Unauthorized

Invalid or missing API key.

402 Payment Required

Insufficient credits. Top up your balance.

404 Not Found

Model or job ID not found.

429 Rate Limited

Slow down or upgrade your plan.

500 Server Error

Unexpected issue. Retry with exponential backoff.

Rate limits

Plans scale usage, concurrency, and batch sizes.

Sandbox

$0

  • 1,000 requests/month
  • Up to 3 fine-tunes
  • Shared IO.net burst pool

Growth

$0.001 / request

  • Unlimited requests
  • Priority fine-tune queue
  • Dedicated GPU slices

Scale

Contact

  • SLA-backed latency
  • Private fleet on IO.net
  • Enterprise observability