Quick start
Authenticate, choose a model, and stream your first completion.
1. Create an API key
2. Make your first call
curl https://api.llmtune.io/v1/models/meta-llama/Llama-3.3-70B-Instruct/inference \
-H "Authorization: Bearer sk_live_your_key" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Summarize LLMTune in one sentence.",
"temperature": 0.6,
"maxTokens": 200
}'Authentication
Every request requires a Bearer token header.
Headers
Authorization: Bearer sk_live_your_keyREST reference
Primary endpoints for inference and batch jobs.
/v1/models/{modelId}/inferenceRun inference against a deployed LLMTune or IO.net hosted model.
Request body
{
"prompt": "string · required",
"temperature": 0.7,
"maxTokens": 1024,
"topP": 1.0,
"metadata": { "conversationId": "optional" }
}Response
{
"text": "Generated response...",
"tokens": 228,
"model": "meta-llama/Llama-3.3-70B-Instruct",
"latency": 234,
"metadata": { "conversationId": "optional" }
}/v1/playground/inferenceMirror of the UI playground. Useful for quick smoke tests.
Request body
{
"modelId": "meta-llama/Llama-3.3-70B-Instruct",
"prompt": "string · required",
"temperature": 0.7,
"maxTokens": 800
}/v1/batch/inferenceSubmit up to 100 inference jobs per call. Optionally provide a webhook for async results.
Request body
{
"modelId": "meta-llama/Llama-3.3-70B-Instruct",
"requests": [
{ "id": "req-1", "prompt": "First prompt" },
{ "id": "req-2", "prompt": "Second prompt" }
],
"webhookUrl": "https://app.yourdomain.com/batch-callback"
}Response
{
"batchId": "batch-uuid",
"status": "queued",
"summary": { "total": 2, "accepted": 2 }
}Fine-tuning API
Launch and monitor fine-tunes with LLMTune + IO.net compute.
/v1/fine-tuneSubmit a fine-tuning job. Datasets can be HTTPS, S3, GCS or LLMTune uploads.
Request body
{
"baseModel": "meta-llama/Llama-3.3-70B-Instruct",
"dataset": "s3://your-bucket/dataset.jsonl",
"trainingMethod": "sft",
"hyperparameters": {
"learningRate": 0.0001,
"epochs": 3,
"batchSize": 64
},
"webhookUrl": "https://app.yourdomain.com/fine-tune-callback"
}/v1/fine-tune/{jobId}Retrieve job status, metrics, and errors.
Response
{
"id": "job-uuid",
"status": "training",
"progress": 42.3,
"epochs": { "current": 1, "total": 3 },
"metrics": {
"loss": 1.28,
"evalLoss": 1.02,
"tokensPerSecond": 178
},
"createdAt": "2025-11-01T15:23:54Z"
}Webhooks
Receive lifecycle events for fine-tunes and deployments.
Event payload
POST https://your-app.com/webhooks/llmtune
{
"event": "training.completed",
"data": {
"jobId": "job-uuid",
"baseModel": "meta-llama/Llama-3.3-70B-Instruct",
"artifact": "ionet-finetune-model-id"
},
"deliveredAt": "2025-11-08T14:22:31Z"
}Events: training.started training.completed training.failed model.deployed
Error handling
Standard error schema is returned for all unsuccessful responses.
Invalid or missing API key.
Insufficient credits. Top up your balance.
Model or job ID not found.
Slow down or upgrade your plan.
Unexpected issue. Retry with exponential backoff.
Rate limits
Plans scale usage, concurrency, and batch sizes.
Sandbox
$0
- 1,000 requests/month
- Up to 3 fine-tunes
- Shared IO.net burst pool
Growth
$0.001 / request
- Unlimited requests
- Priority fine-tune queue
- Dedicated GPU slices
Scale
Contact
- SLA-backed latency
- Private fleet on IO.net
- Enterprise observability