Decentralized GPU. On-Demand AI.
A decentralized GPU marketplace where developers run AI inference on demand and GPU owners monetize idle hardware through a simple API.
How it works
From request to response in four steps.
Submit
Send a model inference request through the SDK or REST API with your prompt and model choice.
Route
The broker evaluates all available hosts and routes to the cheapest GPU with sufficient VRAM.
Execute
The host agent loads the model, runs inference on its GPU, and begins generating output.
Stream
Results stream back to you in real time via WebSocket. Pay only for what you use.
Run AI models with five lines of code
Use the Python SDK to run inference on any supported model. Results stream back in real time. No GPU required on your end.
from sdk.compute import ComputeClient
client = ComputeClient(url, api_key="YOUR_KEY")
job = client.run_model("gpt2", prompt="Hello")
for event in client.stream_job(job["job_id"]):
print(event["text"], end="")Earn by sharing your idle GPU
Install the host agent, register your GPU, and start earning. Auto-detection, automatic bidding, and transparent payouts.
12%
Platform fee
Auto
GPU detection
$0.12/hr
Avg. host earnings
Supported models
Run popular open-source models out of the box.
| Model | Parameters | VRAM Required | Status |
|---|---|---|---|
| GPT-2 | 124M | 2 GB | Available |
| GPT-2 Medium | 355M | 4 GB | Available |
| GPT-2 Large | 774M | 6 GB | Available |
| DistilGPT2 | 82M | 1.5 GB | Available |
| GPT-2 XL | 1.5B | 8 GB | Coming Soon |
Architecture
A simple three-layer system connecting developers to GPU providers.
Start building on Infrintia
Run AI inference on decentralized GPUs or earn by sharing your idle hardware.
