Pricing
Value, built for scale.
Flexible Plans. Clear Value.
No hidden fees, no surprises. Just transparent pricing designed to scale with your needs. Whether you’re just starting or expanding fast, there’s a solution for you.
Text generation
Text-to-text
| Model | Input | Output |
|---|---|---|
| Deepseek-V3 | $2/M tokens | $2/M tokens |
| Deepseek-R1 | $3/M tokens | $8/M tokens |
| Llama3.1 8B | $0.05/M tokens | $0.2/M tokens |
| Llama3.1 70B | $1/M tokens | $1.5/M tokens |
| Gemma2 9B | $0.1/M tokens | $0.3/M tokens |
| Gemma3 27B | $0.25/M tokens | $0.7/M tokens |
| Qwen3 32B | $0.3/M tokens | $1/M tokens |
| Qwen3 235B A22B | $0.5/M tokens | $2/M tokens |
| Qwen3 Coder 480B A35B | $1/M tokens | $4/M tokens |
| GPT-OSS 120B | $0.25/M tokens | $0.75/M tokens |
| GPT-OSS 20B | $0.15/M tokens | $0.3/M tokens |
| GLM-4.5 | $1/M tokens | $5/M tokens |
| GLM-4.5 Air | $0.4/M tokens | $2.5/M tokens |
Image generation
Text-to-image
| Model | Price per image | Images per $1 |
|---|---|---|
| Flux Schnell | $0.003 | 333 |
| Flux Dev | $0.025 | 40 |
On-demand
GPU Pricing
| GPU type | Price per GPU | CPU | RAM | VRAM |
|---|---|---|---|---|
| NVIDIA B300 | $9.99/hr | 30 | 275 GB | 288 GB |
| NVIDIA B200 | $7.99/hr | 30 | 184 GB | 180 GB |
| NVIDIA H200 | $5.99/hr | 44 | 182 GB | 141 GB |
| NVIDIA H100 | $3.99/hr | 32 | 185 GB | 80 GB |
| NVIDIA A100 | $1.99/hr | 22 | 120 GB | 80 GB |
| NVIDIA L40S | $1.79/hr | 20 | 60 GB | 48 GB |
On-demand
CPU Pricing
| CPU type | vCPU | RAM | Price per hour |
|---|---|---|---|
| AMD EPYC | 4-360 | 16-1440GB | from $0.16 |
On-demand
Storage Pricing
| Storage type | Bandwidth | IOPS | Price per GB |
|---|---|---|---|
| NVMe | 2000 MB/s | 100k | $0.2/month |
Questions
We've got answers
Need Help? We’ve Got You.
From pricing to features. Here are the answers to your most common questions.
How is GPU usage billed?
We offer two billing models: on-demand GPU servers are billed by minute for the time your server instance is active, while our API endpoints for text and image generation are billed respectively per tokens and per image.
For GPU servers, you pay from the moment you spin up an instance until you terminate it.
For API inference, you're charged only for successful generation requests. No ongoing server costs or idle time charges.
For GPU servers, you pay from the moment you spin up an instance until you terminate it.
For API inference, you're charged only for successful generation requests. No ongoing server costs or idle time charges.
Do you offer volume discounts?
Yes, we provide tiered pricing depending of commitment.
Higher volume customers can also access enterprise pricing with custom rates, dedicated support, and flexible billing terms.
Contact our sales team for volume pricing above certain thresholds.
Higher volume customers can also access enterprise pricing with custom rates, dedicated support, and flexible billing terms.
Contact our sales team for volume pricing above certain thresholds.
What GPU types are available and how do they affect pricing?
We offer various GPU tiers from cost-effective options for lighter workloads to high-performance GPUs for demanding applications.
Pricing varies by GPU type, you can choose the optimal GPU type based on your performance and budget requirements.
Pricing varies by GPU type, you can choose the optimal GPU type based on your performance and budget requirements.
Are there any setup fees or minimum commitments?
Our pay-as-you-go model has no setup fees or minimum monthly commitments. You can start with as little as a few API calls.
However, reserved instances and enterprise plans may have minimum commitments in exchange for significant cost savings.
However, reserved instances and enterprise plans may have minimum commitments in exchange for significant cost savings.
How do you protect my data and models?
All data is encrypted in transit and at rest using industry-standard AES-256 encryption. We implement zero-trust network architecture, and your data is never used to train our models or shared with other customers. Each customer environment is isolated with dedicated compute resources and secure API endpoints.
What enterprise features and support do you provide?
Enterprise customers receive dedicated account management, priority support with guaranteed response times, custom SLAs, and access to beta features.
We also offer cluster solutions, custom training, and integration assistance with your existing infrastructure and workflows.
We also offer cluster solutions, custom training, and integration assistance with your existing infrastructure and workflows.
How do you compare to other AI service providers?
Unlike larger providers, we specialize exclusively in AI inference with optimized infrastructure and competitive pricing. We offer more flexible deployment options, faster response times, and personalized support.
Our focus on both GPU servers and API endpoints gives you more control over your AI workloads than API-only providers.
Our focus on both GPU servers and API endpoints gives you more control over your AI workloads than API-only providers.

