Pricing Value, built for scale.

Flexible Plans. Clear Value.

No hidden fees, no surprises. Just transparent pricing designed to scale with your needs. Whether you’re just starting or expanding fast, there’s a solution for you.

Text generation Text-to-text

Model Input Output
Deepseek-V3 $2/M tokens $2/M tokens
Deepseek-R1 $3/M tokens $8/M tokens
Llama3.1 8B $0.05/M tokens $0.2/M tokens
Llama3.1 70B $1/M tokens $1.5/M tokens
Gemma2 9B $0.1/M tokens $0.3/M tokens
Gemma3 27B $0.25/M tokens $0.7/M tokens
Qwen3 32B $0.3/M tokens $1/M tokens
Qwen3 235B A22B $0.5/M tokens $2/M tokens
Qwen3 Coder 480B A35B $1/M tokens $4/M tokens
GPT-OSS 120B $0.25/M tokens $0.75/M tokens
GPT-OSS 20B $0.15/M tokens $0.3/M tokens
GLM-4.5 $1/M tokens $5/M tokens
GLM-4.5 Air $0.4/M tokens $2.5/M tokens
Image generation Text-to-image

Model Price per image Images per $1
Flux Schnell $0.003 333
Flux Dev $0.025 40
On-demand GPU Pricing

GPU type Price per GPU CPU RAM VRAM
NVIDIA B300 $9.99/hr 30 275 GB 288 GB
NVIDIA B200 $7.99/hr 30 184 GB 180 GB
NVIDIA H200 $5.99/hr 44 182 GB 141 GB
NVIDIA H100 $3.99/hr 32 185 GB 80 GB
NVIDIA A100 $1.99/hr 22 120 GB 80 GB
NVIDIA L40S $1.79/hr 20 60 GB 48 GB
On-demand CPU Pricing

CPU type vCPU RAM Price per hour
AMD EPYC 4-360 16-1440GB from $0.16
On-demand Storage Pricing

Storage type Bandwidth IOPS Price per GB
NVMe 2000 MB/s 100k $0.2/month
Questions We've got answers

Need Help? We’ve Got You.

From pricing to features. Here are the answers to your most common questions. 

How is GPU usage billed?
We offer two billing models: on-demand GPU servers are billed by minute for the time your server instance is active, while our API endpoints for text and image generation are billed respectively per tokens and per image.

For GPU servers, you pay from the moment you spin up an instance until you terminate it.
For API inference, you're charged only for successful generation requests. No ongoing server costs or idle time charges.
Do you offer volume discounts?
Yes, we provide tiered pricing depending of commitment.
Higher volume customers can also access enterprise pricing with custom rates, dedicated support, and flexible billing terms.

Contact our sales team for volume pricing above certain thresholds.
What GPU types are available and how do they affect pricing?
We offer various GPU tiers from cost-effective options for lighter workloads to high-performance GPUs for demanding applications.

Pricing varies by GPU type, you can choose the optimal GPU type based on your performance and budget requirements.
Are there any setup fees or minimum commitments?
Our pay-as-you-go model has no setup fees or minimum monthly commitments. You can start with as little as a few API calls.

However, reserved instances and enterprise plans may have minimum commitments in exchange for significant cost savings.
How do you protect my data and models?
All data is encrypted in transit and at rest using industry-standard AES-256 encryption. We implement zero-trust network architecture, and your data is never used to train our models or shared with other customers. Each customer environment is isolated with dedicated compute resources and secure API endpoints.
What enterprise features and support do you provide?
Enterprise customers receive dedicated account management, priority support with guaranteed response times, custom SLAs, and access to beta features.

We also offer cluster solutions, custom training, and integration assistance with your existing infrastructure and workflows.
How do you compare to other AI service providers?
Unlike larger providers, we specialize exclusively in AI inference with optimized infrastructure and competitive pricing. We offer more flexible deployment options, faster response times, and personalized support.

Our focus on both GPU servers and API endpoints gives you more control over your AI workloads than API-only providers.

Create your account