Prices do not include taxes by default. If you are a business registered in EU or an individual, please contact us so we can apply the correct VAT to your subscription.
Use all the pre-trained models. You are invoiced after the fact, based on usage.
• No fixed cost: pay only if you consume
• Automatically get a $15 FREE credit
• All our pre-trained models are available
• Asynchronous mode included
• Monitor your usage in your dashboard
• Parallel requests: 10 (can be increased)
On CPU: $0.003 per request
On GPU: $0.005 per request
ChatDolphin/Yi-34B/Mixtral-8x7B: + $0.0005 per 1K tokens
LLaMA 3.1 405B and Fine-tuned LLaMA 3 70B: + $0.0018 per 1K tokens
Whisper: + $0.0001 per second (duration of your audio or video file)
Speech T5: + $0.0006 per 1K tokens
Pre-paid plans are the most cost-effective solutions if you plan to make an important volume of requests on our pre-trained AI models.
The cost is fixed and paid up-front at the beginning of the month. There is no variable cost based on usage (as opposed to our pay-as-you-go plan).
All the pre-paid plans can be stopped anytime. You only pay for the time you use the service. The invoiced amount is automatically prorated. So in case of a downgrade, you will get a discount on your next invoice. The only exception is the On-Premise plan (this plan is not prorated).
The rate limit is in "requests per minute" by default, but you can ask us to change this to "per hour" or "per day". This change is free of charge on the Enterprise plan and above.
On a GPU, AI models are around 10X faster on average.
Not sure which plan is best for you? Ask our support team!
Free | Starter | Full | Enterprise | Starter GPU | Full GPU | Enterprise GPU | Large Language Models | |
---|---|---|---|---|---|---|---|---|
Price | $0 | $29/month ($1/day) |
$59/month ($2/day) |
$229/month ($7/day) |
$99/month ($3.5/day) |
$199/month ($7/day) |
$699/month ($23/day) |
$2,499/month ($80/day) |
Parallel Requests | 2 | 10 | 20 | 40 | 10 | 20 | 40 | 50 |
Whisper on GPU (requests per minute) | Variable | 1 | 3 | 10 | 50 | |||
LLaMA 3.1 405B and Fine-tuned LLaMA 3 70B on GPU (requests per minute) | Variable | 3 | 10 | 70 | 350 | |||
Dolphin/ChatDolphin/Yi-34B/Mixtral-8x7B on GPU (requests per minute) | Variable | 10 | 30 | 200 | 1000 | |||
Speech T5 on GPU (requests per minute) | Variable | 10 | 30 | 200 | 1000 | |||
Bart Large CNN on GPU (requests per minute) | Variable | 10 | 30 | 200 | 1000 | |||
Bart Large CNN on CPU (requests per minute) | Variable | 10 | 30 | 200 | ||||
Bart Large MNLI Yahoo Answers on GPU (requests per minute) | Variable | 10 | 30 | 200 | 1000 | |||
Bart Large MNLI Yahoo Answers on CPU (requests per minute) | Variable | 10 | 30 | 200 | ||||
XLM Roberta Large XNLI on GPU (requests per minute) | Variable | 10 | 30 | 200 | 1000 | |||
XLM Roberta Large XNLI on CPU (requests per minute) | Variable | 10 | 30 | 200 | ||||
T5 Base EN Generate Headlines on GPU (requests per minute) | Variable | 10 | 30 | 200 | 1000 | |||
T5 Base EN Generate Headlines on CPU (requests per minute) | Variable | 10 | 30 | 200 | ||||
Distilbert Base SST 2 on GPU (requests per minute) | Variable | 10 | 30 | 200 | 1000 | |||
Distilbert Base SST 2 on CPU (requests per minute) | Variable | 10 | 30 | 200 | ||||
Distilbert Base Emotion on GPU (requests per minute) | Variable | 10 | 30 | 200 | 1000 | |||
Distilbert Base Emotion on CPU (requests per minute) | Variable | 10 | 30 | 200 | ||||
NLLB 200 3.3B on GPU (requests per minute) | Variable | 10 | 30 | 200 | 1000 | |||
NLLB 200 3.3B on CPU (requests per minute) | Variable | 10 | 30 | 200 | ||||
Paraphrase Multilingual Mpnet Base V2 on GPU (requests per minute) | Variable | 10 | 30 | 200 | 1000 | |||
Paraphrase Multilingual Mpnet Base V2 on CPU (requests per minute) | Variable | 10 | 30 | 200 | ||||
SpaCy on CPU (requests per minute) | Variable | 10 | 30 | 200 |
• Train/fine-tune your own ChatDolphin/LLaMA 3 70B/Mixtral-8x7B model
• Automatically deploy your model to a basic dedicated GPU server
• Maximum context size in each request: 1024 tokens
• Parallel requests: 5
• Response time: 3 seconds per 50 generated tokens
• Price is fixed, no matter the size of your dataset or the number of requests you are making
• Pay-as-you-go on all the pre-trained models
$399 / month ($13 / day) for 1 deployed model on 1 dedicated server
(3 fine-tunings per month included for free, then + $19 per fine-tuning)
(+ $379 per additional server or model)
• Train/fine-tune your own ChatDolphin/LLaMA 3 70B/Mixtral-8x7B model
• Automatically deploy your model to a cutting-edge dedicated GPU server
• Maximum context size in each request: 16,384 tokens
• Parallel requests: 20
• Response time: 1 second per 50 generated tokens
• Price is fixed, no matter the size of your dataset or the number of requests you are making
• Pay-as-you-go on all the pre-trained models
$990 / month ($33 / day) for 1 deployed model on 1 dedicated server
(3 fine-tunings per month included for free, then + $19 per fine-tuning)
(+ $890 per additional server or model)
• Fine-tune your own semantic search model
• Automatically deploy your model to a GPU server
• Response time: 1 second
• Price is fixed, no matter the size of your dataset or the number of requests you are making
• Maximum dataset size: 1 million examples
• Pay-as-you-go on all the pre-trained models
$299 / month ($10 / day) for 1 deployed model on 1 dedicated server
(3 fine-tunings per month included for free, then + $19 per fine-tuning)
(+ $279 per additional server or model)
• Choose a specific continent or country
• Many regions available (US, France, Germany, Asia, Middle-East, and more)
+ $249 / month ($8 / day).
Please contact
us.
• Choose a specific cloud provider
• Many cloud providers available (AWS, GCP, OVH, Scaleway, and more)
+ $249 / month ($8 / day).
Please contact
us.
• Deploy models in-house within your own infrastructure
• No data is sent to the NLP Cloud servers (no internet connection required)
• Suited for sensitive data (e.g. medical applications, financial applications...)
• You can fine-tune your own model on NLP Cloud and then deploy it on-premise
• A 1h consultancy session is automatically included
$649 / month (not prorated).
Please
contact us.
Not sure how to start your next natural language processing project, how to deal with MLOps, or how to make the most of these new AI models? We have highly skilled AI experts who will be happy to help you and provide trainings!
$200 / hour.
Please contact us.
Do you have an awesome AI project but you are lacking the technical skills to achieve it? Our technical experts can work on integrating natural language processing into your application!
$200 / hour.
Please contact us.
Many more plans can be created for you: a custom number of requests per minute, a mix of pre-trained and custom models, a specific plan for large language models, a rate limiting per hour instead of per minute, and much more! Just let us know.
Plans can also be paid in other currencies. Please ask us for more information if needed.