per node, per hour.
Nodes are fixed units of compute and memory that serve a function. A typical Pinecone service uses both data nodes and model nodes.
Data nodes keep up to 1 GB of indexed data in memory and answer vector queries. Model nodes contain and apply models to preprocess data and queries (e.g. embed) and postprocess (e.g. rank) query results.
worth of usage to start
Get started with Pinecone and use up to 10 nodes for 30 days at no charge, a $720 value. No credit card required.Get started now →
Contact us about precommitment discounts, enterprise features, and VPC deployments.
Example 1: One-Node Service
The simplest Pinecone service contains a single data node. It is already quite useful. It can index 1 Million 250-dimensional vectors and answer Top 10 Nearest Neighbor Search queries in <100ms.
Total cost per month is $0.10 × 1 node × 24 hours × 30 days = $72
Example 2: Ten-Node Production Service
This Pinecone service uses three shards, so each node gets only ⅓ of the data. It can index up to 3 Million 250-dimensional vectors. Each data node is replicated to improve resilience and double the throughput while keeping latencies low, for a total of 6 data nodes.
There are also 4 model nodes. The query embedding model is replicated to improve resilience and double throughput.
Total cost per month is $0.10 × (6 data nodes + 4 model nodes) × 24 hours × 30 days = $720.
Common Questions About Billing
Do you bill by the minute or by the hour?
Usage is billed by the minute. For example, one node service running for 30 minutes will be charged $0.05.
When does the 30-day trial start?
The trial countdown starts start when you receive the API key by email.
What cost controls are there?
For each service deployed, you control the number of nodes and the time that you run it. The time counted starts when you call
pinecone.service.deploy() and ends when you call
pinecone.service.stop(). The service is billed per minute. The number of data nodes is the number of shards times the number of replicas. The number of model nodes consists of preprocessor and postprocessor nodes, including replicas.
What happens when the free trial runs out?
If there is a credit card on file, the service will continue without interruption. Monthly billing starts when the 30-day trial ends.
If there is no credit card on file, the service will stop. Don’t worry, we will remind you 48 hours before the 30-day trial ends.
What happens if I need more than 10 nodes during the first 30 days?We will still honor the trial of 10 nodes for 30 days. You will only be charged for the number of nodes above 10. For example, if you run a 16 node service, you will be charged only for 6 of those. A credit card is required to enable this.
Common Questions About Nodes and Services
How much data can fit on a single data node?
1 GB of data including vectors, ID strings, and metadata.
What models can fit on a single model node?
A model requiring up to 3.6 GB of memory can run on a single model node.
Can Pinecone’s database be used without pre- or post-processors?
Yes, in fact this is quite common.
What performance Latency/QPS can be expected?
Without customer models in the query path you should expect
- Approximate search
- Latency: p99 <100 ms
- Throughput: 50 QPS per replica
- Exact search
- Latency: p99 <100 ms
- Throughput: 15 QPS per replica
If customers specify models in the query path the latency will increase by the time needed to apply them.
Why use replicas?
Replicas improve throughput and reliability, helping to meet specified requirements for QPS and uptime.
How many shards will an application need?
Each data node (shard) contains up to 1 GB of data. The minimum number of shards is therefore the number of GB of memory required to store IDs, vectors, and metadata.
Example calculation for number of shards for vector data with no metadata:
|Items||Dimensions||Memory for items (GB)||Minimum shards|
What recall is expected for approximate-nearest-neighbor search?
Pinecone is designed to be very accurate out of the box, without parameter tuning. Benchmarked over many different data sets, our approximate-search algorithm consistently outperforms leading open-source alternatives in accuracy, query time, and index time.
Users can also invoke exact vector search which guarantees perfect recall/accuracy. However, it is more computationally intensive and will likely result in higher latency.