Start for free, then pay only for what you use.

Pay $0.10
per node, per hour.

Nodes are fixed units of compute and memory that serve a function. A typical Pinecone service uses both data nodes and model nodes.

Data nodes keep up to 1 GB of indexed data in memory and answer vector queries. Model nodes contain and apply models to preprocess data and queries (e.g. embed) and postprocess (e.g. rank) query results.

Get $720
worth of usage to start

Get started with Pinecone and use up to 10 nodes for 30 days at no charge, a $720 value. No credit card required.

Get started now →
Need more?

Contact us about precommitment discounts, enterprise features, and VPC deployments.


Example 1: One-Node Service

The simplest Pinecone service contains a single data node. It is already quite useful. It can index 1 Million 250-dimensional vectors and answer Top 10 Nearest Neighbor Search queries in <100ms.

Total cost per month is $0.10 × 1 node × 24 hours × 30 days = $72

Pricing example with one-node service
Example 2: Ten-Node Production Service

This Pinecone service uses three shards, so each node gets only ⅓ of the data. It can index up to 3 Million 250-dimensional vectors. Each data node is replicated to improve resilience and double the throughput while keeping latencies low, for a total of 6 data nodes.

There are also 4 model nodes. The query embedding model is replicated to improve resilience and double throughput.

Total cost per month is $0.10 × (6 data nodes + 4 model nodes) × 24 hours × 30 days = $720.

Pricing example with ten-node service

Common Questions About Billing

Do you bill by the minute or by the hour?

Usage is billed by the minute. For example, one node service running for 30 minutes will be charged $0.05.

When does the 30-day trial start?

The trial countdown starts start when you receive the API key by email.

What cost controls are there?

For each service deployed, you control the number of nodes and the time that you run it. The time counted starts when you call pinecone.service.deploy() and ends when you call pinecone.service.stop(). The service is billed per minute. The number of data nodes is the number of shards times the number of replicas. The number of model nodes consists of preprocessor and postprocessor nodes, including replicas.

What happens when the free trial runs out?

If there is a credit card on file, the service will continue without interruption. Monthly billing starts when the 30-day trial ends.

If there is no credit card on file, the service will stop. Don’t worry, we will remind you 48 hours before the 30-day trial ends.

What happens if I need more than 10 nodes during the first 30 days?
We will still honor the trial of 10 nodes for 30 days. You will only be charged for the number of nodes above 10. For example, if you run a 16 node service, you will be charged only for 6 of those. A credit card is required to enable this.

Common Questions About Nodes and Services

How much data can fit on a single data node?

1 GB of data including vectors, ID strings, and metadata.

What models can fit on a single model node?

A model requiring up to 3.6 GB of memory can run on a single model node.

Can Pinecone’s database be used without pre- or post-processors?

Yes, in fact this is quite common.

What performance Latency/QPS can be expected?

Without customer models in the query path you should expect

  • Approximate search
    • Latency: p99 <100 ms
    • Throughput: 50 QPS per replica
  • Exact search
    • Latency: p99 <100 ms
    • Throughput: 15 QPS per replica

If customers specify models in the query path the latency will increase by the time needed to apply them.

Why use replicas?

Replicas improve throughput and reliability, helping to meet specified requirements for QPS and uptime.

How many shards will an application need?

Each data node (shard) contains up to 1 GB of data. The minimum number of shards is therefore the number of GB of memory required to store IDs, vectors, and metadata.

Example calculation for number of shards for vector data with no metadata:

ItemsDimensionsMemory for items (GB)Minimum shards
What recall is expected for approximate-nearest-neighbor search?

Pinecone is designed to be very accurate out of the box, without parameter tuning. Benchmarked over many different data sets, our approximate-search algorithm consistently outperforms leading open-source alternatives in accuracy, query time, and index time.

Users can also invoke exact vector search which guarantees perfect recall/accuracy. However, it is more computationally intensive and will likely result in higher latency.

What will you build?

Get started or contact us for a customized demo.