Pinecone Dedicated Read Nodes are in Public Preview: Predictable speed and cost for billion-vector and high-QPS workloads - Learn more
Pinecone Dedicated Read Nodes

Predictable speed and cost for billion-vector and high-QPS workloads

Billion vector-scale semantic search

With strict latency requirements

High-QPS recommendations

That need steady, predictable throughput

Mission-critical AI services

With hard SLOs

Large enterprise or multitenant platforms

That require performance isolation

Lower, more predictable cost

Hourly per-node pricing is more cost-effective than per-request pricing for sustained, high-QPS workloads and makes spend easier to forecast.

Predictable costs

Pay a predictable hourly rate for DRN instead of fluctuating costs based on the number of queries.

Easy to forecast

Tie node count directly to spend so you can model, budget, and adjust costs as traffic grows.

Efficient at high QPS

High-throughput workloads see a lower cost per query with DRN than with per-request pricing.

Predictable low-latency and high throughput at scale

DRN powers 100M-1B+ vector workloads at 100's to 1000's of QPS, delivering p50 latencies in the tens of milliseconds.

E-commerce marketplace. Recommendations. 1.4B vectors.

2.7k QPS – Unfiltered

p50
60ms
p99
100ms

5.7k QPS – Filtered (0.26% avg. selectivity)

p50
26ms
p99
60ms

Design platform. Semantic search. 135M vectors.

600 QPS

p50
45ms
p99
96ms

Media company. Semantic search. 480M vectors.

380 QPS

p50
80ms
p99
170ms

Scale for your largest workloads

DRN is built for billion-vector semantic search and high-QPS recommendation systems, so you can grow without re-architecting or migrating.

Click to scale

Add replicas to increase throughput and shards to grow storage, no reindexing or manual tuning required.

No migrations required

Pinecone moves data and adjusts read capacity behind the scenes, with no downtime or performance degradation, so you never have to plan or run migrations.

Clicking to scale DRN

One API, two vector database modes

The combination of On-Demand and Dedicated Read Nodes powers a wide range of production workloads with the right price-performance for each.

On-Demand

Autoscaling for bursty or multi-tenant workloads with simple, usage-based pricing.

Dedicated Read Nodes

Dedicated, provisioned read nodes, a warm data path, and simple scaling with hourly per-node pricing for predictable speed and cost.

Deploy in seconds

Scale seamlessly.

search/pinecone.py
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone("<API KEY>")

pc.create_index(
    name=index_name,
    dimension=1024,
    metric="cosine",
    spec=ServerlessSpec(
        cloud='aws', 
        region='us-east-1',
        read_capacity={
            "mode": "Dedicated",
            "dedicated": {
                "node_type": "b1",
                "scaling": "Manual",
                "manual": {
                    "shards": 2,
                    "replicas": 2
                }
            }
        },
    )
)

Start building knowledgeable AI today

Create your first index for free, then pay as you go when you're ready to scale.