Predictable speed and cost for billion-vector and high-QPS workloads
Billion vector-scale semantic search
With strict latency requirements
High-QPS recommendations
That need steady, predictable throughput
Mission-critical AI services
With hard SLOs
Large enterprise or multitenant platforms
That require performance isolation
Lower, more predictable cost
Hourly per-node pricing is more cost-effective than per-request pricing for sustained, high-QPS workloads and makes spend easier to forecast.
Predictable costs
Pay a predictable hourly rate for DRN instead of fluctuating costs based on the number of queries.
Easy to forecast
Tie node count directly to spend so you can model, budget, and adjust costs as traffic grows.
Efficient at high QPS
High-throughput workloads see a lower cost per query with DRN than with per-request pricing.
Predictable low-latency and high throughput at scale
DRN powers 100M-1B+ vector workloads at 100's to 1000's of QPS, delivering p50 latencies in the tens of milliseconds.
E-commerce marketplace. Recommendations. 1.4B vectors.
2.7k QPS – Unfiltered
5.7k QPS – Filtered (0.26% avg. selectivity)
Design platform. Semantic search. 135M vectors.
600 QPS
Media company. Semantic search. 480M vectors.
380 QPS
Scale for your largest workloads
DRN is built for billion-vector semantic search and high-QPS recommendation systems, so you can grow without re-architecting or migrating.
Click to scale
Add replicas to increase throughput and shards to grow storage, no reindexing or manual tuning required.
No migrations required
Pinecone moves data and adjusts read capacity behind the scenes, with no downtime or performance degradation, so you never have to plan or run migrations.

One API, two vector database modes
The combination of On-Demand and Dedicated Read Nodes powers a wide range of production workloads with the right price-performance for each.
On-Demand
Autoscaling for bursty or multi-tenant workloads with simple, usage-based pricing.
Dedicated Read Nodes
Dedicated, provisioned read nodes, a warm data path, and simple scaling with hourly per-node pricing for predictable speed and cost.
Deploy in seconds
Scale seamlessly.
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone("<API KEY>")
pc.create_index(
name=index_name,
dimension=1024,
metric="cosine",
spec=ServerlessSpec(
cloud='aws',
region='us-east-1',
read_capacity={
"mode": "Dedicated",
"dedicated": {
"node_type": "b1",
"scaling": "Manual",
"manual": {
"shards": 2,
"replicas": 2
}
}
},
)
)Start building knowledgeable AI today
Create your first index for free, then pay as you go when you're ready to scale.