Pinecone's pod-based architecture is legacy infrastructure. The current Pinecone architecture is serverless, built on object storage with decoupled storage and compute. This page explains the differences, why serverless is the recommended path forward, and how to migrate.
If you're starting a new project, use serverless. It's the current architecture, it's what Pinecone actively develops and optimizes, and it eliminates the operational overhead of managing pods.
If you have existing pod-based indexes, Pinecone provides migration paths to serverless. The rest of this page explains why you should migrate and how the two architectures differ.
Pods were Pinecone's original deployment model. A pod was a pre-configured unit of compute and storage β essentially a fixed-size virtual machine running a Pinecone index. You chose a pod type (s1, p1, p2) based on your performance and storage needs, and you provisioned a specific number of pods.
Pods had several limitations inherent to their architecture:
If you needed more storage, you also paid for more compute (and vice versa). Scaling required provisioning more pods, even if only one dimension (storage or throughput) was the bottleneck.
You provisioned a specific number of pods and paid for them whether they were fully utilized or idle. Handling traffic spikes meant over-provisioning.
Adding capacity required explicit pod count changes. There was no elastic auto-scaling.
Each pod type had a maximum vector count. Scaling beyond that required adding pods and re-distributing data.
Indexing algorithms were fixed at pod creation time. Upgrading algorithms required manual re-indexing.
Pinecone's serverless architecture (launched as the current and recommended architecture) fundamentally rethinks how vector data is stored and queried. Instead of coupling compute and storage in fixed pods, serverless separates them:
Vectors are stored in immutable files called slabs on object storage (e.g., Amazon S3). Data is durable, distributed, and decoupled from the compute layer.
A fleet of stateless query executors caches slabs on local SSDs and processes queries in parallel. The executor pool scales dynamically based on demand.
An asynchronous index builder processes writes into the slab structure, using a Write-Ahead Log for durability and a memtable for immediate query availability.
This is not a minor upgrade β it is a fundamentally different architecture. For a deep technical explanation, see .
A detailed comparison of the pod-based (legacy) and serverless (current) architectures across key capabilities.
| Capability | Pods (Legacy) | Serverless (Current) |
|---|---|---|
| Architecture | Coupled compute + storage in fixed VMs | Decoupled: object storage + stateless compute |
| Scaling | Manual pod provisioning | Automatic elastic scaling (On-Demand) or replica-based (DRN) |
| Write latency | Varies by pod type and load | Under 100ms acknowledgment |
| Write-to-query freshness | Seconds to minutes depending on configuration | Seconds (via memtable) |
| Indexing algorithm | Fixed at creation time | Dynamically chosen per-slab; upgraded transparently |
| Quantization | User-managed (or pod defaults) | Automatic, optimized during background compaction |
| Metadata filtering | Supported, can degrade performance | Accelerates queries via roaring bitmaps and adaptive pre/mid-filtering |
| Re-indexing required | Yes, for algorithm changes or major updates | Never β compaction handles optimization transparently |
| Storage efficiency | Full vectors in RAM per pod | Full-fidelity on object storage; optimized projections in executor cache |
| Max scale | Limited by pod count and type | Billions of vectors across distributed slabs |
| Pricing model | Per-pod-hour (fixed capacity) | Per-read-unit (On-Demand) or per-shard flat fee (DRN) |
| Idle cost | Full pod cost even when idle | Zero queries = minimal cost (On-Demand) |
| Status | Legacy β supported but not actively enhanced | Current β actively developed and optimized |
When new algorithms are developed (like Product-Quantized Fast Scan), they are applied to your existing indexes during compaction β without downtime, re-ingestion, or any action on your part.
On-Demand serverless charges per read unit consumed. Zero queries means minimal cost. For sustained high-throughput workloads, Dedicated Read Nodes offer flat-fee pricing without per-query charges.
With pods, you had to choose pod types, manage pod counts, monitor utilization, and plan capacity. With serverless, Pinecone handles all of this. You interact with indexes and namespaces; the infrastructure is invisible.
Serverless indexes use roaring bitmap indexes for every metadata field and dynamically choose between pre-filtering and mid-scan filtering based on selectivity. Selective filters accelerate queries rather than slow them down.
Serverless stores your vectors at full precision on object storage and applies optimized quantization internally during compaction. You never sacrifice accuracy for scalability.
On pods, the indexing algorithm was fixed when the index was created. If Pinecone developed a better algorithm, you had to manually re-index to benefit.
On serverless, Pinecone's background compaction process continuously re-optimizes your data. When new algorithms are developed (like the recent Product-Quantized Fast Scan), they are applied to your existing indexes during compaction β without downtime, re-ingestion, or any action on your part.
Pods charge per-hour regardless of utilization. If your traffic is bursty β heavy during business hours, quiet at night β you pay full price for idle pods.
On-Demand serverless charges per read unit consumed. Zero queries means minimal cost. For sustained high-throughput workloads, Dedicated Read Nodes offer flat-fee pricing without per-query charges.
With pods, you had to choose pod types, manage pod counts, monitor utilization, and plan capacity. With serverless, Pinecone handles all of this. You interact with indexes and namespaces; the infrastructure is invisible.
Serverless indexes use roaring bitmap indexes for every metadata field and dynamically choose between pre-filtering and mid-scan filtering based on selectivity. This makes selective filters accelerate queries rather than slow them down.
Serverless stores your vectors at full precision on object storage and applies optimized quantization internally during compaction. You never sacrifice accuracy for scalability.
Migration involves four straightforward steps. Your pod-based index remains available during the entire process, so there is no downtime.
Create a new index using the serverless architecture. Specify the same metric (cosine, euclidean, or dotproduct) and dimensions as your pod-based index.
Export vectors from your pod-based index and upsert them into the new serverless index. For large datasets, use batch operations and parallel processing. Your pod-based index remains available during migration.
Point your application to the new serverless index. The query and upsert APIs are the same β no code changes are needed beyond updating the index name or host.
Once you've validated that the serverless index is serving correctly, delete the pod-based index to stop incurring pod charges.
Create a new index using the serverless architecture. Specify the same metric (cosine, euclidean, or dotproduct) and dimensions as your pod-based index.
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="YOUR_API_KEY")
pc.create_index(
name="my-index-serverless",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)Pinecone provides tooling and documentation for data migration. The key consideration is that this is a data copy operation β your pod-based index remains available during the migration, so there is no downtime. The query and upsert APIs are the same β no code changes are needed beyond updating the index name or host.
Pods are Pinecone's legacy architecture. They are still supported for existing users, but all new indexes should use the serverless architecture. Pinecone's active development, algorithm improvements, and performance optimizations are focused on the serverless platform.
Always use serverless for new projects. It is Pinecone's current architecture, offers automatic scaling, background algorithm optimization, full-fidelity vector storage, and eliminates the operational overhead of managing pods.
Yes. Migration involves creating a new serverless index, exporting data from your pod-based index, and re-ingesting it into the serverless index. Your pod-based index remains available during migration, so there is no downtime. The query and upsert APIs are the same β only the index name or host changes.
Pods coupled compute and storage in fixed virtual machines that you provisioned and managed. Serverless decouples storage (object storage like S3) from compute (stateless query executors), scales automatically, applies algorithm upgrades transparently, and eliminates manual infrastructure management. Serverless is the current architecture; pods are legacy.
Pods are legacy infrastructure. While existing pod-based indexes continue to be supported, Pinecone's active development and optimization efforts are focused on the serverless architecture. New users should start with serverless, and existing pod users are encouraged to migrate.
For a complete technical explanation of how Pinecone's serverless architecture works, see . For migration assistance, contact our solutions team at .
Migrate from pods to serverless and unlock automatic scaling, algorithm upgrades, and full-fidelity vector storage β with zero operational overhead.