Performance tuning
This section provides some tips for getting the best performance out of Pinecone.
Basic performance checklist
- Switch to a cloud environment. For example: EC2, GCE, Google Colab, GCP AI Platform Notebook, or SageMaker Notebook. If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network.
- Deploy your application and your Pinecone service in the same region. Contact us if you need a dedicated deployment.
- Reuse connections. We recommend you reuse the same
pinecone.Pinecone.Index()
instance when you are upserting
vectors into, and querying, the same index. - Operate within known limits.
Increasing throughput
Batch upserts
When upserting larger amounts of data, upsert data in batches of 100-500 vectors over multiple upsert requests. Batching significantly reduces the time it takes to process data.
Send upserts in parallel
Pinecone is thread-safe, so you can send multiple read and write requests in parallel to help increase throughput. You can read more about high-throughput optimizations on our blog.
Note
For serverless indexes, reads and writes follow independent paths, so you can can send multiple read and write requests in parallel to improve throughput.
For pod-based indexes, multiple reads can be performed in parallel, and multiple writes can be performed in parallel, but multiple reads and writes cannot be performed in parallel. Therefore, write batches may affect query latency, and read batches may affect write throughput.
Scale pod-based indexes
To increase throughput (QPS) for pod-based indexes, increase the number of replicas for your index. See the configure_index
API reference for more details.
Note
With serverless indexes, you don't configure any compute or storage resources, and you don't need to manually scale resources to increase throughput. Instead, serverless indexes scale automatically based on usage.
Example
The following example increases the number of replicas for example-index
to 4.
from pinecone import Pinecone
pc = Pinecone(api_key="YOUR_API_KEY")
pc.configure_index("example-index", replicas=4)
import { Pinecone } from '@pinecone-database/pinecone'
const pc = new Pinecone({
apiKey: 'YOUR_API_KEY'
});
await pc.configureIndex('example-index', { replicas: 4 });
PINECONE_API_KEY = "YOUR_API_KEY"
curl -s -X PATCH "https://api.pinecone.io/indexes/example-index" \
-H "Content-Type: application/json" \
-H "Api-Key: $PINECONE_API_KEY" \
-d '{
"replicas": 4
}'
See the configure_index
API reference for more details.
Decreasing latency
Use namespaces
When you use namespaces to partition records within a single index, you can limit queries to specific namespaces to reduces the number of records scanned. For more details, see Namespaces.
Use metadata filtering
When you attach metadata key-value pairs to records, you can filter queries to retrieve only records that match the metadata filter. For more details, see Metadata filtering.
Warning
For
p2
pod-based indexes, metadata filters can increase query latency.
Avoid network calls to fetch index hosts
When you target an index, the Python and Node.js clients make a network call to fetch the host where the index is deployed. In a production situation, you can avoid this additional round trip by specifying the host of the index as follows:
from pinecone import Pinecone
pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index(host="INDEX_HOST")
You can get the host of an index using the Pinecone console or using the describe_index
operation. For more details, see Get an index endpoint.
Updated 27 days ago