Performance Tuning

You are viewing docs for Pinecone v1. Switch to docs Pinecone v2.

Tips and best practices for getting the most out of Pinecone.

Reuse Connections

Under the hood, pinecone.Index connects to Pinecone by establishing a gRPC channel. A gRPC channel should be reused to allow RPC calls to use the existing HTTP/2 connection. In other words, when upserting and querying the same index, you shoud reuse the same pinecone.Index() instance.

Minimize Latency

Pinecone Beta is deployed in the AWS us-west-2 US West (Oregon) region. To minimize latency when you access Pinecone, consider deploying your application in the same US West (Oregon) region.

Contact us if you need a dedicated deployment in other regions. We currently support AWS and GCP.

Slow Uploads or High Latencies?

Pinecone supports very high throughput (10K+ vectors per second). If you experience slow uploads or high query latencies, it may be because you are accessing Pinecone from your home network. Switch to a cloud environment such as EC2, GCE, Google Colab, GCP AI Platform Notebook, or SageMaker Notebook for significant performance improvements.

How to Upload >1GB of Data

When your data is more than 1GB, be sure to use more than 1 shard. As a general guideline, add 1 shard to your index for every additional GB of data. Refer to the documentation on how to specify the number of shards for your index.


How to Increase Throughput

To increase throughput (QPS), increase the number of replicas for your index. Refer to the SDK Reference about specifying the number of replicas for your index.


Unary Variants

Use unary_* when you handle only one item at a time, or when you have many concurrent clients accessing the same Pinecone index. The unary_* interface sends unary gRPC requests under the hood. For example, if you scale your web application horizontally, then you should use unary_query to send queries.

What will you build?

Upgrade your search or recommendation systems with just a few lines of code, or contact us for help.