Query Data

Overview

After your data is indexed, you can send a query vector to Pinecone and retrieve IDs of the most similar items in the index, along with their similarity scores.

The number of items to retrieve is specified each time you send a query. They are always ordered by similarity, from most similar to least similar.

Query

We can send multiple queries, and retrieve the top-k most similar vectors for each query:

index.query(queries=vectors, top_k=integer)
  • Replace vectors with a list of query vectors to search with.
  • Replace integer with the number of vectors to retrieve in the result.

This returns results in a QueryResult object:

QueryResult(ids, scores, data)
  • ids is an ordered list of IDs of the most similar vectors.
  • scores is an ordered list of scores corresponding to the similarity metric of your index.
  • data is an optional field that returns the values of the similar vectors. You can turn this on by adding the include_data=True keyword parameter to your query.

Example:

results = index.query(queries=[[0, 0], [2.1, 2.1]], top_k=3)
for res in results:
    print(res)
QueryResult(ids=['id-0', 'id-1', 'id-2'], scores=[-0.0, -1.9999998807907104, -7.999999523162842], data=None)
QueryResult(ids=['id-2', 'id-3', 'id-1'], scores=[-0.0199999138712883, -1.6199977397918701, -2.41999888420105], data=None)

Send One Query at a Time

You can send one query at a time. This is helpful if you have multiple clients accessing the same index simultaneously.

index.unary_query(query=1 + np.zeros(2), top_k=3)

Query by Namespace

Search through a namespace (partition) within an index.

index.query(queries=vectors, top_k=integer, namespace=namespace)

Multiple namespaces can’t be queried at once. If no namespace is specified, the default namespace is queried.

Example

In this example, we use Euclidean distance as the measure of similarity. See Create Indexes for more information on how to use cosine similarity or max-dot-product to measure similarity.

import pinecone
import pandas as pd
import numpy as np

pinecone.init(">>>YOUR_API_KEY<<<")

df = pd.DataFrame(data={
    "id": [f"id-{ii}" for ii in range(10000)],
    "vector": [ii + np.zeros(2) for ii in range(10000)]
})

pinecone.create_index("pinecone-index", metric="euclidean")
index = pinecone.Index("pinecone-index")
index.upsert(items=zip(df.id, df.vector))

results = index.query(queries=[[0, 0], [2.1, 2.1]], top_k=3)
for res in results:
    print(res)

QueryResult(ids=['id-0', 'id-1', 'id-2'], scores=[-0.0, -1.9999998807907104, -7.999999523162842], data=None)
QueryResult(ids=['id-2', 'id-3', 'id-1'], scores=[-0.0199999138712883, -1.6199977397918701, -2.41999888420105], data=None)

Once you’re done, free up computing resources and delete all of the data.

pinecone.delete_index("pinecone-index")