Query Data

Overview

After your data are indexed, you can send a query vector and retrive the most similar items in your index.

The number of similar items to retrieve can be specified each time you send a query, and they are always ordered by similarity.

Setup

Create an index and insert data.

import pinecone
import pandas as pd
import numpy as np

pinecone.init(">>>YOUR_API_KEY<<<")

df = pd.DataFrame(data={
    "id": [f"id-{ii}" for ii in range(10000)],
    "vector": [ii + np.zeros(2) for ii in range(10000)]
})

pinecone.create_index("pinecone-index", metric="euclidean")
index = pinecone.Index("pinecone-index")
index.upsert(items=zip(df.id, df.vector))

Query

In this example, we use Euclidean distance as the measure of similarity. See Create Indexes for more information on how to use cosine similarity or max-dot-product to measure similarity.

We can send multiple queries, and retrieve the top-k most similar vectors for each query as a QueryResult object:

  • ids is an ordered list of the ids of the most similar vectors.
  • scores is an ordered list of scores corresponding to the similarity metric of your index.
  • data is an optional field that returns the values of the similar vectors. You can turn this on by adding the include_data=True keyword parameter to your query.
results = index.query(queries=[[0, 0], [2.1, 2.1]], top_k=3)
for res in results:
    print(res)

# QueryResult(ids=['id-0', 'id-1', 'id-2'], scores=[-0.0, -1.9999998807907104, -7.999999523162842], data=None)
# QueryResult(ids=['id-2', 'id-3', 'id-1'], scores=[-0.0199999138712883, -1.6199977397918701, -2.41999888420105], data=None)

Send One Query at a Time

You can send one query at a time. This is helpful if you have multiple clients accessing the same index simultaneously.

index.unary_query(query=1 + np.zeros(2), top_k=3)

Delete the Index

Free up computing resources and delete all of the data.

pinecone.delete_index("pinecone-index")