Query data

After your data is indexed, you can start sending queries to Pinecone.

The Query operation searches a namespace, using one or more query vectors. It retrieves the ids of the most similar vectors in a namespace, along with their similarity scores. It can optionally include the vectors' values and metadata too. You specify the number of vectors to retrieve each time you send a query. They are always ordered by similarity, from most similar to least similar.

This section explains how you can:

  • Send a query vector and retrieve ids of the most similar items in the index, along with their similarity scores
  • Search through a namespace (partition) within an index
Learn more

Our Learn section explains the basics of vector databases and similarity search as a service.

Sending a query

You can send multiple queries, and retrieve the top-k most similar vectors for each query. For example, this sends two query vectors and retrieves three matching vectors for each:

pythoncurl
index.query(
  queries=[
    [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],
    [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]
  ],
  top_k=3,
  include_values=True
)

# Returns:
# {'results': [{'matches': [{'id': 'C',
#                            'score': -1.76717265e-07,
#                            'values': [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]},
#                           {'id': 'B',
#                            'score': 0.080000028,
#                            'values': [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]},
#                           {'id': 'D',
#                            'score': 0.0800001323,
#                            'values': [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]}],
#               'namespace': ''},
#              {'matches': [{'id': 'D',
#                            'score': 2.14875229e-07,
#                            'values': [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]},
#                           {'id': 'C',
#                            'score': 0.0799998939,
#                            'values': [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]},
#                           {'id': 'E',
#                            'score': 0.0800002143,
#                            'values': [0.49999997, 0.49999997, 0.49999997, 0.49999997, 0.49999997, 0.49999997, 0.49999997, 0.49999997]}],
#               'namespace': ''}]}
curl -i -X POST https://hello-pinecone-YOUR_PROJECT.svc.us-west1-gcp.pinecone.io/query \
  -H 'Api-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "queries": [
      {"values": [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]},
      {"values": [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]}
    ],
    "topK": 3,
    "includeValues": true
  }'

# Output:
# {
#   "results":[
#     {
#       "matches":[
#         {
#           "id": "C",
#           "score": -1.76717265e-07,
#           "values": [0.3,0.3,0.3,0.3,0.3,0.3,0.3,0.3]
#         },
#         {
#           "id": "B",
#           "score": 0.080000028,
#           "values": [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]
#         },
#         {
#           "id": "D",
#           "score": 0.0800001323,
#           "values": [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]
#         }
#       ],
#       "namespace": ""
#     }
#     {
#       "matches":[
#         {
#           "id": "D",
#           "score": 0.0800001323,
#           "values": [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]
#         },
#         {
#           "id": "C",
#           "score": -1.76717265e-07,
#           "values": [0.3,0.3,0.3,0.3,0.3,0.3,0.3,0.3]
#         },
#         {
#           "id": "E",
#           "score": 0.0800002143,
#           "values": [0.49999997, 0.49999997, 0.49999997, 0.49999997, 0.49999997, 0.49999997, 0.49999997, 0.49999997]
#         },
#       ],
#       "namespace": ""
#     }
#   ]
# }

Querying by namespace

You can search through a namespace (partition) within an index:

pythoncurl
index.query(queries=vectors, top_k=integer, namespace=namespace)
curl -i -X POST \
  'https://{index_name}-{project_name}.svc.{environment}.pinecone.io/query' \
  -H 'Api-Key: YOUR_API_KEY_HERE' \
  -H 'Content-Type: application/json' \
  -d '{
    "namespace": "namespace_a",
    "topK": {
      "top_k": 102
    },
    "filter": "filter_a",
    "includeData": {
      "include_data": true
    },
    "includeMetadata": {
      "include_data": true
    },
    "queries": {
      "queries": [
        {
          "vector": {
            "values": [
              1.524,
              2.23432,
              3.664
            ]
          },
          "top_k": 106,
          "namespace": "namespace_a",
          "filter": "filter_to_use"
        },
        {
          "vector": {
            "values": [
              2.4,
              3.22,
              7.664
            ]
          },
          "top_k": 90,
          "namespace": "namespace_a",
          "filter": "another_filter_to_use"
        }
      ]
    }
  }'
note

Multiple namespaces can’t be queried at once. If no namespace is specified, the default namespace is queried.

Using metadata filters in queries

You can add metadata to document embeddings within Pinecone, and then filter for those criteria when sending the query. Pinecone will search for similar vector embeddings only among those items that match the filter. For more information, see: Metadata Filtering.

This example shows a filtered search for movies:

  • $in - In array (string)
  • $nin - Not in array (string)
pythoncurl
import pinecone

pinecone.init(api_key="your-api-key")
index = pinecone.Index("example-index-name")

query_response = index.query(
    queries=[
        ([0.1, 0.2, 0.3, 0.4]),
        ([0.2, 0.3, 0.4, 0.5])
    ],
    filter={"genre": {"$in": ["comedy", "documentary", "drama"]}},
    namespace="example-namespace",
    top_k=10
)
curl -i -X POST \
  -H 'Api-Key: YOUR_API_KEY_HERE' \
  -H 'Content-Type: application/json' \
  'https://example-index-name-example-project.svc.beta.pinecone.io/query' \
  -d '{
    "topK": 10,
    "queries": [
      {"values": [0.1, 0.2, 0.3, 0.4]},
      {"values": [0.2, 0.3, 0.4, 0.5]}
    ],
    "filter": {"genre": {"$in": ["comedy", "documentary", "drama"]}},
    "namespace": "example-namespace"
  }'

Example - Query using Euclidean distance

In this example, we use Euclidean distance as the measure of similarity.

info

When you use metric=‘euclidean’, the most similar results are those with the lowest score.

Learn more

See Manage Indexes for more information on how to use cosine similarity or max-dot-product to measure similarity.

  1. The code below creates an index, uploads data to the index, and sends a query:
import pinecone
import pandas as pd
import numpy as np

pinecone.init(">>>YOUR_API_KEY<<<")

df = pd.DataFrame(data={
    "id": [f"id-{ii}" for ii in range(10000)],
    "vector": [ii + np.zeros(2) for ii in range(10000)]
})

pinecone.create_index("pinecone-index", metric="euclidean")
index = pinecone.Index("pinecone-index")
index.upsert(items=zip(df.id, df.vector))

# Send the query
results = index.query(queries=[[0, 0], [2.1, 2.1]], top_k=3)
for res in results:
    print(res)
  1. The result displays:
QueryResult(ids=['id-0', 'id-1', 'id-2'], scores=[-0.0, -1.9999998807907104, -7.999999523162842], data=None)
QueryResult(ids=['id-2', 'id-3', 'id-1'], scores=[-0.0199999138712883, -1.6199977397918701, -2.41999888420105], data=None)
  1. Delete the index when when it's no longer needed:
pinecone.delete_index("pinecone-index")