Manage indexes

An index is the highest-level organizational unit of vector data in Pinecone. It accepts and stores vectors, serves queries over the vectors it contains, and does other vector operations over its contents.

Our Learn section explains the basics of vector databases and similarity search as a service.

Getting a list of your indexes

List all your Pinecone indexes:

pythoncurl
pinecone.list_indexes()
curl -i -X GET \
  -H 'Api-Key: YOUR_API_KEY_HERE' \
  https://controller.beta.pinecone.io/databases

Creating an index

The simplest way to create an index is as follows. This gives you an index with a single shard and no additional replicas that will perform approximate nearest neighbor (ANN) search using cosine similarity:

pythoncurl
pinecone.create_index("pinecone-index", dimension=128)
curl -i -X POST \
  -H 'Content-Type: application/json' \
  -H 'Api-Key: YOUR_API_KEY_HERE' \
  https://controller.beta.pinecone.io/databases \
  -d '{
    "name": "example-index-name",
    "dimension": 128
  }'

This is usually sufficient for millions of low-dimensional vectors with a moderate Queries Per Second (QPS) requirement.

A more complex index can be created as follows. This creates an index with 2 shards and 2 replicas that will perform ANN search using Euclidean distance:

pythoncurl
import pinecone
pinecone.init(">>>YOUR_API_KEY<<<")

pinecone.create_index('example-index-name', dimension=128, index_type='approximated', metric='euclidean', shards=2, replicas=2)
curl -i -X POST \
  -H 'Content-Type: application/json' \
  -H 'Api-Key: YOUR_API_KEY_HERE' \
  https://controller.beta.pinecone.io/databases \
  -d '{
    "name": "example-index-name",
    "dimension": 128,
    "index_type": "approximated",
    "metric": "cosine",
    "replicas": 2,
    "shards": 2
  }'

For the full list of parameters available to customize an index, see the create_index API reference.

Deleting an index

This operation will delete all of the data and the computing resources associated with the index.

Tip: Delete your index when it's no longer needed. When you create an index, it runs as a service until you delete it. Users are billed for running indexes, so we recommend you delete any indexes you're not using. This will minimize your costs.

Delete a Pinecone index named "pinecone-index":

pythoncurl
pinecone.delete_index("pinecone-index")
curl -i -X DELETE \
  -H 'Api-Key: YOUR_API_KEY_HERE' \
  https://controller.beta.pinecone.io/databases/pinecone-index

Describing an index

Get the configuration and current status of an index named "pinecone-index":

pythoncurl
pinecone.describe_index("pinecone-index")
curl -i -X GET \
  -H 'Api-Key: YOUR_API_KEY_HERE' \
  https://controller.beta.pinecone.io/databases/pinecone-index

Example 1 Creating an index using Pandas

This example shows you how to create an index with Euclidean distance as the measure of similarity.

Prerequisite: To use the sample code below, you must install Pandas. Pandas is a Python package for data analysis.

  1. In the command line (terminal), enter:
pip install pandas

Note: Version 3 of Python uses pip3. The command is:

pip3 install pandas

  1. Create the index in Python:
import pinecone

pinecone.init(">>>YOUR_API_KEY<<<")

pinecone.create_index("pinecone-index", metric="euclidean", index_type='approximated', dimension=1024, shards=1, replicas=1)
  1. When the index is created, you can start inserting vectors and getting query results:
# Import Pandas
import pandas

# Upload the sample data formatted as (id, vector) tuples
df = pd.DataFrame(data={
    "id": [f"id-{ii}" for ii in range(10000)],
    "vector": [ii + np.zeros(2) for ii in range(10000)]
})

# Connect to the index
index = pinecone.Index("pinecone-index")

# insert vectors
index.upsert(vectors=zip(df.id, df.vector))

# query the index and get similar vectors
index.query(queries=[[0, 1]], top_k=3)
  1. Delete the index when when it's no longer needed:
# Delete the index
pinecone.delete_index("pinecone-index")

Example 2 Nearest-neighbor classifier

This is an example of a simple nearest-neighbor classifier. The data is sampled from two multivariate normal distributions.

Given an unknown vector, we will build a classifier to determine which multivariate normal this vector is more likely to belong to, using the majority class label of its nearest neighbors.

Open In Colab

"""Generate data from multivariate normal distributions"""

import numpy as np
import pandas as pd
from collections import Counter

sample_size = 50000
dim = 10
A_mean = 0
B_mean = 2

# Create multivariate normal samples
A_vectors = A_mean + np.random.randn(sample_size, dim)
B_vectors = B_mean + np.random.randn(sample_size, dim)

# Query data generated from A distribution
query_size = 20
A_queries = A_mean + np.random.randn(query_size, dim)


"""Build a classifier using Pinecone"""

import pinecone

pinecone.init(">>>YOUR_API_KEY<<<")

# Create an index
index_name = 'simple-knn-classifier'
pinecone.create_index(index_name, metric="euclidean")

# Connect to the index
index = pinecone.Index(index_name)

# Upload the sample data formatted as (id, vector) tuples.
A_df = pd.DataFrame(data={
    "id": [f"A-{ii}" for ii in range(len(A_vectors))],
    "vector": A_vectors
})
B_df = pd.DataFrame(data={
    "id": [f"B-{ii}" for ii in range(len(B_vectors))],
    "vector": B_vectors
})
acks = index.upsert(items=zip(A_df.id, A_df.vector))
acks = index.upsert(items=zip(B_df.id, B_df.vector))

# We expect most of a query's nearest neighbors to come from the A distribution
for result in index.query(queries=A_queries, top_k=10):
    cc = Counter(id_.split("-")[0] for id_ in result.ids)
    print(f"Count nearest neighbors' class labels: A={cc['A']}, B={cc['B']}")

# Delete the index
pinecone.delete_index(index_name)