Insert Data

You are viewing docs for Pinecone v1. Switch to docs Pinecone v2.

Overview

After creating a Pinecone index, you can start inserting data (vector embeddings) into the index.

Pinecone allows you to create partitions within an index, which we call namespaces. This lets you separate vector embeddings within the same index.

Data Preparation

When you insert data to a Pinecone index, you upload (id, vector) pairs, just as you would with a key-value store.

import pandas as pd
import numpy as np

df = pd.DataFrame(data={
    "id": ["A", "B", "C", "D", "E"],
    "vector": [ii + np.ones(2) for ii in range(5)]
})

Insert Vectors

First connect to the index.

index = pinecone.Index("pinecone-index")

Then insert the data as a list of (id, vector) tuples.

acks = index.upsert(items=zip(df.id, df.vector))
print(acks[:2])

# [UpsertResult(id='A'), UpsertResult(id='B')]

UpsertResult(id='A') is an acknowledgement that the vector with id="A" has been inserted successfully.

Batching Upserts

We can insert data in the index in batches as well.

batch_size = 100
acks = index.upsert(items=zip(ids,vectors),batch_size=batch_size)

Uploading data to Pinecone should be very fast (10K+ items per second). If you experience slow uploads, read our Performance Tuning tips.

Namespaces (Partitions)

An index can be partitioned into namespaces during upserts. Namespaces are like partitions, and are useful for limiting queries and other operations to a subset of vector embeddings in the index.

When upserting vectors you can optionally specify a namespace destination. The namespace will be created if it doesn’t already exist. When no namepsace is specified the index uses the ‘default’ namespace.

# Upsert vectors while creating a new namespace
index.upsert(items=zip(df.id, df.vector),
             namespace='my-first-namespace')

Once namespsaces are created in an index, we can only interact with specific namespaces and not with the whole index. In the case above, we now have two namespaces, the default namespace and ‘my-first-namepsace’.

Different namespaces within an index need not hold vectors of the same size.

import numpy as np

# Create two sets of vectors with 128 and 256 dimensions respectively
vectors_a = np.random.rand(15,128)
vectors_b = np.random.rand(20,256)
vectors_c = np.random.rand(30,512)

# Create ids
ids_a = np.arange(15)
ids_b = np.arange(20)
ids_c = np.arange(30)

# Insert into separate namespaces
index.upsert(items=zip(ids_a,vectors_a),namespace='namespace_a')
index.upsert(items=zip(ids_b,vectors_b),namespace='namespace_b')
# if no namespaces are specified, the index uses the default namespace
index.upsert(items=zip(ids_c,vectors_c))

Insert One Vector at a Time

You can insert one vector at a time. This is helpful if you have multiple clients updating the same index infrequently.

index.unary_upsert(item=("F", 5 + np.ones(2)))

You can still send a single vector through the uspsert() call, but unary_upsert() is better optimized for this use case.

Example

# Load Pinecone and create index
import pinecone
pinecone.init(">>>YOUR_API_KEY<<<")
pinecone.create_index("pinecone-index", metric="euclidean")

# Libraries
import pandas as pd
import numpy as np

# Connect to the index
index = pinecone.Index("pinecone-index")

# Define dataframe
df = pd.DataFrame(data={
    "id": ["A", "B", "C", "D", "E"],
    "vector": [ii + np.ones(2) for ii in range(5)]
})

# Upsert vectors as a dataframe
acks = index.upsert(items=zip(df.id, df.vector))
print(acks[:2])

# [UpsertResult(id='A'), UpsertResult(id='B')]

# Free up computing resources and delete all of the data.
pinecone.delete_index("pinecone-index")

What will you build?

Upgrade your search or recommendation systems with just a few lines of code, or contact us for help.

}