Insert data

After creating a Pinecone index, you can start inserting data (vector embeddings) into the index.

Learn more

Our Learn section explains the basics of vector databases and similarity search as a service.

Inserting the vectors

  1. Connect to the index:
pythoncurl
index = pinecone.Index("pinecone-index")
# Not applicable
  1. Insert the data as a list of (id, vector) tuples. Use the Upsert operation to write vectors into a namespace:
pythoncurl
# Sample data (5 8-dimensional vectors)
ids = ["A", "B", "C", "D", "E"]
vectors = [
    [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
    [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
    [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],
    [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4],
    [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]
]
# Insert the data
index.upsert(vectors=zip(ids, vectors))
curl -i -X POST https://YOUR_INDEX-YOUR_PROJECT.svc.us-west1-gcp.pinecone.io/vectors/upsert \
  -H 'Api-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vectors": [
      {
        "id": "A",
        "values": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
      },
      {
        "id": "B",
        "values": [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]
      },
      {
        "id": "C",
        "values": [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]
      },
      {
        "id": "D",
        "values": [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]
      },
      {
        "id": "E",
        "values": [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]
      }
    ]
  }'
note

Immediately after the upsert response is received, vectors may not be visible to queries yet. In most situations, you can check if the vectors have been received by checking for the vector counts returned by describe_index_stats() to be updated. This technique may not work if the index has multiple replicas. The database is eventually consistent.

Batching upserts

For clients upserting larger amounts of data, you should insert data into an index in batches, over multiple upsert requests. For example, this can be done as follows:

import random
import itertools

def chunks(iterable, batch_size=100):
    """A helper function to break an iterable into chunks of size batch_size."""
    it = iter(iterable)
    chunk = tuple(itertools.islice(it, batch_size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it, batch_size))

vector_dim = 128
vector_count = 10000

# Example generator that generates many (id, vector) pairs
example_data_generator = map(lambda i: (f'id-{i}', [random.random() for _ in range(vector_dim)]), range(vector_count))

# Upsert data with 100 vectors per upsert request
for ids_vectors_chunk in chunks(example_data_generator, batch_size=100):
    index.upsert(vectors=ids_vectors_chunk)  # Assuming `index` defined elsewhere

Sending upserts in parallel

By default, all vector operations block until the response has been received. But using our client they can be made asynchronous. For the Batching Upserts example this can be done as follows:

pythonshell
# Upsert data with 100 vectors per upsert request asynchronously
# - Create pinecone.Index with pool_threads=30 (limits to 30 simultaneous requests)
# - Pass async_req=True to index.upsert()
with pinecone.Index('example-index', pool_threads=30) as index:
    # Send requests in parallel
    async_results = [
        index.upsert(vectors=ids_vectors_chunk, async_req=True)
        for ids_vectors_chunk in chunks(example_data_generator, batch_size=100)
    ]
    # Wait for and retrieve responses (this raises in case of error)
    [async_result.get() for async_result in async_results]
# Not applicable

If you experience slow uploads, see Performance tuning for advice.

Partitioning an index into namespaces

You can organize the vectors added to an index into a small set of partitions, or "namespaces", in order to limit queries and other vector operations to only one such namespace at a time. For more information, see: Namespaces.

Inserting vectors with metadata

You can insert vectors that contain metadata key-value pairs.

You can then use the metadata to filter for those criteria when sending the query. Pinecone will search for similar vector embeddings only among those items that match the filter. For more information, see: Metadata Filtering.