# Create Indexes

## Overview

Creating a Pinecone index is easy. In this example, we will create an index with Euclidean distance as the measure of similarity.

```
import pinecone
pinecone.init(">>>YOUR_API_KEY<<<")
pinecone.create_index("pinecone-index", metric="euclidean")
```

Once the index is created, you can start inserting vectors and getting query results.

```
import pandas
df = pd.DataFrame(data={
"id": [f"id-{ii}" for ii in range(10000)],
"vector": [ii + np.zeros(2) for ii in range(10000)]
})
# connect to the index
index = pinecone.Index("pinecone-index")
# insert vectors
index.upsert(items=zip(df.id, df.vector))
# query the index and get similar vectors
index.query(queries=[[0, 1]], top_k=3)
```

When your similarity search service is no longer needed, you can delete the index and all of the data.

```
pinecone.delete_index("pinecone-index")
```

Creating an index with default settings is usually sufficient for millions of low-dimentional vectors with a moderate Queries Per Second (QPS) requirement.

Keep reading to learn about how to scale up your index, or use different measures of similarity.

## Parameters

### Engine types

One of `approximated`

or `exact`

.

Pinecone currently supports two types of index search algorithms:
**approximate** nearest neighbor search and **exact** nearest neighbor search.

The `approximated`

engine uses fast approximate search algorithms developed by Pinecone; it is fast and highly accurate.

The `exact`

engine uses exact search algorithms that performs exhaustive searches and thus it is usually slower than the `approximated`

engine.

### Metrics

One of `cosine`

, `dotproduct`

, or `euclidean`

.

Use `cosine`

for cosine similarity, `dotproduct`

for max-dot-product, and `euclidean`

for Euclidean distance.

Depending on your application, some metrics have better recall and precision performance than others.

### Shards

By intelligently sharding your data, a Pinecone index can store billions of vectors and still achieve high accuracy and low latency.

As a general guideline, add 1 shard to the index for each additional GB of data.

For example, one million 32-dimensional vectors would take about 150MB of storage.

### Replicas

Replicas duplicate your index to help with concurrent access. Increasing the number of replicas increases throughput (QPS). We recommend using at least 2 replicas if your application needs high availability (99.99% uptime) for querying.

## Example

This is an example of a simple nearest-neighbor classifier. The data are sampled from two multivariate normal distributions.

Given an unknown vector, we will build a classifier to determine which multivariate normal this vector is more likely to belong to, using the majority class label of its nearest neighbors.

```
"""Generate data from multivariate normal distributions"""
import numpy as np
import pandas as pd
from collections import Counter
sample_size = 50000
dim = 10
A_mean = 0
B_mean = 2
# Create multivariate normal samples
A_vectors = A_mean + np.random.randn(sample_size, dim)
B_vectors = B_mean + np.random.randn(sample_size, dim)
# Query data generated from A distribution
query_size = 20
A_queries = A_mean + np.random.randn(query_size, dim)
"""Build a classifier using Pinecone"""
import pinecone
pinecone.init(">>>YOUR_API_KEY<<<")
# Create an index
index_name = 'simple-knn-classifier'
pinecone.create_index(index_name, metric="euclidean")
# Connect to the index
index = pinecone.Index(index_name)
# Upload the sample data formatted as (id, vector) tuples.
A_df = pd.DataFrame(data={
"id": [f"A-{ii}" for ii in range(len(A_vectors))],
"vector": A_vectors
})
B_df = pd.DataFrame(data={
"id": [f"B-{ii}" for ii in range(len(B_vectors))],
"vector": B_vectors
})
acks = index.upsert(items=zip(A_df.id, A_df.vector))
acks = index.upsert(items=zip(B_df.id, B_df.vector))
# We expect most of a query's nearest neighbors to come from the A distribution
for result in index.query(queries=A_queries, top_k=10):
cc = Counter(id_.split("-")[0] for id_ in result.ids)
print(f"Count nearest neighbors' class labels: A={cc['A']}, B={cc['B']}")
# Delete the index
pinecone.delete_index(index_name)
```