# Introduction to Facebook AI Similarity Search (Faiss)

> Facebook AI Similarity Search (Faiss) is one of the most popular implementations of efficient similarity search, but what is it — and how can we use it? What is it that makes Faiss special? How do we make the best use of this incredible tool?

Facebook AI Similarity Search (Faiss) is one of the most popular implementations of efficient similarity search, but what is it — and how can we use it?

What is it that makes [Faiss](https://github.com/facebookresearch/faiss) special? How do we make the best use of this incredible tool?

---

**Note:** [Pinecone](https://www.pinecone.io/) **lets you implement vector search into your applications with just a few API calls, without knowing anything about Faiss. However, you like seeing how things work, so enjoy the guide!**


Start using Pinecone for free

[Sign up free](https://app.pinecone.io)

[View Examples](https://docs.pinecone.io/page/examples)


---

Fortunately, it’s a brilliantly simple process to get started with. And in this article, we’ll explore some of the options FAISS provides, how they work, and — most importantly — how Faiss can make our search faster.

Check out the video walkthrough here:

[Video](https://www.youtube.com/watch?v=sKyvsdEv6rk)


## Key Terms Glossary

Before diving in, here's a quick reference for the key acronyms and concepts used throughout this article:

| Term | Full Name | Description |
| **Faiss** | Facebook AI Similarity Search | An open-source library developed by Meta (Facebook) AI Research for efficient similarity search and clustering of dense vectors. |
| **IVF** | Inverted File Index | An indexing technique that partitions the vector space into Voronoi cells (clusters). At query time, only the nearest cell(s) are searched rather than the full index, dramatically speeding up approximate nearest-neighbor retrieval. |
| **PQ** | Product Quantization | A vector compression technique that splits each vector into sub-vectors, clusters each sub-vector set independently, and replaces each sub-vector with a centroid ID. This reduces memory usage and speeds up distance calculations at the cost of some accuracy. |
| **IVFFlat** | Inverted File Index (Flat) | Combines IVF partitioning with exact (flat) distance calculations within each cell. Faster than a pure flat index while maintaining good accuracy. |
| **IVFPQ** |  Inverted File Index with Product Quantization | Combines IVF partitioning with PQ-compressed vectors. The most memory-efficient and fastest option, with a modest accuracy trade-off. |
| **L2** | L2 (Euclidean) Distance | A distance metric measuring the straight-line distance between two vectors in space. Smaller L2 distance = more similar vectors. |
| **nlist** | Number of Voronoi Cells | A parameter that controls how many partitions the IVF index is divided into. More cells = finer partitioning and faster search, but requires more training data. |
| **nprobe** | Number of Cells to Search | A parameter that controls how many neighboring Voronoi cells are searched at query time. Higher values improve recall at the cost of speed. |

## What is Faiss?

Before we get started with any code, many of you will be asking — what is Faiss?

Faiss is a library — developed by Facebook AI — that enables efficient similarity search.

So, given a set of [vectors](https://www.pinecone.io/learn/vector-embeddings/), we can index them using Faiss — then using another vector (the query vector), we search for the most similar vectors within the index.

Now, Faiss not only allows us to build an index and search — but it also speeds up search times to ludicrous performance levels — something we will explore throughout this article.

## Building Some Vectors

The first thing we need is data, we’ll be concatenating several datasets from this semantic test similarity hub repo. We will download each dataset, and extract the relevant text columns into a single list.

```json
{
  "_key": "c157edb312e0",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import requests\\n\",\n    \"from io import StringIO\\n\",\n    \"import pandas as pd\"\n   ]\n  },\n  {\n   \"source\": [\n    \"The first dataset is in a slightly different format:\"\n   ],\n   \"cell_type\": \"markdown\",\n   \"metadata\": {}\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"   pair_ID                                         sentence_A  \\\\\\n\",\n       \"0        1  A group of kids is playing in a yard and an ol...   \\n\",\n       \"1        2  A group of children is playing in the house an...   \\n\",\n       \"2        3  The young boys are playing outdoors and the ma...   \\n\",\n       \"3        5  The kids are playing outdoors near a man with ...   \\n\",\n       \"4        9  The young boys are playing outdoors and the ma...   \\n\",\n       \"\\n\",\n       \"                                          sentence_B  relatedness_score  \\\\\\n\",\n       \"0  A group of boys in a yard is playing and a man...                4.5   \\n\",\n       \"1  A group of kids is playing in a yard and an ol...                3.2   \\n\",\n       \"2  The kids are playing outdoors near a man with ...                4.7   \\n\",\n       \"3  A group of kids is playing in a yard and an ol...                3.4   \\n\",\n       \"4  A group of kids is playing in a yard and an ol...                3.7   \\n\",\n       \"\\n\",\n       \"  entailment_judgment  \\n\",\n       \"0             NEUTRAL  \\n\",\n       \"1             NEUTRAL  \\n\",\n       \"2          ENTAILMENT  \\n\",\n       \"3             NEUTRAL  \\n\",\n       \"4             NEUTRAL  \"\n      ],\n      \"text/html\": \"<div>\\n<style scoped>\\n    .dataframe tbody tr th:only-of-type {\\n        vertical-align: middle;\\n    }\\n\\n    .dataframe tbody tr th {\\n        vertical-align: top;\\n    }\\n\\n    .dataframe thead th {\\n        text-align: right;\\n    }\\n</style>\\n<table border=\\\"1\\\" class=\\\"dataframe\\\">\\n  <thead>\\n    <tr style=\\\"text-align: right;\\\">\\n      <th></th>\\n      <th>pair_ID</th>\\n      <th>sentence_A</th>\\n      <th>sentence_B</th>\\n      <th>relatedness_score</th>\\n      <th>entailment_judgment</th>\\n    </tr>\\n  </thead>\\n  <tbody>\\n    <tr>\\n      <th>0</th>\\n      <td>1</td>\\n      <td>A group of kids is playing in a yard and an ol...</td>\\n      <td>A group of boys in a yard is playing and a man...</td>\\n      <td>4.5</td>\\n      <td>NEUTRAL</td>\\n    </tr>\\n    <tr>\\n      <th>1</th>\\n      <td>2</td>\\n      <td>A group of children is playing in the house an...</td>\\n      <td>A group of kids is playing in a yard and an ol...</td>\\n      <td>3.2</td>\\n      <td>NEUTRAL</td>\\n    </tr>\\n    <tr>\\n      <th>2</th>\\n      <td>3</td>\\n      <td>The young boys are playing outdoors and the ma...</td>\\n      <td>The kids are playing outdoors near a man with ...</td>\\n      <td>4.7</td>\\n      <td>ENTAILMENT</td>\\n    </tr>\\n    <tr>\\n      <th>3</th>\\n      <td>5</td>\\n      <td>The kids are playing outdoors near a man with ...</td>\\n      <td>A group of kids is playing in a yard and an ol...</td>\\n      <td>3.4</td>\\n      <td>NEUTRAL</td>\\n    </tr>\\n    <tr>\\n      <th>4</th>\\n      <td>9</td>\\n      <td>The young boys are playing outdoors and the ma...</td>\\n      <td>A group of kids is playing in a yard and an ol...</td>\\n      <td>3.7</td>\\n      <td>NEUTRAL</td>\\n    </tr>\\n  </tbody>\\n</table>\\n</div>\"\n     },\n     \"metadata\": {},\n     \"execution_count\": 4\n    }\n   ],\n   \"source\": [\n    \"res = requests.get('https://raw.githubusercontent.com/brmson/dataset-sts/master/data/sts/sick2014/SICK_train.txt')\\n\",\n    \"# create dataframe\\n\",\n    \"data = pd.read_csv(StringIO(res.text), sep='\\\\t')\\n\",\n    \"data.head()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"['A group of kids is playing in a yard and an old man is standing in the background',\\n\",\n       \" 'A group of children is playing in the house and there is no man standing in the background',\\n\",\n       \" 'The young boys are playing outdoors and the man is smiling nearby',\\n\",\n       \" 'The kids are playing outdoors near a man with a smile',\\n\",\n       \" 'The young boys are playing outdoors and the man is smiling nearby']\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 5\n    }\n   ],\n   \"source\": [\n    \"# we take all samples from both sentence A and B\\n\",\n    \"sentences = data['sentence_A'].tolist()\\n\",\n    \"sentences[:5]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"4802\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 6\n    }\n   ],\n   \"source\": [\n    \"# we take all samples from both sentence A and B\\n\",\n    \"sentences = data['sentence_A'].tolist()\\n\",\n    \"sentence_b = data['sentence_B'].tolist()\\n\",\n    \"sentences.extend(sentence_b)  # merge them\\n\",\n    \"len(set(sentences))  # together we have ~4.5K unique sentences\"\n   ]\n  },\n  {\n   \"source\": [\n    \"This isn't a particularly large number, so let's pull in a few more similar datasets.\"\n   ],\n   \"cell_type\": \"markdown\",\n   \"metadata\": {}\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"urls = [\\n\",\n    \"    'https://raw.githubusercontent.com/brmson/dataset-sts/master/data/sts/semeval-sts/2012/MSRpar.train.tsv',\\n\",\n    \"    'https://raw.githubusercontent.com/brmson/dataset-sts/master/data/sts/semeval-sts/2012/MSRpar.test.tsv',\\n\",\n    \"    'https://raw.githubusercontent.com/brmson/dataset-sts/master/data/sts/semeval-sts/2012/OnWN.test.tsv',\\n\",\n    \"    'https://raw.githubusercontent.com/brmson/dataset-sts/master/data/sts/semeval-sts/2013/OnWN.test.tsv',\\n\",\n    \"    'https://raw.githubusercontent.com/brmson/dataset-sts/master/data/sts/semeval-sts/2014/OnWN.test.tsv',\\n\",\n    \"    'https://raw.githubusercontent.com/brmson/dataset-sts/master/data/sts/semeval-sts/2014/images.test.tsv',\\n\",\n    \"    'https://raw.githubusercontent.com/brmson/dataset-sts/master/data/sts/semeval-sts/2015/images.test.tsv'\\n\",\n    \"]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"stream\",\n     \"name\": \"stderr\",\n     \"text\": [\n      \"b'Skipping line 191: expected 3 fields, saw 4\\\\nSkipping line 206: expected 3 fields, saw 4\\\\nSkipping line 295: expected 3 fields, saw 4\\\\nSkipping line 695: expected 3 fields, saw 4\\\\nSkipping line 699: expected 3 fields, saw 4\\\\n'\\n\",\n      \"b'Skipping line 104: expected 3 fields, saw 4\\\\nSkipping line 181: expected 3 fields, saw 4\\\\nSkipping line 317: expected 3 fields, saw 4\\\\nSkipping line 412: expected 3 fields, saw 5\\\\nSkipping line 508: expected 3 fields, saw 4\\\\n'\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"# each of these dataset have the same structure, so we loop through each creating our sentences data\\n\",\n    \"for url in urls:\\n\",\n    \"    res = requests.get(url)\\n\",\n    \"    # extract to dataframe\\n\",\n    \"    data = pd.read_csv(StringIO(res.text), sep='\\\\t', header=None, error_bad_lines=False)\\n\",\n    \"    # add to columns 1 and 2 to sentences list\\n\",\n    \"    sentences.extend(data[1].tolist())\\n\",\n    \"    sentences.extend(data[2].tolist())\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"14505\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 9\n    }\n   ],\n   \"source\": [\n    \"len(set(sentences))\"\n   ]\n  }\n ]\n}"
}
```

Next, we remove any duplicates, leaving us with 14.5K unique sentences. Finally, we build our dense vector representations of each sentence using the [sentence-BERT](https://www.pinecone.io/learn/semantic-search/) library.

```json
{
  "_key": "e237700ffe37",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# remove duplicates and NaN\\n\",\n    \"sentences = [word for word in list(set(sentences)) if type(word) is str]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"(14504, 768)\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 7\n    }\n   ],\n   \"source\": [\n    \"from sentence_transformers import SentenceTransformer\\n\",\n    \"# initialize sentence transformer model\\n\",\n    \"model = SentenceTransformer('bert-base-nli-mean-tokens')\\n\",\n    \"# create sentence embeddings\\n\",\n    \"sentence_embeddings = model.encode(sentences)\\n\",\n    \"sentence_embeddings.shape\"\n   ]\n  }\n ]\n}"
}
```

Now, building these sentence embeddings can take some time — so feel free to download them directly from here (you can use [this script](https://github.com/jamescalam/data/blob/main/sentence_embeddings_15K/download.py) to load them into Python).

## Plain and Simple

We’ll start simple. First, we need to set up Faiss. Now, if you’re on Linux — you’re in luck — Faiss comes with built-in GPU optimization for any CUDA-enabled Linux machine.

MacOS or Windows? Well, we’re less lucky.

_(Don’t worry, it’s still ludicrously fast)_

So, CUDA-enabled Linux users, type `conda install -c pytorch faiss-gpu`. Everyone else, `conda install -c pytorch faiss-cpu`. If you don’t want to use `conda` there are alternative installation instructions [here](https://github.com/facebookresearch/faiss/blob/master/INSTALL.md).

Once we have Faiss installed we can open Python and build our first, plain and simple index with `IndexFlatL2`.

## IndexFlatL2

`IndexFlatL2` measures the L2 (or Euclidean) distance between _all_ given points between our query vector, and the vectors loaded into the index. It’s simple, _very_ accurate, but not too fast.

![L2 distance calculation between a query vector xq and our indexed vectors (shown as y)](https://cdn.sanity.io/images/vr8gru94/production/ea951a4be3acf9d379cc6f922be1468b37b7f9e5-1280x720.png)


In Python, we would initialize our `IndexFlatL2` index with our vector dimensionality (`768` — the output size of our sentence embeddings) like so:

```json
{
  "_key": "04e87db6f4b5",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import faiss\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"768\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 9\n    }\n   ],\n   \"source\": [\n    \"d = sentence_embeddings.shape[1]\\n\",\n    \"d\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"index = faiss.IndexFlatL2(d)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"True\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 11\n    }\n   ],\n   \"source\": [\n    \"index.is_trained\"\n   ]\n  }\n ]\n}"
}
```

Often, we’ll be using indexes that require us to train them before loading in our data. We can check whether an index needs to be trained using the `is_trained` method. `IndexFlatL2` is not an index that requires training, so we should return `False`.

Once ready, we load our embeddings and query like so:

```json
{
  "_key": "a41b7320aaa0",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"index.add(sentence_embeddings)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"14504\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 6\n    }\n   ],\n   \"source\": [\n    \"index.ntotal\"\n   ]\n  },\n  {\n   \"source\": [\n    \"Then search given a query `xq` and number of nearest neigbors to return `k`.\"\n   ],\n   \"cell_type\": \"markdown\",\n   \"metadata\": {}\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"k = 4\\n\",\n    \"xq = model.encode([\\\"Someone sprints with a football\\\"])\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"stream\",\n     \"name\": \"stdout\",\n     \"text\": [\n      \"[[4586 10252 12465  190]]\\nCPU times: user 27.9 ms, sys: 29.5 ms, total: 57.4 ms\\nWall time: 28.9 ms\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"%%time\\n\",\n    \"D, I = index.search(xq, k)  # search\\n\",\n    \"print(I)\"\n   ]\n  },\n  {\n   \"source\": [\n    \"Here we're returning indices `4586`, `10252`, `12465`, and `190`, which returns:\"\n   ],\n   \"cell_type\": \"markdown\",\n   \"metadata\": {}\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"4586    A group of football players is running in the field\\n\",\n       \"10252    A group of people playing football is running past the person\\n\",\n       \"12465      Two groups of people are playing football\\n\",\n       \"190    A football player is running past an official\\n\",\n       \"Name: sentence_A, dtype: object\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 9\n    }\n   ],\n   \"source\": [\n    \"data['sentence_A'].iloc[[4586, 10252, 12465, 190]]\"\n   ]\n  }\n ]\n}"
}
```

Which returns the top `k` vectors closest to our query vector `xq` as `7460`, `10940`, `3781`, and `5747`. Clearly, these are all great matches — all including either people running with a football or in the _context_ of a football match.

Now, if we’d rather extract the numerical vectors from Faiss, we can do that too.

```json
{
  "_key": "916940d4b6a4",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# we have 4 vectors to return (k) - so we initialize a zero array to hold them\\n\",\n    \"vecs = np.zeros((k, d))\\n\",\n    \"# then iterate through each ID from I and add the reconstructed vector to our zero-array\\n\",\n    \"for i, val in enumerate(I[0].tolist()):\\n\",\n    \"    vecs[i, :] = index.reconstruct(val)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"(4, 768)\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 10\n    }\n   ],\n   \"source\": [\n    \"vecs.shape\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"array([ 0.01627046,  0.22325929, -0.15037425, -0.30747262, -0.27122435,\\n\",\n       \"       -0.10593167, -0.0646093 ,  0.04738174, -0.73349041, -0.37657705,\\n\",\n       \"       -0.76762843,  0.16902871,  0.53107643,  0.51176691,  1.14415848,\\n\",\n       \"       -0.08562929, -0.67240077, -0.96637076,  0.02545463, -0.21559823,\\n\",\n       \"       -1.25656605, -0.82982153, -0.09825023, -0.21850856,  0.50610232,\\n\",\n       \"        0.10527933,  0.50396878,  0.65242976, -1.39458692,  0.6584745 ,\\n\",\n       \"       -0.21525341, -0.22487433,  0.81818348,  0.084643  , -0.76141697,\\n\",\n       \"       -0.28928292, -0.09825806, -0.73046201,  0.07855812, -0.84354657,\\n\",\n       \"       -0.59242058,  0.77471375, -1.20920527, -0.22757971, -1.30733597,\\n\",\n       \"       -0.23081468, -1.31322539,  0.01629073, -0.97285444,  0.19308169,\\n\",\n       \"        0.47424558,  1.18920875, -1.96741307, -0.70061135, -0.29638717,\\n\",\n       \"        0.60533702,  0.62407452, -0.70340395, -0.86754197,  0.17673187,\\n\",\n       \"       -0.1917053 , -0.02951987,  0.22623563, -0.16695444, -0.80402541,\\n\",\n       \"       -0.45918915,  0.69675523, -0.24928184, -1.01478696, -0.921745  ,\\n\",\n       \"       -0.33842632, -0.39296737, -0.83734828, -0.11479235,  0.46049669,\\n\",\n       \"       -1.4521122 ,  0.60310453,  0.38696304, -0.04061254,  0.00453161,\\n\",\n       \"        0.24117804,  0.05396278,  0.07506453,  1.05115867,  0.12383959,\\n\",\n       \"       -0.71281093,  0.11722917,  0.52238214, -0.04581215,  0.26827109,\\n\",\n       \"        0.8598538 , -0.3566995 , -0.64667088, -0.5435797 , -0.0431047 ,\\n\",\n       \"        0.95139188, -0.15605772, -0.49625337, -0.11140176,  0.15610115])\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 11\n    }\n   ],\n   \"source\": [\n    \"vecs[0][:100]\"\n   ]\n  }\n ]\n}"
}
```

### Speed

Using the `IndexFlatL2` index alone is computationally expensive, it doesn’t scale well.

When using this index, we are performing an _exhaustive_ search — meaning we compare our query vector `xq` to every other vector in our index, in our case that is 14.5K L2-distance calculations for every search.

Imagine the speed of our search for datasets containing 1M, 1B, or even more vectors — and when we include several query vectors?

![Milliseconds taken to return a result (y-axis) / number of vectors in the index (x-axis) — relying solely on IndexFlatL2 quickly becomes slow](https://cdn.sanity.io/images/vr8gru94/production/2a7cf4de5beb7a8addb82e6f899b24dd455847fa-1280x720.png)


Our index quickly becomes too slow to be useful, so we need to do something different.

## Partitioning The Index

Faiss allows us to add multiple steps that can optimize our search using many different methods. A popular approach is to partition the index into Voronoi cells using an **Inverted File Index (IVF**) — a technique that clusters vectors during index construction, then at query time restricts the search to only the nearest cluster(s) rather than scanning every vector. This is what makes the `IndexIVFFlat` and `IndexIVFPQ` indexes so much faster than a flat exhaustive search.

![We can imagine our vectors as each being contained within a Voronoi cell — when we introduce a new query vector, we first measure its distance between centroids, then restrict our search scope to that centroid’s cell.](https://cdn.sanity.io/images/vr8gru94/production/ca1ed9b80fd0788cee513ef75c1b8bd8daad8571-1400x748.png)


Using this method, we would take a query vector `xq`, identify the cell it belongs to, and then use our `IndexFlatL2` (or another metric) to search between the query vector and all other vectors belonging to _that specific_ cell.

So, we are reducing the scope of our search, producing an _approximate_ answer, rather than exact (as produced through exhaustive search).

To implement this, we first initialize our index using `IndexFlatL2` — but this time, we are using the L2 index as a quantizer step — which we feed into the partitioning `IndexIVFFlat` index.

```json
{
  "_key": "ff6fe0da9182",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 12,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"nlist = 50  # how many cells\\n\",\n    \"quantizer = faiss.IndexFlatL2(d)\\n\",\n    \"index = faiss.IndexIVFFlat(quantizer, d, nlist)\"\n   ]\n  }\n ]\n}"
}
```

Here we’ve added a new parameter `nlist`. We use `nlist` to specify how many partitions (Voronoi cells) we’d like our index to have.

Now, when we built the previous `IndexFlatL2`-only index, we didn’t need to train the index as no grouping/transformations were required to build the index. Because we added clustering with `IndexIVFFlat`, this is no longer the case.

So, what we do now is train our index on our data — which we must do _before_ adding any data to the index.

```json
{
  "_key": "1dcd0cd90870",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 13,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"False\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 13\n    }\n   ],\n   \"source\": [\n    \"index.is_trained\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 14,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"True\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 14\n    }\n   ],\n   \"source\": [\n    \"index.train(sentence_embeddings)\\n\",\n    \"index.is_trained  # check if index is now trained\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 15,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"14504\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 15\n    }\n   ],\n   \"source\": [\n    \"index.add(sentence_embeddings)\\n\",\n    \"index.ntotal  # number of embeddings indexed\"\n   ]\n  }\n ]\n}"
}
```

Now that our index is trained, we add our data just as we did before.

Let’s search again using the same indexed sentence embeddings and the same query vector `xq`.

```json
{
  "_key": "7ed87230b49b",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 15,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"stream\",\n     \"name\": \"stdout\",\n     \"text\": [\n      \"[[ 7460 10940  3781  5747]]\\nCPU times: user 3.83 ms, sys: 3.25 ms, total: 7.08 ms\\nWall time: 2.15 ms\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"%%time\\n\",\n    \"D, I = index.search(xq, k)  # search\\n\",\n    \"print(I)\"\n   ]\n  }\n ]\n}"
}
```

The search time has clearly decreased, in this case, we don’t find any difference between results returned by our exhaustive search, and this approximate search. But, often this can be the case.

If approximate search with `IndexIVFFlat` returns suboptimal results, we can improve accuracy by increasing the search scope. We do this by increasing the `nprobe` attribute value — which defines how many nearby cells to search.

![Searching the single closest cell when nprobe == 1 (left), and searching the eight closest cells when nprobe == 8 (right)](https://cdn.sanity.io/images/vr8gru94/production/f32a71b57eefa87ef461bb3412f9fc21bbd46514-2020x1270.png)


We can implement this change easily.

```json
{
  "_key": "98e364283ea7",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"source\": [\n    \"We can increase the number of nearby cells to search too with `nprobe`.\"\n   ],\n   \"cell_type\": \"markdown\",\n   \"metadata\": {}\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 16,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"index.nprobe = 10\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 17,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"stream\",\n     \"name\": \"stdout\",\n     \"text\": [\n      \"[[ 7460 10940  3781  5747]]\\nCPU times: user 5.29 ms, sys: 2.7 ms, total: 7.99 ms\\nWall time: 1.54 ms\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"%%time\\n\",\n    \"D, I = index.search(xq, k)  # search\\n\",\n    \"print(I)\"\n   ]\n  }\n ]\n}"
}
```

Now, because we’re searching a larger scope by increasing the `nprobe` value, we will see the search speed increase too.

![Query time / number of vectors for the IVFFlat index with different nprobe values — 1, 5, 10, and 20](https://cdn.sanity.io/images/vr8gru94/production/84b1a10186cdd9dec8ebcfff9a96dbc89951d4b6-1280x720.png)


Although, even with the larger `nprobe` value we still see much faster responses than we returned with our `IndexFlatL2`-only index.

### Vector Reconstruction

If we go ahead and attempt to use `index.reconstruct(<vector_idx>)` again, we will return a `RuntimeError` as there is no direct mapping between the original vectors and their index position, due to the addition of the IVF step.

So, if we’d like to reconstruct the vectors, we must first create these direct mappings using `index.make_direct_map()`.

```json
{
  "_key": "ba9f775eff40",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 19,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"index.make_direct_map()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 20,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"array([ 0.01627046,  0.22325929, -0.15037425, -0.30747262, -0.27122435,\\n\",\n       \"       -0.10593167, -0.0646093 ,  0.04738174, -0.7334904 , -0.37657705,\\n\",\n       \"       -0.76762843,  0.16902871,  0.53107643,  0.5117669 ,  1.1441585 ,\\n\",\n       \"       -0.08562929, -0.6724008 , -0.96637076,  0.02545463, -0.21559823,\\n\",\n       \"       -1.256566  , -0.8298215 , -0.09825023, -0.21850856,  0.5061023 ,\\n\",\n       \"        0.10527933,  0.5039688 ,  0.65242976, -1.3945869 ,  0.6584745 ,\\n\",\n       \"       -0.21525341, -0.22487433,  0.8181835 ,  0.084643  , -0.761417  ,\\n\",\n       \"       -0.28928292, -0.09825806, -0.730462  ,  0.07855812, -0.84354657,\\n\",\n       \"       -0.5924206 ,  0.77471375, -1.2092053 , -0.22757971, -1.307336  ,\\n\",\n       \"       -0.23081468, -1.3132254 ,  0.01629073, -0.97285444,  0.19308169,\\n\",\n       \"        0.47424558,  1.1892087 , -1.9674131 , -0.70061135, -0.29638717,\\n\",\n       \"        0.605337  ,  0.6240745 , -0.70340395, -0.86754197,  0.17673187,\\n\",\n       \"       -0.1917053 , -0.02951987,  0.22623563, -0.16695444, -0.8040254 ,\\n\",\n       \"       -0.45918915,  0.69675523, -0.24928184, -1.014787  , -0.921745  ,\\n\",\n       \"       -0.33842632, -0.39296737, -0.8373483 , -0.11479235,  0.4604967 ,\\n\",\n       \"       -1.4521122 ,  0.60310453,  0.38696304, -0.04061254,  0.00453161,\\n\",\n       \"        0.24117804,  0.05396278,  0.07506453,  1.0511587 ,  0.12383959,\\n\",\n       \"       -0.71281093,  0.11722917,  0.52238214, -0.04581215,  0.2682711 ,\\n\",\n       \"        0.8598538 , -0.3566995 , -0.6466709 , -0.5435797 , -0.0431047 ,\\n\",\n       \"        0.9513919 , -0.15605772, -0.49625337, -0.11140176,  0.15610115],\\n\",\n       \"      dtype=float32)\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 20\n    }\n   ],\n   \"source\": [\n    \"index.reconstruct(7460)[:100]\"\n   ]\n  }\n ]\n}"
}
```

And from there we are able to reconstruct our vectors just as we did before.

## Quantization

We have one more key optimization to cover. All of our indexes so far have stored our vectors as full (eg `Flat`) vectors. Now, in very large datasets this can quickly become a problem.

Fortunately, Faiss comes with the ability to compress our vectors using _Product Quantization (PQ)_.

But, what is PQ? Well, we can view it as an additional approximation step with a similar outcome to our use of **IVF**. Where IVF allowed us to approximate by _reducing the scope_ of our search, PQ approximates the _distance/similarity calculation_ instead.

PQ achieves this approximated similarity operation by compressing the vectors themselves, which consists of three steps.

![Three steps of product quantization](https://cdn.sanity.io/images/vr8gru94/production/6eb8071e80abf8fa8d6c170270efd5db52a3168f-1400x787.png)


1. We split the original vector into several subvectors.
2. For each set of subvectors, we perform a clustering operation — creating multiple centroids for each sub-vector set.
3. In our vector of sub-vectors, we replace each sub-vector with the ID of it’s nearest set-specific centroid.

To implement all of this, we use the IndexIVF**PQ** index — we’ll also need to `train` the index before adding our embeddings.

```json
{
  "_key": "7fec198d7343",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 21,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"m = 8  # number of centroid IDs in final compressed vectors\\n\",\n    \"bits = 8 # number of bits in each centroid\\n\",\n    \"\\n\",\n    \"quantizer = faiss.IndexFlatL2(d)  # we keep the same L2 distance flat index\\n\",\n    \"index = faiss.IndexIVFPQ(quantizer, d, nlist, m, bits) \"\n   ]\n  },\n  {\n   \"source\": [\n    \"`train` the index.\"\n   ],\n   \"cell_type\": \"markdown\",\n   \"metadata\": {}\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 22,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"False\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 22\n    }\n   ],\n   \"source\": [\n    \"index.is_trained\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 23,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"index.train(sentence_embeddings)\"\n   ]\n  },\n  {\n   \"source\": [\n    \"And `add` our vectors.\"\n   ],\n   \"cell_type\": \"markdown\",\n   \"metadata\": {}\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 23,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"index.add(sentence_embeddings)\"\n   ]\n  }\n ]\n}"
}
```

And now we’re ready to begin searching using our new index.

```json
{
  "_key": "f87d26558b32",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 24,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"index.nprobe = 10  # align to previous IndexIVFFlat nprobe value\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 25,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"stream\",\n     \"name\": \"stdout\",\n     \"text\": [\n      \"[[ 5013 10940  7460  5370]]\\nCPU times: user 3.04 ms, sys: 2.18 ms, total: 5.22 ms\\nWall time: 1.33 ms\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"%%time\\n\",\n    \"D, I = index.search(xq, k)\\n\",\n    \"print(I)\"\n   ]\n  }\n ]\n}"
}
```

### Speed or Accuracy?

Through adding PQ we’ve reduced our IVF search time from ~7.5ms to ~5ms, a small difference on a dataset of this size — but when scaled up this becomes significant quickly.

However, we should also take note of the slightly different results being returned. Beforehand, with our exhaustive L2 search, we were returning `7460`, `10940`, `3781`, and `5747`. Now, we see a slightly different order of results — and two different IDs, `5013` and `5370`.

Both of our speed optimization operations, **IVF** and **PQ**, come at the cost of accuracy. Now, if we print out these results we will still find that each item is relevant:

```json
{
  "_key": "5c6a37ec145c",
  "_type": "colabBlock",
  "jsonContent": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.8\"\n  },\n  \"orig_nbformat\": 4,\n  \"kernelspec\": {\n   \"name\": \"python3\",\n   \"display_name\": \"Python 3.8.8 64-bit ('ml': conda)\"\n  },\n  \"interpreter\": {\n   \"hash\": \"a683edd788238e5c64f9fa2e4bdd4387776bc5c6f4f0a84da0685f9a25e421d6\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2,\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 26,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"output_type\": \"execute_result\",\n     \"data\": {\n      \"text/plain\": [\n       \"['5013: A group of football players running down the field.',\\n\",\n       \" '10940: A group of people playing football is running in the field',\\n\",\n       \" '7460: A group of football players is running in the field',\\n\",\n       \" '5370: A football player is running past an official carrying a football']\"\n      ]\n     },\n     \"metadata\": {},\n     \"execution_count\": 26\n    }\n   ],\n   \"source\": [\n    \"[f'{i}: {sentences[i]}' for i in I[0]]\"\n   ]\n  }\n ]\n}"
}
```

So, although we might not get the _perfect_ result, we still get close — and thanks to the approximations, we get a much faster response.

![Query time / number of vectors for our three indexes](https://cdn.sanity.io/images/vr8gru94/production/f0a368ac2ff6372615fef4eb3c30e89bfd54c22d-1280x720.png)


And, as shown in the graph above, the difference in query times become increasingly relevant as our index size increases.

That’s it for this article! We’ve covered the essentials to getting started with building high-performance indexes for search in Faiss.

Clearly, a lot can be done using `IndexFlatL2`, `IndexIVFFlat`, and `IndexIVFPQ` — and each has many parameters that can be fine-tuned to our specific accuracy/speed requirements. And as shown, we can produce some truly impressive results, at lightning-fast speeds very easily thanks to Faiss.

---

**Want to run Faiss in production?** [Pinecone](https://www.pinecone.io/) **provides vector similarity search that’s production-ready, scalable, and fully managed.**

---