# Using Semantic Search to Find GIFs

> In this article, we will use vector search applied to language, called semantic search, to build a GIF search engine. Unlike more traditional search where we rely on keyword matching, semantic search enables search based on the human meaning behind text and images. That means we can find highly relevant GIFs with natural language prompts.

James Briggs · 2023-06-30

Vector search powers some of the most popular services in the world. It serves your Google results, delivers the [best podcasts on Spotify](https://www.pinecone.io/learn/series/wild/spotify-podcast-search/), and accounts for _at least_ 35% of consumer purchases on Amazon [1][2].

In this article, we will use vector search applied to language, called _semantic_ search, to build a GIF search engine. Unlike more traditional search where we rely on _keyword_ matching, semantic search enables search based on the _human meaning_ behind text and images. That means we can find highly relevant GIFs with natural language prompts.

[Video](https://d33wubrfki0l68.cloudfront.net/6f2e76bbfaef3019cbc629e3bff67c9bb0d4f44b/e2898/images/gif-search-1.mp4)


Preview of the GIF search app [available here](https://share.streamlit.io/pinecone-io/playground/gif-search/src/server.py).

The pipeline for a project like this is simple, yet powerful. It can easily be adapted to tasks as diverse as [video search](https://www.pinecone.io/learn/youtube-search/) or [answering Super Bowl questions](https://www.pinecone.io/learn/series/nlp/question-answering/), or as we’ll see, finding GIFs.

[All supporting notebooks and scripts can be found here](https://github.com/pinecone-io/examples/blob/master/learn/search/semantic-search/gif-search/).

## GIF Dataset

[Video](https://www.youtube.com/watch?v=xXsDIK9z_fg)


We will be using the TGIF dataset found [on GitHub here](https://github.com/raingo/TGIF-Release). To get the dataset we can use `wget` (alternatively, download it manually), and unzip.

```python
wget https://github.com/raingo/TGIF-Release/archive/master.zip

unzip master.zip
```

In these unzipped files we should be able to find a file named `tgif-v1.0.tsv` inside the `data` directory. We’ll use _pandas_ to load the file using `\t` as the field delimiter.

```json
{
  "_key": "bfa8c8f918a5",
  "_type": "colabBlock",
  "jsonContent": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {\n    \"id\": \"K8RvBSYSbvUb\"\n   },\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/html\": [\n       \"<div>\\n\",\n       \"<style scoped>\\n\",\n       \"    .dataframe tbody tr th:only-of-type {\\n\",\n       \"        vertical-align: middle;\\n\",\n       \"    }\\n\",\n       \"\\n\",\n       \"    .dataframe tbody tr th {\\n\",\n       \"        vertical-align: top;\\n\",\n       \"    }\\n\",\n       \"\\n\",\n       \"    .dataframe thead th {\\n\",\n       \"        text-align: right;\\n\",\n       \"    }\\n\",\n       \"</style>\\n\",\n       \"<table border=\\\"1\\\" class=\\\"dataframe\\\">\\n\",\n       \"  <thead>\\n\",\n       \"    <tr style=\\\"text-align: right;\\\">\\n\",\n       \"      <th></th>\\n\",\n       \"      <th>url</th>\\n\",\n       \"      <th>description</th>\\n\",\n       \"    </tr>\\n\",\n       \"  </thead>\\n\",\n       \"  <tbody>\\n\",\n       \"    <tr>\\n\",\n       \"      <th>0</th>\\n\",\n       \"      <td>https://38.media.tumblr.com/9f6c25cc350f12aa74...</td>\\n\",\n       \"      <td>a man is glaring, and someone with sunglasses ...</td>\\n\",\n       \"    </tr>\\n\",\n       \"    <tr>\\n\",\n       \"      <th>1</th>\\n\",\n       \"      <td>https://38.media.tumblr.com/9ead028ef62004ef6a...</td>\\n\",\n       \"      <td>a cat tries to catch a mouse on a tablet</td>\\n\",\n       \"    </tr>\\n\",\n       \"    <tr>\\n\",\n       \"      <th>2</th>\\n\",\n       \"      <td>https://38.media.tumblr.com/9f43dc410be85b1159...</td>\\n\",\n       \"      <td>a man dressed in red is dancing.</td>\\n\",\n       \"    </tr>\\n\",\n       \"    <tr>\\n\",\n       \"      <th>3</th>\\n\",\n       \"      <td>https://38.media.tumblr.com/9f659499c8754e40cf...</td>\\n\",\n       \"      <td>an animal comes close to another in the jungle</td>\\n\",\n       \"    </tr>\\n\",\n       \"    <tr>\\n\",\n       \"      <th>4</th>\\n\",\n       \"      <td>https://38.media.tumblr.com/9ed1c99afa7d714118...</td>\\n\",\n       \"      <td>a man in a hat adjusts his tie and makes a wei...</td>\\n\",\n       \"    </tr>\\n\",\n       \"  </tbody>\\n\",\n       \"</table>\\n\",\n       \"</div>\"\n      ],\n      \"text/plain\": [\n       \"                                                 url  \\\\\\n\",\n       \"0  https://38.media.tumblr.com/9f6c25cc350f12aa74...   \\n\",\n       \"1  https://38.media.tumblr.com/9ead028ef62004ef6a...   \\n\",\n       \"2  https://38.media.tumblr.com/9f43dc410be85b1159...   \\n\",\n       \"3  https://38.media.tumblr.com/9f659499c8754e40cf...   \\n\",\n       \"4  https://38.media.tumblr.com/9ed1c99afa7d714118...   \\n\",\n       \"\\n\",\n       \"                                         description  \\n\",\n       \"0  a man is glaring, and someone with sunglasses ...  \\n\",\n       \"1           a cat tries to catch a mouse on a tablet  \\n\",\n       \"2                   a man dressed in red is dancing.  \\n\",\n       \"3     an animal comes close to another in the jungle  \\n\",\n       \"4  a man in a hat adjusts his tie and makes a wei...  \"\n      ]\n     },\n     \"execution_count\": 3,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"import pandas as pd\\n\",\n    \"\\n\",\n    \"# Load dataset to a pandas dataframe\\n\",\n    \"df = pd.read_csv(\\n\",\n    \"    \\\"./TGIF-Release-master/data/tgif-v1.0.tsv\\\",\\n\",\n    \"    delimiter=\\\"\\\\t\\\",\\n\",\n    \"    names=['url', 'description']\\n\",\n    \")\\n\",\n    \"df.head()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"accelerator\": \"GPU\",\n  \"colab\": {\n   \"collapsed_sections\": [],\n   \"name\": \"gif_search.ipynb\",\n   \"provenance\": []\n  },\n  \"interpreter\": {\n   \"hash\": \"b8e7999f96e1b425e2d542f21b571f5a4be3e97158b0b46ea1b2500df63956ce\"\n  },\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.9.12 ('ml')\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.9.12\"\n  },\n  \"widgets\": {}\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 1\n}"
}
```

The dataset contains GIF URLs and their descriptions in natural language. We can take a look at the first five GIFs.

```json
{
  "_key": "563994d4e182",
  "_type": "colabBlock",
  "jsonContent": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 55,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 577\n    },\n    \"id\": \"m0_jfDW6hl4C\",\n    \"outputId\": \"bcfb0ae3-4c44-4354-e42d-93a3ee35ff2d\"\n   },\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/html\": [\n       \"<img src=https://38.media.tumblr.com/9f6c25cc350f12aa74a7dc386a5c4985/tumblr_mevmyaKtDf1rgvhr8o1_500.gif style='width:120px; height:90px'>\"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 55,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"a man is glaring, and someone with sunglasses appears.\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/html\": [\n       \"<img src=https://38.media.tumblr.com/9ead028ef62004ef6ac2b92e52edd210/tumblr_nok4eeONTv1s2yegdo1_400.gif style='width:120px; height:90px'>\"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 55,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"a cat tries to catch a mouse on a tablet\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/html\": [\n       \"<img src=https://38.media.tumblr.com/9f43dc410be85b1159d1f42663d811d7/tumblr_mllh01J96X1s9npefo1_250.gif style='width:120px; height:90px'>\"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 55,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"a man dressed in red is dancing.\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/html\": [\n       \"<img src=https://38.media.tumblr.com/9f659499c8754e40cf3f7ac21d08dae6/tumblr_nqlr0rn8ox1r2r0koo1_400.gif style='width:120px; height:90px'>\"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 55,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"an animal comes close to another in the jungle\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/html\": [\n       \"<img src=https://38.media.tumblr.com/9ed1c99afa7d71411884101cb054f35f/tumblr_mvtuwlhSkE1qbnleeo1_500.gif style='width:120px; height:90px'>\"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 55,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"a man in a hat adjusts his tie and makes a weird face.\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"for _, gif in df[:5].iterrows():\\n\",\n    \"  HTML(f\\\"<img src={gif['url']} style='width:120px; height:90px'>\\\")\\n\",\n    \"  print(gif[\\\"description\\\"])\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"accelerator\": \"GPU\",\n  \"colab\": {\n   \"collapsed_sections\": [],\n   \"name\": \"gif_search.ipynb\",\n   \"provenance\": []\n  },\n  \"interpreter\": {\n   \"hash\": \"b8e7999f96e1b425e2d542f21b571f5a4be3e97158b0b46ea1b2500df63956ce\"\n  },\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.9.12 ('ml')\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.9.12\"\n  },\n  \"widgets\": {}\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 1\n}"
}
```

We will find that there are some duplicate URLs, but these do not necessarily indicate duplicate records as a _single GIF_ can be assigned multiple descriptions.

```json
{
  "_key": "3fb523cd4dfc",
  "_type": "colabBlock",
  "jsonContent": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"https://38.media.tumblr.com/ddbfe51aff57fd8446f49546bc027bd7/tumblr_nowv0v6oWj1uwbrato1_500.gif    4\\n\",\n       \"https://33.media.tumblr.com/46c873a60bb8bd97bdc253b826d1d7a1/tumblr_nh7vnlXEvL1u6fg3no1_500.gif    4\\n\",\n       \"https://38.media.tumblr.com/b544f3c87cbf26462dc267740bb1c842/tumblr_n98uooxl0K1thiyb6o1_250.gif    4\\n\",\n       \"https://33.media.tumblr.com/88235b43b48e9823eeb3e7890f3d46ef/tumblr_nkg5leY4e21sof15vo1_500.gif    4\\n\",\n       \"https://31.media.tumblr.com/69bca8520e1f03b4148dde2ac78469ec/tumblr_npvi0kW4OD1urqm0mo1_400.gif    4\\n\",\n       \"Name: url, dtype: int64\"\n      ]\n     },\n     \"execution_count\": 6,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"dupes = df['url'].value_counts().sort_values(ascending=False)\\n\",\n    \"dupes.head()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/html\": [\n       \"<img src=https://33.media.tumblr.com/88235b43b48e9823eeb3e7890f3d46ef/tumblr_nkg5leY4e21sof15vo1_500.gif style='width:120px; height:90px'>\"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 7,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"two girls are singing music pop in a concert\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/html\": [\n       \"<img src=https://33.media.tumblr.com/88235b43b48e9823eeb3e7890f3d46ef/tumblr_nkg5leY4e21sof15vo1_500.gif style='width:120px; height:90px'>\"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 7,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"a woman sings sang girl on a stage singing\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/html\": [\n       \"<img src=https://33.media.tumblr.com/88235b43b48e9823eeb3e7890f3d46ef/tumblr_nkg5leY4e21sof15vo1_500.gif style='width:120px; height:90px'>\"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 7,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"two girls on a stage sing into microphones.\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/html\": [\n       \"<img src=https://33.media.tumblr.com/88235b43b48e9823eeb3e7890f3d46ef/tumblr_nkg5leY4e21sof15vo1_500.gif style='width:120px; height:90px'>\"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 7,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"two girls dressed in black are singing.\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"dupe_url = \\\"https://33.media.tumblr.com/88235b43b48e9823eeb3e7890f3d46ef/tumblr_nkg5leY4e21sof15vo1_500.gif\\\"\\n\",\n    \"dupe_df = df[df['url'] == dupe_url]\\n\",\n    \"\\n\",\n    \"# let's take a look at this GIF and it's duplicated descriptions\\n\",\n    \"for _, gif in dupe_df.iterrows():\\n\",\n    \"    HTML(f\\\"<img src={gif['url']} style='width:120px; height:90px'>\\\")\\n\",\n    \"    print(gif[\\\"description\\\"])\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"accelerator\": \"GPU\",\n  \"colab\": {\n   \"collapsed_sections\": [],\n   \"name\": \"gif_search.ipynb\",\n   \"provenance\": []\n  },\n  \"interpreter\": {\n   \"hash\": \"b8e7999f96e1b425e2d542f21b571f5a4be3e97158b0b46ea1b2500df63956ce\"\n  },\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.9.12 ('ml')\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.9.12\"\n  },\n  \"widgets\": {}\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 1\n}"
}
```

With our data, we can move on to building the search pipeline.

## Search

The search pipeline will at a high level take our natural language query like _“a dog talking on the phone”_ and search through the existing GIF descriptions for anything that has a similar _meaning_ to this query.

In this context, we describe _meaning_ as _semantic similarity_, both of which are loaded terms and could refer to many things. For example, are the two phrases `"the dog eats lunch"` and `"the dog does not eat lunch"` similar? In this case, it would depend very much on our use case.

Another example: which of the following two sentences are the most similar?

A: the stock market took a turn for the worse B: how did the stock market do today? C: the stock market performed worse than expected

If we wanted to find phrases with similar meaning then the obvious choice would be `A` and `C`. Matching those with `B` would make little sense. However, this is not the case if we are searching for similar _question-answer pairs_; in that case, `B` should match very closely with `A` and `C`.

It’s important to identify what your use case requires in it’s definition of _“semantic similarity”_. For us, we really want to identify generic similarity. That is, we want `A` and `C` to match, and `B` to not match either of those.

To do this, we will transform our phrases into [dense vector embeddings](https://www.pinecone.io/learn/series/nlp/dense-vector-embeddings-nlp/). These dense vectors can be stored in a [vector database](https://www.pinecone.io/learn/vector-database/) where we can very quickly compare vectors and identify those that are most similar based on metrics like Euclidean distance and cosine similarity.

![Euclidean Cosine](https://cdn.sanity.io/images/vr8gru94/production/9190296c44ceecc2ad22c1ae8e5efae71b956385-1725x724.png)


The vector database handles the storage and fast search of our vector embeddings, but we still need a way to create these embeddings. To do that we use NLP transformer models called _retrievers_ that are [fine-tuned for creating](https://www.pinecone.io/learn/series/nlp/sentence-embeddings/) [sentence embeddings](https://www.pinecone.io/learn/series/nlp/sentence-embeddings/). These sentence embeddings/vectors are able to _numerically represent_ the _meaning_ behind the text that they represent.

![Retriever to Vector Space](https://cdn.sanity.io/images/vr8gru94/production/de76f1da638a1e4941ab3ff000512f2f7446f736-2457x904.png)


Putting these two components together gives us a semantic search pipeline that we can use to retrieve semantically similar GIF descriptions given a query.

[Video](https://d33wubrfki0l68.cloudfront.net/49ec0b831cc1f2c4f3a80863b5c24da15e3c723b/fef88/images/gif-search-4.mp4)


GIF search pipeline covering the one-time indexing step (left) and querying (right).

Let’s take a look at how we can put all of this together.

### Initializing Components

We will start by initializing our retriever model. Many of the most powerful retrievers use a _sentence transformer_ architecture, which are best supported via the `sentence-transformers` library, installed via a `pip install sentence-transformers`.

To find sentence transformer models we go to [huggingface.co/models](https://huggingface.co/models) and search for `sentence-transformers` for the _official_ sentence transformer models. However, there are other models we can use like the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) sentence transformer trained during a special event on [over 1B training pairs](https://discuss.huggingface.co/t/train-the-best-sentence-embedding-model-ever-with-1b-training-pairs/7354). We will use this model.

```json
{
  "_key": "6038cfe7d221",
  "_type": "colabBlock",
  "jsonContent": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 465,\n     \"referenced_widgets\": [\n      \"3bbdb7ac9ddb4e61acbf10e0e322b464\",\n      \"425825b26a384e158608f34c327e7be7\",\n      \"9df1cc108ad74d84a52b25ca0e835197\",\n      \"d9e3f2ddbf5e47ebb86fb20c39079354\",\n      \"fe02dbbb561147198c3278388cb40d04\",\n      \"02851f413fa34d12b412e036d785d38e\",\n      \"1215630e8b53423390353cd56a044c6e\",\n      \"beeffeb083574f0a8c702b6035474073\",\n      \"a40d7b1c14944323937291f6a834061b\",\n      \"2063122bcf15496987749c8cca733a8b\",\n      \"02796d5819bd4b77b56c6f9cab93c908\",\n      \"b69d8608edce436c826f87e269a3b1ec\",\n      \"5e30b575e33843e9b68bc34a84c6e6b2\",\n      \"a6941924044b4956afa6c9b458141007\",\n      \"a2d42ce089df49b896ba9223f6df2ac2\",\n      \"b329d7774a1a4817a4e9cea7adfb288d\",\n      \"4bfd979f8e1a4bd889b846d6affb6d3f\",\n      \"d2088a6432ed4403b9602c5fecb09246\",\n      \"071f6d3e71694ddab08360cabf5e7ecd\",\n      \"f8c2899ce8264b7fa7d5552783c71ad3\",\n      \"850040d37e2b45ecaa53c82958dd9d6a\",\n      \"08779212fc3c46629ea7b5ae778b282b\",\n      \"97b11b05410f4b0682a223efd6eeb570\",\n      \"b5732b4221f448a8b8ae2363849214eb\",\n      \"c7c897d4905f4af28a4e0e8e72897f74\",\n      \"9066e6a9f5384ceaba6d58353c3e0a6a\",\n      \"5085c2de59b34ffe8902285e42f32401\",\n      \"8387420e74e94768893a7415243d4a12\",\n      \"b3fa9e20d0524f0bbbe65575ccd75126\",\n      \"24d25ddd24084a49b47f02138db4b592\",\n      \"b4e62b032b884838b46bcc6b62c0c3ca\",\n      \"a54f0fc3dc184e20bf7660ddbf890a76\",\n      \"b66d0aedfb414d4c8cac45e0f8d157ac\",\n      \"d6e028d4612241fba005c9dd3fe12d30\",\n      \"a765f4490ae44f959bc594bb3b3b719a\",\n      \"0496b4ff5eec45f08c5a1fa1ef106777\",\n      \"976ba55ccfc0419d9b357d83eb4e4ab9\",\n      \"de5a4a6c3495413fa200ab92d876f509\",\n      \"4fb7a30442f94068bdb7c9e434cd03b6\",\n      \"242c1203c62b42b9ae32af653267f8c7\",\n      \"780b68a97f184b8cb7a04e3412d050ab\",\n      \"f6685ce8e5a843f0b856c64a5b71c7ee\",\n      \"6d8f66edc20543b9baae3b474201861e\",\n      \"7d4faf89e1a34c3592a8d8b5c332cf9a\",\n      \"aaff54fb4ed64fbba356616ebad675c3\",\n      \"1b85eb15f8584fd5afceddd1ea3ea2e0\",\n      \"ba4b82c10e804d3ea76592f2d2a3832c\",\n      \"6d2b705a90a447579864e91618f70676\",\n      \"a0d67c99dee248f280cbe34fc3de48a6\",\n      \"42ca2987c03f46e9985a98b4a137f95a\",\n      \"3d044c94cbd741adbf826306bd882971\",\n      \"9a612735a6b94182bafe9a0b42648455\",\n      \"3965cf39aef641dbb2d0b2a362e1c8c2\",\n      \"369f1ee2f1844327b3074a31c0e12519\",\n      \"0e1839f9d1fc43b18f84d0f554a54cc2\",\n      \"272faddd67724961a8bdcccba6b196b0\",\n      \"922e13c3b6944e65a04d96147dd3c9ad\",\n      \"aac3d96ceb444486b6f7f760b43ffc79\",\n      \"04a7db113b11418ab275879f0f8bf162\",\n      \"fd337d66d3a14867ae76908cd58e313b\",\n      \"462563ff268e47d5a8c9efffe09604ed\",\n      \"abbefd5572204fb4b3c0bf7c5597eb8e\",\n      \"69e75d6d62494209b2ee828d1710a59e\",\n      \"45240e5ad3a6426bbfc5e3c1880ac781\",\n      \"67dfd1d8ff4043239f31bb3a53ce4ccc\",\n      \"6aa46bc05e6b4a139009a4124b0f80ec\",\n      \"9ab75ac0bf404362ab25e898b172110c\",\n      \"005d8360dc824d1f869d0eca3b3ec9fd\",\n      \"aa556238d9174c8e83187d079f229b97\",\n      \"65e7a017905442ae98c6415bcdfe0bb4\",\n      \"97302900f6604d7991df8c9854d95728\",\n      \"0ebe49597de647a5a4da64d03c471b76\",\n      \"40e00bdafe664e6082d5fc4392cbc4bc\",\n      \"32719d55aa0e49d7bb65e7755fc2f572\",\n      \"24eb84536c324805bf159da81d8ad509\",\n      \"6565c0393fee4725abb04e245a9a6fb0\",\n      \"2fe6b2e1c56b48fe9b056ee8d0d3697c\",\n      \"e6e9f5860acc4fdda68eafe6abbc99db\",\n      \"7f84fd3a40bb40dbb742101403e671c9\",\n      \"48e79feebdc043adae21f571319493db\",\n      \"2d07707632dc4c60bca7afbbeb928ca6\",\n      \"71600400fa14406bb81e2e410d116681\",\n      \"2540fac951cd4883833b586558c9081e\",\n      \"759f82584cc84eedbbf8fb6142b0d657\",\n      \"693abe44ba9a463ca9d08e1b6c54673b\",\n      \"024be013b90d4a309b2f54d979f1269a\",\n      \"0097b88603d8424f8fc1696b17bd165e\",\n      \"a1d9893a7c7345a49cd336610e0ba3f5\",\n      \"fefd20a4306b4dad840a79038db4c0a3\",\n      \"c985a33ed0064c589e78a72080c382e7\",\n      \"4e37e65238634a97a91e8449e24a99c2\",\n      \"0bb8029a6b2e49de98586fa091033387\",\n      \"fd631c1a5515475a873dceb33e21ac7f\",\n      \"2b86f4f2dbf34220ab607cb010ad3ef7\",\n      \"8da6bd57b0f44a65be01086f33313f2a\",\n      \"24f2ae769e2c4d829dffa14ca8928d3b\",\n      \"1e64de324773474d994258dbe9ed736f\",\n      \"c7f564bd62324a048511ff82b8d4c369\",\n      \"71e6d768200b48ddab9375d0266c6756\",\n      \"fbd414ab7af64c2f840d19d1315085c4\",\n      \"f2983bd01cab40469306b9bdea3b3b19\",\n      \"825ae4cc8891473f8feb1b2820b0c0aa\",\n      \"8d8f963d9b6c409d8c21e307958013d8\",\n      \"8f2d54ebde5341808e65a705f15ba036\",\n      \"1f7356b83d474a0f8ed49a4a5fae5d22\",\n      \"55e099fba0fb41f999f690884f80843a\",\n      \"2f0edbcbec1d44378f4bc17535a87b76\",\n      \"6c709e1a7d174633b603df7bf0804756\",\n      \"8b88e836a5884b3db7748d5971463e62\",\n      \"9cb993dbaa5746249fed2df7d8db677c\",\n      \"87b1f938065644cba745c0b0952526b5\",\n      \"9d46acccbf1347c1bfd7b61e84591812\",\n      \"8f240b5dc5274d63bb23ba9979d47163\",\n      \"783b444819ab4454bceb16578c71df39\",\n      \"b1c876108e6b4150bf3638fad1a56157\",\n      \"6afdd5eede004470a8c7eb8709f6182b\",\n      \"f4ae6887074f443299e95e6cebf971ce\",\n      \"b3fbdd74fd764c65a25817690c961de6\",\n      \"d141eee9a0f14702a82cfe1b194f9d89\",\n      \"eeffd24564624122afc795be6bd1131b\",\n      \"8075cad347314feabdb19d6902752b27\",\n      \"b3c2f3d5b805460496a74f2009aaf81c\",\n      \"4247478ac3b742a7a9407bc69ed179e6\",\n      \"d3abd49739b2493b8d52a062e90ae3d6\",\n      \"af1cdb5fdb31467aa54bc5b9c1ec255d\",\n      \"4124c49ef7d9497187b8e29af694d54f\",\n      \"1e7f1c9aa28e4dcd80f6b9c6b9c06fd6\",\n      \"73c159f67bec448aa102470f1abd0ca7\",\n      \"393b7df64cbd4072acf090f3dee2108c\",\n      \"f2c52012dfd84e39851e297d56669213\",\n      \"665cc77211d0466cbb2d4a2a95547547\",\n      \"70ca64c48bad4c42ba0208eecfc3e40e\",\n      \"176a01c7622143c38482371fe2f6cfaf\",\n      \"3bf92cfae4b547ce9f91e6b4dd94f25a\",\n      \"493c065a83ad48fc921b5b11c59c8fed\",\n      \"c0b2ebd2a44845e885863ee18a4e121e\",\n      \"c070c05c63ef48908126fac009b33ea4\",\n      \"33c35634d79f4848a32e6bd46cb1a75b\",\n      \"c0caab84bfcb4bb69ed70c73e5d20b59\",\n      \"95fa10da3a1742ea9fb3ae3bf948618a\",\n      \"00ba06783c7b484f91aceefd8be27782\",\n      \"8b9224015a1a466f982f6d51e1a8591d\",\n      \"e71eff0e7d2d4e6a9949c29395ac947d\",\n      \"a05a15c5b5a34a5cbc7a0647a01826fb\",\n      \"b4f4e484c3c94424a37f6746a590fffd\",\n      \"e993d4821d984becbb06458f62252f1d\",\n      \"93ba13309f954c669989147c0823c06b\",\n      \"4c5388f0293f4bf384332ba0bc775b1c\",\n      \"4d2d0d8ce7594980aef48885259dafef\",\n      \"da474d5e97ef4947936ca0dd34a30126\",\n      \"d0f5166f81f141a88d282b4eb322ec4c\",\n      \"98bd1a3b82e74f8ab86645815d8ca526\",\n      \"8a06eaff96f94f2fb8f95fc92ef87651\",\n      \"123ad9cff4a74609a452474d73c6738c\"\n     ]\n    },\n    \"id\": \"UB0rVxmppnkm\",\n    \"outputId\": \"cd8ed4e8-69a0-4ce9-c974-cda0dca998a8\",\n    \"scrolled\": true\n   },\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"SentenceTransformer(\\n\",\n       \"  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel \\n\",\n       \"  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})\\n\",\n       \"  (2): Normalize()\\n\",\n       \")\"\n      ]\n     },\n     \"execution_count\": 11,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"from sentence_transformers import SentenceTransformer\\n\",\n    \"\\n\",\n    \"# Initialize retriever with SentenceTransformer model \\n\",\n    \"retriever = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')\\n\",\n    \"retriever\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"accelerator\": \"GPU\",\n  \"colab\": {\n   \"collapsed_sections\": [],\n   \"name\": \"gif_search.ipynb\",\n   \"provenance\": []\n  },\n  \"interpreter\": {\n   \"hash\": \"b8e7999f96e1b425e2d542f21b571f5a4be3e97158b0b46ea1b2500df63956ce\"\n  },\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.9.12 ('ml')\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.9.12\"\n  },\n  \"widgets\": {}\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 1\n}"
}
```

There are a couple of important details here:

- `max_sequence_length=128` means the model can read up to _128_ input tokens.
- `word_embedding_size=384` actually refers to the _sentence_ embedding size. This means the model will output a 384-dimensional vector representation of the input text.

For the short several-word GIF descriptions of our dataset a maximum sequence length of _128_ is _more than enough_.

We need to use the _sentence_ embedding size when initializing our vector database, so we store that in the `embed_dim` variable above.

To initialize our vector database we first need to sign up for a [free Pinecone API key](https://app.pinecone.io/) and install the Pinecone Python client via `pip install pinecone-client`. Once ready, we initialize:

```json
{
  "_key": "6ca8db46b2b7",
  "_type": "colabBlock",
  "jsonContent": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 38,\n   \"metadata\": {\n    \"id\": \"Ngbs8wQQoePL\"\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import pinecone\\n\",\n    \"\\n\",\n    \"# Connect to pinecone environment\\n\",\n    \"pinecone.init(\\n\",\n    \"    api_key=\\\"YOUR_API_KEY\\\",\\n\",\n    \"    environment=\\\"YOUR_ENV\\\"  # find next to API key\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"index_name = 'gif-search'\\n\",\n    \"\\n\",\n    \"# check if the gif-search exists\\n\",\n    \"if index_name not in pinecone.list_indexes():\\n\",\n    \"    # create the index if it does not exist\\n\",\n    \"    pinecone.create_index(\\n\",\n    \"        index_name,\\n\",\n    \"        dimension=384,\\n\",\n    \"        metric=\\\"cosine\\\"\\n\",\n    \"    )\\n\",\n    \"\\n\",\n    \"# Connect to gif-search index we created\\n\",\n    \"index = pinecone.Index(index_name)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"accelerator\": \"GPU\",\n  \"colab\": {\n   \"collapsed_sections\": [],\n   \"name\": \"gif_search.ipynb\",\n   \"provenance\": []\n  },\n  \"interpreter\": {\n   \"hash\": \"b8e7999f96e1b425e2d542f21b571f5a4be3e97158b0b46ea1b2500df63956ce\"\n  },\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.9.12 ('ml')\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.9.12\"\n  },\n  \"widgets\": {}\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 1\n}"
}
```

Here, we are specifying an index name of `'gif-search'`; feel free to choose anything you like. It is simply a name. The `metric` is more important and depends on the model being used. For our chosen model we can see in its [model card](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) that it has been trained to use _cosine similarity_, hence why we have specified `metric='cosine'`. Alternative metrics include `euclidean` and `dotproduct`.

We’ve initialized both our vector database and retriever components, so we can move on to embedding and indexing our data.

### Indexing

The embedding and indexing process is much faster when we perform these steps for multiple records _in parallel_. However, we cannot process all of our records at once as the retriever model must shift everything it is embedding into on-chip memory, which is limited.

To avoid this limit, while keeping indexing times as fast as possible, we process everything in batches of `64`.

```json
{
  "_key": "7471ce824ee2",
  "_type": "colabBlock",
  "jsonContent": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 18,\n   \"metadata\": {\n    \"colab\": {\n     \"base_uri\": \"https://localhost:8080/\",\n     \"height\": 49,\n     \"referenced_widgets\": [\n      \"fd9832ba430449d6aefdf4f56603751a\",\n      \"f18bf9eeb43f4dbcb3ad0a237da6308b\",\n      \"8493032e71ed4437b1f221c6fcaac709\",\n      \"702ec6e219d74d99a6c929dd65f549ec\",\n      \"94843930b2ab44438afe2875c44fff96\",\n      \"304c08deba6248da82971665e0caa20b\",\n      \"10e79e8c05db4b36b5103116d1210587\",\n      \"3cc568020c5a4f34992eec0d195ae4f9\",\n      \"08d4c47abb954cc7b277769cb86bb9a9\",\n      \"ef45e711623c4f9ba27743de2db32e07\",\n      \"9752dafd1edb478fba9060f29882c47b\"\n     ]\n    },\n    \"id\": \"TdwtDfIxw7Ik\",\n    \"outputId\": \"32811e16-1934-474d-bd82-a12fccf29cf9\"\n   },\n   \"outputs\": [\n    {\n     \"data\": {\n      \"application/vnd.jupyter.widget-view+json\": {\n       \"model_id\": \"e1de609cb50e40cf953d312813e3c9d0\",\n       \"version_major\": 2,\n       \"version_minor\": 0\n      },\n      \"text/plain\": [\n       \"  0%|          | 0/1966 [00:00<?, ?it/s]\"\n      ]\n     },\n     \"metadata\": {},\n     \"output_type\": \"display_data\"\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"{'dimension': 384,\\n\",\n       \" 'index_fullness': 0.05,\\n\",\n       \" 'namespaces': {'': {'vector_count': 125782}}}\"\n      ]\n     },\n     \"execution_count\": 18,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"from tqdm.auto import tqdm\\n\",\n    \"\\n\",\n    \"# we will use batches of 64\\n\",\n    \"batch_size = 64\\n\",\n    \"\\n\",\n    \"for i in tqdm(range(0, len(df), batch_size)):\\n\",\n    \"    # find end of batch\\n\",\n    \"    i_end = min(i+batch_size, len(df))\\n\",\n    \"    # extract batch\\n\",\n    \"    batch = df.iloc[i:i_end]\\n\",\n    \"    # generate embeddings for batch\\n\",\n    \"    emb = retriever.encode(batch['description'].tolist()).tolist()\\n\",\n    \"    # get metadata\\n\",\n    \"    meta = batch.to_dict(orient='records')\\n\",\n    \"    # create IDs\\n\",\n    \"    ids = [f\\\"{idx}\\\" for idx in range(i, i_end)]\\n\",\n    \"    # add all to upsert list\\n\",\n    \"    to_upsert = list(zip(ids, emb, meta))\\n\",\n    \"    # upsert/insert these records to pinecone\\n\",\n    \"    _ = index.upsert(vectors=to_upsert)\\n\",\n    \"\\n\",\n    \"    \\n\",\n    \"# check that we have all vectors in index\\n\",\n    \"index.describe_index_stats()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"accelerator\": \"GPU\",\n  \"colab\": {\n   \"collapsed_sections\": [],\n   \"name\": \"gif_search.ipynb\",\n   \"provenance\": []\n  },\n  \"interpreter\": {\n   \"hash\": \"b8e7999f96e1b425e2d542f21b571f5a4be3e97158b0b46ea1b2500df63956ce\"\n  },\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.9.12 ('ml')\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.9.12\"\n  },\n  \"widgets\": {}\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 1\n}"
}
```

Here we are extracting the `batch` from our data `df`. We encode the descriptions via our retriever model, create metadata (covering both _descriptions_ and _url_), and create some string format IDs. From this we have everything we need to create _documents_, which will look like this:

```python
(
       "some-id-value",
   [0.1, 0.2, 0.1, 0.4 ...],
   {
       'description': "something descriptive",
       'url': "https://xyz.com"
   }
)
```

When we `upsert` these _documents_ to the Pinecone index, we do so in batches of _64_. After all of this, we use `index.describe_index_stats()` to check that we have inserted all _125,782_ documents, which we have.

### Querying

The final step of querying our data covers:

1. Encoding a query like _“dogs talking on the phone”_ to create a _query vector_,
2. Retrieval of similar _context vectors_ from Pinecone,
3. Getting relevant GIFs from the URLs found in our metadata fields.

[Video](https://d33wubrfki0l68.cloudfront.net/c2a30960fdb3e98ee8d7537eb1efec3f9e27e348/6daf9/images/gif-search-5.mp4)


Steps _one_ and _two_ will be performed by a function named `search_gif`:



```python
def search_gif(query):
    # Generate embeddings for the query
    xq = retriever.encode(query).tolist()
    # Compute cosine similarity between query and embeddings vectors and return top 10 URls
    xc = index.query(xq, top_k=10,
                    include_metadata=True)
    result = []
    for context in xc['matches']:
        url = context['metadata']['url']
        result.append(url)
    return result
```

To display the GIFs we display HTML `<img>` elements using the metadata URLs to point to the correct GIFs. We do this using the `display_gif` function:

```python
def display_gif(urls):
    figures = []
    for url in urls:
        figures.append(f'''
            <figure style="margin: 5px !important;">
              <img src="{url}" style="width: 120px; height: 90px" >
            </figure>
        ''')
    return HTML(data=f'''
        <div style="display: flex; flex-flow: row wrap; text-align: center;">
        {''.join(figures)}
        </div>
    ''')
```

Let’s test some queries.

```json
{
  "_key": "101edb0d9e68",
  "_type": "colabBlock",
  "jsonContent": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 52,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/html\": [\n       \"\\n\",\n       \"        <div style=\\\"display: flex; flex-flow: row wrap; text-align: center;\\\">\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/af53df8d946bbca23be97691db0ecd5e/tumblr_nq3l305zdF1s71nvbo1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://33.media.tumblr.com/a574ab035e7edc7708db423ee67f3ac4/tumblr_nq1zodZJNx1uoke7ao1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/94703ea885174ffc97c44d57487d7ee9/tumblr_na6oo2PKSC1silsr6o1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/fa6a31e326066bb27776066150c8c810/tumblr_np38ipgJPd1tkkgpso1_250.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/241d89939a5714c2db4566d9108245fe/tumblr_n9xv6aqQ5A1qmgppeo1_250.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://31.media.tumblr.com/a00ae69f826dbe89a5bdabad567ac88d/tumblr_n8x5e6ZcFW1sjpl9lo1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://31.media.tumblr.com/28a9aac3c21941e1c61dd9ab4390c3f5/tumblr_nhdr3clKDa1sntw1mo1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://31.media.tumblr.com/5cbd531e1d8cc7fefffdb8a68ec62b1d/tumblr_naysx8YTzn1tzl1owo1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/bc300fcbae8e4eb65c3901a246f46e4c/tumblr_niu5dzNP7G1u62tooo1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/1c3edb33951b52020b9271185942b2b2/tumblr_nflm4phy0P1u4txqeo1_250.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"        </div>\\n\",\n       \"    \"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 52,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"gifs = search_gif(\\\"a dog being confused\\\")\\n\",\n    \"display_gif(gifs)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 40,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/html\": [\n       \"\\n\",\n       \"        <div style=\\\"display: flex; flex-flow: row wrap; text-align: center;\\\">\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://33.media.tumblr.com/73841eb3b37ad5277b324359a83bb19e/tumblr_ngnz25VQpD1twctp1o1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/7b8ebff7051b8a7d0502294465559861/tumblr_na8n60gCmT1tiamw8o1_500.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/49223a5564c8d7dfafe115063ba88c8a/tumblr_nnrps82EvG1sxvevjo1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://31.media.tumblr.com/aa9c98f92f06cc3484ae395194db6d7f/tumblr_naeyc6yQWy1tahfdeo1_250.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/e7a1d7ed5f2289db13e1812a91c0eedf/tumblr_nf8r2nWajt1s236zjo1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/7d8f9cac33b4fc76908a37bf28ab6fca/tumblr_noswtqDMKC1tyncywo1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/136d1d103edf3a82c2332bf8ef28d6d3/tumblr_nhm8rleyTk1u333yco2_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://33.media.tumblr.com/0f97de4f3cc8dca408ca4ab036460412/tumblr_njmp6tj53K1thqmhto1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://31.media.tumblr.com/be2b34de9ff751da15cbde3144d25007/tumblr_nh4oj86LJO1slj978o1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/5f45e9a56121b070ddceca58b37e9ace/tumblr_njaggwVmdn1un7vpco1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"        </div>\\n\",\n       \"    \"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 40,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"gifs = search_gif(\\\"animals being cute\\\")\\n\",\n    \"display_gif(gifs)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 49,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/html\": [\n       \"\\n\",\n       \"        <div style=\\\"display: flex; flex-flow: row wrap; text-align: center;\\\">\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/a67f2f007b9881080aa3fe3584847bc5/tumblr_nc1wzyMaJP1tzj4j8o1_250.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://33.media.tumblr.com/2bf0f300d9ecfbcedf2dd3ba2b40b5e5/tumblr_ne5p2oTuCj1tdmffyo1_250.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://33.media.tumblr.com/ec768e8a6f881fbc0f329932c8591a88/tumblr_mpqwb14Fsq1rjcfxro1_250.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://33.media.tumblr.com/7ada83ae354be1d83ea4407fea789ab8/tumblr_na0e6razjV1s71nvbo1_250.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/f8b6d3d79b59462019c2daf2ba8b4148/tumblr_np762bxBYV1t7jda2o1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/a5ae79c2d62c592d7565684a72af8f2c/tumblr_nageslBNqC1tstoffo1_500.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/3e1f37fea789bb1508d40e8c30f791ae/tumblr_na3xfcUdnK1tiamx1o1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://33.media.tumblr.com/aec9cbbdf826f98307e6d5f3d544a4c2/tumblr_mmlrbhGDAO1qaqutao1_500.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://38.media.tumblr.com/0b04187cb51a8889b0f41e5fbe390df2/tumblr_nbcltiDvcq1s7ri4yo1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"            <figure style=\\\"margin: 5px !important;\\\">\\n\",\n       \"              <img src=\\\"https://33.media.tumblr.com/14f9b213a7355096c14b0af3a7768f5d/tumblr_npexfuFU2K1ti77bgo1_400.gif\\\" style=\\\"width: 120px; height: 90px\\\" >\\n\",\n       \"            </figure>\\n\",\n       \"        \\n\",\n       \"        </div>\\n\",\n       \"    \"\n      ],\n      \"text/plain\": [\n       \"<IPython.core.display.HTML object>\"\n      ]\n     },\n     \"execution_count\": 49,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"gifs = search_gif(\\\"a fluffy dog being cute and dancing like a person\\\")\\n\",\n    \"display_gif(gifs)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"accelerator\": \"GPU\",\n  \"colab\": {\n   \"collapsed_sections\": [],\n   \"name\": \"gif_search.ipynb\",\n   \"provenance\": []\n  },\n  \"interpreter\": {\n   \"hash\": \"b8e7999f96e1b425e2d542f21b571f5a4be3e97158b0b46ea1b2500df63956ce\"\n  },\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.9.12 ('ml')\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.9.12\"\n  },\n  \"widgets\": {}\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 1\n}"
}
```

That looks pretty accurate so we’ve managed to put this GIF search pipeline together very easily. With a [little added effort](https://github.com/pinecone-io/examples/blob/master/learn/search/semantic-search/gif-search/app.py) we can translate these steps in creating a web app using something like [Streamlit](https://www.youtube.com/watch?v=QpISF8gMsjQ).

We’ve successfully built a GIF search tool using a simple semantic search pipeline with out-of-the-box models and Pinecone. This same pipeline can be applied across a variety of domains with very little tweaking.

The ease-of-use and potential of both vector and semantic search have led to a lot of research and applications of both technologies beyond the world of big tech. If you’re interested in seeing other applications of this technology, or would like to share your own, considering joining the [Pinecone community](https://community.pinecone.io/).

## Resources

[Article Notebooks and Scripts](https://github.com/pinecone-io/examples/blob/master/learn/search/semantic-search/gif-search/)

[1] M. Osborne, [How Retail Brands Can Compete And Win Using Amazon’s Tactics](https://www.forbes.com/sites/forbesagencycouncil/2017/12/21/how-retail-brands-can-compete-and-win-using-amazons-tactics/) (2017), Forbes

[2] L. Hardesty, [The history of Amazon’s recommendation algorithm](https://www.amazon.science/the-history-of-amazons-recommendation-algorithm) (2019), Amazon Science