Question Answering

This notebook demonstrates how Pinecone’s similarity search as a service lets you build a question answering application. Here will store a set of questions and retrieve the most similar stored questions for a new (unseen) question. That way, we can link a new question to a stored answer.

We will represent questions in a vector embedding representation. In such a way, semantic similarity is translated to proximity in a vector space. Then, we will store the vector representations in Pinecone’s vector index. Finally, given a new question, our similarity search service will retrieve the top similar stored questions. Each of these top stored questions could link to a corresponding answer.

The resulting service inserted 18,000 items per second and retrieved 45 queries per second while running the notebook over the cloud.


!pip install --quiet sentence-transformers
!pip install --quiet swifter
import pandas as pd
import numpy as np
import swifter
%matplotlib inline

Pinecone Installation and Setup

!pip install --quiet -U pinecone-client
import pinecone.graph
import pinecone.service
import pinecone.connector
import pinecone.hub
# load Pinecone API key

api_key = '<YOUR_API_KEY>'

Get a Pinecone API key if you don’t have one already.

Create a New Pinecone Service

# pick a name for the new service
service_name = 'question-answering'
# check whether the service with the same name already exists
if service_name in

Create a graph

graph = pinecone.graph.IndexGraph(metric='cosine', shards=1)


Question answering example

Deploy the graph

pinecone.service.deploy(service_name, graph, timeout=300)
{'success': True, 'msg': ''}

Create the connection to the new service

conn = pinecone.connector.connect(service_name)

Upload Questions

The dataset used in this notebook is the Quora Question Pairs Dataset.

Let’s download the dataset and load data.

# download dataset from the url
--2021-03-26 19:00:21--
Resolving (,,, ...
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58176133 (55M) [text/tab-separated-values]
Saving to: ‘quora_duplicate_questions.tsv.1’

quora_duplicate_que 100%[===================>]  55.48M   182MB/s    in 0.3s

2021-03-26 19:00:22 (182 MB/s) - ‘quora_duplicate_questions.tsv.1’ saved [58176133/58176133]
pd.set_option('display.max_colwidth', 500)

df = pd.read_csv("quora_duplicate_questions.tsv", sep='\t',  usecols=["qid1", "question1"], index_col=False)
df = df.sample(frac=1).reset_index(drop=True)
0301756Does the compression of information reduce entropy?
1218497What makes you most proud of your country?
2125675Which is the best advertise you have seen?
3179397I want to get 200+ in JEE Mains 2017. How should I study from now?
458187What are the effective ways to improve our logical thinking?

Define the model

We will use the Averarage Word Embeddings Model for this example. This model has a high computation speed but the quality of embeddings is worse, compared to some other models often used for sentence embeddings, such as the Sentence Embeddings Models trained on Paraphrases. You may try using this model as well, by replacing the model name below, but keep in mind that it will take more time to calculate question embeddings.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('average_word_embeddings_glove.6B.300d')
/opt/conda/lib/python3.7/site-packages/torch/cuda/ UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

Creating Vector Embeddings

# create embedding for each question
df['question_vector'] = df.question1.apply(lambda x: model.encode(str(x)))


import time
# define items for upload
items_to_upload = [(row.qid1, row.question_vector) for i,row in df.iterrows()]

You can set a parameter for number of items for upload, in case you want to test vector index for smaller number of items.

NUMBER_OF_ITEMS = len(items_to_upload)
# or set a different value
# NUMBER_OF_ITEMS = 100000
upsert_cursor = conn.upsert(items=items_to_upload[:NUMBER_OF_ITEMS])
t0 = time.time()
for i in range(0, NUMBER_OF_ITEMS, BATCH_SIZE):

print("Inserting {:.2} items/second".format(NUMBER_OF_ITEMS / (time.time() - t0)))
Inserting 1.8e+04 items/second

Let’s find out how easy it is to get the most similar questions from the Pinecone vector index. All you need to do is:

  • Define questions
  • Use the model to retrieve embeddings for each question
  • Use Pinecone connection to query the vector index and get the most similar questions
# define questions to query the vector index
query_questions = ['What is best way to make money online?', ]

# extract embeddings for the questions
query_vectors = [model.encode(str(question)) for question in query_questions]

# query pinecone
t0 = time.time()
query_results = conn.query(queries=query_vectors, top_k=5).collect()
print("Querying {:.2} queries per second".format(len(query_questions)/(time.time() - t0)))

# show the results
for question, res in zip(query_questions, query_results):
    print('\n\n\n Original question : ' + str(question))
    print('\n Most similar questions based on Pinecone vector search: \n')

    df_result = pd.DataFrame({'id':res.ids,
                              'question': [df[df.qid1 == int(_id)].question1.values[0] for _id in res.ids],
Querying 4.5e+01 queries per second

 Original question : What is best way to make money online?

 Most similar questions based on pinecone vector search:
057What is best way to make money online?1.000000
1297469What is the best way to make money online?1.000000
255585What is the best way for making money online?0.989930
328280What are the best ways to make money online?0.981526
4157045What is the best way to make money on the internet?0.978538

Turn off the Service

Turn off the service once you are sure that you do not want to use it anymore. Once it is stopped, you cannot reuse the service you had.

{'success': True}