Getting Started with Hybrid Search

Vector search has unlocked the door to another level of relevance and efficiency in information retrieval. In the past year, the number of vector search use cases has exploded, showing no signs of slowing down.

The capabilities of vector search are impressive, but it isn’t a perfect technology. In fact, without big domain-specific datasets to fine-tune models on, a traditional search still has some advantages.

⚠️ Hybrid search is currently in private preview, sign up for access here!

We repeatedly see that vector search unlocks incredible and intelligent retrieval but struggles to adapt to new domains. Whereas traditional search can cope with new domains but is fundamentally limited to a set performance level.

Both approaches have pros and cons, but what if we merge them somehow to eliminate a few of those cons? Could we create a hybrid search with the heightened performance potential of vector search and the zero-shot adaptability of traditional search?

Today, we will learn how to take our search to a new level. Taking both vector and traditional search and merging them via Pinecone’s new hybrid search.

Out of Domain Datasets

Vector search or dense retrieval has been shown to significantly outperform traditional methods when the embedding models have been fine-tuned on the target domain. However, this changes when we try using these models for “out-of-domain” tasks.

That means if we have a large amount of data covering a specific domain like “Medical question-answering”, we can fine-tune an embedding model. With that embedding model, we can create dense vectors and get outstanding vector search performance.

The problem is if we don’t have data. In this scenario, a pretrained embedding model may perform better than traditional BM25, but it is unlikely. Giving us a best-case performance of BM25, an algorithm that we cannot fine-tune and cannot provide intelligent human-like retrieval.

If we want better performance, we’re left with two options; (1) annotate a large dataset to fine-tune the embedding model, or (2) use hybrid search.

Combining dense and sparse search takes work. In the past, engineering teams needed to run different solutions for dense and sparse search engines and another system to combine results in a meaningful way. Typically a dense vector index, sparse inverted index, and reranking step.

The Pinecone approach to hybrid search uses a single hybrid index. It enables search across any modality; text, audio, images, etc. Finally, the weighting of dense vs. sparse can be chosen via the alpha parameter, making it easy to adjust.

How does a hybrid search pipeline look?


High-level view of a simple hybrid search pipeline.

Everything within the dotted lines is handled by Pinecone’s hybrid index. But before we get there, we still need to create dense and sparse vector representations of our input data.

Let’s take a look at how we can do that.

The first step in a hybrid search implementation is preparing a dataset. We will use the pubmed_qa dataset on Hugging Face Datasets. We download it like so:

from datasets import load_dataset  # !pip install datasets
pubmed = load_dataset(
   features: ['pubid', 'question', 'context', 'long_answer', 'final_decision'],
   num_rows: 1000

The context feature is what we will store in Pinecone. Each context record contains multiple contexts within a list. Many lack real significance alone, so we will join them to create larger contexts.

contexts = []
# loop through the context passages
for record in pubmed['context']:
   # join context passages for each question and append to contexts list
# view some of the contexts
for context in contexts[:2]:
Programmed cell death (PCD) is the regulated death of cells within an organism. The lace plant (Aponogeton madagascariensis) produces perforations in its leaves through PCD. The leaves of the plant consist of a latticework of longitudinal and transverse veins enclosing areoles. PCD occurs in the cel...
Assessment of visual acuity depends on the optotypes used for measurement. The ability to recognize different optotypes differs even if their critical details appear under the same visual angle. Since optotypes are evaluated on individuals with good visual acuity and without eye disorders, differenc...

We can see the highly-technical language contained within each context. An out-of-the-box model will typically struggle with this domain-specific language, making this an ideal use-case for hybrid search.

Let’s move on to building the sparse and dense vectors.

Sparse Vectors

Several methods exist for building sparse vector embeddings, from the latest sparse embedding transformer models like SPLADE to rule-based tokenization logic.

We will stick with a more straightforward tokenization approach to keep things simple. Like the BERT tokenizer hosted by Hugging Face Transformers.

from transformers import BertTokenizerFast  # !pip install transformers

# load bert tokenizer from huggingface
tokenizer = BertTokenizerFast.from_pretrained(

To tokenize a single context, we can do this:

# tokenize the context passage
inputs = tokenizer(
   contexts[0], padding=True, truncation=True,
dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])

The output from this includes a few arrays that are all important when using transformer models. As we’re doing tokenization only, we need the input_ids.

input_ids = inputs['input_ids']
[101, 16984, 3526, 2331, 1006, 7473, 2094, ...]

These input IDs represent a unique word or sub-word token translated into integer ID values. This transformation is done using the BERT tokenizer’s rule-based tokenization logic.

Pinecone expects to receive sparse vectors in dictionary format. For example, the vector:

[0, 2, 9, 2, 5, 5]

Would become:

       "0": 1,
       "2": 2,
       "5": 2,
       "9": 1

Each token is represented by a single key in the dictionary, and its frequency is counted by the respective key-value. We apply the same transformation to our input_ids like so:

from collections import Counter

# convert the input_ids list to a dictionary of key to frequency values
sparse_vec = dict(Counter(input_ids))
{101: 1,
16984: 1,
3526: 2,
2331: 2,
1006: 10,

We can reformat all of this logic into two functions; build_dict to transform input IDs into dictionaries and generate_sparse_vectors to handle the tokenization and dictionary creation.

def build_dict(input_batch):
 # store a batch of sparse embeddings
   sparse_emb = []
   # iterate through input batch
   for token_ids in input_batch:
       # convert the input_ids list to a dictionary of key to frequency values
       d = dict(Counter(token_ids))
       # remove special tokens and append sparse vectors to sparse_emb list
           key: d[key] for key in d if key not in [101, 102, 103, 0]
   # return sparse_emb list
   return sparse_emb
 def generate_sparse_vectors(context_batch):
   # create batch of input_ids
   inputs = tokenizer(
           context_batch, padding=True,
   # create sparse dictionaries
   sparse_embeds = build_dict(inputs)
   return sparse_embeds

We also remove special tokens 101, 102, 103, and 0. These are all tokens explicitly required by the BERT transformer model but have no meaning when building our sparse vectors.

This code is all we need to build our sparse vectors, but as usual, we still need to create dense vectors.

Dense Vectors

Our dense vectors are comparatively simple to generate. We initialize a multi-qa-MiniLM-L6-cos-v1 sentence transformer model and encode the same context as before like so:

# !pip install sentence-transformers
from sentence_transformers import SentenceTransformer

# load a sentence transformer model from huggingface
model = SentenceTransformer(

emb = model.encode(contexts[0])
(1, 384)

The model gives us a 384 dimensional dense vector. We can move on to upserting the full dataset with both sparse and dense vectors.


Our upsert operation is almost identical, with the exception that; we are pointing our requests to the /hybrid/vectors/upsert endpoint rather than /vectors/upsert, and our upsert includes an additional sparse_values parameter.

The private preview of hybrid search does not include a Python Pinecone client so we must interact with the Pinecone API directly. To keep things simple we wrote a temporary Pinecone client class for hybrid search.

We then initialize our hybrid index like so:

# choose a name for your index
index_name = "hybrid-test"
# create the index
   index_name = index_name,
   dimension = 384,
   metric = "dotproduct",
   pod_type = "s1h"

Note that we use a hybrid s1 pod type by specifying s1h and all hybrid indexes are currently restricted to the dotproduct similarity metric.

With all of that ready, we can begin adding all of our data to the hybrid index like so:

From describe_index_stats, we should see that 1000 records have been added. With that, we can move on to querying the new index.

Making Queries

Queries remain very similar to pure dense vector queries, with the exception being that we must include a sparse vector version of our query — alongside the typical dense vector representation.


Queries are made to the /hybrid/query endpoint with both dense and sparse vector embeddings.

We can use the earlier generate_sparse_vectors function to build the sparse vector. We will wrap the encode and query operations into a single hybrid_query function to keep queries simple.

def hybrid_query(question, top_k, alpha):
   # convert the question into a sparse vector
   sparse_vec = generate_sparse_vectors([question])
   # convert the question into a dense vector
   dense_vec = model.encode([question]).tolist()
   # set the query parameters to send to pinecone
   query = {
     "topK": top_k,
     "vector": dense_vec,
     "sparseVector": sparse_vec[0],
     "alpha": alpha,
     "includeMetadata": True
   # query pinecone with the query parameters
   result = pinecone.query(query)
   # return search results as json
   return result

Now we query like so:

How can we assess the impact of hybrid search vs. vector search with these results? We use the new alpha parameter that can be used while making queries.

The alpha parameter controls the weighting between the dense and sparse vector search scores. By default, this is set to 0.5, making any results a pure hybrid search.

Above we performed a pure dense vector search by using an alpha of 1.

With a full vector search, we get the same ranking of results. However, the “best” context (711) is currently in position two. We can modify the alpha parameter to try and improve this result.

Using an alpha of 0.3 improves the results and returns the best context (711) as the top result.

That’s it for our fast introduction to hybrid search and how we can implement it with Pinecone. With this, we can reap the benefits of dense vector search while sidestepping its out-of-domain pitfalls.

If you’d like to get started with hybrid search, it is available via private preview with Pinecone. Get in touch for early access!


What will you build?

Upgrade your search or recommendation systems with just a few lines of code, or contact us for help.