Models
all-MiniLM-L12-v2
Sentence-transformers model that maps sentences & paragraphs to vector space and can be used for tasks like clustering or semantic search.
Dimension:Size of a single vector supported by this model. | 384 |
Distance Metric:Used to measure similarity between vectors. | cosine, dot product or euclidean |
Max Seq. Length:Number of tokens the model can process at once. | 256 |
Overview
all-MiniLM-L12-v2 is a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks.
all-MiniLM-L12-v2 is a fine-tuned model that uses the pretrained microsoft/MiniLM-L12-H384-uncased model under the hood.
This model is 5x faster than all-mpnet-base-v2, while still offering good quality. It comes from the sbert all family of models.
Using the Model
Installation:
!pip install -U sentence-transformers
Creating Embeddings:
from sentence_transformers import SentenceTransformer
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/all-MiniLM-L12-v2').to(device)
embeddings = model.encode(sentences) # Would do the same w/query
print(embeddings)