AnnouncementPinecone serverless on AWS is now generally availableLearn more


multi-lingual version of the OpenAI CLIP-ViT-B32 model capable of encoding between text and image modalities.
Dimension:Size of a single vector
supported by this model.
768 or 2048
Distance Metric:Used to measure similarity
between vectors.
cosine or dot product
Max Seq. Length:Number of tokens the model
can process at once.


You can map text (in 50+ languages) and images to a common (dense) vector space.

Can also be used for image search and multi-lingual zero-shot image classification (image labels are defined as text).

Using the Model

Load and Encode Images:

Encode Text:

Learn how vector databases work