Abstract
Vector databases enable semantic search over large, dynamic datasets, supporting complex queries that combine vector similarity with metadata constraints. They are increasingly used in retrieval-augmented generation (RAG) systems, where accurate filtering over metadata– such as document type, user context, or recency– is essential to response quality. In serverless settings, where compute and storage are fully decoupled, this becomes especially challenging: data is continuously inserted and deleted, and metadata may be updated independently of the vector index, yet filters must be applied accurately and efficiently at query time.
This paper presents the design of metadata filtering in Pinecone’s serverless vector database, which achieves high accuracy by integrating filtering into the vector retrieval path. Our architecture leverages immutable vector slabs organized in an LSM-tree structure in object storage, with stateless, on-demand executors that require novel coordination mechanisms to maintain correctness without tight coupling. We formalize accuracy through exact filter recall metrics and analyze two fundamental filter interaction paradigms: ad-hoc application versus pre-computed filter representations. We present results of filtered ANN search over a public filtered-search dataset (YFCC), as well as data from a production customer with categorical and numeric fields, demonstrating scalable performance while maintaining exact filtering accuracy.