Frontier Medicines and Pinecone: Pioneering the Use of Super Large Vector Databases in Healthcare
Frontier Medicines is a precision medicine company pioneering groundbreaking medicines to transform treatment for genetically-defined patient populations, starting with oncology and immunology. They use Pinecone to supercharge their usage of vector embeddings— specifically impactful for similarity searches over billions of vectorized molecules.
Tens of Billions
in vector searches
Frontier Medicines is a precision medicine company pioneering groundbreaking medicines to transform treatment for genetically-defined patient populations, starting with oncology and immunology. The company’s proprietary chemoproteomics powered drug discovery engine, the Frontier™ Platform, leverages machine learning paired with covalent chemistry to unlock hard-to-treat disease causing proteins for drug development.
The Frontier platform is powered in part by data from its mass spectrometry driven chemoproteomics, generating terabytes of data a day. In the last five years, Frontier has measured hundreds of millions of covalent molecule proteome interaction data and strategically invested in the generation of covalent chemical property data. Together, these data represent an unprecedented large data set for covalent drug discovery.
Leveraging these high value and high quality data, Frontier has developed key AI algorithms to drive the discovery of novel medicines against historically “undruggable” targets. Chemoproteomics focused algorithms inform and guide targeting strategies for nearly every protein in the human proteome. In addition, medicinal chemistry focused algorithms including large-language-models and generative AI approaches drive the optimization of compounds from within Frontier’s highly curated library.
Large scale similarity searches over billions of vectorized items including molecules is an important step in this process and Pinecone serverless enabled Frontier to perform those searches with superior performance while being cost effective.
Pinecone Offers Next Generation Vector Embedded Molecule Search
Frontier Medicines turned to Pinecone to take advantage of Pinecone’s handling of vectors at a large scale with automatic resource management.
With Pinecone, Frontier Medicines supercharged their usage of vector embeddings—specifically impactful for molecule searches. First, Frontier leveraged a proprietary transformer model to generate molecule specific vectors for billions of molecules, then Pinecone enabled optimizing the entire search process.
As their usage of Pinecone evolved, Frontier Medicines wanted to explore the ability to search across molecular datasets. Facing the challenging requirements of a billion-scale complex dataset, the Frontier Machine Learning team delved into the capabilities of the newly introduced Pinecone serverless. During a private preview period, the team tested the recently available architecture, which allowed them to leverage vector storage at any scale while lowering costs for higher availability.
The introduction of Pinecone serverless has led to amazing performance and efficiency improvements in our vector search capability. We will continue to push forward searching Billions of vectors with Pinecone serverless at the center.” - Johannes Hermann PhD, Chief Technology Officer, Frontier Medicines
Scaling Drug Discovery with Pinecone Serverless
Frontier Medicines wanted a more scalable solution to support the generation of molecular insights across billions of molecules. The team needed to store a large collection of vectors representing molecules to be able to process and find insights for their projects. Here, the query needs were not always large in nature, but also frequent and smaller and focused. With Pinecone serverless, Frontier Medicines has been able to continue rapidly:
- Building at any scale: Pinecone serverless has enabled the team to scale up to tens of billions of vectors. With this advanced architecture, they could substantially expand their molecular semantic search effectively, enabling precise searches within specific namespaces based on molecular characteristics.
- Accessing easiest-to-use technology: This developer-first experience allowed Frontier Medicines to push the envelope in using large scale molecule vector spaces for searches and machine learning. Pinecone is key to enabling that large level of scale, making manual resource provisioning a thing of the past.
A Vision for Broader Impact
Frontier Medicines has continued significant enhancements in their ability to search large vector spaces since transitioning to Pinecone serverless:
- Scale to tens of billions of molecule vectors with precise performance: sub-second searches achieved in near real-time usage
- Increased efficiency in vector searches
Frontier’s utilization of Pinecone serverless has enabled Frontier to enter a new dimension of search space and laid the foundation for future scalability and innovation as the company continues to expand its leadership in developing breakthrough precision medicines.