Cascading retrieval with multi-vector representations — ConstBERT model is now in open source - Read the blog

Abstract

Reranking models are critical to enhancing the quality of retrieval systems by refining initial search results based on query relevance. Among these, cross-encoders demonstrate higher effectiveness because of their deep semantic understanding, achieved through transformer-based architectures. However, their high computational demands pose significant challenges for real-time applications and scalability.

This paper introduces E2Rank, a layer-wise reranking model that optimizes both efficiency and effectiveness by leveraging intermediate transformer outputs, progressively applying deeper model layers to a narrowed candidate set, to reduce computational costs with minimal impact on quality. Our training approach, which includes model merging and layerwise contrastive training, yields substantial gains in effectiveness. Extensive experiments conducted on standard benchmarks demonstrate that E2Rank achieves state-of-the-art performance, outperforming existing rerankers in both effectiveness and computational efficiency.

Share:

Start building knowledgeable AI today

Create your first index for free, then pay as you go when you're ready to scale.