Abstract
Dimension IMportance Estimation (DIME) is a recently proposed technique to enhance ranking effectiveness of dense retrieval models by pruning irrelevant embedding dimensions through Pseudo Relevance Feedback (PRF DIME) or exploiting dense representations of Large Language Model-generated answers (LLM DIME). Despite strong empirical performance, its theoretical foundations and generalizability remain open questions.
In this paper, we propose four key contributions. First, we provide a rigorous theoretical analysis of DIME, framing it as a denoising mechanism that mitigates embedding noise while preserving the salient information. Second, we conduct a comprehensive reproducibility study, confirming previously reported gains for both PRF DIME and LLM DIME. Third, we extend the evaluations of PRF DIME by applying it to a broader set of embedding models with distinct characteristics, such as matryoshka embeddings, cosine similarity-optimized models, and architectures that produce high-dimensional representations, while also testing it on diverse retrieval datasets. For LLM DIME, we expand the analysis across a range of LLMs, comparing high-parameter proprietary models with cheaper open-source alternatives. Finally, we refine DIME by introducing an attention-inspired PRF mechanism and propose to leverage dimension importance as a reranking technique.