Improving Llamaindex RAG performance with ranking

Cole Murray
3 min readDec 19, 2023

--

enhancing rag with BAAI/bge-reranker

Retrieval-Augmented Generation (RAG) has been a game-changer in the field of natural language processing, especially for tasks that blend information retrieval with generative models. As part of my contributions to the field, I have recently implemented support for the FlagEmbeddingReranker in the llama_index library. This tool performs reranking, bringing more accuracy and relevance to the retrieval phase. In this article, I'll provide a comprehensive tutorial on the usage of FlagEmbeddingReranker, diving into its functionalities, the problems it solves, and its importance in improving RAG systems.

Introduction to FlagEmbeddingReranker

The FlagEmbeddingReranker is a class in the llama_index library designed for re-ranking nodes (or documents) based on their relevance to a given query. It leverages a pre-trained model to compute relevance scores and select the top N nodes. This re-ranking mechanism is particularly valuable in RAG, where the quality of retrieved documents directly impacts the generation quality.

The re-ranker can be used with any of the BAAI/bge-reranker models, such as BAAI/bge-reranker-base , BAAI/bge-reranker-large , BAAI/bge-reranker-large-en-v1.5

You can view the available models here

Why Use FlagEmbedding Reranking in RAG?

The latest BAAI/bge-reranker models, integral to the FlagEmbeddingReranker, are designed to:

  • Directly Compute Relevance: They take a question and document as inputs and directly output a similarity score, providing a more nuanced measure of relevance compared to vector-based embedding models.
  • Optimize Scoring Range: Optimized using cross-entropy loss, these models offer flexibility in scoring, not limited to a specific range.
  • Adapt to Specific Use Cases: Their fine-tuning capabilities make them ideal for re-ranking the top-k documents returned by embedding models, ensuring high relevance and accuracy.

Setting Up

Before diving into the implementation, ensure you have the llama_index and FlagEmbedding packages installed. If not, install them using:

pip install llama-index FlagEmbedding

Implementing FlagEmbeddingReranker

Here’s how to incorporate FlagEmbeddingReranker into your RAG setup:

1. Import and Initialize

First, import necessary classes and initialize FlagEmbeddingReranker.

from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker

reranker = FlagEmbeddingReranker(
top_n=3,
model="BAAI/bge-reranker-large",
use_fp16=False
)

2. Prepare Nodes and Query

Assume you have a list of NodeWithScore objects, each representing a document retrieved by your RAG's initial query phase.

from llama_index.schema import NodeWithScore, QueryBundle, TextNode

documents = [
"Retrieval-Augmented Generation (RAG) combines retrieval and generation for NLP tasks.",
"Generative Pre-trained Transformer (GPT) is a language generation model.",
"RAG uses a retriever to fetch relevant documents and a generator to produce answers.",
"BERT is a model designed for understanding the context of a word in a sentence."
]

nodes = [NodeWithScore(node=TextNode(text=doc)) for doc in documents]
query = "What is RAG in NLP?"

3. Re-Rank Nodes

Invoke FlagEmbeddingReranker with the nodes and query.

query_bundle = QueryBundle(query_str=query)
ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle)

4. Analyze Results

The returned ranked_nodes will be a sorted list based on relevance to the query.

for node in ranked_nodes:
print(node.node.get_content(), "-> Score:", node.score)

Expected Output

The output should reflect the relevance of each document to the query “What is RAG in NLP?”. Ideally, the document explicitly mentioning RAG and its role in NLP should rank higher.

  1. “Retrieval-Augmented Generation (RAG) combines retrieval and generation for NLP tasks.” -> Higher score
  2. “RAG uses a retriever to fetch relevant documents and a generator to produce answers.” -> Next highest score

Documents about GPT and BERT, while relevant to NLP, are less pertinent to the specific query about RAG and thus should score lower.

Conclusion

The FlagEmbeddingReranker equipped with the new BAAI/bge-reranker models is a powerful addition to any RAG pipeline. It ensures that your retrieval phase is not just fetching a wide array of documents but is accurately focusing on the most relevant ones. Whether you are developing sophisticated NLP applications or exploring the frontiers of language models, integrating this advanced reranking tool is pivotal in achieving high-quality, relevant outputs.

--

--

Cole Murray

Machine Learning Engineer | CTO Empiric | Previously Personalization @ Amazon | https://murraycole.com