
Integration: fastRAG
fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines
fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. fastRAG is designed to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation.
Comments, suggestions, issues and pull-requests are welcomed! ❤️
IMPORTANT
Now compatible with Haystack v2+. Please report any possible issues you find.
📣 Updates
- 2024-05: fastRAG V3 is Haystack 2.0 compatible 🔥
- 2023-12: Gaudi2 and ONNX runtime support; Optimized Embedding models; Multi-modality and Chat demos; REPLUG text generation.
- 2023-06: ColBERT index modification: adding/removing documents.
- 2023-05: RAG with LLM and dynamic prompt synthesis example.
- 2023-04: Qdrant
DocumentStore
support.
Key Features
- Optimized RAG: Build RAG pipelines with SOTA efficient components for greater compute efficiency.
- Optimized for Intel Hardware: Leverage Intel extensions for PyTorch (IPEX), 🤗 Optimum Intel and 🤗 Optimum-Habana for running as optimal as possible on Intel® Xeon® Processors and Intel® Gaudi® AI accelerators.
- Customizable: fastRAG is built using Haystack and HuggingFace. All of fastRAG’s components are 100% Haystack compatible.
🚀 Components
For a brief overview of the various unique components in fastRAG refer to the Components Overview page.
LLM Backends | |
Intel Gaudi Accelerators | Running LLMs on Gaudi 2 |
ONNX Runtime | Running LLMs with optimized ONNX-runtime |
OpenVINO | Running quantized LLMs using OpenVINO |
Llama-CPP | Running RAG Pipelines with LLMs on a Llama CPP backend |
Optimized Components | |
Embedders | Optimized int8 bi-encoders |
Rankers | Optimized/sparse cross-encoders |
RAG-efficient Components | |
ColBERT | Token-based late interaction |
Fusion-in-Decoder (FiD) | Generative multi-document encoder-decoder |
REPLUG | Improved multi-document decoder |
PLAID | Incredibly efficient indexing engine |
📍 Installation
Preliminary requirements:
- Python 3.8 or higher.
- PyTorch 2.0 or higher.
To set up the software, clone the project and run the following, preferably in a newly created virtual environment:
git clone https://github.com/IntelLabs/fastRAG.git
cd fastrag
pip install .
Usage
You can import components from fastRAG and use them in a Haystack pipeline:
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.rankers import TransformersSimilarityRanker
from fastrag.generators.openvino import OpenVINOGenerator
prompt_template = """
Given these documents, answer the question.
Documents:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
Question: {{query}}
Answer:
"""
openvino_compressed_model_path = "path/to/quantized/model"
generator = OpenVINOGenerator(
model="microsoft/phi-2",
compressed_model_dir=openvino_compressed_model_path,
device_openvino="CPU",
task="text-generation",
generation_kwargs={
"max_new_tokens": 100,
}
)
pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=store))
pipe.add_component("ranker", ransformersSimilarityRanker())
pipe.add_component("prompt_builder", PromptBuilder(template=prompt_template))
pipe.add_component("llm", generator)
pipe.connect("retriever.documents", "ranker.documents")
pipe.connect("ranker", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")
query = "Who is the main villan in Lord of the Rings?"
answer_result = pipe.run({
"prompt_builder": {
"query": query
},
"retriever": {
"query": query
},
"ranker": {
"query": query,
"top_k": 1
}
})
print(answer_result["llm"]["replies"][0])
#' Sauron\n'
For more examples, check out Example Use Cases.
License
The code is licensed under the Apache 2.0 License.
Disclaimer
This is not an official Intel product.