RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval¶
This notebook shows how to use an implementation of RAPTOR with llama-index, leveraging the RAPTOR llama-pack.
RAPTOR works by recursively clustering and summarizing clusters in layers for retrieval.
There two retrieval modes:
- tree_traversal -- traversing the tree of clusters, performing top-k at each level in the tree.
- collapsed -- treat the entire tree as a giant pile of nodes, perform simple top-k.
See the paper for full algorithm details.
Setup¶
In [ ]:
Copied!
!pip install llama-index llama-index-packs-raptor llama-index-vector-stores-qdrant
!pip install llama-index llama-index-packs-raptor llama-index-vector-stores-qdrant
In [ ]:
Copied!
from llama_index.packs.raptor import RaptorPack
# optionally download the pack to inspect/modify it yourself!
# from llama_index.core.llama_pack import download_llama_pack
# RaptorPack = download_llama_pack("RaptorPack", "./raptor_pack")
from llama_index.packs.raptor import RaptorPack
# optionally download the pack to inspect/modify it yourself!
# from llama_index.core.llama_pack import download_llama_pack
# RaptorPack = download_llama_pack("RaptorPack", "./raptor_pack")
In [ ]:
Copied!
!wget https://arxiv.org/pdf/2401.18059.pdf -O ./raptor_paper.pdf
!wget https://arxiv.org/pdf/2401.18059.pdf -O ./raptor_paper.pdf
Will not apply HSTS. The HSTS database must be a regular and non-world-writable file. ERROR: could not open HSTS store at '/home/loganm/.wget-hsts'. HSTS will be disabled. --2024-02-29 22:16:11-- https://arxiv.org/pdf/2401.18059.pdf Resolving arxiv.org (arxiv.org)... 151.101.3.42, 151.101.195.42, 151.101.131.42, ... Connecting to arxiv.org (arxiv.org)|151.101.3.42|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 2547113 (2.4M) [application/pdf] Saving to: ‘./raptor_paper.pdf’ ./raptor_paper.pdf 100%[===================>] 2.43M 12.5MB/s in 0.2s 2024-02-29 22:16:12 (12.5 MB/s) - ‘./raptor_paper.pdf’ saved [2547113/2547113]
In [ ]:
Copied!
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
Constructing the Clusters/Hierarchy Tree¶
In [ ]:
Copied!
import nest_asyncio
nest_asyncio.apply()
import nest_asyncio
nest_asyncio.apply()
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader(input_files=["./raptor_paper.pdf"]).load_data()
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader(input_files=["./raptor_paper.pdf"]).load_data()
In [ ]:
Copied!
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
client = chromadb.PersistentClient(path="./raptor_paper_db")
collection = client.get_or_create_collection("raptor")
vector_store = ChromaVectorStore(chroma_collection=collection)
raptor_pack = RaptorPack(
documents,
embed_model=OpenAIEmbedding(
model="text-embedding-3-small"
), # used for embedding clusters
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1), # used for generating summaries
vector_store=vector_store, # used for storage
similarity_top_k=2, # top k for each layer, or overall top-k for collapsed
mode="collapsed", # sets default mode
transformations=[
SentenceSplitter(chunk_size=400, chunk_overlap=50)
], # transformations applied for ingestion
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
client = chromadb.PersistentClient(path="./raptor_paper_db")
collection = client.get_or_create_collection("raptor")
vector_store = ChromaVectorStore(chroma_collection=collection)
raptor_pack = RaptorPack(
documents,
embed_model=OpenAIEmbedding(
model="text-embedding-3-small"
), # used for embedding clusters
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1), # used for generating summaries
vector_store=vector_store, # used for storage
similarity_top_k=2, # top k for each layer, or overall top-k for collapsed
mode="collapsed", # sets default mode
transformations=[
SentenceSplitter(chunk_size=400, chunk_overlap=50)
], # transformations applied for ingestion
)
Generating embeddings for level 0. Performing clustering for level 0. Generating summaries for level 0 with 10 clusters. Level 0 created summaries/clusters: 10 Generating embeddings for level 1. Performing clustering for level 1. Generating summaries for level 1 with 1 clusters. Level 1 created summaries/clusters: 1 Generating embeddings for level 2. Performing clustering for level 2. Generating summaries for level 2 with 1 clusters. Level 2 created summaries/clusters: 1
Retrieval¶
In [ ]:
Copied!
nodes = raptor_pack.run("What baselines is raptor compared against?", mode="collapsed")
print(len(nodes))
print(nodes[0].text)
nodes = raptor_pack.run("What baselines is raptor compared against?", mode="collapsed")
print(len(nodes))
print(nodes[0].text)
2 Specifically, RAPTOR’s F-1 scores are at least 1.8% points higher than DPR and at least 5.3% points higher than BM25. Retriever GPT-3 F-1 Match GPT-4 F-1 Match UnifiedQA F-1 Match Title + Abstract 25.2 22.2 17.5 BM25 46.6 50.2 26.4 DPR 51.3 53.0 32.1 RAPTOR 53.1 55.7 36.6 Table 4: Comparison of accuracies on the QuAL- ITY dev dataset for two different language mod- els (GPT-3, UnifiedQA 3B) using various retrieval methods. RAPTOR outperforms the baselines of BM25 and DPR by at least 2.0% in accuracy. Model GPT-3 Acc. UnifiedQA Acc. BM25 57.3 49.9 DPR 60.4 53.9 RAPTOR 62.4 56.6 Table 5: Results on F-1 Match scores of various models on the QASPER dataset. Model F-1 Match LongT5 XL (Guo et al., 2022) 53.1 CoLT5 XL (Ainslie et al., 2023) 53.9 RAPTOR + GPT-4 55.7Comparison to State-of-the-art Systems Building upon our controlled comparisons, we examine RAPTOR’s performance relative to other state-of-the-art models.
In [ ]:
Copied!
nodes = raptor_pack.run(
"What baselines is raptor compared against?", mode="tree_traversal"
)
print(len(nodes))
print(nodes[0].text)
nodes = raptor_pack.run(
"What baselines is raptor compared against?", mode="tree_traversal"
)
print(len(nodes))
print(nodes[0].text)
Retrieved parent IDs from level 2: ['cc3b3f41-f4ca-4020-b11f-be7e0ce04c4f'] Retrieved 1 from parents at level 2. Retrieved parent IDs from level 1: ['a4ca9426-a312-4a01-813a-c9b02aefc7e8'] Retrieved 2 from parents at level 1. Retrieved parent IDs from level 0: ['63126782-2778-449f-99c0-1e8fd90caa36', 'd8f68d31-d878-41f1-aeb6-a7dde8ed5143'] Retrieved 4 from parents at level 0. 4 Specifically, RAPTOR’s F-1 scores are at least 1.8% points higher than DPR and at least 5.3% points higher than BM25. Retriever GPT-3 F-1 Match GPT-4 F-1 Match UnifiedQA F-1 Match Title + Abstract 25.2 22.2 17.5 BM25 46.6 50.2 26.4 DPR 51.3 53.0 32.1 RAPTOR 53.1 55.7 36.6 Table 4: Comparison of accuracies on the QuAL- ITY dev dataset for two different language mod- els (GPT-3, UnifiedQA 3B) using various retrieval methods. RAPTOR outperforms the baselines of BM25 and DPR by at least 2.0% in accuracy. Model GPT-3 Acc. UnifiedQA Acc. BM25 57.3 49.9 DPR 60.4 53.9 RAPTOR 62.4 56.6 Table 5: Results on F-1 Match scores of various models on the QASPER dataset. Model F-1 Match LongT5 XL (Guo et al., 2022) 53.1 CoLT5 XL (Ainslie et al., 2023) 53.9 RAPTOR + GPT-4 55.7Comparison to State-of-the-art Systems Building upon our controlled comparisons, we examine RAPTOR’s performance relative to other state-of-the-art models.
Loading¶
Since we saved to a vector store, we can also use it again! (For local vector stores, there is a persist
and from_persist_dir
method on the retriever)
In [ ]:
Copied!
from llama_index.packs.raptor import RaptorRetriever
retriever = RaptorRetriever(
[],
embed_model=OpenAIEmbedding(
model="text-embedding-3-small"
), # used for embedding clusters
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1), # used for generating summaries
vector_store=vector_store, # used for storage
similarity_top_k=2, # top k for each layer, or overall top-k for collapsed
mode="tree_traversal", # sets default mode
)
from llama_index.packs.raptor import RaptorRetriever
retriever = RaptorRetriever(
[],
embed_model=OpenAIEmbedding(
model="text-embedding-3-small"
), # used for embedding clusters
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1), # used for generating summaries
vector_store=vector_store, # used for storage
similarity_top_k=2, # top k for each layer, or overall top-k for collapsed
mode="tree_traversal", # sets default mode
)
In [ ]:
Copied!
# if using a default vector store
# retriever.persist("./persist")
# retriever = RaptorRetriever.from_persist_dir("./persist", ...)
# if using a default vector store
# retriever.persist("./persist")
# retriever = RaptorRetriever.from_persist_dir("./persist", ...)
Query Engine¶
In [ ]:
Copied!
from llama_index.core.query_engine import RetrieverQueryEngine
query_engine = RetrieverQueryEngine.from_args(
retriever, llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1)
)
from llama_index.core.query_engine import RetrieverQueryEngine
query_engine = RetrieverQueryEngine.from_args(
retriever, llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1)
)
In [ ]:
Copied!
response = query_engine.query("What baselines was RAPTOR compared against?")
response = query_engine.query("What baselines was RAPTOR compared against?")
In [ ]:
Copied!
print(str(response))
print(str(response))
BM25 and DPR