Integration of graphrag

Menghuan1918July 14, 2024Less than 1 minuteGuide

Install and configure the corresponding libraries

To avoid unnecessary trouble, please use a virtual environment:

miniconda3, the minimal installation version of conda, of course, you can also directly use Anaconda.
uv, a very fast package installer and resolver built with Rust.

conda

conda create -n rag python=3.12
conda activate rag
pip install --upgrade pdfdeal graphrag

uv venv
source .venv/bin/activate # For Linux
source .venv/Scripts/activate # For Windows
uv pip install --upgrade graphrag pdfdeal

Create two folders to store the PDFs before processing and the txt files after processing:

mkdir ./pdf
mkdir -p ./ragtest/input

Put the PDFs to be processed into the pdf folder, here using graphrag's own paper and it's references.

Go to Doc2X, click on identity information, and copy your identity token as a key.

Use pdfdeal's CLI tool doc2x for batch processing, please add the long flag --graphrag to enable special adaptation for graphrag:

doc2x -k "Your Key Here" -o ./ragtest/input --graphrag ./pdf

Wait for it to complete processing:

python -m graphrag.index --init --root ./ragtest

Modify settings.yaml and .env files, then build:

python -m graphrag.index --root ./ragtest

After building is complete, you can start asking questions to graphrag using different answering strategies:

global

python -m graphrag.query \
--root ./ragtest \
--method global \
"Q"

local

python -m graphrag.query \
--root ./ragtest \
--method local \
"Q"