# kb-ragflow

**Repository Path**: salierime/kb-ragflow

## Basic Information

- **Project Name**: kb-ragflow
- **Description**: 改造工程：https://github.com/infiniflow/ragflow
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 3
- **Forks**: 0
- **Created**: 2025-03-23
- **Last Updated**: 2025-05-14

## Categories & Tags

**Categories**: Uncategorized

**Tags**: AI

## README

# 一、ragflow官网
源工程地址： https://github.com/infiniflow/ragflow      
当前改造工程适配版本号: v0.17.2    
本地环境: win11, amd GPU    

# 二、本地环境拉通
```shell
pip install pipx -i https://mirrors.aliyun.com/pypi/simple
pipx install uv
uv sync --python 3.10 --all-extras
export HF_ENDPOINT=https://hf-mirror.com
```

# 三、deepdoc使用
## 3.1. ocr识别
Step1: 在deepdoc/sali-input/ocr_test放入待识别的images/PDFs    
Step2: 执行如下指令    
```shell
# 参数说明:
# --inputs INPUTS ---> Directory where to store images or PDFs, or a file path to a single image or PDF
# --output_dir OUTPUT_DIR ---> Directory where to store the output images. Default: './ocr_outputs'
python deepdoc/vision/t_ocr.py --input deepdoc/sali-input/ocr_test --output_dir deepdoc/sali-output/ocr_test
```
Step3: 在deepdoc/sali-output/ocr_test得到 (1)带有布局识别标识的源文件 (2)识别出的文本内容    

## 3.2. layout识别
Step1: 在deepdoc/sali-input/layout_test放入待识别的images/PDFs    
Step2: 执行如下指令     
```shell
# 参数说明:
# --inputs INPUTS ---> Directory where to store images or PDFs, or a file path to a single image or PDF
# --output_dir OUTPUT_DIR ---> Directory where to store the output images. Default: './layouts_outputs'
# --threshold THRESHOLD ---> A threshold to filter out detections. Default: 0.5
python deepdoc/vision/t_recognizer.py --inputs=deepdoc/sali-input/layout_test --threshold=0.2 --mode=layout --output_dir=deepdoc/sali-output/layout_test
```
Step3: 在deepdoc/sali-output/layout_test得到 (1)带有布局识别标识的源文件     

## 3.3. tsr识别
Step1: 执行如下命令下载依赖资源     
```shell
python deepdoc/download_for_tsr.py
```
Step2: 在deepdoc/sali-input/tsr_test放入待识别的images/PDFs      
Step3: 执行如下指令      
```shell
python deepdoc/vision/t_recognizer.py --inputs=deepdoc/sali-input/tsr_test --threshold=0.2 --mode=tsr --output_dir=deepdoc/sali-output/tsr_test
```
Step4: 在deepdoc/sali-output/tsr_test得到 (1)带有布局识别标识的源文件 (2)识别出的文本内容