# kb-ragflow **Repository Path**: salierime/kb-ragflow ## Basic Information - **Project Name**: kb-ragflow - **Description**: 改造工程:https://github.com/infiniflow/ragflow - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 3 - **Forks**: 0 - **Created**: 2025-03-23 - **Last Updated**: 2025-05-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: AI ## README # 一、ragflow官网 源工程地址: https://github.com/infiniflow/ragflow 当前改造工程适配版本号: v0.17.2 本地环境: win11, amd GPU # 二、本地环境拉通 ```shell pip install pipx -i https://mirrors.aliyun.com/pypi/simple pipx install uv uv sync --python 3.10 --all-extras export HF_ENDPOINT=https://hf-mirror.com ``` # 三、deepdoc使用 ## 3.1. ocr识别 Step1: 在deepdoc/sali-input/ocr_test放入待识别的images/PDFs Step2: 执行如下指令 ```shell # 参数说明: # --inputs INPUTS ---> Directory where to store images or PDFs, or a file path to a single image or PDF # --output_dir OUTPUT_DIR ---> Directory where to store the output images. Default: './ocr_outputs' python deepdoc/vision/t_ocr.py --input deepdoc/sali-input/ocr_test --output_dir deepdoc/sali-output/ocr_test ``` Step3: 在deepdoc/sali-output/ocr_test得到 (1)带有布局识别标识的源文件 (2)识别出的文本内容 ## 3.2. layout识别 Step1: 在deepdoc/sali-input/layout_test放入待识别的images/PDFs Step2: 执行如下指令 ```shell # 参数说明: # --inputs INPUTS ---> Directory where to store images or PDFs, or a file path to a single image or PDF # --output_dir OUTPUT_DIR ---> Directory where to store the output images. Default: './layouts_outputs' # --threshold THRESHOLD ---> A threshold to filter out detections. Default: 0.5 python deepdoc/vision/t_recognizer.py --inputs=deepdoc/sali-input/layout_test --threshold=0.2 --mode=layout --output_dir=deepdoc/sali-output/layout_test ``` Step3: 在deepdoc/sali-output/layout_test得到 (1)带有布局识别标识的源文件 ## 3.3. tsr识别 Step1: 执行如下命令下载依赖资源 ```shell python deepdoc/download_for_tsr.py ``` Step2: 在deepdoc/sali-input/tsr_test放入待识别的images/PDFs Step3: 执行如下指令 ```shell python deepdoc/vision/t_recognizer.py --inputs=deepdoc/sali-input/tsr_test --threshold=0.2 --mode=tsr --output_dir=deepdoc/sali-output/tsr_test ``` Step4: 在deepdoc/sali-output/tsr_test得到 (1)带有布局识别标识的源文件 (2)识别出的文本内容