# F5_TTS_Faster

**Repository Path**: ruby11dog/F5_TTS_Faster

## Basic Information

- **Project Name**: F5_TTS_Faster
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-03-20
- **Last Updated**: 2025-03-20

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## F5-TTS Tensorrt-LLM Faster

为 F5-TTS 进行推理加速，测试样例如下：

+ `NVIDIA GeForce RTX 3090`
+ 测试文本为: `这点请您放心，估计是我的号码被标记了，请问您是沈沈吗？`

经测试，推理速度由`3.2s`降低为`0.72s`, 速度提升4倍！

基本的思路就是先将 `F5-TTS` 用 `ONNX` 导出，然后使用 `Tensorrt-LLM` 对有关 `Transformer` 部分进行加速。

特别感谢以下两个开源项目的贡献：
+ https://github.com/DakeQQ/F5-TTS-ONNX
+ https://github.com/Bigfishering/f5-tts-trtllm

全部`ref`见文档末尾。

**模型权重**：https://huggingface.co/wgs/F5-TTS-Faster

> 项目构建耗时约 **3h**, 建议在 tmux 中构建。


## Install

```shell
conda create -n f5_tts_faster python=3.10 -y
source activate f5_tts_faster
```

```shell
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.4 -c pytorch -c nvidia -y
```

### F5-TTS 环境

```shell
# huggingface-cli download --resume-download SWivid/F5-TTS --local-dir ./F5-TTS/ckpts
git clone https://github.com/SWivid/F5-TTS.git   # 0.3.4
cd F5-TTS
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple
```

```shell
# 修改源码加载本地的 vocoder
vim /home/wangguisen/miniconda3/envs/f5_tts_faster/lib/python3.10/site-packages/f5_tts/infer/infer_cli.py
# 如果是源码安装，则在 F5-TTS/src/f5_tts/infer/infer_cli.py
# 大约在124行，将 vocoder_local_path = "../checkpoints/vocos-mel-24khz" 注释，改为本地路径：
# vocoder_local_path = "/home/wangguisen/projects/tts/f5tts_faster/ckpts/vocos-mel-24khz"

# 运行 F5-TTS 推理
f5-tts_infer-cli \
--model "F5-TTS" \
--ref_audio "./assets/wgs-f5tts_mono.wav" \
--ref_text "那到时候再给你打电话，麻烦你注意接听。" \
--gen_text "这点请您放心，估计是我的号码被标记了，请问您是沈沈吗？" \
--vocoder_name "vocos" \
--load_vocoder_from_local \
--ckpt_file "./ckpts/F5TTS_Base/model_1200000.pt" \
--speed 1.2 \
--output_dir "./output/" \
--output_file "f5tts_wgs_out.wav"
```


### F5-TTS-Faster 环境：
```shell
conda install -c conda-forge ffmpeg cmake openmpi -y

# 为 OpenMPI 设置 C 编译器, 确保安装了 ggc
# conda install -c conda-forge compilers
# which gcc
# gcc --version
export OMPI_CC=$(which gcc)
export OMPI_CXX=$(which g++)

pip install -r ./requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
```

```python
# 检查 onnxruntime 是否支持 CUDA
import onnxruntime as ort
print(ort.get_available_providers())
```

### TensorRT-LLM 环境
```shell
sudo apt-get -y install libopenmpi-dev
pip install tensorrt_llm==0.15.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
```

验证 TensorRT-LLM 环境安装是否成功：
```python
python -c "import tensorrt_llm"
python -c "import tensorrt_llm.bindings"
```


## 转 ONNX

将 `F5-TTS` 导出到 `ONNX`:

```python
python ./export_onnx/Export_F5.py
```

```python
# 单独以 onnx 进行推理
python ./export_onnx/F5-TTS-ONNX-Inference.py
```

导出 `ONNX` 结构如下：
```shell
./export_onnx/onnx_ckpt/
├── F5_Decode.onnx
├── F5_Preprocess.onnx
└── F5_Transformer.onnx
```

转成 onnx 后，用 GPU 推理其实速度并没有很快，感兴趣的可以参考：
```python
# 指定 CUDAExecutionProvider
import onnxruntime as ort
sess_options = ort.SessionOptions()
session = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"], sess_options=sess_options)

# 量化：
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic("model.onnx", "model_quant.onnx", weight_type=QuantType.QInt8)
```

注意：因为导出 ONNX 的时候修改了 `F5` 和 `vocos` 的源码，所以需要重新 re-download 和 re-install，才能继续保证 F5-TTS 可用。
```shell
pip uninstall -y vocos && pip install vocos -i https://pypi.tuna.tsinghua.edu.cn/simple
```
```python
# ./export_trtllm/origin_f5 即为 F5 对应源码
cp ./export_trtllm/origin_f5/modules.py ../F5-TTS/src/f5_tts/model/
cp ./export_trtllm/origin_f5/dit.py ../F5-TTS/src/f5_tts/model/backbones/
cp ./export_trtllm/origin_f5/utils_infer.py ../F5-TTS/src/f5_tts/infer/
```


## 转 Trtllm

### 源码改动

装好 `TensorRT-LLM` 后，所以需要移动目录：

本项目目录中 `export_trtllm/model` 对应 `Tensorrt-LLM` 源码中的 `tensorrt_llm/models`。

1. 在 `Tensorrt-LLM` 源码中的 `tensorrt_llm/models` 目录下新建 `f5tts` 目录，然后将 repo 中的代码放入对应的目录。

```shell
# 查看 tensorrt_llm
ll /home/wangguisen/miniconda3/envs/f5_tts_faster/lib/python3.10/site-packages | grep tensorrt_llm

# 源码 tensorrt_llm/models 导入
mkdir /home/wangguisen/miniconda3/envs/f5_tts_faster/lib/python3.10/site-packages/tensorrt_llm/models/f5tts

cp -r /home/wangguisen/projects/tts/f5tts_faster/export_trtllm/model/* /home/wangguisen/miniconda3/envs/f5_tts_faster/lib/python3.10/site-packages/tensorrt_llm/models/f5tts
```

+ `tensorrt_llm/models/f5tts` 目录如下:
```shell
/home/wangguisen/miniconda3/envs/f5_tts_faster/lib/python3.10/site-packages/tensorrt_llm/models/f5tts/
├── model.py
└── modules.py
```

2. 在 `tensorrt_llm/models/__init__.py `导 `入f5tts`:

```shell
vim /home/wangguisen/miniconda3/envs/f5_tts_faster/lib/python3.10/site-packages/tensorrt_llm/models/__init__.py
```

```python
from .f5tts.model import F5TTS

__all__ = [..., 'F5TTS']

# 并且在 `MODEL_MAP` 添加模型：
MODEL_MAP = {..., 'F5TTS': F5TTS}
```


### convert_checkpoint
```python
python ./export_trtllm/convert_checkpoint.py \
        --timm_ckpt "./ckpts/F5TTS_Base/model_1200000.pt" \
        --output_dir "./ckpts/trtllm_ckpt"

# --dtype float32
```

### build engine
> 支持Tensor 并行, --tp_size

```python
trtllm-build --checkpoint_dir ./ckpts/trtllm_ckpt \
             --remove_input_padding disable \
             --bert_attention_plugin disable \
             --output_dir ./ckpts/engine_outputs
# 如报参数dtype不一致错误，那是因为我们默认参数是fp16，而网络参数默认需要fp32，需要在tensorrt_llm/parameter.py中将参数默认_DEFAULT_DTYPE = trt.DataType.HALF
```

## fast inference
```python
python ./export_trtllm/sample.py \
        --tllm_model_dir "./ckpts/engine_outputs"
```


## References

https://github.com/SWivid/F5-TTS

https://github.com/DakeQQ/F5-TTS-ONNX

https://github.com/NVIDIA/TensorRT-LLM

https://github.com/Bigfishering/f5-tts-trtllm