IndexTTS2

IndexTTS2是B站开源的文本转音频模型，引入了音色与情感解耦建模机制，处理支持单音频参考以外，额外支持分别指定音色参考与情感参考，实现更加灵活、细腻的语音合成控制(cite: bilibili)。

我的机器平台比较老，是Ryzen 3700X + RTX2070，16G内存，Windows 11。

项目安装

git lfs install

cd index-tts
git clone https://github.com/index-tts/index-tts.git
git lfs pull

UV可查看此篇。

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

此处有两个额外选项，分别是webui、deepspeed，按照其他食用指南的说法，deepspeed主要是50系显卡才能用，因此这里只安装webui（--all-extras、--extra webui、--extra deepspeed按需选择参数）。

uv sync --extra webui --default-index "https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"

选择用魔搭平台下载，速度快点

uv tool install "modelscope"
modelscope download --model IndexTeam/IndexTTS-2 --local_dir checkpoints

uv run tools/gpu_check.py

理论上已经完成了项目的安装，接下来进行项目启动

uv run webui.py

可自定义webui的端口，以及建议使用FP16

uv run webui.py --port 9999 --fp16

如果遇到SSL错误，可能是因为代理设置的问题，在启动前先设定代理

set NO_PROXY="127.0.0.1;localhost"

如果必须走代理，则可以设置临时环境变量(按实际设置端口)

$env:HTTP_PROXY = "http://127.0.0.1:7890"
$env:HTTPS_PROXY = "http://127.0.0.1:7890"