|
@@ -4,7 +4,7 @@
|
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
"source": [
|
|
"source": [
|
|
|
- "## 命令行推理"
|
|
|
|
|
|
|
+ "# Fish Speech"
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
@@ -48,65 +48,57 @@
|
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
"source": [
|
|
"source": [
|
|
|
- "## API Client\n",
|
|
|
|
|
- "\n",
|
|
|
|
|
- "需要在终端开启API Server\n",
|
|
|
|
|
- "\n",
|
|
|
|
|
- "> 音频用本地路径\n",
|
|
|
|
|
- "\n",
|
|
|
|
|
- "> 文本可以直接用路径,也可以用内容"
|
|
|
|
|
|
|
+ "### Prepare Model"
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "code",
|
|
"cell_type": "code",
|
|
|
"execution_count": null,
|
|
"execution_count": null,
|
|
|
- "metadata": {
|
|
|
|
|
- "vscode": {
|
|
|
|
|
- "languageId": "shellscript"
|
|
|
|
|
- }
|
|
|
|
|
- },
|
|
|
|
|
|
|
+ "metadata": {},
|
|
|
"outputs": [],
|
|
"outputs": [],
|
|
|
"source": [
|
|
"source": [
|
|
|
- "!python -m tools.post_api \\\n",
|
|
|
|
|
- " --text \"Hello everyone, I am an open-source text-to-speech model developed by Fish Audio.\" \\\n",
|
|
|
|
|
- " --reference_audio \"D:\\PythonProject\\原神语音中文\\胡桃\\vo_hutao_draw_appear.wav\" \\\n",
|
|
|
|
|
- " --reference_text \"D:\\PythonProject\\原神语音中文\\胡桃\\vo_hutao_draw_appear.lab\" \\\n",
|
|
|
|
|
- " --streaming True"
|
|
|
|
|
|
|
+ "# For Chinese users, you probably want to use mirror to accelerate downloading\n",
|
|
|
|
|
+ "# !set HF_ENDPOINT=https://hf-mirror.com\n",
|
|
|
|
|
+ "# !export HF_ENDPOINT=https://hf-mirror.com \n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "!huggingface-cli download fishaudio/fish-speech-1.2-sft --local-dir checkpoints/fish-speech-1.2-sft/"
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
"source": [
|
|
"source": [
|
|
|
- "## For Test"
|
|
|
|
|
|
|
+ "## WebUI Inference\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "> You can use --compile to fuse CUDA kernels for faster inference (10x)."
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
- "cell_type": "markdown",
|
|
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": null,
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
|
|
+ "outputs": [],
|
|
|
"source": [
|
|
"source": [
|
|
|
- "### 0. 下载模型"
|
|
|
|
|
|
|
+ "!python tools/webui.py \\\n",
|
|
|
|
|
+ " --llama-checkpoint-path checkpoints/fish-speech-1.2-sft \\\n",
|
|
|
|
|
+ " --decoder-checkpoint-path checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth \\\n",
|
|
|
|
|
+ " # --compile"
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
- "cell_type": "code",
|
|
|
|
|
- "execution_count": null,
|
|
|
|
|
|
|
+ "cell_type": "markdown",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
- "outputs": [],
|
|
|
|
|
"source": [
|
|
"source": [
|
|
|
- "!set HF_ENDPOINT=https://hf-mirror.com\n",
|
|
|
|
|
- "# !export HF_ENDPOINT=https://hf-mirror.com\n",
|
|
|
|
|
- "!huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-speech-1.2/"
|
|
|
|
|
|
|
+ "## Break-down CLI Inference"
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
"source": [
|
|
"source": [
|
|
|
- "### 1. 从语音生成 prompt:\n",
|
|
|
|
|
- "> 如果你打算让模型随机选择音色, 你可以跳过这一步.\n",
|
|
|
|
|
|
|
+ "### 1. Encode reference audio\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
- "你应该能得到一个 `fake.npy` 文件."
|
|
|
|
|
|
|
+ "You should get a `fake.npy` by doing this."
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
@@ -119,12 +111,12 @@
|
|
|
},
|
|
},
|
|
|
"outputs": [],
|
|
"outputs": [],
|
|
|
"source": [
|
|
"source": [
|
|
|
- "## 在此输入你的语音路径:\n",
|
|
|
|
|
- "src_audio = r\"D:\\PythonProject\\原神语音中文\\胡桃\\vo_hutao_draw_appear.wav\"\n",
|
|
|
|
|
|
|
+ "## Enter the path to the audio file here\n",
|
|
|
|
|
+ "src_audio = r\"D:\\PythonProject\\\\vo_hutao_draw_appear.wav\"\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"!python tools/vqgan/inference.py \\\n",
|
|
"!python tools/vqgan/inference.py \\\n",
|
|
|
" -i {src_audio} \\\n",
|
|
" -i {src_audio} \\\n",
|
|
|
- " --checkpoint-path \"checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth\"\n",
|
|
|
|
|
|
|
+ " --checkpoint-path \"checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth\"\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"from IPython.display import Audio, display\n",
|
|
"from IPython.display import Audio, display\n",
|
|
|
"audio = Audio(filename=\"fake.wav\")\n",
|
|
"audio = Audio(filename=\"fake.wav\")\n",
|
|
@@ -135,10 +127,10 @@
|
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
"source": [
|
|
"source": [
|
|
|
- "### 2. 从文本生成语义 token:\n",
|
|
|
|
|
- "> 该命令会在工作目录下创建 codes_N 文件, 其中 N 是从 0 开始的整数.\n",
|
|
|
|
|
|
|
+ "### 2. Generate semantic tokens from text:\n",
|
|
|
|
|
+ "> This command will create codes_N files in the working directory, where N is an integer starting from 0.\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
- "> 您可以使用 --compile 来融合 cuda 内核以实现更快的推理"
|
|
|
|
|
|
|
+ "> You can use --compile to fuse CUDA kernels for faster inference."
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
@@ -152,10 +144,10 @@
|
|
|
"outputs": [],
|
|
"outputs": [],
|
|
|
"source": [
|
|
"source": [
|
|
|
"!python tools/llama/generate.py \\\n",
|
|
"!python tools/llama/generate.py \\\n",
|
|
|
- " --text \"人间灯火倒映湖中,她的渴望让静水泛起涟漪。若代价只是孤独,那就让这份愿望肆意流淌。流入她所注视的世间,也流入她如湖水般澄澈的目光。\" \\\n",
|
|
|
|
|
- " --prompt-text \"唷,找本堂主有何贵干呀?嗯?你不知道吗,往生堂第七十七代堂主就是胡桃我啦!嘶,不过瞧你的模样,容光焕发,身体健康,嗯…想必是为了工作以外的事来找我,对吧?\" \\\n",
|
|
|
|
|
|
|
+ " --text \"hello world\" \\\n",
|
|
|
|
|
+ " --prompt-text \"The text corresponding to reference audio\" \\\n",
|
|
|
" --prompt-tokens \"fake.npy\" \\\n",
|
|
" --prompt-tokens \"fake.npy\" \\\n",
|
|
|
- " --checkpoint-path \"checkpoints/fish-speech-1.2\" \\\n",
|
|
|
|
|
|
|
+ " --checkpoint-path \"checkpoints/fish-speech-1.2-sft\" \\\n",
|
|
|
" --num-samples 2\n",
|
|
" --num-samples 2\n",
|
|
|
" # --compile"
|
|
" # --compile"
|
|
|
]
|
|
]
|
|
@@ -164,7 +156,7 @@
|
|
|
"cell_type": "markdown",
|
|
"cell_type": "markdown",
|
|
|
"metadata": {},
|
|
"metadata": {},
|
|
|
"source": [
|
|
"source": [
|
|
|
- "### 3. 从语义 token 生成人声:"
|
|
|
|
|
|
|
+ "### 3. Generate speech from semantic tokens:"
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|
|
@@ -179,7 +171,7 @@
|
|
|
"source": [
|
|
"source": [
|
|
|
"!python tools/vqgan/inference.py \\\n",
|
|
"!python tools/vqgan/inference.py \\\n",
|
|
|
" -i \"codes_0.npy\" \\\n",
|
|
" -i \"codes_0.npy\" \\\n",
|
|
|
- " --checkpoint-path \"checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth\"\n",
|
|
|
|
|
|
|
+ " --checkpoint-path \"checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth\"\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
"from IPython.display import Audio, display\n",
|
|
"from IPython.display import Audio, display\n",
|
|
|
"audio = Audio(filename=\"fake.wav\")\n",
|
|
"audio = Audio(filename=\"fake.wav\")\n",
|