|
|
@@ -0,0 +1,211 @@
|
|
|
+{
|
|
|
+ "cells": [
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "## 命令行推理"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "### For Windows"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "metadata": {
|
|
|
+ "vscode": {
|
|
|
+ "languageId": "bat"
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "!chcp 65001"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "### For Linux"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "import locale\n",
|
|
|
+ "locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "## API Client\n",
|
|
|
+ "\n",
|
|
|
+ "需要在终端开启API Server\n",
|
|
|
+ "\n",
|
|
|
+ "> 音频用本地路径\n",
|
|
|
+ "\n",
|
|
|
+ "> 文本可以直接用路径,也可以用内容"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "metadata": {
|
|
|
+ "vscode": {
|
|
|
+ "languageId": "shellscript"
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "!python -m tools.post_api \\\n",
|
|
|
+ " --text \"Hello everyone, I am an open-source text-to-speech model developed by Fish Audio.\" \\\n",
|
|
|
+ " --reference_audio \"D:\\PythonProject\\原神语音中文\\胡桃\\vo_hutao_draw_appear.wav\" \\\n",
|
|
|
+ " --reference_text \"D:\\PythonProject\\原神语音中文\\胡桃\\vo_hutao_draw_appear.lab\" \\\n",
|
|
|
+ " --streaming True"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "## For Test"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "### 0. 下载模型"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "metadata": {},
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "!set HF_ENDPOINT=https://hf-mirror.com\n",
|
|
|
+ "# !export HF_ENDPOINT=https://hf-mirror.com\n",
|
|
|
+ "!huggingface-cli download fishaudio/fish-speech-1.2 --local-dir checkpoints/fish-speech-1.2/"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "### 1. 从语音生成 prompt:\n",
|
|
|
+ "> 如果你打算让模型随机选择音色, 你可以跳过这一步.\n",
|
|
|
+ "\n",
|
|
|
+ "你应该能得到一个 `fake.npy` 文件."
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "metadata": {
|
|
|
+ "vscode": {
|
|
|
+ "languageId": "shellscript"
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "## 在此输入你的语音路径:\n",
|
|
|
+ "src_audio = r\"D:\\PythonProject\\原神语音中文\\胡桃\\vo_hutao_draw_appear.wav\"\n",
|
|
|
+ "\n",
|
|
|
+ "!python tools/vqgan/inference.py \\\n",
|
|
|
+ " -i {src_audio} \\\n",
|
|
|
+ " --checkpoint-path \"checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth\"\n",
|
|
|
+ "\n",
|
|
|
+ "from IPython.display import Audio, display\n",
|
|
|
+ "audio = Audio(filename=\"fake.wav\")\n",
|
|
|
+ "display(audio)"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "### 2. 从文本生成语义 token:\n",
|
|
|
+ "> 该命令会在工作目录下创建 codes_N 文件, 其中 N 是从 0 开始的整数.\n",
|
|
|
+ "\n",
|
|
|
+ "> 您可以使用 --compile 来融合 cuda 内核以实现更快的推理"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "metadata": {
|
|
|
+ "vscode": {
|
|
|
+ "languageId": "shellscript"
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "!python tools/llama/generate.py \\\n",
|
|
|
+ " --text \"人间灯火倒映湖中,她的渴望让静水泛起涟漪。若代价只是孤独,那就让这份愿望肆意流淌。流入她所注视的世间,也流入她如湖水般澄澈的目光。\" \\\n",
|
|
|
+ " --prompt-text \"唷,找本堂主有何贵干呀?嗯?你不知道吗,往生堂第七十七代堂主就是胡桃我啦!嘶,不过瞧你的模样,容光焕发,身体健康,嗯…想必是为了工作以外的事来找我,对吧?\" \\\n",
|
|
|
+ " --prompt-tokens \"fake.npy\" \\\n",
|
|
|
+ " --checkpoint-path \"checkpoints/fish-speech-1.2\" \\\n",
|
|
|
+ " --num-samples 2\n",
|
|
|
+ " # --compile"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "markdown",
|
|
|
+ "metadata": {},
|
|
|
+ "source": [
|
|
|
+ "### 3. 从语义 token 生成人声:"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ {
|
|
|
+ "cell_type": "code",
|
|
|
+ "execution_count": null,
|
|
|
+ "metadata": {
|
|
|
+ "vscode": {
|
|
|
+ "languageId": "shellscript"
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "outputs": [],
|
|
|
+ "source": [
|
|
|
+ "!python tools/vqgan/inference.py \\\n",
|
|
|
+ " -i \"codes_0.npy\" \\\n",
|
|
|
+ " --checkpoint-path \"checkpoints/fish-speech-1.2/firefly-gan-vq-fsq-4x1024-42hz-generator.pth\"\n",
|
|
|
+ "\n",
|
|
|
+ "from IPython.display import Audio, display\n",
|
|
|
+ "audio = Audio(filename=\"fake.wav\")\n",
|
|
|
+ "display(audio)"
|
|
|
+ ]
|
|
|
+ }
|
|
|
+ ],
|
|
|
+ "metadata": {
|
|
|
+ "kernelspec": {
|
|
|
+ "display_name": "Python 3",
|
|
|
+ "language": "python",
|
|
|
+ "name": "python3"
|
|
|
+ },
|
|
|
+ "language_info": {
|
|
|
+ "codemirror_mode": {
|
|
|
+ "name": "ipython",
|
|
|
+ "version": 3
|
|
|
+ },
|
|
|
+ "file_extension": ".py",
|
|
|
+ "mimetype": "text/x-python",
|
|
|
+ "name": "python",
|
|
|
+ "nbconvert_exporter": "python",
|
|
|
+ "pygments_lexer": "ipython3",
|
|
|
+ "version": "3.10.14"
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "nbformat": 4,
|
|
|
+ "nbformat_minor": 2
|
|
|
+}
|