1 year ago · f15d9f23a9
--- a/README.ja.md
+++ b/README.ja.md
@@ -1,4 +1,3 @@
 
				-
			
 
				 <div align="center">
			
 
				 <h1>Fish Speech</h1>
			
 
				 
			
@@ -15,7 +14,7 @@
 
				 <br>
			
 
				 
			
 
				 <div align="center">
			
 
				-    <img src="https://counter.seku.su/cmoe?name=fish-speech&theme=asoul" /><br>
			
 
				+    <img src="https://count.getloli.com/get/@fish-speech?theme=asoul" /><br>
			
 
				 </div>
			
 
				 <br>
			
 
				 
			
@@ -31,28 +30,25 @@
 
				     </a>
			
 
				 </div>
			
 
				 
			
 
				-このコードベースとすべてのモデルは、CC-BY-NC-SA-4.0ライセンスの下でリリースされています。詳細については、[LICENSE](LICENSE)を参照してください。
			
 
				+このコードベースとすべてのモデルは、CC-BY-NC-SA-4.0 ライセンスの下でリリースされています。詳細については、[LICENSE](LICENSE)を参照してください。
			
 
				 
			
 
				 ---
			
 
				 
			
 
				 ## 機能
			
 
				 
			
 
				-1. **ゼロショット & フューショット TTS**：10〜30秒の音声サンプルを入力して、高品質のTTS出力を生成します。**詳細は [音声クローンのベストプラクティス](https://docs.fish.audio/text-to-speech/voice-clone-best-practices) を参照してください。**
			
 
				+1. **ゼロショット & フューショット TTS**：10〜30 秒の音声サンプルを入力して、高品質の TTS 出力を生成します。**詳細は [音声クローンのベストプラクティス](https://docs.fish.audio/text-to-speech/voice-clone-best-practices) を参照してください。**
			
 
				 2. **多言語 & クロスリンガル対応**：多言語テキストを入力ボックスにコピーペーストするだけで、言語を気にする必要はありません。現在、英語、日本語、韓国語、中国語、フランス語、ドイツ語、アラビア語、スペイン語に対応しています。
			
 
				-3. **音素依存なし**：このモデルは強力な汎化能力を持ち、TTSに音素を必要としません。あらゆる言語スクリプトに対応可能です。
			
 
				-4. **高精度**：5分間の英語テキストに対し、CER（文字誤り率）とWER（単語誤り率）は約2%の精度を達成します。
			
 
				-5. **高速**：fish-techアクセラレーションにより、Nvidia RTX 4060ラップトップではリアルタイムファクターが約1:5、Nvidia RTX 4090では約1:15です。
			
 
				-6. **WebUI 推論**：使いやすいGradioベースのWebユーザーインターフェースを搭載し、Chrome、Firefox、Edgeなどのブラウザに対応しています。
			
 
				-7. **GUI 推論**：PyQt6のグラフィカルインターフェースを提供し、APIサーバーとシームレスに連携します。Linux、Windows、macOSに対応しています。[GUIを見る](https://github.com/AnyaCoder/fish-speech-gui)。
			
 
				-8. **デプロイしやすい**：Linux、Windows、macOSにネイティブ対応した推論サーバーを簡単にセットアップでき、速度の低下を最小限に抑えます。
			
 
				-
			
 
				-
			
 
				+3. **音素依存なし**：このモデルは強力な汎化能力を持ち、TTS に音素を必要としません。あらゆる言語スクリプトに対応可能です。
			
 
				+4. **高精度**：5 分間の英語テキストに対し、CER（文字誤り率）と WER（単語誤り率）は約 2%の精度を達成します。
			
 
				+5. **高速**：fish-tech アクセラレーションにより、Nvidia RTX 4060 ラップトップではリアルタイムファクターが約 1:5、Nvidia RTX 4090 では約 1:15 です。
			
 
				+6. **WebUI 推論**：使いやすい Gradio ベースの Web ユーザーインターフェースを搭載し、Chrome、Firefox、Edge などのブラウザに対応しています。
			
 
				+7. **GUI 推論**：PyQt6 のグラフィカルインターフェースを提供し、API サーバーとシームレスに連携します。Linux、Windows、macOS に対応しています。[GUI を見る](https://github.com/AnyaCoder/fish-speech-gui)。
			
 
				+8. **デプロイしやすい**：Linux、Windows、macOS にネイティブ対応した推論サーバーを簡単にセットアップでき、速度の低下を最小限に抑えます。
			
 
				 
			
 
				 ## 免責事項
			
 
				 
			
 
				 コードベースの違法な使用については一切責任を負いません。DMCA（デジタルミレニアム著作権法）およびその他の関連法については、地域の法律を参照してください。
			
 
				 
			
 
				-
			
 
				 ## オンラインデモ
			
 
				 
			
 
				 [Fish Audio](https://fish.audio)
			
--- a/README.md
+++ b/README.md
@@ -1,4 +1,3 @@
 
				-
			
 
				 <div align="center">
			
 
				 <h1>Fish Speech</h1>
			
 
				 
			
@@ -15,8 +14,9 @@
 
				 <br>
			
 
				 
			
 
				 <div align="center">
			
 
				-    <img src="https://counter.seku.su/cmoe?name=fish-speech&theme=asoul" /><br>
			
 
				+    <img src="https://count.getloli.com/get/@fish-speech?theme=asoul" /><br>
			
 
				 </div>
			
 
				+
			
 
				 <br>
			
 
				 
			
 
				 <div align="center">
			
@@ -31,7 +31,7 @@
 
				     </a>
			
 
				 </div>
			
 
				 
			
 
				-This codebase and all models are released under CC-BY-NC-SA-4.0 License. Please refer to [LICENSE](LICENSE) for more details. 
			
 
				+This codebase and all models are released under CC-BY-NC-SA-4.0 License. Please refer to [LICENSE](LICENSE) for more details.
			
 
				 
			
 
				 ---
			
 
				 
			
@@ -54,6 +54,7 @@ This codebase and all models are released under CC-BY-NC-SA-4.0 License. Please
 
				 8. **Deploy-Friendly:** Easily set up an inference server with native support for Linux, Windows and MacOS, minimizing speed loss.
			
 
				 
			
 
				 ## Disclaimer
			
 
				+
			
 
				 We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
			
 
				 
			
 
				 ## Online Demo
			
--- a/README.pt-BR.md
+++ b/README.pt-BR.md
@@ -1,4 +1,3 @@
 
				-
			
 
				 <div align="center">
			
 
				 <h1>Fish Speech</h1>
			
 
				 
			
@@ -15,8 +14,9 @@
 
				 <br>
			
 
				 
			
 
				 <div align="center">
			
 
				-    <img src="https://counter.seku.su/cmoe?name=fish-speech&theme=asoul" /><br>
			
 
				+    <img src="https://count.getloli.com/get/@fish-speech?theme=asoul" /><br>
			
 
				 </div>
			
 
				+
			
 
				 <br>
			
 
				 
			
 
				 <div align="center">
			
@@ -34,6 +34,7 @@
 
				 Este código-fonte e os modelos são publicados sob a licença CC-BY-NC-SA-4.0. Consulte [LICENSE](LICENSE) para mais detalhes.
			
 
				 
			
 
				 ---
			
 
				+
			
 
				 ## Funcionalidades
			
 
				 
			
 
				 1. **TTS Zero-shot & Few-shot**: Insira uma amostra vocal de 10 a 30 segundos para gerar saída de TTS de alta qualidade. **Para diretrizes detalhadas, veja [Melhores Práticas para Clonagem de Voz](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
			
@@ -52,13 +53,10 @@ Este código-fonte e os modelos são publicados sob a licença CC-BY-NC-SA-4.0.
 
				 
			
 
				 8. **Fácil de Implantar**: Configura facilmente um servidor de inferência com suporte nativo para Linux, Windows e macOS, minimizando a perda de velocidade.
			
 
				 
			
 
				-   
			
 
				-
			
 
				 ## Isenção de Responsabilidade
			
 
				 
			
 
				 Não nos responsabilizamos por qualquer uso ilegal do código-fonte. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua região.
			
 
				 
			
 
				-
			
 
				 ## Demonstração Online
			
 
				 
			
 
				 [Fish Audio](https://fish.audio)
			
--- a/README.zh.md
+++ b/README.zh.md
@@ -1,4 +1,3 @@
 
				-
			
 
				 <div align="center">
			
 
				 <h1>Fish Speech</h1>
			
 
				 
			
@@ -15,8 +14,9 @@
 
				 <br>
			
 
				 
			
 
				 <div align="center">
			
 
				-    <img src="https://counter.seku.su/cmoe?name=fish-speech&theme=asoul" /><br>
			
 
				+    <img src="https://count.getloli.com/get/@fish-speech?theme=asoul" /><br>
			
 
				 </div>
			
 
				+
			
 
				 <br>
			
 
				 
			
 
				 <div align="center">
			
@@ -30,13 +30,14 @@
 
				         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
			
 
				     </a>
			
 
				     <br>
			
 
				-    
			
 
				+
			
 
				 
			
 
				 </div>
			
 
				 
			
 
				 此代码库及模型根据 CC-BY-NC-SA-4.0 许可证发布。请参阅 [LICENSE](LICENSE) 了解更多细节.
			
 
				 
			
 
				 ---
			
 
				+
			
 
				 ## 特性
			
 
				 
			
 
				 1. **零样本 & 小样本 TTS**：输入 10 到 30 秒的声音样本即可生成高质量的 TTS 输出。**详见 [语音克隆最佳实践指南](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)。**
			
@@ -48,12 +49,10 @@
 
				 7. **GUI 推理**：提供 PyQt6 图形界面，与 API 服务器无缝协作。支持 Linux、Windows 和 macOS。[查看 GUI](https://github.com/AnyaCoder/fish-speech-gui)。
			
 
				 8. **易于部署**：轻松设置推理服务器，原生支持 Linux、Windows 和 macOS，最大程度减少速度损失。
			
 
				 
			
 
				-
			
 
				 ## 免责声明
			
 
				 
			
 
				 我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律法规.
			
 
				 
			
 
				-
			
 
				 ## 在线 DEMO
			
 
				 
			
 
				 [Fish Audio](https://fish.audio)
			
--- a/docs/en/index.md
+++ b/docs/en/index.md
@@ -35,7 +35,7 @@ conda create -n fish-speech python=3.10
 
				 conda activate fish-speech
			
 
				 
			
 
				 # Install pytorch
			
 
				-pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
			
 
				+pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
			
 
				 
			
 
				 # Install fish-speech
			
 
				 pip3 install -e .
			
@@ -100,7 +100,7 @@ conda create -n fish-speech python=3.10
 
				 conda activate fish-speech
			
 
				 
			
 
				 # Install pytorch
			
 
				-pip3 install torch torchvision torchaudio
			
 
				+pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
			
 
				 
			
 
				 # Install fish-speech
			
 
				 pip3 install -e .[stable]
			
@@ -122,7 +122,7 @@ Please refer to [this PR](https://github.com/fishaudio/fish-speech/pull/461#issu
 
				 conda create -n fish-speech python=3.10
			
 
				 conda activate fish-speech
			
 
				 # install pytorch
			
 
				-pip install torch torchvision torchaudio
			
 
				+pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
			
 
				 # install fish-speech
			
 
				 pip install -e .[stable]
			
 
				 ```
			
--- a/docs/ja/index.md
+++ b/docs/ja/index.md
@@ -35,7 +35,7 @@ conda create -n fish-speech python=3.10
 
				 conda activate fish-speech
			
 
				 
			
 
				 # PyTorchをインストール
			
 
				-pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
			
 
				+pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
			
 
				 
			
 
				 # fish-speechをインストール
			
 
				 pip3 install -e .
			
@@ -98,7 +98,7 @@ conda create -n fish-speech python=3.10
 
				 conda activate fish-speech
			
 
				 
			
 
				 # pytorchをインストールします。
			
 
				-pip3 install torch torchvision torchaudio
			
 
				+pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
			
 
				 
			
 
				 # fish-speechをインストールします。
			
 
				 pip3 install -e .[stable]
			
@@ -120,7 +120,7 @@ apt install libsox-dev ffmpeg
 
				 conda create -n fish-speech python=3.10
			
 
				 conda activate fish-speech
			
 
				 # install pytorch
			
 
				-pip install torch torchvision torchaudio
			
 
				+pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
			
 
				 # install fish-speech
			
 
				 pip install -e .[stable]
			
 
				 ```
			
--- a/docs/pt/index.md
+++ b/docs/pt/index.md
@@ -35,7 +35,7 @@ conda create -n fish-speech python=3.10
 
				 conda activate fish-speech
			
 
				 
			
 
				 # Instale o pytorch
			
 
				-pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
			
 
				+pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
			
 
				 
			
 
				 # Instale o fish-speech
			
 
				 pip3 install -e .
			
@@ -96,7 +96,7 @@ conda create -n fish-speech python=3.10
 
				 conda activate fish-speech
			
 
				 
			
 
				 # Instale o pytorch
			
 
				-pip3 install torch torchvision torchaudio
			
 
				+pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
			
 
				 
			
 
				 # Instale o fish-speech
			
 
				 pip3 install -e .[stable]
			
@@ -118,7 +118,7 @@ Para uma comparação das velocidades de inferência, consulte [este PR](https:/
 
				 conda create -n fish-speech python=3.10
			
 
				 conda activate fish-speech
			
 
				 # install pytorch
			
 
				-pip install torch torchvision torchaudio
			
 
				+pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
			
 
				 # install fish-speech
			
 
				 pip install -e .[stable]
			
 
				 ```
			
--- a/docs/zh/index.md
+++ b/docs/zh/index.md
@@ -35,7 +35,7 @@ conda create -n fish-speech python=3.10
 
				 conda activate fish-speech
			
 
				 
			
 
				 # 安装 pytorch
			
 
				-pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
			
 
				+pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
			
 
				 
			
 
				 # 安装 fish-speech
			
 
				 pip3 install -e .
			
@@ -95,7 +95,7 @@ conda create -n fish-speech python=3.10
 
				 conda activate fish-speech
			
 
				 
			
 
				 # 安装 pytorch
			
 
				-pip3 install torch torchvision torchaudio
			
 
				+pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
			
 
				 
			
 
				 # 安装 fish-speech
			
 
				 pip3 install -e .[stable]
			
@@ -117,7 +117,7 @@ apt install libsox-dev ffmpeg
 
				 conda create -n fish-speech python=3.10
			
 
				 conda activate fish-speech
			
 
				 # install pytorch
			
 
				-pip install torch torchvision torchaudio
			
 
				+pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
			
 
				 # install fish-speech
			
 
				 pip install -e .[stable]
			
 
				 ```
			
--- a/install_env.bat
+++ b/install_env.bat
@@ -133,7 +133,7 @@ if "%USE_MIRROR%"=="true" (
 
				 echo "HF_ENDPOINT: !HF_ENDPOINT!"
			
 
				 echo "NO_PROXY: !no_proxy!"
			
 
				 
			
 
				-%PIP_CMD% install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
			
 
				+%PIP_CMD% install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
			
 
				 
			
 
				 %PIP_CMD% install -e . --upgrade-strategy only-if-needed
			
 
				 
			
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -23,7 +23,7 @@ dependencies = [
 
				     "einops>=0.7.0",
			
 
				     "librosa>=0.10.1",
			
 
				     "rich>=13.5.3",
			
 
				-    "gradio>=4.0.0",
			
 
				+    "gradio<5.0.0",
			
 
				     "wandb>=0.15.11",
			
 
				     "grpcio>=1.58.0",
			
 
				     "kui>=1.6.0",
			
@@ -37,6 +37,7 @@ dependencies = [
 
				     "einx[torch]==0.2.2",
			
 
				     "zstandard>=0.22.0",
			
 
				     "pydub",
			
 
				+    "pyaudio",
			
 
				     "faster_whisper",
			
 
				     "modelscope==1.17.1",
			
 
				     "funasr==1.1.5",
			
@@ -47,7 +48,7 @@ dependencies = [
 
				 
			
 
				 [project.optional-dependencies]
			
 
				 stable = [
			
 
				-    "torch>=2.3.1",
			
 
				+    "torch<=2.4.1",
			
 
				     "torchaudio",
			
 
				 ]
			
 
				 
			
--- a/tools/api.py
+++ b/tools/api.py
@@ -1,4 +1,5 @@
 
				 import io
			
 
				+import os
			
 
				 import queue
			
 
				 import sys
			
 
				 import traceback
			
@@ -88,7 +89,8 @@ def load_audio(reference_audio, sr):
 
				         reference_audio = io.BytesIO(audio_data)
			
 
				 
			
 
				     waveform, original_sr = torchaudio.load(
			
 
				-        reference_audio, backend="ffmpeg" if sys.platform == "linux" else "soundfile"
			
 
				+        reference_audio,
			
 
				+        backend="soundfile",  # not every linux release supports 'sox' or 'ffmpeg'
			
 
				     )
			
 
				 
			
 
				     if waveform.shape[0] > 1:
			
@@ -166,6 +168,8 @@ def get_content_type(audio_format):
 
				 @torch.inference_mode()
			
 
				 def inference(req: ServeTTSRequest):
			
 
				 
			
 
				+    global prompt_tokens, prompt_texts
			
 
				+
			
 
				     idstr: str | None = req.reference_id
			
 
				     if idstr is not None:
			
 
				         ref_folder = Path("references") / idstr
			
@@ -173,33 +177,43 @@ def inference(req: ServeTTSRequest):
 
				         ref_audios = list_files(
			
 
				             ref_folder, AUDIO_EXTENSIONS, recursive=True, sort=False
			
 
				         )
			
 
				-        prompt_tokens = [
			
 
				-            encode_reference(
			
 
				-                decoder_model=decoder_model,
			
 
				-                reference_audio=audio_to_bytes(str(ref_audio)),
			
 
				-                enable_reference_audio=True,
			
 
				-            )
			
 
				-            for ref_audio in ref_audios
			
 
				-        ]
			
 
				-        prompt_texts = [
			
 
				-            read_ref_text(str(ref_audio.with_suffix(".lab")))
			
 
				-            for ref_audio in ref_audios
			
 
				-        ]
			
 
				+
			
 
				+        if req.use_memory_cache == "never" or (
			
 
				+            req.use_memory_cache == "on-demand" and len(prompt_tokens) == 0
			
 
				+        ):
			
 
				+            prompt_tokens = [
			
 
				+                encode_reference(
			
 
				+                    decoder_model=decoder_model,
			
 
				+                    reference_audio=audio_to_bytes(str(ref_audio)),
			
 
				+                    enable_reference_audio=True,
			
 
				+                )
			
 
				+                for ref_audio in ref_audios
			
 
				+            ]
			
 
				+            prompt_texts = [
			
 
				+                read_ref_text(str(ref_audio.with_suffix(".lab")))
			
 
				+                for ref_audio in ref_audios
			
 
				+            ]
			
 
				+        else:
			
 
				+            logger.info("Use same references")
			
 
				 
			
 
				     else:
			
 
				         # Parse reference audio aka prompt
			
 
				         refs = req.references
			
 
				-        if refs is None:
			
 
				-            refs = []
			
 
				-        prompt_tokens = [
			
 
				-            encode_reference(
			
 
				-                decoder_model=decoder_model,
			
 
				-                reference_audio=ref.audio,
			
 
				-                enable_reference_audio=True,
			
 
				-            )
			
 
				-            for ref in refs
			
 
				-        ]
			
 
				-        prompt_texts = [ref.text for ref in refs]
			
 
				+
			
 
				+        if req.use_memory_cache == "never" or (
			
 
				+            req.use_memory_cache == "on-demand" and len(prompt_tokens) == 0
			
 
				+        ):
			
 
				+            prompt_tokens = [
			
 
				+                encode_reference(
			
 
				+                    decoder_model=decoder_model,
			
 
				+                    reference_audio=ref.audio,
			
 
				+                    enable_reference_audio=True,
			
 
				+                )
			
 
				+                for ref in refs
			
 
				+            ]
			
 
				+            prompt_texts = [ref.text for ref in refs]
			
 
				+        else:
			
 
				+            logger.info("Use same references")
			
 
				 
			
 
				     # LLAMA Inference
			
 
				     request = dict(
			
@@ -397,11 +411,23 @@ app = Kui(
 
				 )
			
 
				 
			
 
				 
			
 
				-if __name__ == "__main__":
			
 
				+# Each worker process created by Uvicorn has its own memory space,
			
 
				+# meaning that models and variables are not shared between processes.
			
 
				+# Therefore, any global variables (like `llama_queue` or `decoder_model`)
			
 
				+# will not be shared across workers.
			
 
				 
			
 
				-    import uvicorn
			
 
				 
			
 
				-    args = parse_args()
			
 
				+# Multi-threading for deep learning can cause issues, such as inconsistent
			
 
				+# outputs if multiple threads access the same buffers simultaneously.
			
 
				+# Instead, it's better to use multiprocessing or independent models per thread.
			
 
				+@app.on_startup
			
 
				+def initialize_app(app: Kui):
			
 
				+
			
 
				+    global args, llama_queue, decoder_model, prompt_tokens, prompt_texts
			
 
				+
			
 
				+    prompt_tokens, prompt_texts = [], []
			
 
				+
			
 
				+    args = parse_args()  # args same as ones in other processes
			
 
				     args.precision = torch.half if args.half else torch.bfloat16
			
 
				 
			
 
				     logger.info("Loading Llama model...")
			
@@ -411,6 +437,7 @@ if __name__ == "__main__":
 
				         precision=args.precision,
			
 
				         compile=args.compile,
			
 
				     )
			
 
				+
			
 
				     logger.info("Llama model loaded, loading VQ-GAN model...")
			
 
				 
			
 
				     decoder_model = load_decoder_model(
			
@@ -421,7 +448,7 @@ if __name__ == "__main__":
 
				 
			
 
				     logger.info("VQ-GAN model loaded, warming up...")
			
 
				 
			
 
				-    # Dry run to check if the model is loaded correctly and avoid the first-time latency
			
 
				+    # Dry run to ensure models work and avoid first-time latency
			
 
				     list(
			
 
				         inference(
			
 
				             ServeTTSRequest(
			
@@ -440,5 +467,18 @@ if __name__ == "__main__":
 
				     )
			
 
				 
			
 
				     logger.info(f"Warming up done, starting server at http://{args.listen}")
			
 
				+
			
 
				+
			
 
				+if __name__ == "__main__":
			
 
				+
			
 
				+    import uvicorn
			
 
				+
			
 
				+    args = parse_args()
			
 
				     host, port = args.listen.split(":")
			
 
				-    uvicorn.run(app, host=host, port=int(port), workers=args.workers, log_level="info")
			
 
				+    uvicorn.run(
			
 
				+        "tools.api:app",
			
 
				+        host=host,
			
 
				+        port=int(port),
			
 
				+        workers=args.workers,
			
 
				+        log_level="info",
			
 
				+    )
			
--- a/tools/commons.py
+++ b/tools/commons.py
@@ -20,6 +20,7 @@ class ServeTTSRequest(BaseModel):
 
				     # For example, if you want use https://fish.audio/m/7f92f8afb8ec43bf81429cc1c9199cb1/
			
 
				     # Just pass 7f92f8afb8ec43bf81429cc1c9199cb1
			
 
				     reference_id: str | None = None
			
 
				+    use_memory_cache: Literal["on-demand", "never"] = "never"
			
 
				     # Normalize text for en & zh, this increase stability for numbers
			
 
				     normalize: bool = True
			
 
				     mp3_bitrate: Optional[int] = 64
			
--- a/tools/post_api.py
+++ b/tools/post_api.py
@@ -103,6 +103,12 @@ def parse_args():
 
				         "--channels", type=int, default=1, help="Number of audio channels"
			
 
				     )
			
 
				     parser.add_argument("--rate", type=int, default=44100, help="Sample rate for audio")
			
 
				+    parser.add_argument(
			
 
				+        "--use_memory_cache",
			
 
				+        type=str,
			
 
				+        default="never",
			
 
				+        help="Cache encoded references codes in memory",
			
 
				+    )
			
 
				 
			
 
				     return parser.parse_args()
			
 
				 
			
@@ -148,6 +154,7 @@ if __name__ == "__main__":
 
				         "speaker": args.speaker,
			
 
				         "emotion": args.emotion,
			
 
				         "streaming": args.streaming,
			
 
				+        "use_memory_cache": args.use_memory_cache,
			
 
				     }
			
 
				 
			
 
				     pydantic_data = ServeTTSRequest(**data)