1 rok pred · 75d7ecb5b5
--- a/docs/assets/openaudio.jpg
+++ b/docs/assets/openaudio.jpg
--- a/docs/assets/openaudio.png
+++ b/docs/assets/openaudio.png
--- a/docs/en/index.md
+++ b/docs/en/index.md
@@ -1,4 +1,14 @@
 
															-# Introduction
														
 
															+# OpenAudio (formerly Fish-Speech)
														
 
															+
														
 
															+<div align="center">
														
 
															+
														
 
															+<div align="center">
														
 
															+
														
 
															+<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
														
 
															+
														
 
															+</div>
														
 
															+
														
 
															+<strong>Advanced Text-to-Speech Model Series</strong>
														
 
															 <div>
														
 
															 <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
														
@@ -12,39 +22,114 @@
 
															 </a>
														
 
															 </div>
														
 
															-!!! warning
														
 
															-    We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area. <br/>
														
 
															-    This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
														
 
															+<strong>Try it now:</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>Learn more:</strong> <a href="https://openaudio.com">OpenAudio Website</a>
														
 
															-## Requirements
														
 
															+</div>
														
 
															+
														
 
															+---
														
 
															+
														
 
															+!!! warning "Legal Notice"
														
 
															+    We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area.
														
 
															+    
														
 
															+    **License:** This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
														
 
															-- GPU Memory: 12GB (Inference)
														
 
															-- System: Linux, Windows
														
 
															+## **Introduction**
														
 
															-## Setup
														
 
															+We are excited to announce that we have rebranded to **OpenAudio** - introducing a brand new series of advanced Text-to-Speech models that builds upon the foundation of Fish-Speech with significant improvements and new capabilities.
														
 
															-First, we need to create a conda environment to install the packages.
														
 
															+**Openaudio-S1-mini**: [Video](To Be Uploaded); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
														
 
															-```bash
														
 
															+**Fish-Speech v1.5**: [Video](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
														
 
															-conda create -n fish-speech python=3.12
														
 
															-conda activate fish-speech
														
 
															+## **Highlights** ✨
														
 
															-pip install sudo apt-get install portaudio19-dev # For pyaudio
														
 
															-pip install -e . # This will download all rest packages.
														
 
															+### **Emotion Control**
														
 
															+OpenAudio S1 **supports a variety of emotional, tone, and special markers** to enhance speech synthesis:
														
 
															-apt install libsox-dev ffmpeg # If needed.
														
 
															+- **Basic emotions**:
														
 
															 ```
														
 
															+(angry) (sad) (excited) (surprised) (satisfied) (delighted) 
														
 
															+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
														
 
															+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
														
 
															+(grateful) (confident) (interested) (curious) (confused) (joyful)
														
 
															+```
														
 
															+
														
 
															+- **Advanced emotions**:
														
 
															+```
														
 
															+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
														
 
															+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
														
 
															+(keen) (disapproving) (negative) (denying) (astonished) (serious)
														
 
															+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
														
 
															+(hesitating) (yielding) (painful) (awkward) (amused)
														
 
															+```
														
 
															+
														
 
															+- **Tone markers**:
														
 
															+```
														
 
															+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
														
 
															+```
														
 
															+
														
 
															+- **Special audio effects**:
														
 
															+```
														
 
															+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
														
 
															+(groaning) (crowd laughing) (background laughter) (audience laughing)
														
 
															+```
														
 
															+
														
 
															+You can also use Ha,ha,ha to control, there's many other cases waiting to be explored by yourself.
														
 
															+
														
 
															+### **Excellent TTS quality**
														
 
															+
														
 
															+We use Seed TTS Eval Metrics to evaluate the model performance, and the results show that OpenAudio S1 achieves **0.008 WER** and **0.004 CER** on English text, which is significantly better than previous models. (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
														
 
															+
														
 
															+| Model | Word Error Rate (WER) | Character Error Rate (CER) | Speaker Distance |
														
 
															+|-------|----------------------|---------------------------|------------------|
														
 
															+| **S1** | **0.008**  | **0.004**  | **0.332** |
														
 
															+| **S1-mini** | **0.011** | **0.005** | **0.380** |
														
 
															+
														
 
															+### **Two Type of Models**
														
 
															+
														
 
															+| Model | Size | Availability | Features |
														
 
															+|-------|------|--------------|----------|
														
 
															+| **S1** | 4B parameters | Avaliable on [fish.audio](fish.audio) | Full-featured flagship model |
														
 
															+| **S1-mini** | 0.5B parameters | Avaliable on huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Distilled version with core capabilities |
														
 
															+
														
 
															+Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedback (RLHF).
														
 
															+
														
 
															+## **Features**
														
 
															+
														
 
															+1. **Zero-shot & Few-shot TTS:** Input a 10 to 30-second vocal sample to generate high-quality TTS output. **For detailed guidelines, see [Voice Cloning Best Practices](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
														
 
															+
														
 
															+2. **Multilingual & Cross-lingual Support:** Simply copy and paste multilingual text into the input box—no need to worry about the language. Currently supports English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.
														
 
															+
														
 
															+3. **No Phoneme Dependency:** The model has strong generalization capabilities and does not rely on phonemes for TTS. It can handle text in any language script.
														
 
															+
														
 
															+4. **Highly Accurate:** Achieves a low CER (Character Error Rate) of around 0.4% and WER (Word Error Rate) of around 0.8% for Seed-TTS Eval.
														
 
															+
														
 
															+5. **Fast:** With fish-tech acceleration, the real-time factor is approximately 1:5 on an Nvidia RTX 4060 laptop and 1:15 on an Nvidia RTX 4090.
														
 
															+
														
 
															+6. **WebUI Inference:** Features an easy-to-use, Gradio-based web UI compatible with Chrome, Firefox, Edge, and other browsers.
														
 
															+
														
 
															+7. **GUI Inference:** Offers a PyQt6 graphical interface that works seamlessly with the API server. Supports Linux, Windows, and macOS. [See GUI](https://github.com/AnyaCoder/fish-speech-gui).
														
 
															+
														
 
															+8. **Deploy-Friendly:** Easily set up an inference server with native support for Linux, Windows (MacOS comming soon), minimizing speed loss.
														
 
															+
														
 
															+## **Disclaimer**
														
 
															+
														
 
															+We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
														
 
															+
														
 
															+## **Media & Demos**
														
 
															+
														
 
															+#### 🚧 Coming Soon
														
 
															+Video demonstrations and tutorials are currently in development.
														
 
															+
														
 
															+## **Documentation**
														
 
															+
														
 
															+### Quick Start
														
 
															+- [Build Environment](en/install.md) - Set up your development environment
														
 
															+- [Inference Guide](en/inference.md) - Run the model and generate speech
														
 
															-!!! warning
														
 
															-    The `compile` option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
														
 
															-## Acknowledgements
														
 
															+## **Community & Support**
														
 
															-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
														
 
															-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
														
 
															-- [GPT VITS](https://github.com/innnky/gpt-vits)
														
 
															-- [MQTTS](https://github.com/b04901014/MQTTS)
														
 
															-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
														
 
															-- [Transformers](https://github.com/huggingface/transformers)
														
 
															-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
														
 
															+- **Discord:** Join our [Discord community](https://discord.gg/Es5qTB9BcN)
														
 
															+- **Website:** Visit [OpenAudio.com](https://openaudio.com) for latest updates
														
 
															+- **Try Online:** [Fish Audio Playground](https://fish.audio)
														
--- a/docs/en/inference.md
+++ b/docs/en/inference.md
@@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
 
															     --text "The text you want to convert" \
														
 
															     --prompt-text "Your reference text" \
														
 
															     --prompt-tokens "fake.npy" \
														
 
															-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
														
 
															-    --num-samples 2 \
														
 
															-    --compile # if you want a faster speed
														
 
															+    --compile
														
 
															 ```
														
 
															 This command will create a `codes_N` file in the working directory, where N is an integer starting from 0.
														
@@ -50,15 +48,12 @@ This command will create a `codes_N` file in the working directory, where N is a
 
															 ### 3. Generate vocals from semantic tokens:
														
 
															-#### VQGAN Decoder
														
 
															-
														
 
															 !!! warning "Future Warning"
														
 
															     We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.
														
 
															 ```bash
														
 
															 python fish_speech/models/dac/inference.py \
														
 
															     -i "codes_0.npy" \
														
 
															-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
														
 
															 ```
														
 
															 ## HTTP API Inference
														
--- a/docs/en/install.md
+++ b/docs/en/install.md
@@ -0,0 +1,31 @@
 
															+## Requirements
														
 
															+
														
 
															+- GPU Memory: 12GB (Inference)
														
 
															+- System: Linux, WSL
														
 
															+
														
 
															+## Setup
														
 
															+
														
 
															+First you need install pyaudio and sox, which is used for audio processing.
														
 
															+
														
 
															+``` bash
														
 
															+apt install portaudio19-dev libsox-dev ffmpeg
														
 
															+```
														
 
															+
														
 
															+### Conda
														
 
															+
														
 
															+```bash
														
 
															+conda create -n fish-speech python=3.12
														
 
															+conda activate fish-speech
														
 
															+
														
 
															+pip install -e .
														
 
															+```
														
 
															+
														
 
															+### UV
														
 
															+
														
 
															+```bash
														
 
															+
														
 
															+uv sync --python 3.12
														
 
															+```
														
 
															+
														
 
															+!!! warning
														
 
															+    The `compile` option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
														
--- a/docs/ja/index.md
+++ b/docs/ja/index.md
@@ -1,4 +1,14 @@
 
															-# 紹介
														
 
															+# OpenAudio (旧 Fish-Speech)
														
 
															+
														
 
															+<div align="center">
														
 
															+
														
 
															+<div align="center">
														
 
															+
														
 
															+<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
														
 
															+
														
 
															+</div>
														
 
															+
														
 
															+<strong>先進的なText-to-Speechモデルシリーズ</strong>
														
 
															 <div>
														
 
															 <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
														
@@ -12,39 +22,113 @@
 
															 </a>
														
 
															 </div>
														
 
															-!!! warning
														
 
															-    このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCA（デジタルミレニアム著作権法）およびその他の関連法規をご参照ください。<br/>
														
 
															-    このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
														
 
															+<strong>今すぐ試す：</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>詳細情報：</strong> <a href="https://openaudio.com">OpenAudio ウェブサイト</a>
														
 
															+
														
 
															+</div>
														
 
															+
														
 
															+---
														
 
															+
														
 
															+!!! warning "法的通知"
														
 
															+    このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCA（デジタルミレニアム著作権法）およびその他の関連法規をご参照ください。
														
 
															+    
														
 
															+    **ライセンス：** このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
														
 
															-## システム要件
														
 
															+## **紹介**
														
 
															-- GPU メモリ：12GB（推論）
														
 
															-- システム：Linux、Windows
														
 
															+私たちは **OpenAudio** への改名を発表できることを嬉しく思います。Fish-Speechを基盤とし、大幅な改善と新機能を加えた、新しい先進的なText-to-Speechモデルシリーズを紹介します。
														
 
															-## セットアップ
														
 
															+**Openaudio-S1-mini**: [動画](アップロード予定); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
														
 
															-まず、パッケージをインストールするためのconda環境を作成する必要があります。
														
 
															+**Fish-Speech v1.5**: [動画](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
														
 
															-```bash
														
 
															+## **ハイライト** ✨
														
 
															-conda create -n fish-speech python=3.12
														
 
															-conda activate fish-speech
														
 
															+### **感情制御**
														
 
															+OpenAudio S1は**多様な感情、トーン、特殊マーカーをサポート**して音声合成を強化します：
														
 
															-pip install sudo apt-get install portaudio19-dev # pyaudio用
														
 
															-pip install -e . # これにより残りのパッケージがすべてダウンロードされます。
														
 
															+- **基本感情**：
														
 
															+```
														
 
															+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
														
 
															+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
														
 
															+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
														
 
															+(grateful) (confident) (interested) (curious) (confused) (joyful)
														
 
															+```
														
 
															+
														
 
															+- **高度な感情**：
														
 
															+```
														
 
															+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
														
 
															+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
														
 
															+(keen) (disapproving) (negative) (denying) (astonished) (serious)
														
 
															+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
														
 
															+(hesitating) (yielding) (painful) (awkward) (amused)
														
 
															+```
														
 
															-apt install libsox-dev ffmpeg # 必要に応じて。
														
 
															+- **トーンマーカー**：
														
 
															 ```
														
 
															+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
														
 
															+```
														
 
															+
														
 
															+- **特殊音響効果**：
														
 
															+```
														
 
															+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
														
 
															+(groaning) (crowd laughing) (background laughter) (audience laughing)
														
 
															+```
														
 
															+
														
 
															+Ha,ha,haを使用してコントロールすることもでき、他にも多くの使用法があなた自身の探索を待っています。
														
 
															+
														
 
															+### **優秀なTTS品質**
														
 
															+
														
 
															+Seed TTS評価指標を使用してモデルのパフォーマンスを評価した結果、OpenAudio S1は英語テキストで**0.008 WER**と**0.004 CER**を達成し、以前のモデルより大幅に改善されました。（英語、自動評価、OpenAI gpt-4o-転写に基づく、話者距離はRevai/pyannote-wespeaker-voxceleb-resnet34-LM使用）
														
 
															+
														
 
															+| モデル | 単語誤り率 (WER) | 文字誤り率 (CER) | 話者距離 |
														
 
															+|-------|----------------------|---------------------------|------------------|
														
 
															+| **S1** | **0.008**  | **0.004**  | **0.332** |
														
 
															+| **S1-mini** | **0.011** | **0.005** | **0.380** |
														
 
															+
														
 
															+### **2つのモデルタイプ**
														
 
															+
														
 
															+| モデル | サイズ | 利用可能性 | 特徴 |
														
 
															+|-------|------|--------------|----------|
														
 
															+| **S1** | 40億パラメータ | [fish.audio](fish.audio) で利用可能 | 全機能搭載のフラッグシップモデル |
														
 
															+| **S1-mini** | 5億パラメータ | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) で利用可能 | コア機能を備えた蒸留版 |
														
 
															+
														
 
															+S1とS1-miniの両方にオンライン人間フィードバック強化学習（RLHF）が組み込まれています。
														
 
															+
														
 
															+## **機能**
														
 
															+
														
 
															+1. **ゼロショット・フューショットTTS：** 10〜30秒の音声サンプルを入力するだけで高品質なTTS出力を生成します。**詳細なガイドラインについては、[音声クローニングのベストプラクティス](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)をご覧ください。**
														
 
															+
														
 
															+2. **多言語・言語横断サポート：** 多言語テキストを入力ボックスにコピー＆ペーストするだけで、言語を気にする必要はありません。現在、英語、日本語、韓国語、中国語、フランス語、ドイツ語、アラビア語、スペイン語をサポートしています。
														
 
															+
														
 
															+3. **音素依存なし：** このモデルは強力な汎化能力を持ち、TTSに音素に依存しません。あらゆる言語スクリプトのテキストを処理できます。
														
 
															+
														
 
															+4. **高精度：** Seed-TTS Evalで低い文字誤り率（CER）約0.4%と単語誤り率（WER）約0.8%を達成します。
														
 
															+
														
 
															+5. **高速：** fish-tech加速により、Nvidia RTX 4060ラップトップでリアルタイム係数約1:5、Nvidia RTX 4090で約1:15を実現します。
														
 
															+
														
 
															+6. **WebUI推論：** Chrome、Firefox、Edge、その他のブラウザと互換性のあるGradioベースの使いやすいWebUIを備えています。
														
 
															+
														
 
															+7. **GUI推論：** APIサーバーとシームレスに連携するPyQt6グラフィカルインターフェースを提供します。Linux、Windows、macOSをサポートします。[GUIを見る](https://github.com/AnyaCoder/fish-speech-gui)。
														
 
															+
														
 
															+8. **デプロイフレンドリー：** Linux、Windows、MacOSの native サポートで推論サーバーを簡単にセットアップし、速度低下を最小化します。
														
 
															+
														
 
															+## **免責事項**
														
 
															+
														
 
															+コードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCAやその他の関連法律をご参照ください。
														
 
															+
														
 
															+## **メディア・デモ**
														
 
															+
														
 
															+#### 🚧 近日公開
														
 
															+動画デモとチュートリアルは現在開発中です。
														
 
															+
														
 
															+## **ドキュメント**
														
 
															-!!! warning
														
 
															-    `compile`オプションはWindowsとmacOSでサポートされていません。compileで実行したい場合は、tritionを自分でインストールする必要があります。
														
 
															+### クイックスタート
														
 
															+- [環境構築](install.md) - 開発環境をセットアップ
														
 
															+- [推論ガイド](inference.md) - モデルを実行して音声を生成
														
 
															-## 謝辞
														
 
															+## **コミュニティ・サポート**
														
 
															-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
														
 
															-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
														
 
															-- [GPT VITS](https://github.com/innnky/gpt-vits)
														
 
															-- [MQTTS](https://github.com/b04901014/MQTTS)
														
 
															-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
														
 
															-- [Transformers](https://github.com/huggingface/transformers)
														
 
															-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
														
 
															+- **Discord：** [Discordコミュニティ](https://discord.gg/Es5qTB9BcN)に参加
														
 
															+- **ウェブサイト：** 最新アップデートは[OpenAudio.com](https://openaudio.com)をご覧ください
														
 
															+- **オンライン試用：** [Fish Audio Playground](https://fish.audio)
														
--- a/docs/ja/inference.md
+++ b/docs/ja/inference.md
@@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
 
															     --text "変換したいテキスト" \
														
 
															     --prompt-text "参照テキスト" \
														
 
															     --prompt-tokens "fake.npy" \
														
 
															-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
														
 
															-    --num-samples 2 \
														
 
															-    --compile # より高速化を求める場合
														
 
															+    --compile
														
 
															 ```
														
 
															 このコマンドは、作業ディレクトリに `codes_N` ファイルを作成します（Nは0から始まる整数）。
														
@@ -50,15 +48,12 @@ python fish_speech/models/text2semantic/inference.py \
 
															 ### 3. セマンティックトークンから音声を生成：
														
 
															-#### VQGANデコーダー
														
 
															-
														
 
															 !!! warning "将来の警告"
														
 
															     元のパス（tools/vqgan/inference.py）からアクセス可能なインターフェースを維持していますが、このインターフェースは後続のリリースで削除される可能性があるため、できるだけ早くコードを変更してください。
														
 
															 ```bash
														
 
															 python fish_speech/models/dac/inference.py \
														
 
															-    -i "codes_0.npy" \
														
 
															-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
														
 
															+    -i "codes_0.npy"
														
 
															 ```
														
 
															 ## HTTP API推論
														
@@ -103,5 +98,3 @@ python -m tools.run_webui
 
															 !!! note
														
 
															     `GRADIO_SHARE`、`GRADIO_SERVER_PORT`、`GRADIO_SERVER_NAME` などのGradio環境変数を使用してWebUIを設定できます。
														
 
															-
														
 
															-お楽しみください！
														
--- a/docs/ja/install.md
+++ b/docs/ja/install.md
@@ -0,0 +1,30 @@
 
															+## システム要件
														
 
															+
														
 
															+- GPU メモリ：12GB（推論）
														
 
															+- システム：Linux、WSL
														
 
															+
														
 
															+## セットアップ
														
 
															+
														
 
															+まず、音声処理に使用される pyaudio と sox をインストールする必要があります。
														
 
															+
														
 
															+``` bash
														
 
															+apt install portaudio19-dev libsox-dev ffmpeg
														
 
															+```
														
 
															+
														
 
															+### Conda
														
 
															+
														
 
															+```bash
														
 
															+conda create -n fish-speech python=3.12
														
 
															+conda activate fish-speech
														
 
															+
														
 
															+pip install -e .
														
 
															+```
														
 
															+
														
 
															+### UV
														
 
															+
														
 
															+```bash
														
 
															+uv sync --python 3.12
														
 
															+```
														
 
															+
														
 
															+!!! warning
														
 
															+    `compile` オプションは Windows と macOS でサポートされていません。compile で実行したい場合は、triton を自分でインストールする必要があります。
														
--- a/docs/ko/index.md
+++ b/docs/ko/index.md
@@ -1,4 +1,14 @@
 
															-# 소개
														
 
															+# OpenAudio (구 Fish-Speech)
														
 
															+
														
 
															+<div align="center">
														
 
															+
														
 
															+<div align="center">
														
 
															+
														
 
															+<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
														
 
															+
														
 
															+</div>
														
 
															+
														
 
															+<strong>고급 텍스트-음성 변환 모델 시리즈</strong>
														
 
															 <div>
														
 
															 <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
														
@@ -12,39 +22,113 @@
 
															 </a>
														
 
															 </div>
														
 
															-!!! warning
														
 
															-    코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다. <br/>
														
 
															-    이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
														
 
															+<strong>지금 체험:</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>자세히 알아보기:</strong> <a href="https://openaudio.com">OpenAudio 웹사이트</a>
														
 
															+
														
 
															+</div>
														
 
															+
														
 
															+---
														
 
															+
														
 
															+!!! warning "법적 고지"
														
 
															+    코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다.
														
 
															+    
														
 
															+    **라이선스:** 이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
														
 
															-## 시스템 요구사항
														
 
															+## **소개**
														
 
															-- GPU 메모리: 12GB (추론)
														
 
															-- 시스템: Linux, Windows
														
 
															+저희는 **OpenAudio**로의 브랜드 변경을 발표하게 되어 기쁩니다. Fish-Speech를 기반으로 하여 상당한 개선과 새로운 기능을 추가한 새로운 고급 텍스트-음성 변환 모델 시리즈를 소개합니다.
														
 
															-## 설치
														
 
															+**Openaudio-S1-mini**: [동영상](업로드 예정); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
														
 
															-먼저 패키지를 설치하기 위한 conda 환경을 만들어야 합니다.
														
 
															+**Fish-Speech v1.5**: [동영상](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
														
 
															-```bash
														
 
															+## **주요 특징** ✨
														
 
															-conda create -n fish-speech python=3.12
														
 
															-conda activate fish-speech
														
 
															+### **감정 제어**
														
 
															+OpenAudio S1은 **다양한 감정, 톤, 특수 마커를 지원**하여 음성 합성을 향상시킵니다:
														
 
															-pip install sudo apt-get install portaudio19-dev # pyaudio용
														
 
															-pip install -e . # 나머지 모든 패키지를 다운로드합니다.
														
 
															+- **기본 감정**:
														
 
															+```
														
 
															+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
														
 
															+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
														
 
															+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
														
 
															+(grateful) (confident) (interested) (curious) (confused) (joyful)
														
 
															+```
														
 
															+
														
 
															+- **고급 감정**:
														
 
															+```
														
 
															+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
														
 
															+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
														
 
															+(keen) (disapproving) (negative) (denying) (astonished) (serious)
														
 
															+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
														
 
															+(hesitating) (yielding) (painful) (awkward) (amused)
														
 
															+```
														
 
															-apt install libsox-dev ffmpeg # 필요한 경우.
														
 
															+- **톤 마커**:
														
 
															 ```
														
 
															+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
														
 
															+```
														
 
															+
														
 
															+- **특수 음향 효과**:
														
 
															+```
														
 
															+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
														
 
															+(groaning) (crowd laughing) (background laughter) (audience laughing)
														
 
															+```
														
 
															+
														
 
															+Ha,ha,ha를 사용하여 제어할 수도 있으며, 여러분 스스로 탐구할 수 있는 다른 많은 사용법이 있습니다.
														
 
															+
														
 
															+### **뛰어난 TTS 품질**
														
 
															+
														
 
															+Seed TTS 평가 지표를 사용하여 모델 성능을 평가한 결과, OpenAudio S1은 영어 텍스트에서 **0.008 WER**과 **0.004 CER**을 달성하여 이전 모델보다 현저히 향상되었습니다. (영어, 자동 평가, OpenAI gpt-4o-전사 기반, 화자 거리는 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 사용)
														
 
															+
														
 
															+| 모델 | 단어 오류율 (WER) | 문자 오류율 (CER) | 화자 거리 |
														
 
															+|-------|----------------------|---------------------------|------------------|
														
 
															+| **S1** | **0.008**  | **0.004**  | **0.332** |
														
 
															+| **S1-mini** | **0.011** | **0.005** | **0.380** |
														
 
															+
														
 
															+### **두 가지 모델 유형**
														
 
															+
														
 
															+| 모델 | 크기 | 가용성 | 특징 |
														
 
															+|-------|------|--------------|----------|
														
 
															+| **S1** | 40억 매개변수 | [fish.audio](fish.audio)에서 이용 가능 | 모든 기능을 갖춘 플래그십 모델 |
														
 
															+| **S1-mini** | 5억 매개변수 | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)에서 이용 가능 | 핵심 기능을 갖춘 경량화 버전 |
														
 
															+
														
 
															+S1과 S1-mini 모두 온라인 인간 피드백 강화 학습(RLHF)이 통합되어 있습니다.
														
 
															+
														
 
															+## **기능**
														
 
															+
														
 
															+1. **제로샷 및 퓨샷 TTS:** 10~30초의 음성 샘플을 입력하여 고품질 TTS 출력을 생성합니다. **자세한 가이드라인은 [음성 복제 모범 사례](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)를 참조하세요.**
														
 
															+
														
 
															+2. **다국어 및 교차 언어 지원:** 다국어 텍스트를 입력 상자에 복사하여 붙여넣기만 하면 됩니다. 언어에 대해 걱정할 필요가 없습니다. 현재 영어, 일본어, 한국어, 중국어, 프랑스어, 독일어, 아랍어, 스페인어를 지원합니다.
														
 
															+
														
 
															+3. **음소 의존성 없음:** 이 모델은 강력한 일반화 능력을 가지고 있으며 TTS에 음소에 의존하지 않습니다. 어떤 언어 스크립트의 텍스트도 처리할 수 있습니다.
														
 
															+
														
 
															+4. **높은 정확도:** Seed-TTS Eval에서 약 0.4%의 낮은 문자 오류율(CER)과 약 0.8%의 단어 오류율(WER)을 달성합니다.
														
 
															+
														
 
															+5. **빠른 속도:** fish-tech 가속을 통해 Nvidia RTX 4060 노트북에서 실시간 계수 약 1:5, Nvidia RTX 4090에서 약 1:15를 달성합니다.
														
 
															+
														
 
															+6. **WebUI 추론:** Chrome, Firefox, Edge 및 기타 브라우저와 호환되는 사용하기 쉬운 Gradio 기반 웹 UI를 제공합니다.
														
 
															+
														
 
															+7. **GUI 추론:** API 서버와 원활하게 작동하는 PyQt6 그래픽 인터페이스를 제공합니다. Linux, Windows, macOS를 지원합니다. [GUI 보기](https://github.com/AnyaCoder/fish-speech-gui).
														
 
															+
														
 
															+8. **배포 친화적:** Linux, Windows, MacOS의 네이티브 지원으로 추론 서버를 쉽게 설정하여 속도 손실을 최소화합니다.
														
 
															+
														
 
															+## **면책 조항**
														
 
															+
														
 
															+코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하 지역의 DMCA 및 기타 관련 법률을 참고하시기 바랍니다.
														
 
															+
														
 
															+## **미디어 및 데모**
														
 
															+
														
 
															+#### 🚧 곧 출시 예정
														
 
															+동영상 데모와 튜토리얼이 현재 개발 중입니다.
														
 
															+
														
 
															+## **문서**
														
 
															-!!! warning
														
 
															-    `compile` 옵션은 Windows와 macOS에서 지원되지 않습니다. compile로 실행하려면 trition을 직접 설치해야 합니다.
														
 
															+### 빠른 시작
														
 
															+- [환경 구축](install.md) - 개발 환경 설정
														
 
															+- [추론 가이드](inference.md) - 모델 실행 및 음성 생성
														
 
															-## 감사의 말
														
 
															+## **커뮤니티 및 지원**
														
 
															-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
														
 
															-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
														
 
															-- [GPT VITS](https://github.com/innnky/gpt-vits)
														
 
															-- [MQTTS](https://github.com/b04901014/MQTTS)
														
 
															-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
														
 
															-- [Transformers](https://github.com/huggingface/transformers)
														
 
															-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
														
 
															+- **Discord:** [Discord 커뮤니티](https://discord.gg/Es5qTB9BcN)에 참여하세요
														
 
															+- **웹사이트:** 최신 업데이트는 [OpenAudio.com](https://openaudio.com)을 방문하세요
														
 
															+- **온라인 체험:** [Fish Audio Playground](https://fish.audio)
														
--- a/docs/ko/inference.md
+++ b/docs/ko/inference.md
@@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
 
															     --text "변환하고 싶은 텍스트" \
														
 
															     --prompt-text "참조 텍스트" \
														
 
															     --prompt-tokens "fake.npy" \
														
 
															-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
														
 
															-    --num-samples 2 \
														
 
															-    --compile # 더 빠른 속도를 원한다면
														
 
															+    --compile
														
 
															 ```
														
 
															 이 명령은 작업 디렉토리에 `codes_N` 파일을 생성합니다. 여기서 N은 0부터 시작하는 정수입니다.
														
@@ -50,15 +48,12 @@ python fish_speech/models/text2semantic/inference.py \
 
															 ### 3. 의미 토큰에서 음성 생성:
														
 
															-#### VQGAN 디코더
														
 
															-
														
 
															 !!! warning "향후 경고"
														
 
															     원래 경로(tools/vqgan/inference.py)에서 액세스 가능한 인터페이스를 유지하고 있지만, 이 인터페이스는 향후 릴리스에서 제거될 수 있으므로 가능한 한 빨리 코드를 변경해 주세요.
														
 
															 ```bash
														
 
															 python fish_speech/models/dac/inference.py \
														
 
															-    -i "codes_0.npy" \
														
 
															-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
														
 
															+    -i "codes_0.npy"
														
 
															 ```
														
 
															 ## HTTP API 추론
														
@@ -103,5 +98,3 @@ python -m tools.run_webui
 
															 !!! note
														
 
															     `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME`과 같은 Gradio 환경 변수를 사용하여 WebUI를 구성할 수 있습니다.
														
 
															-
														
 
															-즐기세요!
														
--- a/docs/ko/install.md
+++ b/docs/ko/install.md
@@ -0,0 +1,30 @@
 
															+## 시스템 요구사항
														
 
															+
														
 
															+- GPU 메모리: 12GB (추론)
														
 
															+- 시스템: Linux, WSL
														
 
															+
														
 
															+## 설정
														
 
															+
														
 
															+먼저 오디오 처리에 사용되는 pyaudio와 sox를 설치해야 합니다.
														
 
															+
														
 
															+``` bash
														
 
															+apt install portaudio19-dev libsox-dev ffmpeg
														
 
															+```
														
 
															+
														
 
															+### Conda
														
 
															+
														
 
															+```bash
														
 
															+conda create -n fish-speech python=3.12
														
 
															+conda activate fish-speech
														
 
															+
														
 
															+pip install -e .
														
 
															+```
														
 
															+
														
 
															+### UV
														
 
															+
														
 
															+```bash
														
 
															+uv sync --python 3.12
														
 
															+```
														
 
															+
														
 
															+!!! warning
														
 
															+    `compile` 옵션은 Windows와 macOS에서 지원되지 않습니다. compile로 실행하려면 triton을 직접 설치해야 합니다.
														
--- a/docs/pt/index.md
+++ b/docs/pt/index.md
@@ -1,4 +1,14 @@
 
															-# Introdução
														
 
															+# OpenAudio (anteriormente Fish-Speech)
														
 
															+
														
 
															+<div align="center">
														
 
															+
														
 
															+<div align="center">
														
 
															+
														
 
															+<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
														
 
															+
														
 
															+</div>
														
 
															+
														
 
															+<strong>Série Avançada de Modelos Text-to-Speech</strong>
														
 
															 <div>
														
 
															 <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
														
@@ -12,39 +22,113 @@
 
															 </a>
														
 
															 </div>
														
 
															-!!! warning
														
 
															-    Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área. <br/>
														
 
															-    Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
														
 
															+<strong>Experimente agora:</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>Saiba mais:</strong> <a href="https://openaudio.com">Site OpenAudio</a>
														
 
															+
														
 
															+</div>
														
 
															+
														
 
															+---
														
 
															+
														
 
															+!!! warning "Aviso Legal"
														
 
															+    Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área.
														
 
															+    
														
 
															+    **Licença:** Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
														
 
															-## Requisitos
														
 
															+## **Introdução**
														
 
															-- Memória GPU: 12GB (Inferência)
														
 
															-- Sistema: Linux, Windows
														
 
															+Estamos empolgados em anunciar que mudamos nossa marca para **OpenAudio** - introduzindo uma nova série de modelos avançados de Text-to-Speech que se baseia na fundação do Fish-Speech com melhorias significativas e novas capacidades.
														
 
															-## Configuração
														
 
															+**Openaudio-S1-mini**: [Vídeo](A ser carregado); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
														
 
															-Primeiro, precisamos criar um ambiente conda para instalar os pacotes.
														
 
															+**Fish-Speech v1.5**: [Vídeo](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
														
 
															-```bash
														
 
															+## **Destaques** ✨
														
 
															-conda create -n fish-speech python=3.12
														
 
															-conda activate fish-speech
														
 
															+### **Controle Emocional**
														
 
															+O OpenAudio S1 **suporta uma variedade de marcadores emocionais, de tom e especiais** para aprimorar a síntese de fala:
														
 
															-pip install sudo apt-get install portaudio19-dev # Para pyaudio
														
 
															-pip install -e . # Isso baixará todos os pacotes restantes.
														
 
															+- **Emoções básicas**:
														
 
															+```
														
 
															+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
														
 
															+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
														
 
															+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
														
 
															+(grateful) (confident) (interested) (curious) (confused) (joyful)
														
 
															+```
														
 
															+
														
 
															+- **Emoções avançadas**:
														
 
															+```
														
 
															+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
														
 
															+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
														
 
															+(keen) (disapproving) (negative) (denying) (astonished) (serious)
														
 
															+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
														
 
															+(hesitating) (yielding) (painful) (awkward) (amused)
														
 
															+```
														
 
															-apt install libsox-dev ffmpeg # Se necessário.
														
 
															+- **Marcadores de tom**:
														
 
															 ```
														
 
															+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
														
 
															+```
														
 
															+
														
 
															+- **Efeitos sonoros especiais**:
														
 
															+```
														
 
															+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
														
 
															+(groaning) (crowd laughing) (background laughter) (audience laughing)
														
 
															+```
														
 
															+
														
 
															+Você também pode usar Ha,ha,ha para controlar, há muitos outros casos esperando para serem explorados por você mesmo.
														
 
															+
														
 
															+### **Qualidade TTS Excelente**
														
 
															+
														
 
															+Utilizamos as métricas Seed TTS Eval para avaliar o desempenho do modelo, e os resultados mostram que o OpenAudio S1 alcança **0.008 WER** e **0.004 CER** em texto inglês, que é significativamente melhor que modelos anteriores. (Inglês, avaliação automática, baseada na transcrição OpenAI gpt-4o, distância do falante usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
														
 
															+
														
 
															+| Modelo | Taxa de Erro de Palavras (WER) | Taxa de Erro de Caracteres (CER) | Distância do Falante |
														
 
															+|-------|----------------------|---------------------------|------------------|
														
 
															+| **S1** | **0.008**  | **0.004**  | **0.332** |
														
 
															+| **S1-mini** | **0.011** | **0.005** | **0.380** |
														
 
															+
														
 
															+### **Dois Tipos de Modelos**
														
 
															+
														
 
															+| Modelo | Tamanho | Disponibilidade | Características |
														
 
															+|-------|------|--------------|----------|
														
 
															+| **S1** | 4B parâmetros | Disponível em [fish.audio](fish.audio) | Modelo principal com todas as funcionalidades |
														
 
															+| **S1-mini** | 0.5B parâmetros | Disponível no huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Versão destilada com capacidades principais |
														
 
															+
														
 
															+Tanto o S1 quanto o S1-mini incorporam Aprendizado por Reforço Online com Feedback Humano (RLHF).
														
 
															+
														
 
															+## **Características**
														
 
															+
														
 
															+1. **TTS Zero-shot e Few-shot:** Insira uma amostra vocal de 10 a 30 segundos para gerar saída TTS de alta qualidade. **Para diretrizes detalhadas, veja [Melhores Práticas de Clonagem de Voz](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
														
 
															+
														
 
															+2. **Suporte Multilíngue e Cross-lingual:** Simplesmente copie e cole texto multilíngue na caixa de entrada—não precisa se preocupar com o idioma. Atualmente suporta inglês, japonês, coreano, chinês, francês, alemão, árabe e espanhol.
														
 
															+
														
 
															+3. **Sem Dependência de Fonemas:** O modelo tem fortes capacidades de generalização e não depende de fonemas para TTS. Pode lidar com texto em qualquer script de idioma.
														
 
															+
														
 
															+4. **Altamente Preciso:** Alcança uma baixa Taxa de Erro de Caracteres (CER) de cerca de 0,4% e Taxa de Erro de Palavras (WER) de cerca de 0,8% para Seed-TTS Eval.
														
 
															+
														
 
															+5. **Rápido:** Com aceleração fish-tech, o fator de tempo real é aproximadamente 1:5 em um laptop Nvidia RTX 4060 e 1:15 em um Nvidia RTX 4090.
														
 
															+
														
 
															+6. **Inferência WebUI:** Apresenta uma interface web fácil de usar baseada em Gradio, compatível com Chrome, Firefox, Edge e outros navegadores.
														
 
															+
														
 
															+7. **Inferência GUI:** Oferece uma interface gráfica PyQt6 que funciona perfeitamente com o servidor API. Suporta Linux, Windows e macOS. [Ver GUI](https://github.com/AnyaCoder/fish-speech-gui).
														
 
															+
														
 
															+8. **Amigável para Deploy:** Configure facilmente um servidor de inferência com suporte nativo para Linux, Windows e MacOS, minimizando a perda de velocidade.
														
 
															+
														
 
															+## **Isenção de Responsabilidade**
														
 
															+
														
 
															+Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte suas leis locais sobre DMCA e outras leis relacionadas.
														
 
															+
														
 
															+## **Mídia e Demos**
														
 
															+
														
 
															+#### 🚧 Em Breve
														
 
															+Demonstrações em vídeo e tutoriais estão atualmente em desenvolvimento.
														
 
															+
														
 
															+## **Documentação**
														
 
															-!!! warning
														
 
															-    A opção `compile` não é suportada no Windows e macOS, se você quiser executar com compile, precisa instalar o trition por conta própria.
														
 
															+### Início Rápido
														
 
															+- [Configurar Ambiente](install.md) - Configure seu ambiente de desenvolvimento
														
 
															+- [Guia de Inferência](inference.md) - Execute o modelo e gere fala
														
 
															-## Agradecimentos
														
 
															+## **Comunidade e Suporte**
														
 
															-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
														
 
															-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
														
 
															-- [GPT VITS](https://github.com/innnky/gpt-vits)
														
 
															-- [MQTTS](https://github.com/b04901014/MQTTS)
														
 
															-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
														
 
															-- [Transformers](https://github.com/huggingface/transformers)
														
 
															-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
														
 
															+- **Discord:** Junte-se à nossa [comunidade Discord](https://discord.gg/Es5qTB9BcN)
														
 
															+- **Site:** Visite [OpenAudio.com](https://openaudio.com) para as últimas atualizações
														
 
															+- **Experimente Online:** [Fish Audio Playground](https://fish.audio)
														
--- a/docs/pt/inference.md
+++ b/docs/pt/inference.md
@@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
 
															     --text "O texto que você quer converter" \
														
 
															     --prompt-text "Seu texto de referência" \
														
 
															     --prompt-tokens "fake.npy" \
														
 
															-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
														
 
															-    --num-samples 2 \
														
 
															-    --compile # se você quiser uma velocidade mais rápida
														
 
															+    --compile
														
 
															 ```
														
 
															 Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é um inteiro começando de 0.
														
@@ -50,15 +48,12 @@ Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é u
 
															 ### 3. Gerar vocais a partir de tokens semânticos:
														
 
															-#### Decodificador VQGAN
														
 
															-
														
 
															 !!! warning "Aviso Futuro"
														
 
															     Mantivemos a interface acessível do caminho original (tools/vqgan/inference.py), mas esta interface pode ser removida em versões subsequentes, então por favor altere seu código o mais breve possível.
														
 
															 ```bash
														
 
															 python fish_speech/models/dac/inference.py \
														
 
															-    -i "codes_0.npy" \
														
 
															-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
														
 
															+    -i "codes_0.npy"
														
 
															 ```
														
 
															 ## Inferência com API HTTP
														
@@ -103,5 +98,3 @@ python -m tools.run_webui
 
															 !!! note
														
 
															     Você pode usar variáveis de ambiente do Gradio, como `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` para configurar o WebUI.
														
 
															-
														
 
															-Divirta-se!
														
--- a/docs/pt/install.md
+++ b/docs/pt/install.md
@@ -0,0 +1,30 @@
 
															+## Requisitos
														
 
															+
														
 
															+- Memória GPU: 12GB (Inferência)
														
 
															+- Sistema: Linux, WSL
														
 
															+
														
 
															+## Configuração
														
 
															+
														
 
															+Primeiro você precisa instalar pyaudio e sox, que são usados para processamento de áudio.
														
 
															+
														
 
															+``` bash
														
 
															+apt install portaudio19-dev libsox-dev ffmpeg
														
 
															+```
														
 
															+
														
 
															+### Conda
														
 
															+
														
 
															+```bash
														
 
															+conda create -n fish-speech python=3.12
														
 
															+conda activate fish-speech
														
 
															+
														
 
															+pip install -e .
														
 
															+```
														
 
															+
														
 
															+### UV
														
 
															+
														
 
															+```bash
														
 
															+uv sync --python 3.12
														
 
															+```
														
 
															+
														
 
															+!!! warning
														
 
															+    A opção `compile` não é suportada no Windows e macOS, se você quiser executar com compile, precisa instalar o triton por conta própria.
														
--- a/docs/zh/index.md
+++ b/docs/zh/index.md
@@ -1,4 +1,14 @@
 
															-# 简介
														
 
															+# OpenAudio (原 Fish-Speech)
														
 
															+
														
 
															+<div align="center">
														
 
															+
														
 
															+<div align="center">
														
 
															+
														
 
															+<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
														
 
															+
														
 
															+</div>
														
 
															+
														
 
															+<strong>先进的文字转语音模型系列</strong>
														
 
															 <div>
														
 
															 <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
														
@@ -12,39 +22,113 @@
 
															 </a>
														
 
															 </div>
														
 
															-!!! warning
														
 
															-    我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA（数字千年版权法）和其他相关法律的规定。<br/>
														
 
															-    此代码库在 Apache 2.0 许可证下发布，所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
														
 
															+<strong>立即试用：</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>了解更多：</strong> <a href="https://openaudio.com">OpenAudio 网站</a>
														
 
															+
														
 
															+</div>
														
 
															+
														
 
															+---
														
 
															+
														
 
															+!!! warning "法律声明"
														
 
															+    我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA（数字千年版权法）和其他相关法律的规定。
														
 
															+    
														
 
															+    **许可证：** 此代码库在 Apache 2.0 许可证下发布，所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
														
 
															-## 系统要求
														
 
															+## **介绍**
														
 
															-- GPU 内存：12GB（推理）
														
 
															-- 系统：Linux、Windows
														
 
															+我们很高兴地宣布，我们已经更名为 **OpenAudio** - 推出全新的先进文字转语音模型系列，在 Fish-Speech 的基础上进行了重大改进并增加了新功能。
														
 
															-## 安装
														
 
															+**Openaudio-S1-mini**: [视频](即将上传); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
														
 
															-首先，我们需要创建一个 conda 环境来安装包。
														
 
															+**Fish-Speech v1.5**: [视频](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
														
 
															-```bash
														
 
															+## **亮点** ✨
														
 
															-conda create -n fish-speech python=3.12
														
 
															-conda activate fish-speech
														
 
															+### **情感控制**
														
 
															+OpenAudio S1 **支持多种情感、语调和特殊标记**来增强语音合成效果：
														
 
															-pip install sudo apt-get install portaudio19-dev # 用于 pyaudio
														
 
															-pip install -e . # 这将下载所有其余的包。
														
 
															+- **基础情感**：
														
 
															+```
														
 
															+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
														
 
															+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
														
 
															+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
														
 
															+(grateful) (confident) (interested) (curious) (confused) (joyful)
														
 
															+```
														
 
															+
														
 
															+- **高级情感**：
														
 
															+```
														
 
															+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
														
 
															+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
														
 
															+(keen) (disapproving) (negative) (denying) (astonished) (serious)
														
 
															+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
														
 
															+(hesitating) (yielding) (painful) (awkward) (amused)
														
 
															+```
														
 
															-apt install libsox-dev ffmpeg # 如果需要的话。
														
 
															+- **语调标记**：
														
 
															 ```
														
 
															+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
														
 
															+```
														
 
															+
														
 
															+- **特殊音效**：
														
 
															+```
														
 
															+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
														
 
															+(groaning) (crowd laughing) (background laughter) (audience laughing)
														
 
															+```
														
 
															+
														
 
															+您还可以使用 Ha,ha,ha 来控制，还有许多其他用法等待您自己探索。
														
 
															+
														
 
															+### **卓越的 TTS 质量**
														
 
															+
														
 
															+我们使用 Seed TTS 评估指标来评估模型性能，结果显示 OpenAudio S1 在英文文本上达到了 **0.008 WER** 和 **0.004 CER**，明显优于以前的模型。（英语，自动评估，基于 OpenAI gpt-4o-转录，说话人距离使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM）
														
 
															+
														
 
															+| 模型 | 词错误率 (WER) | 字符错误率 (CER) | 说话人距离 |
														
 
															+|-------|----------------------|---------------------------|------------------|
														
 
															+| **S1** | **0.008**  | **0.004**  | **0.332** |
														
 
															+| **S1-mini** | **0.011** | **0.005** | **0.380** |
														
 
															+
														
 
															+### **两种模型类型**
														
 
															+
														
 
															+| 模型 | 规模 | 可用性 | 特性 |
														
 
															+|-------|------|--------------|----------|
														
 
															+| **S1** | 40亿参数 | 在 [fish.audio](fish.audio) 上可用 | 功能齐全的旗舰模型 |
														
 
															+| **S1-mini** | 5亿参数 | 在 huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) 上可用 | 具有核心功能的蒸馏版本 |
														
 
															+
														
 
															+S1 和 S1-mini 都集成了在线人类反馈强化学习 (RLHF)。
														
 
															+
														
 
															+## **功能特性**
														
 
															+
														
 
															+1. **零样本和少样本 TTS：** 输入 10 到 30 秒的语音样本即可生成高质量的 TTS 输出。**详细指南请参见 [语音克隆最佳实践](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)。**
														
 
															+
														
 
															+2. **多语言和跨语言支持：** 只需复制粘贴多语言文本到输入框即可——无需担心语言问题。目前支持英语、日语、韩语、中文、法语、德语、阿拉伯语和西班牙语。
														
 
															+
														
 
															+3. **无音素依赖：** 该模型具有强大的泛化能力，不依赖音素进行 TTS。它可以处理任何语言文字的文本。
														
 
															+
														
 
															+4. **高度准确：** 在 Seed-TTS Eval 中实现低字符错误率 (CER) 约 0.4% 和词错误率 (WER) 约 0.8%。
														
 
															+
														
 
															+5. **快速：** 通过 fish-tech 加速，在 Nvidia RTX 4060 笔记本电脑上实时因子约为 1:5，在 Nvidia RTX 4090 上约为 1:15。
														
 
															+
														
 
															+6. **WebUI 推理：** 具有易于使用的基于 Gradio 的网络界面，兼容 Chrome、Firefox、Edge 和其他浏览器。
														
 
															+
														
 
															+7. **GUI 推理：** 提供与 API 服务器无缝配合的 PyQt6 图形界面。支持 Linux、Windows 和 macOS。[查看 GUI](https://github.com/AnyaCoder/fish-speech-gui)。
														
 
															+
														
 
															+8. **部署友好：** 轻松设置推理服务器，原生支持 Linux、Windows 和 MacOS，最小化速度损失。
														
 
															+
														
 
															+## **免责声明**
														
 
															+
														
 
															+我们不对代码库的任何非法使用承担责任。请参考您当地关于 DMCA 和其他相关法律的规定。
														
 
															+
														
 
															+## **媒体和演示**
														
 
															+
														
 
															+#### 🚧 即将推出
														
 
															+视频演示和教程正在开发中。
														
 
															+
														
 
															+## **文档**
														
 
															-!!! warning
														
 
															-    `compile` 选项在 Windows 和 macOS 上不受支持，如果您想使用 compile 运行，需要自己安装 trition。
														
 
															+### 快速开始
														
 
															+- [构建环境](install.md) - 设置您的开发环境
														
 
															+- [推理指南](inference.md) - 运行模型并生成语音
														
 
															-## 致谢
														
 
															+## **社区和支持**
														
 
															-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
														
 
															-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
														
 
															-- [GPT VITS](https://github.com/innnky/gpt-vits)
														
 
															-- [MQTTS](https://github.com/b04901014/MQTTS)
														
 
															-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
														
 
															-- [Transformers](https://github.com/huggingface/transformers)
														
 
															-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
														
 
															+- **Discord：** 加入我们的 [Discord 社区](https://discord.gg/Es5qTB9BcN)
														
 
															+- **网站：** 访问 [OpenAudio.com](https://openaudio.com) 获取最新更新
														
 
															+- **在线试用：** [Fish Audio Playground](https://fish.audio)
														
--- a/docs/zh/inference.md
+++ b/docs/zh/inference.md
@@ -1,6 +1,6 @@
 
															 # 推理
														
 
															-由于声码器模型已更改，您需要比以前更多的显存，建议使用12GB显存以便流畅推理。
														
 
															+由于声码器模型已更改，您需要比以前更多的 VRAM，建议使用 12GB 进行流畅推理。
														
 
															 我们支持命令行、HTTP API 和 WebUI 进行推理，您可以选择任何您喜欢的方法。
														
@@ -17,7 +17,7 @@ huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/ope
 
															 !!! note
														
 
															     如果您计划让模型随机选择音色，可以跳过此步骤。
														
 
															-### 1. 从参考音频获取VQ tokens
														
 
															+### 1. 从参考音频获取 VQ 令牌
														
 
															 ```bash
														
 
															 python fish_speech/models/dac/inference.py \
														
@@ -27,38 +27,33 @@ python fish_speech/models/dac/inference.py \
 
															 您应该会得到一个 `fake.npy` 和一个 `fake.wav`。
														
 
															-### 2. 从文本生成语义tokens：
														
 
															+### 2. 从文本生成语义令牌：
														
 
															 ```bash
														
 
															 python fish_speech/models/text2semantic/inference.py \
														
 
															     --text "您想要转换的文本" \
														
 
															     --prompt-text "您的参考文本" \
														
 
															     --prompt-tokens "fake.npy" \
														
 
															-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
														
 
															-    --num-samples 2 \
														
 
															-    --compile # 如果您想要更快的速度
														
 
															+    --compile
														
 
															 ```
														
 
															-此命令将在工作目录中创建一个 `codes_N` 文件，其中N是从0开始的整数。
														
 
															+此命令将在工作目录中创建一个 `codes_N` 文件，其中 N 是从 0 开始的整数。
														
 
															 !!! note
														
 
															-    您可能想要使用 `--compile` 来融合CUDA内核以获得更快的推理速度（约30 tokens/秒 -> 约500 tokens/秒）。
														
 
															-    相应地，如果您不打算使用加速，可以删除 `--compile` 参数的注释。
														
 
															+    您可能希望使用 `--compile` 来融合 CUDA 内核以实现更快的推理（~30 令牌/秒 -> ~500 令牌/秒）。
														
 
															+    相应地，如果您不计划使用加速，可以注释掉 `--compile` 参数。
														
 
															 !!! info
														
 
															-    对于不支持bf16的GPU，您可能需要使用 `--half` 参数。
														
 
															+    对于不支持 bf16 的 GPU，您可能需要使用 `--half` 参数。
														
 
															-### 3. 从语义tokens生成人声：
														
 
															-
														
 
															-#### VQGAN 解码器
														
 
															+### 3. 从语义令牌生成声音：
														
 
															 !!! warning "未来警告"
														
 
															-    我们保留了从原始路径（tools/vqgan/inference.py）访问的接口，但此接口可能在后续版本中被移除，请尽快更改您的代码。
														
 
															+    我们保留了从原始路径（tools/vqgan/inference.py）访问接口的能力，但此接口可能在后续版本中被删除，因此请尽快更改您的代码。
														
 
															 ```bash
														
 
															 python fish_speech/models/dac/inference.py \
														
 
															     -i "codes_0.npy" \
														
 
															-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
														
 
															 ```
														
 
															 ## HTTP API 推理
														
--- a/docs/zh/install.md
+++ b/docs/zh/install.md
@@ -0,0 +1,30 @@
 
															+## 系统要求
														
 
															+
														
 
															+- GPU 内存：12GB（推理）
														
 
															+- 系统：Linux、WSL
														
 
															+
														
 
															+## 安装
														
 
															+
														
 
															+首先需要安装 pyaudio 和 sox，用于音频处理。
														
 
															+
														
 
															+``` bash
														
 
															+apt install portaudio19-dev libsox-dev ffmpeg
														
 
															+```
														
 
															+
														
 
															+### Conda
														
 
															+
														
 
															+```bash
														
 
															+conda create -n fish-speech python=3.12
														
 
															+conda activate fish-speech
														
 
															+
														
 
															+pip install -e .
														
 
															+```
														
 
															+
														
 
															+### UV
														
 
															+
														
 
															+```bash
														
 
															+uv sync --python 3.12
														
 
															+```
														
 
															+
														
 
															+!!! warning
														
 
															+    `compile` 选项在 Windows 和 macOS 上不受支持，如果您想使用 compile 运行，需要自己安装 triton。
														
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,4 +1,4 @@
 
															-site_name: Fish Speech
														
 
															+site_name: OpenAudio
														
 
															 site_description: Targeting SOTA TTS solutions.
														
 
															 site_url: https://speech.fish.audio
														
@@ -12,7 +12,7 @@ copyright: Copyright &copy; 2023-2025 by Fish Audio
 
															 theme:
														
 
															   name: material
														
 
															-  favicon: assets/figs/logo-circle.png
														
 
															+  favicon: assets/openaudio.png
														
 
															   language: en
														
 
															   features:
														
 
															     - content.action.edit
														
@@ -25,8 +25,7 @@ theme:
 
															     - search.highlight
														
 
															     - search.share
														
 
															     - content.code.copy
														
 
															-  icon:
														
 
															-    logo: fontawesome/solid/fish
														
 
															+  logo: assets/openaudio.png
														
 
															   palette:
														
 
															     # Palette toggle for automatic mode
														
@@ -56,7 +55,8 @@ theme:
 
															         code: Roboto Mono
														
 
															 nav:
														
 
															-  - Installation: en/index.md
														
 
															+  - Introduction: en/index.md
														
 
															+  - Installation: en/install.md
														
 
															   - Inference: en/inference.md
														
 
															 # Plugins
														
@@ -80,25 +80,29 @@ plugins:
 
															           name: 简体中文
														
 
															           build: true
														
 
															           nav:
														
 
															-            - 安装: zh/index.md
														
 
															+            - 介绍: zh/index.md
														
 
															+            - 安装: zh/install.md
														
 
															             - 推理: zh/inference.md
														
 
															         - locale: ja
														
 
															           name: 日本語
														
 
															           build: true
														
 
															           nav:
														
 
															-            - インストール: ja/index.md
														
 
															+            - はじめに: ja/index.md
														
 
															+            - インストール: ja/install.md
														
 
															             - 推論: ja/inference.md
														
 
															         - locale: pt
														
 
															           name: Português (Brasil)
														
 
															           build: true
														
 
															           nav:
														
 
															-            - Instalação: pt/index.md
														
 
															+            - Introdução: pt/index.md
														
 
															+            - Instalação: pt/install.md
														
 
															             - Inferência: pt/inference.md
														
 
															         - locale: ko
														
 
															           name: 한국어
														
 
															           build: true
														
 
															           nav:
														
 
															-            - 설치: ko/index.md
														
 
															+            - 소개: ko/index.md
														
 
															+            - 설치: ko/install.md
														
 
															             - 추론: ko/inference.md
														
 
															 markdown_extensions: