Browse Source

Update README.md for OpenAudio-S1 (#998)

* [feature]add dataset classs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [dev]combine agent and tts infer

* [feature]:update inference

* [feature]:update uv.lock

* [Merge]:merge upstream/main

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [fix]:remove unused files

* [fix]:remove unused files

* [fix]:remove unused files

* [fix]:fix infer bugs

* [docs]:update introduction and optinize front appearence

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [docs]:update README for OpenAudio-S1

* [docs]:update docs

* [docs]:Update video

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Whale and Dolphin 10 tháng trước cách đây
mục cha
commit
9efa2087bd
12 tập tin đã thay đổi với 922 bổ sung366 xóa
  1. 121 38
      README.md
  2. 123 40
      docs/README.ja.md
  3. 123 40
      docs/README.ko.md
  4. 122 39
      docs/README.pt-BR.md
  5. 120 37
      docs/README.zh.md
  6. BIN
      docs/assets/Elo.jpg
  7. BIN
      docs/assets/Thumbnail.jpg
  8. 50 24
      docs/en/index.md
  9. 63 37
      docs/ja/index.md
  10. 73 36
      docs/ko/index.md
  11. 63 37
      docs/pt/index.md
  12. 64 38
      docs/zh/index.md

+ 121 - 38
README.md

@@ -26,76 +26,159 @@
     <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
         <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
     </a>
+    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
+      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    </a>
+</div>
+
+<div align="center">
+    <a target="_blank" href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2">
+      <img alt="TTS-Arena2 Score" src="https://img.shields.io/badge/TTS_Arena2-Rank_%231-gold?style=flat-square&logo=trophy&logoColor=white">
+    </a>
     <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
     </a>
-    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
-      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    <a target="_blank" href="https://huggingface.co/fishaudio/openaudio-s1-mini">
+        <img alt="HuggingFace Model" src="https://img.shields.io/badge/🤗%20-models-orange"/>
     </a>
 </div>
 
-This codebase is released under Apache License and all model weights are released under CC-BY-NC-SA-4.0 License. Please refer to [LICENSE](LICENSE) for more details.
+> [!IMPORTANT]
+> **License Notice**  
+> This codebase is released under **Apache License** and all model weights are released under **CC-BY-NC-SA-4.0 License**. Please refer to [LICENSE](LICENSE) for more details.
+
+> [!WARNING]
+> **Legal Disclaimer**  
+> We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
+
+---
+
+## 🎉 Announcement
+
+We are excited to announce that we have rebranded to **OpenAudio** — introducing a revolutionary new series of advanced Text-to-Speech models that builds upon the foundation of Fish-Speech.
+
+We are proud to release **OpenAudio-S1** as the first model in this series, delivering significant improvements in quality, performance, and capabilities.
+
+OpenAudio-S1 comes in two versions: **OpenAudio-S1** and **OpenAudio-S1-mini**. Both models are now available on [Fish Audio Playground](https://fish.audio) (for **OpenAudio-S1**) and [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini) (for **OpenAudio-S1-mini**).
+
+Visit the [OpenAudio website](https://openaudio.com/blogs/s1) for blog & tech report.
+
+## Highlights ✨
+
+### **Excellent TTS quality**
+
+We use Seed TTS Eval Metrics to evaluate the model performance, and the results show that OpenAudio S1 achieves **0.008 WER** and **0.004 CER** on English text, which is significantly better than previous models. (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Model | Word Error Rate (WER) | Character Error Rate (CER) | Speaker Distance |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **Best Model in TTS-Arena2** 🏆
+
+OpenAudio S1 has achieved the **#1 ranking** on [TTS-Arena2](https://arena.speechcolab.org/), the benchmark for text-to-speech evaluation:
+
+<div align="center">
+    <img src="docs/assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **Speech Control**
+
+OpenAudio S1 **supports a variety of emotional, tone, and special markers** to enhance speech synthesis:
+
+- **Basic emotions**:
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted) 
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
+```
+
+- **Advanced emotions**:
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```
+
+- **Tone markers**:
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```
+
+- **Special audio effects**:
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+You can also use Ha,ha,ha to control, there's many other cases waiting to be explored by yourself.
+
+(Support for English, Chinese and Japanese now, and more languages is coming soon!)
 
-We are excited to announce that we have changed our name into OpenAudio, this will be a brand new series of Text-to-Speech model.
+### **Two Type of Models**
 
-Demo available at [Fish Audio Playground](https://fish.audio).
+| Model | Size | Availability | Features |
+|-------|------|--------------|----------|
+| **S1** | 4B parameters | Avaliable on [fish.audio](fish.audio) | Full-featured flagship model |
+| **S1-mini** | 0.5B parameters | Avaliable on huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Distilled version with core capabilities |
 
-Visit the [OpenAudio website](https://openaudio.com) for blog & tech report.
+Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedback (RLHF).
 
-## Features
-### OpenAudio-S1 (Fish-Speech's new verison)
+## **Features**
 
-1. This model has **ALL FEATURES** that fish-speech had.
+1. **Zero-shot & Few-shot TTS:** Input a 10 to 30-second vocal sample to generate high-quality TTS output. **For detailed guidelines, see [Voice Cloning Best Practices](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
 
-2. OpenAudio S1 supports a variety of emotional, tone, and special markers to enhance speech synthesis:
-   
-   (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused)
+2. **Multilingual & Cross-lingual Support:** Simply copy and paste multilingual text into the input box—no need to worry about the language. Currently supports English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.
 
-   Also supports tone marker:
+3. **No Phoneme Dependency:** The model has strong generalization capabilities and does not rely on phonemes for TTS. It can handle text in any language script.
 
-   (in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+4. **Highly Accurate:** Achieves a low CER (Character Error Rate) of around 0.4% and WER (Word Error Rate) of around 0.8% for Seed-TTS Eval.
 
-    There's a few special markers that are supported:
+5. **Fast:** With fish-tech acceleration, the real-time factor is approximately 1:5 on an Nvidia RTX 4060 laptop and 1:15 on an Nvidia RTX 4090.
 
-    (laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting) (groaning) (crowd laughing) (background laughter) (audience laughing)
+6. **WebUI Inference:** Features an easy-to-use, Gradio-based web UI compatible with Chrome, Firefox, Edge, and other browsers.
 
-    You can also use **Ha,ha,ha** to control, there's many other cases waiting to be explored by yourself.
+7. **GUI Inference:** Offers a PyQt6 graphical interface that works seamlessly with the API server. Supports Linux, Windows, and macOS. [See GUI](https://github.com/AnyaCoder/fish-speech-gui).
 
-3. The OpenAudio S1 includes the following sizes:
--   **S1 (4B, proprietary):** The full-sized model.
--   **S1-mini (0.5B, open-sourced):** A distilled version of S1.
+8. **Deploy-Friendly:** Easily set up an inference server with native support for Linux, Windows (MacOS comming soon), minimizing speed loss.
 
-    Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedback (RLHF).
+## **Media & Demos**
 
-4. Evaluations
+<div align="center">
 
-    **Seed TTS Eval Metrics (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM):**
+### **Social Media**
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
 
-    -   **S1:**
-        -   WER (Word Error Rate): **0.008**
-        -   CER (Character Error Rate): **0.004**
-        -   Distance: **0.332**
-    -   **S1-mini:**
-        -   WER (Word Error Rate): **0.011**
-        -   CER (Character Error Rate): **0.005**
-        -   Distance: **0.380**
-    
+### **Interactive Demos**
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
 
-## Disclaimer
+### **Video Showcases**
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
 
-We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
+### **Audio Samples**
+<div style="margin: 20px 0;">
+    <em> High-quality audio samples will be available soon, demonstrating our multilingual TTS capabilities across different languages and emotions.</em>
+</div>
 
-## Videos
+</div>
 
-#### To be continued.
+---
 
 ## Documents
 
 - [Build Envrionment](docs/en/install.md)
 - [Inference](docs/en/inference.md)
 
-It should be noted that the current model **DOESN'T SUPPORT FINETUNE**.
-
 ## Credits
 
 - [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)

+ 123 - 40
docs/README.ja.md

@@ -26,75 +26,157 @@
     <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
         <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
     </a>
+    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
+      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    </a>
+</div>
+
+<div align="center">
+    <a target="_blank" href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2">
+      <img alt="TTS-Arena2 Score" src="https://img.shields.io/badge/TTS_Arena2-Rank_%231-gold?style=flat-square&logo=trophy&logoColor=white">
+    </a>
     <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
     </a>
-    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
-      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    <a target="_blank" href="https://huggingface.co/fishaudio/openaudio-s1-mini">
+        <img alt="HuggingFace Model" src="https://img.shields.io/badge/🤗%20-models-orange"/>
     </a>
 </div>
 
-このコードベースはApache Licenseの下でリリースされ、すべてのモデルウェイトはCC-BY-NC-SA-4.0 Licenseの下でリリースされています。詳細については[LICENSE](../LICENSE)をご参照ください。
+> [!IMPORTANT]
+> **ライセンス注意事項**  
+> このコードベースは**Apache License**の下でリリースされ、すべてのモデルウェイトは**CC-BY-NC-SA-4.0 License**の下でリリースされています。詳細については[LICENSE](../LICENSE)をご参照ください。
+
+> [!WARNING]
+> **法的免責事項**  
+> 私たちはコードベースの不法な使用について一切の責任を負いません。DMCA及びその他の関連法律について、現地の法律をご参照ください。
+
+---
+
+## 🎉 発表
 
-私たちは名前をOpenAudioに変更したことをお知らせでき、嬉しく思います。これは全く新しいText-to-Speechモデルシリーズになります。
+**OpenAudio**へのリブランドを発表できることを嬉しく思います。Fish-Speechの基盤を元に構築された、革新的な新しい高度Text-to-Speechモデルシリーズを紹介します。
 
-デモは[Fish Audio Playground](https://fish.audio)で利用可能です。
+このシリーズの最初のモデルとして**OpenAudio-S1**をリリースできることを誇りに思います。品質、性能、機能において大幅な改善を実現しました
 
-ブログと技術レポートについては[OpenAudioウェブサイト](https://openaudio.com)をご覧ください。
+OpenAudio-S1には2つのバージョンがあります:**OpenAudio-S1**と**OpenAudio-S1-mini**。両モデルとも[Fish Audio Playground](https://fish.audio)(**OpenAudio-S1**用)と[Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini)(**OpenAudio-S1-mini**用)で利用可能です
 
-## 機能
-### OpenAudio-S1 (Fish-Speechの新バージョン)
+ブログと技術レポートについては[OpenAudioウェブサイト](https://openaudio.com/blogs/s1)をご覧ください。
 
-1. このモデルはfish-speechが持っていた**すべての機能**を持っています。
+## ハイライト ✨
 
-2. OpenAudio S1は音声合成を強化するための様々な感情、トーン、特別なマーカーをサポートしています:
-   
-      (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused)
+### **優秀なTTS品質**
 
-   またトーンマーカーもサポートしています:
+Seed TTS Eval Metricsを使用してモデル性能を評価した結果、OpenAudio S1は英語テキストで**0.008 WER**と**0.004 CER**を達成し、これは従来のモデルより大幅に優れています。(英語、自動評価、OpenAI gpt-4o-transcribeベース、Revai/pyannote-wespeaker-voxceleb-resnet34-LMを使用した話者距離)
 
-   (急いだトーン) (叫び) (絶叫) (ささやき) (柔らかいトーン)
+| モデル | 単語誤り率 (WER) | 文字誤り率 (CER) | 話者距離 |
+|-------|------------------|------------------|----------|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-    サポートされているいくつかの特別なマーカーがあります:
+### **TTS-Arena2でのベストモデル** 🏆
 
-    (笑い) (くすくす笑い) (すすり泣き) (大声で泣く) (ため息) (あえぎ) (うめき) (群衆の笑い) (背景の笑い) (観客の笑い)
+OpenAudio S1は、テキスト音声変換評価のベンチマークである[TTS-Arena2](https://arena.speechcolab.org/)で**1位**を獲得しました:
 
-    また、**ハ、ハ、ハ**を使って制御することもでき、あなた自身が探索を待っている他の多くのケースがあります。
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **音声制御**
+OpenAudio S1は**音声合成を強化するための様々な感情、トーン、特別なマーカーをサポート**しています:
 
-3. OpenAudio S1には以下のサイズが含まれています:
--   **S1 (4B, プロプライエタリ):** フルサイズのモデル。
--   **S1-mini (0.5B, オープンソース):** S1の蒸留版。
+- **基本感情**:
+```
+(怒った) (悲しい) (興奮した) (驚いた) (満足した) (喜んだ) 
+(恐れた) (心配した) (動揺した) (緊張した) (イライラした) (憂鬱な)
+(共感的な) (恥ずかしい) (嫌悪した) (感動した) (誇らしい) (リラックスした)
+(感謝する) (自信のある) (興味のある) (好奇心のある) (混乱した) (喜びに満ちた)
+```
 
-    S1とS1-miniの両方がオンライン人間フィードバック強化学習(RLHF)を組み込んでいます。
+- **高度な感情**:
+```
+(軽蔑的な) (不幸な) (不安な) (ヒステリックな) (無関心な) 
+(せっかちな) (罪悪感のある) (軽蔑した) (パニックした) (激怒した) (しぶしぶの)
+(熱心な) (不賛成の) (否定的な) (否認する) (驚愕した) (真剣な)
+(皮肉な) (宥める) (慰める) (誠実な) (冷笑する)
+(躊躇する) (屈服する) (苦痛な) (気まずい) (面白がる)
+```
 
-4. 評価
+- **トーンマーカー**:
+```
+(急いだトーン) (叫ぶ) (悲鳴) (囁く) (柔らかいトーン)
+```
 
-    **Seed TTS評価メトリクス(英語、自動評価、OpenAI gpt-4o-transcribeベース、Revai/pyannote-wespeaker-voxceleb-resnet34-LMを使用したスピーカー距離):**
+- **特別な音響効果**:
+```
+(笑う) (くすくす笑う) (すすり泣く) (大声で泣く) (ため息) (息切れ)
+(うめく) (群衆の笑い声) (背景の笑い声) (聴衆の笑い声)
+```
 
-    -   **S1:**
-        -   WER(単語誤り率):**0.008**
-        -   CER(文字誤り率):**0.004**
-        -   距離:**0.332**
-    -   **S1-mini:**
-        -   WER(単語誤り率):**0.011**
-        -   CER(文字誤り率):**0.005**
-        -   距離:**0.380**
-    
+また、「ハ、ハ、ハ」を使って制御することもでき、あなた自身が探索できる多くの他のケースがあります。
 
-## 免責事項
+(現在、英語、中国語、日本語をサポートしており、より多くの言語が近日公開予定です!)
 
-コードベースの違法な使用について、いかなる責任も負いません。DMCAおよびその他の関連法律に関する現地の法律をご参照ください。
+### **2種類のモデル**
 
-## 動画
+| モデル | サイズ | 利用可能性 | 機能 |
+|-------|--------|------------|------|
+| **S1** | 4Bパラメータ | [fish.audio](fish.audio)で利用可能 | フル機能のフラッグシップモデル |
+| **S1-mini** | 0.5Bパラメータ | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)で利用可能 | コア機能を持つ蒸留版 |
 
-#### 続く予定。
+S1とS1-miniの両方がオンライン人間フィードバック強化学習(RLHF)を組み込んでいます
 
-## ドキュメント
+## **機能**
 
-- [環境構築](en/install.md)
-- [推論](en/inference.md)
+1. **ゼロショット・少数ショットTTS:** 10〜30秒の音声サンプルを入力して高品質のTTS出力を生成します。**詳細なガイドラインについては、[Voice Cloning Best Practices](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)をご覧ください。**
+
+2. **多言語・言語横断サポート:** 多言語テキストを入力ボックスにコピー&ペーストするだけで、言語を気にする必要はありません。現在、英語、日本語、韓国語、中国語、フランス語、ドイツ語、アラビア語、スペイン語をサポートしています。
+
+3. **音素依存なし:** モデルは強い汎化能力を持ち、TTSに音素に依存しません。どの言語の文字体系のテキストも処理できます。
+
+4. **高精度:** Seed-TTS Evalで約0.4%の低いCER(文字誤り率)と約0.8%のWER(単語誤り率)を達成します。
+
+5. **高速:** fish-tech加速により、Nvidia RTX 4060ラップトップで約1:5、Nvidia RTX 4090で約1:15のリアルタイム係数を実現します。
+
+6. **WebUI推論:** Chrome、Firefox、Edge、その他のブラウザと互換性のある使いやすいGradioベースのWeb UIを提供します。
+
+7. **GUI推論:** APIサーバーとシームレスに動作するPyQt6グラフィカルインターフェースを提供します。Linux、Windows、macOSをサポートします。[GUIを見る](https://github.com/AnyaCoder/fish-speech-gui)。
+
+8. **デプロイフレンドリー:** Linux、Windows(macOSは近日公開予定)のネイティブサポートで推論サーバーを簡単にセットアップし、速度損失を最小限に抑えます。
+
+## **メディア・デモ**
+
+<div align="center">
+
+### **ソーシャルメディア**
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
+
+### **インタラクティブデモ**
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+### **ビデオショーケース**
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
+### **音声サンプル**
+<div style="margin: 20px 0;">
+    <em>高品質の音声サンプルは間もなく公開予定で、異なる言語と感情における私たちの多言語TTS機能を実演します。</em>
+</div>
+
+</div>
+
+---
+
+## ドキュメント
 
-現在のモデルは**ファインチューニングをサポートしていない**ことに注意してください。
+- [環境構築](ja/install.md)
+- [推論](ja/inference.md)
 
 ## クレジット
 
@@ -104,6 +186,7 @@
 - [MQTTS](https://github.com/b04901014/MQTTS)
 - [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
 - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- [Qwen3](https://github.com/QwenLM/Qwen3)
 
 ## 技術レポート (V1.4)
 ```bibtex

+ 123 - 40
docs/README.ko.md

@@ -26,75 +26,157 @@
     <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
         <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
     </a>
+    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
+      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    </a>
+</div>
+
+<div align="center">
+    <a target="_blank" href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2">
+      <img alt="TTS-Arena2 Score" src="https://img.shields.io/badge/TTS_Arena2-Rank_%231-gold?style=flat-square&logo=trophy&logoColor=white">
+    </a>
     <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
     </a>
-    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
-      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    <a target="_blank" href="https://huggingface.co/fishaudio/openaudio-s1-mini">
+        <img alt="HuggingFace Model" src="https://img.shields.io/badge/🤗%20-models-orange"/>
     </a>
 </div>
 
-이 코드베이스는 Apache License 하에 릴리스되며, 모든 모델 가중치는 CC-BY-NC-SA-4.0 License 하에 릴리스됩니다. 자세한 내용은 [LICENSE](../LICENSE)를 참조하세요.
+> [!IMPORTANT]
+> **라이선스 고지사항**  
+> 이 코드베이스는 **Apache License** 하에 릴리스되며, 모든 모델 가중치는 **CC-BY-NC-SA-4.0 License** 하에 릴리스됩니다. 자세한 내용은 [LICENSE](../LICENSE)를 참조하세요.
+
+> [!WARNING]
+> **법적 면책조항**  
+> 저희는 코드베이스의 불법적인 사용에 대해 어떠한 책임도 지지 않습니다. DMCA 및 기타 관련 법률에 대한 현지 법률을 참조하세요.
+
+---
+
+## 🎉 발표
 
-저희는 이름을 OpenAudio로 변경했다고 발표하게 되어 기쁩니다. 이는 완전히 새로운 Text-to-Speech 모델 시리즈가 될 것입니다.
+**OpenAudio**로의 리브랜딩을 발표하게 되어 기쁩니다. Fish-Speech의 기반 위에 구축된 혁신적인 새로운 고급 Text-to-Speech 모델 시리즈를 소개합니다.
 
-데모는 [Fish Audio Playground](https://fish.audio)에서 사용할 수 있습니다.
+이 시리즈의 첫 번째 모델인 **OpenAudio-S1**을 출시하게 되어 자랑스럽습니다. 품질, 성능, 기능에서 상당한 개선을 제공합니다.
 
-블로그와 기술 보고서는 [OpenAudio 웹사이트](https://openaudio.com)를 방문하세요.
+OpenAudio-S1은 두 가지 버전으로 제공됩니다: **OpenAudio-S1**과 **OpenAudio-S1-mini**. 두 모델 모두 [Fish Audio Playground](https://fish.audio)(**OpenAudio-S1**용)와 [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini)(**OpenAudio-S1-mini**용)에서 사용할 수 있습니다.
 
-## 기능
-### OpenAudio-S1 (Fish-Speech의 새 버전)
+블로그와 기술 보고서는 [OpenAudio 웹사이트](https://openaudio.com/blogs/s1)를 방문하세요.
 
-1. 이 모델은 fish-speech가 가지고 있던 **모든 기능**을 가지고 있습니다.
+## 주요 특징 ✨
 
-2. OpenAudio S1은 음성 합성을 향상시키기 위한 다양한 감정, 톤, 특별한 마커를 지원합니다:
-   
-      (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused)
+### **뛰어난 TTS 품질**
 
-   또한 톤 마커도 지원합니다:
+우리는 Seed TTS Eval Metrics를 사용하여 모델 성능을 평가했으며, 결과에 따르면 OpenAudio S1은 영어 텍스트에서 **0.008 WER**과 **0.004 CER**을 달성하여 이전 모델들보다 상당히 우수한 성능을 보입니다. (영어, 자동 평가, OpenAI gpt-4o-transcribe 기반, Revai/pyannote-wespeaker-voxceleb-resnet34-LM을 사용한 화자 거리)
 
-   (급한 톤) (외치기) (비명지르기) (속삭이기) (부드러운 톤)
+| 모델 | 단어 오류율 (WER) | 문자 오류율 (CER) | 화자 거리 |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-    지원되는 몇 가지 특별한 마커가 있습니다:
+### **TTS-Arena2 최고 모델** 🏆
 
-    (웃음) (킥킥거림) (흐느낌) (큰 소리로 우는 것) (한숨) (헐떡거림) (신음) (군중 웃음) (배경 웃음) (관객 웃음)
+OpenAudio S1은 텍스트 음성 변환 평가의 벤치마크인 [TTS-Arena2](https://arena.speechcolab.org/)에서 **1위**를 달성했습니다:
 
-    또한 **하, 하, 하**를 사용하여 제어할 수도 있으며, 여러분이 직접 탐험할 수 있는 많은 다른 경우들이 있습니다.
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 순위" style="width: 75%;" />
+</div>
+
+### **음성 제어**
+OpenAudio S1은 **음성 합성을 향상시키기 위한 다양한 감정, 톤, 특별한 마커를 지원**합니다:
 
-3. OpenAudio S1은 다음 크기를 포함합니다:
--   **S1 (4B, 독점):** 전체 크기 모델.
--   **S1-mini (0.5B, 오픈소스):** S1의 증류 버전.
+- **기본 감정**:
+```
+(화난) (슬픈) (흥분한) (놀란) (만족한) (기쁜) 
+(무서워하는) (걱정하는) (속상한) (긴장한) (좌절한) (우울한)
+(공감하는) (당황한) (역겨워하는) (감동한) (자랑스러운) (편안한)
+(감사하는) (자신있는) (관심있는) (호기심있는) (혼란스러운) (즐거운)
+```
 
-    S1과 S1-mini 모두 온라인 인간 피드백 강화학습(RLHF)을 통합하고 있습니다.
+- **고급 감정**:
+```
+(경멸하는) (불행한) (불안한) (히스테리한) (무관심한) 
+(조급한) (죄책감있는) (냉소적인) (공황상태인) (분노한) (마지못한)
+(열성적인) (반대하는) (부정적인) (부인하는) (놀란) (진지한)
+(비꼬는) (달래는) (위로하는) (진심인) (비웃는)
+(망설이는) (굴복하는) (고통스러운) (어색한) (재미있어하는)
+```
 
-4. 평가
+- **톤 마커**:
+```
+(급한 톤) (외치기) (비명지르기) (속삭이기) (부드러운 톤)
+```
 
-    **Seed TTS 평가 메트릭 (영어, 자동 평가, OpenAI gpt-4o-transcribe 기반, Revai/pyannote-wespeaker-voxceleb-resnet34-LM을 사용한 화자 거리):**
+- **특별한 오디오 효과**:
+```
+(웃음) (킥킥거림) (흐느낌) (큰 소리로 우는 것) (한숨) (헐떡거림)
+(신음) (군중 웃음) (배경 웃음) (관객 웃음)
+```
 
-    -   **S1:**
-        -   WER (단어 오류율): **0.008**
-        -   CER (문자 오류율): **0.004**
-        -   거리: **0.332**
-    -   **S1-mini:**
-        -   WER (단어 오류율): **0.011**
-        -   CER (문자 오류율): **0.005**
-        -   거리: **0.380**
-    
+또한 **하, 하, 하**를 사용하여 제어할 수도 있으며, 여러분이 직접 탐험할 수 있는 많은 다른 경우들이 있습니다.
 
-## 면책 조항
+(현재 영어, 중국어, 일본어를 지원하며, 더 많은 언어가 곧 추가될 예정입니다!)
 
-저희는 코드베이스의 불법적인 사용에 대해 어떠한 책임도 지지 않습니다. DMCA 및 기타 관련 법률에 대한 현지 법률을 참조하세요.
+### **두 가지 유형의 모델**
 
-## 비디오
+| 모델 | 크기 | 가용성 | 특징 |
+|-------|------|--------------|----------|
+| **S1** | 4B 매개변수 | [fish.audio](https://fish.audio)에서 사용 가능 | 모든 기능을 갖춘 플래그십 모델 |
+| **S1-mini** | 0.5B 매개변수 | 허깅페이스 [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)에서 사용 가능 | 핵심 기능을 갖춘 증류 버전 |
 
-#### 계속될 예정입니다.
+S1과 S1-mini 모두 온라인 인간 피드백 강화학습(RLHF)을 통합하고 있습니다.
 
-## 문서
+## **기능**
 
-- [환경 구축](en/install.md)
-- [추론](en/inference.md)
+1. **제로샷 및 퓨샷 TTS:** 10~30초의 음성 샘플을 입력하여 고품질 TTS 출력을 생성합니다. **자세한 가이드라인은 [음성 복제 모범 사례](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)를 참조하세요.**
+
+2. **다국어 및 교차 언어 지원:** 다국어 텍스트를 입력 상자에 복사하여 붙여넣기만 하면 됩니다. 언어를 걱정할 필요가 없습니다. 현재 영어, 일본어, 한국어, 중국어, 프랑스어, 독일어, 아랍어, 스페인어를 지원합니다.
+
+3. **음소 의존성 없음:** 모델은 강력한 일반화 능력을 가지고 있으며 TTS를 위해 음소에 의존하지 않습니다. 모든 언어 스크립트의 텍스트를 처리할 수 있습니다.
+
+4. **높은 정확도:** Seed-TTS Eval에서 약 0.4%의 낮은 CER(문자 오류율)과 약 0.8%의 WER(단어 오류율)을 달성합니다.
+
+5. **빠른 속도:** fish-tech 가속화로 Nvidia RTX 4060 노트북에서 실시간 팩터가 약 1:5, Nvidia RTX 4090에서 1:15입니다.
+
+6. **WebUI 추론:** Chrome, Firefox, Edge 및 기타 브라우저와 호환되는 사용하기 쉬운 Gradio 기반 웹 UI를 제공합니다.
+
+7. **GUI 추론:** API 서버와 완벽하게 작동하는 PyQt6 그래픽 인터페이스를 제공합니다. Linux, Windows, macOS를 지원합니다. [GUI 보기](https://github.com/AnyaCoder/fish-speech-gui).
+
+8. **배포 친화적:** Linux, Windows(MacOS 곧 출시 예정)에 대한 네이티브 지원으로 추론 서버를 쉽게 설정할 수 있으며, 속도 손실을 최소화합니다.
+
+## **미디어 및 데모**
+
+<div align="center">
+
+### **소셜 미디어**
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="X에서 최신 데모" />
+</a>
+
+### **인터랙티브 데모**
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="OpenAudio S1 체험하기" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="S1 Mini 체험하기" />
+</a>
+
+### **비디오 쇼케이스**
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
+### **오디오 샘플**
+<div style="margin: 20px 0;">
+    <em> 다양한 언어와 감정에 걸친 다국어 TTS 기능을 보여주는 고품질 오디오 샘플이 곧 제공될 예정입니다.</em>
+</div>
+
+</div>
+
+---
+
+## 문서
 
-현재 모델은 **파인튜닝을 지원하지 않는다**는 점에 유의해야 합니다.
+- [환경 구축](ko/install.md)
+- [추론](ko/inference.md)
 
 ## 크레딧
 
@@ -104,6 +186,7 @@
 - [MQTTS](https://github.com/b04901014/MQTTS)
 - [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
 - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- [Qwen3](https://github.com/QwenLM/Qwen3)
 
 ## 기술 보고서 (V1.4)
 ```bibtex

+ 122 - 39
docs/README.pt-BR.md

@@ -26,75 +26,157 @@
     <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
         <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
     </a>
+    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
+      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    </a>
+</div>
+
+<div align="center">
+    <a target="_blank" href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2">
+      <img alt="TTS-Arena2 Score" src="https://img.shields.io/badge/TTS_Arena2-Rank_%231-gold?style=flat-square&logo=trophy&logoColor=white">
+    </a>
     <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
     </a>
-    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
-      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    <a target="_blank" href="https://huggingface.co/fishaudio/openaudio-s1-mini">
+        <img alt="HuggingFace Model" src="https://img.shields.io/badge/🤗%20-models-orange"/>
     </a>
 </div>
 
-Esta base de código é lançada sob a Licença Apache e todos os pesos dos modelos são lançados sob a Licença CC-BY-NC-SA-4.0. Consulte [LICENSE](../LICENSE) para mais detalhes.
+> [!IMPORTANT]
+> **Aviso de Licença**  
+> Esta base de código é lançada sob a **Licença Apache** e todos os pesos dos modelos são lançados sob a **Licença CC-BY-NC-SA-4.0**. Consulte [LICENSE](../LICENSE) para mais detalhes.
+
+> [!WARNING]
+> **Isenção de Responsabilidade Legal**  
+> Não assumimos qualquer responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA e outras leis relacionadas.
+
+---
+
+## 🎉 Anúncio
+
+Estamos animados em anunciar que mudamos nossa marca para **OpenAudio** — introduzindo uma nova série revolucionária de modelos avançados de Text-to-Speech que se baseia na fundação do Fish-Speech.
+
+Temos o orgulho de lançar o **OpenAudio-S1** como o primeiro modelo desta série, oferecendo melhorias significativas em qualidade, desempenho e capacidades.
+
+O OpenAudio-S1 vem em duas versões: **OpenAudio-S1** e **OpenAudio-S1-mini**. Ambos os modelos estão agora disponíveis no [Fish Audio Playground](https://fish.audio) (para **OpenAudio-S1**) e [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini) (para **OpenAudio-S1-mini**).
+
+Visite o [site OpenAudio](https://openaudio.com/blogs/s1) para blog e relatório técnico.
+
+## Destaques ✨
+
+### **Excelente qualidade TTS**
+
+Usamos as métricas de avaliação Seed TTS para avaliar o desempenho do modelo, e os resultados mostram que o OpenAudio S1 alcança **0.008 WER** e **0.004 CER** em texto em inglês, que é significativamente melhor que modelos anteriores. (Inglês, avaliação automática, baseada no OpenAI gpt-4o-transcribe, distância do locutor usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Modelo | Taxa de Erro de Palavra (WER) | Taxa de Erro de Caractere (CER) | Distância do Locutor |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **Melhor Modelo no TTS-Arena2** 🏆
+
+O OpenAudio S1 alcançou a **classificação #1** no [TTS-Arena2](https://arena.speechcolab.org/), o benchmark para avaliação de text-to-speech:
+
+<div align="center">
+    <img src="assets/Elo.jpg" alt="Classificação TTS-Arena2" style="width: 75%;" />
+</div>
+
+### **Controle de Fala**
+O OpenAudio S1 **suporta uma variedade de marcadores emocionais, de tom e especiais** para aprimorar a síntese de fala:
+
+- **Emoções básicas**:
+```
+(raivoso) (triste) (animado) (surpreso) (satisfeito) (encantado) 
+(assustado) (preocupado) (chateado) (nervoso) (frustrado) (deprimido)
+(empático) (envergonhado) (enojado) (emocionado) (orgulhoso) (relaxado)
+(grato) (confiante) (interessado) (curioso) (confuso) (alegre)
+```
 
-Estamos animados em anunciar que mudamos nosso nome para OpenAudio, esta será uma nova série de modelos Text-to-Speech.
+- **Emoções avançadas**:
+```
+(desdenhoso) (infeliz) (ansioso) (histérico) (indiferente) 
+(impaciente) (culpado) (desprezível) (em pânico) (furioso) (relutante)
+(entusiasmado) (desaprovador) (negativo) (negando) (espantado) (sério)
+(sarcástico) (conciliador) (consolador) (sincero) (escarnecedor)
+(hesitante) (cedendo) (doloroso) (constrangido) (divertido)
+```
 
-Demo disponível em [Fish Audio Playground](https://fish.audio).
+- **Marcadores de tom**:
+```
+(tom apressado) (gritando) (gritando alto) (sussurrando) (tom suave)
+```
 
-Visite o [site OpenAudio](https://openaudio.com) para blog e relatório técnico.
+- **Efeitos de áudio especiais**:
+```
+(rindo) (dando risinhos) (soluçando) (chorando alto) (suspirando) (ofegando)
+(gemendo) (risos da multidão) (risos de fundo) (risos da audiência)
+```
 
-## Recursos
-### OpenAudio-S1 (Nova versão do Fish-Speech)
+Você também pode usar Ha,ha,ha para controlar, há muitos outros casos esperando para serem explorados por você mesmo.
 
-1. Este modelo possui **TODOS OS RECURSOS** que o fish-speech tinha.
+(Suporte para inglês, chinês e japonês agora, e mais idiomas em breve!)
 
-2. O OpenAudio S1 suporta uma variedade de marcadores emocionais, de tom e especiais para aprimorar a síntese de fala:
+### **Dois Tipos de Modelos**
+
+| Modelo | Tamanho | Disponibilidade | Recursos |
+|-------|------|--------------|----------|
+| **S1** | 4B parâmetros | Disponível em [fish.audio](https://fish.audio) | Modelo flagship com recursos completos |
+| **S1-mini** | 0.5B parâmetros | Disponível no Hugging Face [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Versão destilada com capacidades principais |
+
+Tanto S1 quanto S1-mini incorporam Aprendizado por Reforço online com Feedback Humano (RLHF).
    
-      (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused)
+   ## **Recursos**
 
-   Também suporta marcadores de tom:
+1. **TTS Zero-shot e Few-shot:** Insira uma amostra vocal de 10 a 30 segundos para gerar saída TTS de alta qualidade. **Para diretrizes detalhadas, veja [Melhores Práticas de Clonagem de Voz](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
 
-   (tom apressado) (gritando) (berrando) (sussurrando) (tom suave)
+2. **Suporte Multilíngue e Cross-lingual:** Simplesmente copie e cole texto multilíngue na caixa de entrada—não precisa se preocupar com o idioma. Atualmente suporta inglês, japonês, coreano, chinês, francês, alemão, árabe e espanhol.
 
-    Há alguns marcadores especiais que são suportados:
+3. **Sem Dependência de Fonema:** O modelo tem fortes capacidades de generalização e não depende de fonemas para TTS. Pode lidar com texto em qualquer script de idioma.
 
-    (rindo) (dando risadinhas) (soluçando) (chorando alto) (suspirando) (ofegando) (gemendo) (multidão rindo) (riso de fundo) (audiência rindo)
+4. **Altamente Preciso:** Alcança um baixo CER (Taxa de Erro de Caractere) de cerca de 0.4% e WER (Taxa de Erro de Palavra) de cerca de 0.8% para Seed-TTS Eval.
 
-    Você também pode usar **Ha,ha,ha** para controlar, há muitos outros casos esperando para serem explorados por você mesmo.
+5. **Rápido:** Com aceleração fish-tech, o fator de tempo real é aproximadamente 1:5 em um laptop Nvidia RTX 4060 e 1:15 em um Nvidia RTX 4090.
 
-3. O OpenAudio S1 inclui os seguintes tamanhos:
--   **S1 (4B, proprietário):** O modelo de tamanho completo.
--   **S1-mini (0.5B, código aberto):** Uma versão destilada do S1.
+6. **Inferência WebUI:** Apresenta uma UI web baseada em Gradio fácil de usar, compatível com Chrome, Firefox, Edge e outros navegadores.
 
-    Tanto S1 quanto S1-mini incorporam Aprendizado por Reforço online a partir de Feedback Humano (RLHF).
+7. **Inferência GUI:** Oferece uma interface gráfica PyQt6 que funciona perfeitamente com o servidor de API. Suporta Linux, Windows e macOS. [Ver GUI](https://github.com/AnyaCoder/fish-speech-gui).
 
-4. Avaliações
+8. **Amigável para Deploy:** Configure facilmente um servidor de inferência com suporte nativo para Linux, Windows (MacOS em breve), minimizando perda de velocidade.
 
-    **Métricas de Avaliação Seed TTS (Inglês, avaliação automática, baseada no OpenAI gpt-4o-transcribe, distância do locutor usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM):**
+## **Mídia e Demos**
 
-    -   **S1:**
-        -   WER (Taxa de Erro de Palavra): **0.008**
-        -   CER (Taxa de Erro de Caractere): **0.004**
-        -   Distância: **0.332**
-    -   **S1-mini:**
-        -   WER (Taxa de Erro de Palavra): **0.011**
-        -   CER (Taxa de Erro de Caractere): **0.005**
-        -   Distância: **0.380**
-    
+<div align="center">
 
-## Aviso Legal
+### **Mídia Social**
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="Demo Mais Recente no X" />
+</a>
 
-Não assumimos qualquer responsabilidade por qualquer uso ilegal da base de código. Consulte suas leis locais sobre DMCA e outras leis relacionadas.
+### **Demos Interativos**
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="Experimente OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="Experimente S1 Mini" />
+</a>
 
-## Vídeos
+### **Vitrines de Vídeo**
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
 
-#### A ser continuado.
+### **Amostras de Áudio**
+<div style="margin: 20px 0;">
+    <em> Amostras de áudio de alta qualidade estarão disponíveis em breve, demonstrando nossas capacidades TTS multilíngues em diferentes idiomas e emoções.</em>
+</div>
 
-## Documentos
+</div>
 
-- [Construir Ambiente](en/install.md)
-- [Inferência](en/inference.md)
+---
+
+## Documentos
 
-Deve-se notar que o modelo atual **NÃO SUPORTA AJUSTE FINO**.
+- [Construir Ambiente](pt/install.md)
+- [Inferência](pt/inference.md)
 
 ## Créditos
 
@@ -104,6 +186,7 @@ Deve-se notar que o modelo atual **NÃO SUPORTA AJUSTE FINO**.
 - [MQTTS](https://github.com/b04901014/MQTTS)
 - [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
 - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- [Qwen3](https://github.com/QwenLM/Qwen3)
 
 ## Relatório Técnico (V1.4)
 ```bibtex

+ 120 - 37
docs/README.zh.md

@@ -26,76 +26,158 @@
     <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
         <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
     </a>
+    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
+      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    </a>
+</div>
+
+<div align="center">
+    <a target="_blank" href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2">
+      <img alt="TTS-Arena2 Score" src="https://img.shields.io/badge/TTS_Arena2-Rank_%231-gold?style=flat-square&logo=trophy&logoColor=white">
+    </a>
     <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
     </a>
-    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
-      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    <a target="_blank" href="https://huggingface.co/fishaudio/openaudio-s1-mini">
+        <img alt="HuggingFace Model" src="https://img.shields.io/badge/🤗%20-models-orange"/>
     </a>
 </div>
 
-此代码库在 Apache License 下发布,所有模型权重在 CC-BY-NC-SA-4.0 License 下发布。更多详情请参考 [LICENSE](../LICENSE)。
+> [!IMPORTANT]
+> **许可证声明**  
+> 此代码库在 **Apache License** 下发布,所有模型权重在 **CC-BY-NC-SA-4.0 License** 下发布。更多详情请参考 [LICENSE](../LICENSE)。
+
+> [!WARNING]
+> **法律免责声明**  
+> 我们不对代码库的任何非法使用承担责任。请参考您当地关于 DMCA 和其他相关法律的法规。
+
+---
+
+## 🎉 公告
+
+我们很高兴地宣布,我们已将品牌重塑为 **OpenAudio** —— 推出基于 Fish-Speech 基础构建的革命性新一代高级文本转语音模型系列。
+
+我们自豪地发布 **OpenAudio-S1** 作为该系列的第一个模型,在质量、性能和功能方面都有显著改进。
+
+OpenAudio-S1 提供两个版本:**OpenAudio-S1** 和 **OpenAudio-S1-mini**。两个模型现在都可以在 [Fish Audio Playground](https://fish.audio)(**OpenAudio-S1**)和 [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini)(**OpenAudio-S1-mini**)上使用。
+
+请访问 [OpenAudio 网站](https://openaudio.com/blogs/s1) 获取博客和技术报告。
+
+## 亮点 ✨
+
+### **出色的 TTS 质量**
+
+我们使用 Seed TTS 评估指标来评估模型性能,结果显示 OpenAudio S1 在英语文本上达到了 **0.008 WER** 和 **0.004 CER**,这比以前的模型显著更好。(英语,自动评估,基于 OpenAI gpt-4o-transcribe,使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 进行说话人距离计算)
+
+| 模型 | 词错误率 (WER) | 字符错误率 (CER) | 说话人距离 |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **TTS-Arena2 最佳模型** 🏆
+
+OpenAudio S1 在 [TTS-Arena2](https://arena.speechcolab.org/) 上取得了 **第一名**,这是文本转语音评估的基准:
+
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 排名" style="width: 75%;" />
+</div>
+
+### **语音控制**
+OpenAudio S1 **支持多种情感、语调和特殊标记** 来增强语音合成:
+
+- **基础情感**:
+```
+(生气) (伤心) (兴奋) (惊讶) (满意) (高兴) 
+(害怕) (担心) (沮丧) (紧张) (挫败) (郁闷)
+(同情) (尴尬) (厌恶) (感动) (自豪) (放松)
+(感激) (自信) (感兴趣) (好奇) (困惑) (快乐)
+```
 
-我们很高兴地宣布,我们已将名字更改为 OpenAudio,这将是一个全新的文本转语音模型系列。
+- **高级情感**:
+```
+(鄙视) (不开心) (焦虑) (歇斯底里) (冷漠) 
+(不耐烦) (内疚) (轻蔑) (恐慌) (愤怒) (不情愿)
+(热衷) (不赞成) (消极) (否认) (震惊) (严肃)
+(讽刺) (安抚) (安慰) (真诚) (冷笑)
+(犹豫) (屈服) (痛苦) (尴尬) (觉得有趣)
+```
 
-演示可在 [Fish Audio Playground](https://fish.audio) 获得。
+- **语调标记**:
+```
+(急促的语调) (喊叫) (尖叫) (耳语) (柔和的语调)
+```
 
-访问 [OpenAudio 网站](https://openaudio.com) 获取博客和技术报告。
+- **特殊音频效果**:
+```
+(笑声) (轻笑) (抽泣) (大声哭泣) (叹息) (喘息)
+(呻吟) (人群笑声) (背景笑声) (观众笑声)
+```
 
-## 特性
-### OpenAudio-S1 (Fish-Speech 的新版本)
+您也可以使用 哈,哈,哈 来控制,还有许多其他情况等待您自己探索。
 
-1. 此模型具有 fish-speech 的**所有功能**。
+(目前支持英语、中文和日语,更多语言即将推出!)
 
-2. OpenAudio S1 支持多种情感、语调和特殊标记来增强语音合成:
+### **两种类型的模型**
+
+| 模型 | 大小 | 可用性 | 特性 |
+|-------|------|--------------|----------|
+| **S1** | 4B 参数 | 在 [fish.audio](https://fish.audio) 上可用 | 功能齐全的旗舰模型 |
+| **S1-mini** | 0.5B 参数 | 在 Hugging Face [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) 上可用 | 具有核心功能的精简版本 |
+
+S1 和 S1-mini 都集成了在线人类反馈强化学习(RLHF)。
    
-      (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused) PS:中文也支持
+   ## **功能**
 
-   同时支持语调标记:
+1. **零样本和少样本 TTS:** 输入 10 到 30 秒的语音样本以生成高质量的 TTS 输出。**详细指南请参见 [语音克隆最佳实践](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)。**
 
-   (急促的语调) (大喊) (尖叫) (低语) (温柔的语调)
+2. **多语言和跨语言支持:** 只需将多语言文本复制并粘贴到输入框中——无需担心语言问题。目前支持英语、日语、韩语、中文、法语、德语、阿拉伯语和西班牙语。
 
-    还有一些特殊标记得到支持:
+3. **无音素依赖:** 模型具有强大的泛化能力,不依赖音素进行 TTS。它可以处理任何语言脚本的文本。
 
-    (笑声) (轻笑) (抽泣) (大声哭泣) (叹气) (喘气) (呻吟) (人群笑声) (背景笑声) (观众笑声)
+4. **高准确性:** 在 Seed-TTS Eval 上实现约 0.4% 的低 CER(字符错误率)和约 0.8% 的 WER(词错误率)。
 
-    您也可以使用 **哈,哈,哈** 来控制,还有许多其他情况等待您自己探索。
+5. **快速:** 通过 fish-tech 加速,在 Nvidia RTX 4060 笔记本电脑上实时因子约为 1:5,在 Nvidia RTX 4090 上为 1:15
 
-3. OpenAudio S1 包含以下规模:
--   **S1 (4B, 专有):** 完整规模的模型。
--   **S1-mini (0.5B, 开源):** S1 的蒸馏版本。
+6. **WebUI 推理:** 具有易于使用的基于 Gradio 的 Web UI,兼容 Chrome、Firefox、Edge 和其他浏览器。
 
-    S1 和 S1-mini 都结合了在线人类反馈强化学习(RLHF)。
+7. **GUI 推理:** 提供与 API 服务器无缝配合的 PyQt6 图形界面。支持 Linux、Windows 和 macOS。[查看 GUI](https://github.com/AnyaCoder/fish-speech-gui)。
 
-4. 评估
+8. **部署友好:** 通过对 Linux、Windows(macOS 即将推出)的原生支持,轻松设置推理服务器,最小化速度损失。
 
-    **Seed TTS 评估指标(英语,自动评估,基于 OpenAI gpt-4o-transcribe,使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 的说话人距离):**
+## **媒体和演示**
 
-    -   **S1:**
-        -   WER(词错误率):**0.008**
-        -   CER(字符错误率):**0.004**
-        -   距离:**0.332**
-    -   **S1-mini:**
-        -   WER(词错误率):**0.011**
-        -   CER(字符错误率):**0.005**
-        -   距离:**0.380**
-    
+<div align="center">
 
-## 免责声明
+### **社交媒体**
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="X 上的最新演示" />
+</a>
 
-我们不对代码库的任何非法使用承担责任。请参考您当地关于 DMCA 和其他相关法律的规定。
+### **交互式演示**
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="试用 OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="试用 S1 Mini" />
+</a>
 
-## 视频
+### **视频展示**
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
 
-#### 待续。
+### **音频样本**
+<div style="margin: 20px 0;">
+    <em> 展示我们跨不同语言和情感的多语言 TTS 功能的高质量音频样本即将推出。</em>
+</div>
+
+</div>
+
+---
 
 ## 文档
 
 - [构建环境](zh/install.md)
 - [推理](zh/inference.md)
 
-需要注意的是,当前模型**不支持微调**。
-
 ## 致谢
 
 - [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
@@ -104,6 +186,7 @@
 - [MQTTS](https://github.com/b04901014/MQTTS)
 - [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
 - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- [Qwen3](https://github.com/QwenLM/Qwen3)
 
 ## 技术报告 (V1.4)
 ```bibtex

BIN
docs/assets/Elo.jpg


BIN
docs/assets/Thumbnail.jpg


+ 50 - 24
docs/en/index.md

@@ -28,22 +28,40 @@
 
 ---
 
-!!! warning "Legal Notice"
-    We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area.
-    
-    **License:** This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
+!!! note "License Notice"
+    This codebase is released under **Apache License** and all model weights are released under **CC-BY-NC-SA-4.0 License**. Please refer to [LICENSE](LICENSE) for more details.
+
+!!! warning "Legal Disclaimer"
+    We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
 
 ## **Introduction**
 
 We are excited to announce that we have rebranded to **OpenAudio** - introducing a brand new series of advanced Text-to-Speech models that builds upon the foundation of Fish-Speech with significant improvements and new capabilities.
 
-**Openaudio-S1-mini**: [Video](To Be Uploaded); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
+**Openaudio-S1-mini**: [Blog](https://openaudio.com/blogs/s1); [Video](https://www.youtube.com/watch?v=SYuPvd7m06A); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
 
 **Fish-Speech v1.5**: [Video](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
 
-## **Highlights** ✨
+## **Highlights**
+
+### **Excellent TTS quality**
+
+We use Seed TTS Eval Metrics to evaluate the model performance, and the results show that OpenAudio S1 achieves **0.008 WER** and **0.004 CER** on English text, which is significantly better than previous models. (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Model | Word Error Rate (WER) | Character Error Rate (CER) | Speaker Distance |
+|:-----:|:--------------------:|:-------------------------:|:----------------:|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-### **Emotion Control**
+### **Best Model in TTS-Arena2**
+
+OpenAudio S1 has achieved the **#1 ranking** on [TTS-Arena2](https://arena.speechcolab.org/), the benchmark for text-to-speech evaluation:
+
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **Speech Control**
 OpenAudio S1 **supports a variety of emotional, tone, and special markers** to enhance speech synthesis:
 
 - **Basic emotions**:
@@ -63,6 +81,8 @@ OpenAudio S1 **supports a variety of emotional, tone, and special markers** to e
 (hesitating) (yielding) (painful) (awkward) (amused)
 ```
 
+(Support for English, Chinese and Japanese now, and more languages is coming soon!)
+
 - **Tone markers**:
 ```
 (in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
@@ -76,21 +96,13 @@ OpenAudio S1 **supports a variety of emotional, tone, and special markers** to e
 
 You can also use Ha,ha,ha to control, there's many other cases waiting to be explored by yourself.
 
-### **Excellent TTS quality**
-
-We use Seed TTS Eval Metrics to evaluate the model performance, and the results show that OpenAudio S1 achieves **0.008 WER** and **0.004 CER** on English text, which is significantly better than previous models. (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+### **Two Type of Models**
 
-| Model | Word Error Rate (WER) | Character Error Rate (CER) | Speaker Distance |
-|-------|----------------------|---------------------------|------------------|
-| **S1** | **0.008**  | **0.004**  | **0.332** |
-| **S1-mini** | **0.011** | **0.005** | **0.380** |
+We offer two model variants to suit different needs:
 
-### **Two Type of Models**
+- **OpenAudio S1 (4B parameters)**: Our full-featured flagship model available on [fish.audio](https://fish.audio), delivering the highest quality speech synthesis with all advanced features.
 
-| Model | Size | Availability | Features |
-|-------|------|--------------|----------|
-| **S1** | 4B parameters | Avaliable on [fish.audio](fish.audio) | Full-featured flagship model |
-| **S1-mini** | 0.5B parameters | Avaliable on huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Distilled version with core capabilities |
+- **OpenAudio S1-mini (0.5B parameters)**: A distilled version with core capabilities, available on [Hugging Face Space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini), optimized for faster inference while maintaining excellent quality.
 
 Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedback (RLHF).
 
@@ -112,14 +124,28 @@ Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedbac
 
 8. **Deploy-Friendly:** Easily set up an inference server with native support for Linux, Windows (MacOS comming soon), minimizing speed loss.
 
-## **Disclaimer**
+## **Media & Demos**
 
-We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
+<!-- <div align="center"> -->
 
-## **Media & Demos**
+<h3><strong>Social Media</strong></h3>
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
+
+<h3><strong>Interactive Demos</strong></h3>
 
-#### 🚧 Coming Soon
-Video demonstrations and tutorials are currently in development.
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+<h3><strong>Video Showcases</strong></h3>
+<div align="center">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
 
 ## **Documentation**
 

+ 63 - 37
docs/ja/index.md

@@ -28,69 +28,81 @@
 
 ---
 
-!!! warning "法的通知"
-    このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCA(デジタルミレニアム著作権法)およびその他の関連法規をご参照ください。
-    
-    **ライセンス:** このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
+!!! note "ライセンス通知"
+    このコードベースは **Apacheライセンス** の下でリリースされ、すべてのモデル重みは **CC-BY-NC-SA-4.0ライセンス** の下でリリースされています。詳細は [LICENSE](LICENSE) を参照してください。
+
+!!! warning "法的免責事項"
+    コードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCAおよびその他の関連法規をご参照ください。
 
 ## **紹介**
 
 私たちは **OpenAudio** への改名を発表できることを嬉しく思います。Fish-Speechを基盤とし、大幅な改善と新機能を加えた、新しい先進的なText-to-Speechモデルシリーズを紹介します。
 
-**Openaudio-S1-mini**: [動画](アップロード予定); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
+**Openaudio-S1-mini**: [ブログ](https://openaudio.com/blogs/s1); [動画](https://www.youtube.com/watch?v=SYuPvd7m06A); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
 
 **Fish-Speech v1.5**: [動画](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
 
-## **ハイライト** ✨
+## **ハイライト**
+
+### **優秀なTTS品質**
+
+Seed TTS評価指標を使用してモデルのパフォーマンスを評価した結果、OpenAudio S1は英語テキストで**0.008 WER**と**0.004 CER**を達成し、以前のモデルより大幅に改善されました。(英語、自動評価、OpenAI gpt-4o-転写に基づく、話者距離はRevai/pyannote-wespeaker-voxceleb-resnet34-LM使用)
+
+| モデル | 単語誤り率 (WER) | 文字誤り率 (CER) | 話者距離 |
+|:-----:|:--------------------:|:-------------------------:|:----------------:|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-### **感情制御**
+### **TTS-Arena2最高モデル**
+
+OpenAudio S1は[TTS-Arena2](https://arena.speechcolab.org/)で**#1ランキング**を達成しました。これはtext-to-speech評価のベンチマークです:
+
+<div align="center">
+    <img src="../assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **音声制御**
 OpenAudio S1は**多様な感情、トーン、特殊マーカーをサポート**して音声合成を強化します:
 
 - **基本感情**:
 ```
-(angry) (sad) (excited) (surprised) (satisfied) (delighted)
-(scared) (worried) (upset) (nervous) (frustrated) (depressed)
-(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
-(grateful) (confident) (interested) (curious) (confused) (joyful)
+(怒った) (悲しい) (興奮した) (驚いた) (満足した) (喜んだ) 
+(怖がった) (心配した) (動揺した) (緊張した) (欲求不満な) (落ち込んだ)
+(共感した) (恥ずかしい) (嫌悪した) (感動した) (誇らしい) (リラックスした)
+(感謝した) (自信のある) (興味のある) (好奇心のある) (困惑した) (楽しい)
 ```
 
 - **高度な感情**:
 ```
-(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
-(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
-(keen) (disapproving) (negative) (denying) (astonished) (serious)
-(sarcastic) (conciliative) (comforting) (sincere) (sneering)
-(hesitating) (yielding) (painful) (awkward) (amused)
+(軽蔑的な) (不幸な) (不安な) (ヒステリックな) (無関心な) 
+(いらいらした) (罪悪感のある) (軽蔑的な) (パニックした) (激怒した) (不本意な)
+(熱心な) (不賛成の) (否定的な) (否定する) (驚いた) (真剣な)
+(皮肉な) (和解的な) (慰める) (誠実な) (冷笑的な)
+(躊躇する) (譲歩する) (痛々しい) (気まずい) (面白がった)
 ```
 
+(現在英語、中国語、日本語をサポート、より多くの言語が近日公開予定!)
+
 - **トーンマーカー**:
 ```
-(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+(急いだ調子で) (叫んで) (悲鳴をあげて) (ささやいて) (柔らかい調子で)
 ```
 
 - **特殊音響効果**:
 ```
-(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
-(groaning) (crowd laughing) (background laughter) (audience laughing)
+(笑って) (くすくす笑って) (すすり泣いて) (大声で泣いて) (ため息をついて) (息を切らして)
+(うめいて) (群衆の笑い声) (背景の笑い声) (観客の笑い声)
 ```
 
 Ha,ha,haを使用してコントロールすることもでき、他にも多くの使用法があなた自身の探索を待っています。
 
-### **優秀なTTS品質**
-
-Seed TTS評価指標を使用してモデルのパフォーマンスを評価した結果、OpenAudio S1は英語テキストで**0.008 WER**と**0.004 CER**を達成し、以前のモデルより大幅に改善されました。(英語、自動評価、OpenAI gpt-4o-転写に基づく、話者距離はRevai/pyannote-wespeaker-voxceleb-resnet34-LM使用)
+### **2つのモデルタイプ**
 
-| モデル | 単語誤り率 (WER) | 文字誤り率 (CER) | 話者距離 |
-|-------|----------------------|---------------------------|------------------|
-| **S1** | **0.008**  | **0.004**  | **0.332** |
-| **S1-mini** | **0.011** | **0.005** | **0.380** |
+異なるニーズに対応する2つのモデルバリエーションを提供しています:
 
-### **2つのモデルタイプ**
+- **OpenAudio S1 (40億パラメータ)**:[fish.audio](https://fish.audio) で利用可能な全機能搭載のフラッグシップモデルで、すべての高度な機能を備えた最高品質の音声合成を提供します。
 
-| モデル | サイズ | 利用可能性 | 特徴 |
-|-------|------|--------------|----------|
-| **S1** | 40億パラメータ | [fish.audio](fish.audio) で利用可能 | 全機能搭載のフラッグシップモデル |
-| **S1-mini** | 5億パラメータ | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) で利用可能 | コア機能を備えた蒸留版 |
+- **OpenAudio S1-mini (5億パラメータ)**:コア機能を備えた蒸留版で、[Hugging Face Space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) で利用可能です。優秀な品質を維持しながら、より高速な推論のために最適化されています。
 
 S1とS1-miniの両方にオンライン人間フィードバック強化学習(RLHF)が組み込まれています。
 
@@ -110,16 +122,30 @@ S1とS1-miniの両方にオンライン人間フィードバック強化学習
 
 7. **GUI推論:** APIサーバーとシームレスに連携するPyQt6グラフィカルインターフェースを提供します。Linux、Windows、macOSをサポートします。[GUIを見る](https://github.com/AnyaCoder/fish-speech-gui)。
 
-8. **デプロイフレンドリー:** Linux、Windows、MacOSの native サポートで推論サーバーを簡単にセットアップし、速度低下を最小化します。
+8. **デプロイフレンドリー:** Linux、Windows(MacOS近日公開)のネイティブサポートで推論サーバーを簡単にセットアップし、速度低下を最小化します。
 
-## **免責事項**
+## **メディア・デモ**
 
-コードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCAやその他の関連法律をご参照ください。
+<!-- <div align="center"> -->
 
-## **メディア・デモ**
+<h3><strong>ソーシャルメディア</strong></h3>
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-最新デモ-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
+
+<h3><strong>インタラクティブデモ</strong></h3>
 
-#### 🚧 近日公開
-動画デモとチュートリアルは現在開発中です。
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-OpenAudio_S1を試す-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-S1_Miniを試す-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+<h3><strong>動画ショーケース</strong></h3>
+<div align="center">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
 
 ## **ドキュメント**
 

+ 73 - 36
docs/ko/index.md

@@ -4,7 +4,15 @@
 
 <div align="center">
 
-<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
+<img src="../assets/opena### **두 가지 모델 유형**
+
+다양한 요구 사항에 맞는 두 가지 모델 변형을 제공합니다:
+
+- **OpenAudio S1 (40억 매개변수)**:[fish.audio](https://fish.audio)에서 이용 가능한 모든 기능을 갖춘 플래그십 모델로, 모든 고급 기능을 갖춘 최고 품질의 음성 합성을 제공합니다.
+
+- **OpenAudio S1-mini (5억 매개변수)**:핵심 기능을 갖춘 경량화 버전으로, [Hugging Face Space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)에서 이용 가능하며, 우수한 품질을 유지하면서 더 빠른 추론을 위해 최적화되었습니다.
+
+S1과 S1-mini 모두 온라인 인간 피드백 강화 학습(RLHF)이 통합되어 있습니다。t="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
 
 </div>
 
@@ -28,70 +36,85 @@
 
 ---
 
-!!! warning "법적 고지"
-    코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다.
-    
-    **라이선스:** 이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
+!!! note "라이선스 안내"
+    이 코드베이스는 **Apache 라이선스** 하에 배포되며, 모든 모델 가중치는 **CC-BY-NC-SA-4.0 라이선스** 하에 배포됩니다. 자세한 내용은 [LICENSE](LICENSE)를 참조하세요.
+
+!!! warning "법적 면책조항"
+    코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA 및 기타 관련 법률을 참고하시기 바랍니다.
 
 ## **소개**
 
 저희는 **OpenAudio**로의 브랜드 변경을 발표하게 되어 기쁩니다. Fish-Speech를 기반으로 하여 상당한 개선과 새로운 기능을 추가한 새로운 고급 텍스트-음성 변환 모델 시리즈를 소개합니다.
 
-**Openaudio-S1-mini**: [동영상](업로드 예정); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
+**Openaudio-S1-mini**: [블로그](https://openaudio.com/blogs/s1); [동영상](https://www.youtube.com/watch?v=SYuPvd7m06A); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
 
 **Fish-Speech v1.5**: [동영상](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
 
-## **주요 특징**
+## **주요 특징**
 
-### **감정 제어**
+### **뛰어난 TTS 품질**
+
+Seed TTS 평가 지표를 사용하여 모델 성능을 평가한 결과, OpenAudio S1은 영어 텍스트에서 **0.008 WER**과 **0.004 CER**을 달성하여 이전 모델보다 현저히 향상되었습니다. (영어, 자동 평가, OpenAI gpt-4o-전사 기반, 화자 거리는 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 사용)
+
+| 모델 | 단어 오류율 (WER) | 문자 오류율 (CER) | 화자 거리 |
+|:-----:|:--------------------:|:-------------------------:|:----------------:|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **TTS-Arena2 최고 모델**
+
+OpenAudio S1은 [TTS-Arena2](https://arena.speechcolab.org/)에서 **#1 순위**를 달성했습니다. 이는 텍스트 음성 변환 평가의 기준입니다:
+
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **음성 제어**
 OpenAudio S1은 **다양한 감정, 톤, 특수 마커를 지원**하여 음성 합성을 향상시킵니다:
 
 - **기본 감정**:
 ```
-(angry) (sad) (excited) (surprised) (satisfied) (delighted)
-(scared) (worried) (upset) (nervous) (frustrated) (depressed)
-(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
-(grateful) (confident) (interested) (curious) (confused) (joyful)
+(화난) (슬픈) (흥미진진한) (놀란) (만족한) (기쁜) 
+(무서워하는) (걱정하는) (속상한) (긴장한) (좌절한) (우울한)
+(공감하는) (당황한) (역겨워하는) (감동한) (자랑스러운) (편안한)
+(감사한) (자신감있는) (관심있는) (호기심있는) (혼란스러운) (즐거운)
 ```
 
 - **고급 감정**:
 ```
-(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
-(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
-(keen) (disapproving) (negative) (denying) (astonished) (serious)
-(sarcastic) (conciliative) (comforting) (sincere) (sneering)
-(hesitating) (yielding) (painful) (awkward) (amused)
+(경멸하는) (불행한) (불안한) (히스테리컬한) (무관심한) 
+(참을성없는) (죄책감있는) (멸시하는) (공황상태의) (격분한) (마지못한)
+(열망하는) (불찬성하는) (부정적인) (부인하는) (놀란) (진지한)
+(비꼬는) (화해하는) (위로하는) (진실한) (비웃는)
+(주저하는) (굴복하는) (고통스러운) (어색한) (재미있어하는)
 ```
 
+(현재 영어, 중국어, 일본어를 지원하며, 더 많은 언어가 곧 출시될 예정입니다!)
+
 - **톤 마커**:
 ```
-(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+(서두르는 톤으로) (소리치며) (비명지르며) (속삭이며) (부드러운 톤으로)
 ```
 
 - **특수 음향 효과**:
 ```
-(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
-(groaning) (crowd laughing) (background laughter) (audience laughing)
+(웃으며) (킥킥거리며) (흐느끼며) (크게 울며) (한숨쉬며) (헐떡이며)
+(신음하며) (군중 웃음소리) (배경 웃음소리) (관객 웃음소리)
 ```
 
 Ha,ha,ha를 사용하여 제어할 수도 있으며, 여러분 스스로 탐구할 수 있는 다른 많은 사용법이 있습니다.
 
-### **뛰어난 TTS 품질**
-
-Seed TTS 평가 지표를 사용하여 모델 성능을 평가한 결과, OpenAudio S1은 영어 텍스트에서 **0.008 WER**과 **0.004 CER**을 달성하여 이전 모델보다 현저히 향상되었습니다. (영어, 자동 평가, OpenAI gpt-4o-전사 기반, 화자 거리는 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 사용)
-
-| 모델 | 단어 오류율 (WER) | 문자 오류율 (CER) | 화자 거리 |
-|-------|----------------------|---------------------------|------------------|
-| **S1** | **0.008**  | **0.004**  | **0.332** |
-| **S1-mini** | **0.011** | **0.005** | **0.380** |
-
 ### **두 가지 모델 유형**
 
+<div align="center">
+
 | 모델 | 크기 | 가용성 | 특징 |
 |-------|------|--------------|----------|
-| **S1** | 40억 매개변수 | [fish.audio](fish.audio)에서 이용 가능 | 모든 기능을 갖춘 플래그십 모델 |
+| **S1** | 40억 매개변수 | [fish.audio](https://fish.audio)에서 이용 가능 | 모든 기능을 갖춘 플래그십 모델 |
 | **S1-mini** | 5억 매개변수 | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)에서 이용 가능 | 핵심 기능을 갖춘 경량화 버전 |
 
+</div>
+
 S1과 S1-mini 모두 온라인 인간 피드백 강화 학습(RLHF)이 통합되어 있습니다.
 
 ## **기능**
@@ -110,16 +133,30 @@ S1과 S1-mini 모두 온라인 인간 피드백 강화 학습(RLHF)이 통합되
 
 7. **GUI 추론:** API 서버와 원활하게 작동하는 PyQt6 그래픽 인터페이스를 제공합니다. Linux, Windows, macOS를 지원합니다. [GUI 보기](https://github.com/AnyaCoder/fish-speech-gui).
 
-8. **배포 친화적:** Linux, Windows, MacOS의 네이티브 지원으로 추론 서버를 쉽게 설정하여 속도 손실을 최소화합니다.
+8. **배포 친화적:** Linux, Windows (MacOS 곧 출시 예정)의 네이티브 지원으로 추론 서버를 쉽게 설정하여 속도 손실을 최소화합니다.
 
-## **면책 조항**
+## **미디어 및 데모**
 
-코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하 지역의 DMCA 및 기타 관련 법률을 참고하시기 바랍니다.
+<!-- <div align="center"> -->
 
-## **미디어 및 데모**
+<h3><strong>소셜 미디어</strong></h3>
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-최신_데모-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
 
-#### 🚧 곧 출시 예정
-동영상 데모와 튜토리얼이 현재 개발 중입니다.
+<h3><strong>인터랙티브 데모</strong></h3>
+
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-OpenAudio_S1_체험-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-S1_Mini_체험-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+<h3><strong>동영상 쇼케이스</strong></h3>
+<div align="center">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
 
 ## **문서**
 

+ 63 - 37
docs/pt/index.md

@@ -28,69 +28,81 @@
 
 ---
 
+!!! note "Aviso de Licença"
+    Esta base de código é lançada sob **Licença Apache** e todos os pesos do modelo são lançados sob **Licença CC-BY-NC-SA-4.0**. Consulte [LICENSE](LICENSE) para mais detalhes.
+
 !!! warning "Aviso Legal"
-    Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área.
-    
-    **Licença:** Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
+    Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA e outras leis relevantes.
 
 ## **Introdução**
 
 Estamos empolgados em anunciar que mudamos nossa marca para **OpenAudio** - introduzindo uma nova série de modelos avançados de Text-to-Speech que se baseia na fundação do Fish-Speech com melhorias significativas e novas capacidades.
 
-**Openaudio-S1-mini**: [Vídeo](A ser carregado); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
+**OpenAudio-S1-mini**: [Blog](https://openaudio.com/blogs/s1); [Vídeo](https://www.youtube.com/watch?v=SYuPvd7m06A); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
 
 **Fish-Speech v1.5**: [Vídeo](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
 
-## **Destaques** ✨
+## **Destaques**
+
+### **Qualidade TTS Excelente**
+
+Utilizamos as métricas Seed TTS Eval para avaliar o desempenho do modelo, e os resultados mostram que o OpenAudio S1 alcança **0.008 WER** e **0.004 CER** em texto inglês, que é significativamente melhor que modelos anteriores. (Inglês, avaliação automática, baseada na transcrição OpenAI gpt-4o, distância do falante usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Modelo | Taxa de Erro de Palavras (WER) | Taxa de Erro de Caracteres (CER) | Distância do Falante |
+|:-----:|:--------------------:|:-------------------------:|:----------------:|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-### **Controle Emocional**
-O OpenAudio S1 **suporta uma variedade de marcadores emocionais, de tom e especiais** para aprimorar a síntese de fala:
+### **Melhor Modelo no TTS-Arena2**
+
+OpenAudio S1 alcançou a **classificação #1** no [TTS-Arena2](https://arena.speechcolab.org/), o benchmark para avaliação de text-to-speech:
+
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **Controle de Fala**
+OpenAudio S1 **suporta uma variedade de marcadores emocionais, de tom e especiais** para aprimorar a síntese de fala:
 
 - **Emoções básicas**:
 ```
-(angry) (sad) (excited) (surprised) (satisfied) (delighted)
-(scared) (worried) (upset) (nervous) (frustrated) (depressed)
-(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
-(grateful) (confident) (interested) (curious) (confused) (joyful)
+(raivoso) (triste) (animado) (surpreso) (satisfeito) (encantado) 
+(com medo) (preocupado) (chateado) (nervoso) (frustrado) (deprimido)
+(empático) (envergonhado) (nojento) (comovido) (orgulhoso) (relaxado)
+(grato) (confiante) (interessado) (curioso) (confuso) (alegre)
 ```
 
 - **Emoções avançadas**:
 ```
-(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
-(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
-(keen) (disapproving) (negative) (denying) (astonished) (serious)
-(sarcastic) (conciliative) (comforting) (sincere) (sneering)
-(hesitating) (yielding) (painful) (awkward) (amused)
+(desdenhoso) (infeliz) (ansioso) (histérico) (indiferente) 
+(impaciente) (culpado) (desprezível) (em pânico) (furioso) (relutante)
+(entusiasmado) (desaprovador) (negativo) (negando) (espantado) (sério)
+(sarcástico) (conciliador) (consolador) (sincero) (zombeteiro)
+(hesitante) (cedendo) (doloroso) (constrangido) (divertido)
 ```
 
+(Suporte para inglês, chinês e japonês agora, e mais idiomas em breve!)
+
 - **Marcadores de tom**:
 ```
-(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+(em tom de pressa) (gritando) (berrando) (sussurrando) (tom suave)
 ```
 
 - **Efeitos sonoros especiais**:
 ```
-(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
-(groaning) (crowd laughing) (background laughter) (audience laughing)
+(rindo) (gargalhando) (soluçando) (chorando alto) (suspirando) (ofegante)
+(gemendo) (risada da multidão) (risada de fundo) (risada da plateia)
 ```
 
 Você também pode usar Ha,ha,ha para controlar, há muitos outros casos esperando para serem explorados por você mesmo.
 
-### **Qualidade TTS Excelente**
-
-Utilizamos as métricas Seed TTS Eval para avaliar o desempenho do modelo, e os resultados mostram que o OpenAudio S1 alcança **0.008 WER** e **0.004 CER** em texto inglês, que é significativamente melhor que modelos anteriores. (Inglês, avaliação automática, baseada na transcrição OpenAI gpt-4o, distância do falante usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+### **Dois Tipos de Modelos**
 
-| Modelo | Taxa de Erro de Palavras (WER) | Taxa de Erro de Caracteres (CER) | Distância do Falante |
-|-------|----------------------|---------------------------|------------------|
-| **S1** | **0.008**  | **0.004**  | **0.332** |
-| **S1-mini** | **0.011** | **0.005** | **0.380** |
+Oferecemos duas variantes de modelo para atender diferentes necessidades:
 
-### **Dois Tipos de Modelos**
+- **OpenAudio S1 (4B parâmetros)**: Nosso modelo principal com todas as funcionalidades disponível em [fish.audio](https://fish.audio), oferecendo a mais alta qualidade de síntese de fala com todas as características avançadas.
 
-| Modelo | Tamanho | Disponibilidade | Características |
-|-------|------|--------------|----------|
-| **S1** | 4B parâmetros | Disponível em [fish.audio](fish.audio) | Modelo principal com todas as funcionalidades |
-| **S1-mini** | 0.5B parâmetros | Disponível no huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Versão destilada com capacidades principais |
+- **OpenAudio S1-mini (0.5B parâmetros)**: Uma versão destilada com capacidades principais, disponível no [Hugging Face Space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini), otimizada para inferência mais rápida mantendo excelente qualidade.
 
 Tanto o S1 quanto o S1-mini incorporam Aprendizado por Reforço Online com Feedback Humano (RLHF).
 
@@ -110,16 +122,30 @@ Tanto o S1 quanto o S1-mini incorporam Aprendizado por Reforço Online com Feedb
 
 7. **Inferência GUI:** Oferece uma interface gráfica PyQt6 que funciona perfeitamente com o servidor API. Suporta Linux, Windows e macOS. [Ver GUI](https://github.com/AnyaCoder/fish-speech-gui).
 
-8. **Amigável para Deploy:** Configure facilmente um servidor de inferência com suporte nativo para Linux, Windows e MacOS, minimizando a perda de velocidade.
+8. **Amigável para Deploy:** Configure facilmente um servidor de inferência com suporte nativo para Linux, Windows (MacOS em breve), minimizando a perda de velocidade.
 
-## **Isenção de Responsabilidade**
+## **Mídia e Demos**
 
-Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte suas leis locais sobre DMCA e outras leis relacionadas.
+<!-- <div align="center"> -->
 
-## **Mídia e Demos**
+<h3><strong>Mídia Social</strong></h3>
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Demo_Mais_Recente-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
+
+<h3><strong>Demos Interativos</strong></h3>
 
-#### 🚧 Em Breve
-Demonstrações em vídeo e tutoriais estão atualmente em desenvolvimento.
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Experimente_OpenAudio_S1-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Experimente_S1_Mini-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+<h3><strong>Showcases em Vídeo</strong></h3>
+<div align="center">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
 
 ## **Documentação**
 

+ 64 - 38
docs/zh/index.md

@@ -20,7 +20,7 @@
 <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
 <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
 </a>
-</div>
+</div>·
 
 <strong>立即试用:</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>了解更多:</strong> <a href="https://openaudio.com">OpenAudio 网站</a>
 
@@ -28,69 +28,81 @@
 
 ---
 
-!!! warning "法律声明"
-    我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA(数字千年版权法)和其他相关法律的规定。
-    
-    **许可证:** 此代码库在 Apache 2.0 许可证下发布,所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
+!!! note "许可证声明"
+    此代码库在 **Apache 许可证** 下发布,所有模型权重在 **CC-BY-NC-SA-4.0 许可证** 下发布。更多详情请参阅 [LICENSE](LICENSE)。
+
+!!! warning "法律免责声明"
+    我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA 和其他相关法律的规定。
 
 ## **介绍**
 
 我们很高兴地宣布,我们已经更名为 **OpenAudio** - 推出全新的先进文字转语音模型系列,在 Fish-Speech 的基础上进行了重大改进并增加了新功能。
 
-**Openaudio-S1-mini**: [视频](即将上传); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
+**Openaudio-S1-mini**: [博客](https://openaudio.com/blogs/s1); [视频](https://www.youtube.com/watch?v=SYuPvd7m06A); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
 
 **Fish-Speech v1.5**: [视频](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
 
-## **亮点** ✨
+## **亮点**
+
+### **优秀的 TTS 质量**
+
+我们使用 Seed TTS 评估指标来评估模型性能,结果显示 OpenAudio S1 在英文文本上达到了 **0.008 WER** 和 **0.004 CER**,明显优于以前的模型。(英语,自动评估,基于 OpenAI gpt-4o-转录,说话人距离使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| 模型 | 词错误率 (WER) | 字符错误率 (CER) | 说话人距离 |
+|:-----:|:--------------------:|:-------------------------:|:----------------:|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-### **情感控制**
+### **TTS-Arena2 最佳模型**
+
+OpenAudio S1 在 [TTS-Arena2](https://arena.speechcolab.org/) 上获得了 **#1 排名**,这是文字转语音评估的基准:
+
+<div align="center">
+    <img src="../assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **语音控制**
 OpenAudio S1 **支持多种情感、语调和特殊标记**来增强语音合成效果:
 
 - **基础情感**:
 ```
-(angry) (sad) (excited) (surprised) (satisfied) (delighted)
-(scared) (worried) (upset) (nervous) (frustrated) (depressed)
-(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
-(grateful) (confident) (interested) (curious) (confused) (joyful)
+(生气) (伤心) (兴奋) (惊讶) (满意) (高兴) 
+(害怕) (担心) (沮丧) (紧张) (失望) (沮丧)
+(共情) (尴尬) (厌恶) (感动) (自豪) (放松)
+(感激) (自信) (感兴趣) (好奇) (困惑) (快乐)
 ```
 
 - **高级情感**:
 ```
-(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
-(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
-(keen) (disapproving) (negative) (denying) (astonished) (serious)
-(sarcastic) (conciliative) (comforting) (sincere) (sneering)
-(hesitating) (yielding) (painful) (awkward) (amused)
+(鄙视) (不高兴) (焦虑) (歇斯底里) (漠不关心) 
+(不耐烦) (内疚) (轻蔑) (恐慌) (愤怒) (不情愿)
+(渴望) (不赞成) (否定) (否认) (惊讶) (严肃)
+(讽刺) (和解) (安慰) (真诚) (冷笑)
+(犹豫) (让步) (痛苦) (尴尬) (开心)
 ```
 
+(现在支持英语、中文和日语,更多语言即将推出!)
+
 - **语调标记**:
 ```
-(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+(匆忙的语调) (大喊) (尖叫) (耳语) (轻声)
 ```
 
 - **特殊音效**:
 ```
-(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
-(groaning) (crowd laughing) (background laughter) (audience laughing)
+(笑) (轻笑) (抽泣) (大哭) (叹气) (喘气)
+(呻吟) (群体笑声) (背景笑声) (观众笑声)
 ```
 
 您还可以使用 Ha,ha,ha 来控制,还有许多其他用法等待您自己探索。
 
-### **卓越的 TTS 质量**
-
-我们使用 Seed TTS 评估指标来评估模型性能,结果显示 OpenAudio S1 在英文文本上达到了 **0.008 WER** 和 **0.004 CER**,明显优于以前的模型。(英语,自动评估,基于 OpenAI gpt-4o-转录,说话人距离使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+### **两种模型类型**
 
-| 模型 | 词错误率 (WER) | 字符错误率 (CER) | 说话人距离 |
-|-------|----------------------|---------------------------|------------------|
-| **S1** | **0.008**  | **0.004**  | **0.332** |
-| **S1-mini** | **0.011** | **0.005** | **0.380** |
+我们提供两种模型变体以满足不同需求:
 
-### **两种模型类型**
+- **OpenAudio S1 (40亿参数)**:我们功能齐全的旗舰模型,可在 [fish.audio](https://fish.audio) 上使用,提供最高质量的语音合成和所有高级功能。
 
-| 模型 | 规模 | 可用性 | 特性 |
-|-------|------|--------------|----------|
-| **S1** | 40亿参数 | 在 [fish.audio](fish.audio) 上可用 | 功能齐全的旗舰模型 |
-| **S1-mini** | 5亿参数 | 在 huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) 上可用 | 具有核心功能的蒸馏版本 |
+- **OpenAudio S1-mini (5亿参数)**:具有核心功能的蒸馏版本,可在 [Hugging Face Space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) 上使用,针对更快推理进行优化,同时保持出色的质量。
 
 S1 和 S1-mini 都集成了在线人类反馈强化学习 (RLHF)。
 
@@ -110,16 +122,30 @@ S1 和 S1-mini 都集成了在线人类反馈强化学习 (RLHF)。
 
 7. **GUI 推理:** 提供与 API 服务器无缝配合的 PyQt6 图形界面。支持 Linux、Windows 和 macOS。[查看 GUI](https://github.com/AnyaCoder/fish-speech-gui)。
 
-8. **部署友好:** 轻松设置推理服务器,原生支持 Linux、Windows 和 MacOS,最小化速度损失。
+8. **部署友好:** 轻松设置推理服务器,原生支持 Linux、Windows(MacOS 即将推出),最小化速度损失。
 
-## **免责声明**
+## **媒体和演示**
 
-我们不对代码库的任何非法使用承担责任。请参考您当地关于 DMCA 和其他相关法律的规定。
+<!-- <div align="center"> -->
 
-## **媒体和演示**
+<h3><strong>社交媒体</strong></h3>
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-最新演示-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
+
+<h3><strong>互动演示</strong></h3>
 
-#### 🚧 即将推出
-视频演示和教程正在开发中。
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-试用_OpenAudio_S1-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-试用_S1_Mini-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+<h3><strong>视频展示</strong></h3>
+<div align="center">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
 
 ## **文档**