1 سال پیش · b8369249e3
--- a/docs/en/index.md
+++ b/docs/en/index.md
@@ -0,0 +1,108 @@
 
				+# Inference
			
 
				+
			
 
				+As the vocoder model has been changed, you need more VRAM than before, 12GB is recommended for fluently inference.
			
 
				+
			
 
				+We support command line, HTTP API and WebUI for inference, you can choose any method you like.
			
 
				+
			
 
				+## Download Weights
			
 
				+
			
 
				+First you need to download the model weights:
			
 
				+
			
 
				+```bash
			
 
				+huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
			
 
				+```
			
 
				+
			
 
				+## Command Line Inference
			
 
				+
			
 
				+!!! note
			
 
				+    If you plan to let the model randomly choose a voice timbre, you can skip this step.
			
 
				+
			
 
				+### 1. Get VQ tokens from reference audio
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/dac/inference.py \
			
 
				+    -i "ref_audio_name.wav" \
			
 
				+    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
			
 
				+```
			
 
				+
			
 
				+You should get a `fake.npy` and a `fake.wav`.
			
 
				+
			
 
				+### 2. Generate semantic tokens from text:
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/text2semantic/inference.py \
			
 
				+    --text "The text you want to convert" \
			
 
				+    --prompt-text "Your reference text" \
			
 
				+    --prompt-tokens "fake.npy" \
			
 
				+    --checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --num-samples 2 \
			
 
				+    --compile # if you want a faster speed
			
 
				+```
			
 
				+
			
 
				+This command will create a `codes_N` file in the working directory, where N is an integer starting from 0.
			
 
				+
			
 
				+!!! note
			
 
				+    You may want to use `--compile` to fuse CUDA kernels for faster inference (~30 tokens/second -> ~500 tokens/second).
			
 
				+    Correspondingly, if you do not plan to use acceleration, you can comment out the `--compile` parameter.
			
 
				+
			
 
				+!!! info
			
 
				+    For GPUs that do not support bf16, you may need to use the `--half` parameter.
			
 
				+
			
 
				+### 3. Generate vocals from semantic tokens:
			
 
				+
			
 
				+#### VQGAN Decoder
			
 
				+
			
 
				+!!! warning "Future Warning"
			
 
				+    We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/dac/inference.py \
			
 
				+    -i "codes_0.npy" \
			
 
				+    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
			
 
				+```
			
 
				+
			
 
				+## HTTP API Inference
			
 
				+
			
 
				+We provide a HTTP API for inference. You can use the following command to start the server:
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.api_server \
			
 
				+    --listen 0.0.0.0:8080 \
			
 
				+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
			
 
				+    --decoder-config-name modded_dac_vq
			
 
				+```
			
 
				+
			
 
				+> If you want to speed up inference, you can add the `--compile` parameter.
			
 
				+
			
 
				+After that, you can view and test the API at http://127.0.0.1:8080/.
			
 
				+
			
 
				+## GUI Inference 
			
 
				+[Download client](https://github.com/AnyaCoder/fish-speech-gui/releases)
			
 
				+
			
 
				+## WebUI Inference
			
 
				+
			
 
				+You can start the WebUI using the following command:
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.run_webui \
			
 
				+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
			
 
				+    --decoder-config-name modded_dac_vq
			
 
				+```
			
 
				+
			
 
				+Or simply
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.run_webui
			
 
				+```
			
 
				+> If you want to speed up inference, you can add the `--compile` parameter.
			
 
				+
			
 
				+
			
 
				+!!! note
			
 
				+    You can save the label file and reference audio file in advance to the `references` folder in the main directory (which you need to create yourself), so that you can directly call them in the WebUI.
			
 
				+
			
 
				+!!! note
			
 
				+    You can use Gradio environment variables, such as `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` to configure WebUI.
			
 
				+
			
 
				+Enjoy!
			
--- a/docs/en/inference.md
+++ b/docs/en/inference.md
--- a/docs/ja/index.md
+++ b/docs/ja/index.md
@@ -0,0 +1,107 @@
 
				+# 推論
			
 
				+
			
 
				+ボコーダーモデルが変更されたため、以前よりも多くのVRAMが必要です。スムーズな推論には12GBを推奨します。
			
 
				+
			
 
				+推論には、コマンドライン、HTTP API、WebUIをサポートしており、お好きな方法を選択できます。
			
 
				+
			
 
				+## 重みのダウンロード
			
 
				+
			
 
				+まず、モデルの重みをダウンロードする必要があります：
			
 
				+
			
 
				+```bash
			
 
				+huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
			
 
				+```
			
 
				+
			
 
				+## コマンドライン推論
			
 
				+
			
 
				+!!! note
			
 
				+    モデルにランダムに音色を選択させる場合は、この手順をスキップできます。
			
 
				+
			
 
				+### 1. 参照音声からVQトークンを取得
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/dac/inference.py \
			
 
				+    -i "ref_audio_name.wav" \
			
 
				+    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
			
 
				+```
			
 
				+
			
 
				+`fake.npy` と `fake.wav` が得られるはずです。
			
 
				+
			
 
				+### 2. テキストからセマンティックトークンを生成：
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/text2semantic/inference.py \
			
 
				+    --text "変換したいテキスト" \
			
 
				+    --prompt-text "参照テキスト" \
			
 
				+    --prompt-tokens "fake.npy" \
			
 
				+    --checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --num-samples 2 \
			
 
				+    --compile # より高速化を求める場合
			
 
				+```
			
 
				+
			
 
				+このコマンドは、作業ディレクトリに `codes_N` ファイルを作成します（Nは0から始まる整数）。
			
 
				+
			
 
				+!!! note
			
 
				+    より高速な推論のために `--compile` を使用してCUDAカーネルを融合することができます（約30トークン/秒 -> 約500トークン/秒）。
			
 
				+    対応して、加速を使用しない場合は、`--compile` パラメータをコメントアウトできます。
			
 
				+
			
 
				+!!! info
			
 
				+    bf16をサポートしないGPUの場合、`--half` パラメータの使用が必要かもしれません。
			
 
				+
			
 
				+### 3. セマンティックトークンから音声を生成：
			
 
				+
			
 
				+#### VQGANデコーダー
			
 
				+
			
 
				+!!! warning "将来の警告"
			
 
				+    元のパス（tools/vqgan/inference.py）からアクセス可能なインターフェースを維持していますが、このインターフェースは後続のリリースで削除される可能性があるため、できるだけ早くコードを変更してください。
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/dac/inference.py \
			
 
				+    -i "codes_0.npy" \
			
 
				+    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
			
 
				+```
			
 
				+
			
 
				+## HTTP API推論
			
 
				+
			
 
				+推論用のHTTP APIを提供しています。以下のコマンドでサーバーを開始できます：
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.api_server \
			
 
				+    --listen 0.0.0.0:8080 \
			
 
				+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
			
 
				+    --decoder-config-name modded_dac_vq
			
 
				+```
			
 
				+
			
 
				+> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
			
 
				+
			
 
				+その後、http://127.0.0.1:8080/ でAPIを表示・テストできます。
			
 
				+
			
 
				+## GUI推論 
			
 
				+[クライアントをダウンロード](https://github.com/AnyaCoder/fish-speech-gui/releases)
			
 
				+
			
 
				+## WebUI推論
			
 
				+
			
 
				+以下のコマンドでWebUIを開始できます：
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.run_webui \
			
 
				+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
			
 
				+    --decoder-config-name modded_dac_vq
			
 
				+```
			
 
				+
			
 
				+または単純に
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.run_webui
			
 
				+```
			
 
				+> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
			
 
				+
			
 
				+!!! note
			
 
				+    ラベルファイルと参照音声ファイルをメインディレクトリの `references` フォルダに事前に保存することができます（自分で作成する必要があります）。これにより、WebUIで直接呼び出すことができます。
			
 
				+
			
 
				+!!! note
			
 
				+    `GRADIO_SHARE`、`GRADIO_SERVER_PORT`、`GRADIO_SERVER_NAME` などのGradio環境変数を使用してWebUIを設定できます。
			
 
				+
			
 
				+お楽しみください！
			
--- a/docs/ja/inference.md
+++ b/docs/ja/inference.md
--- a/docs/ko/index.md
+++ b/docs/ko/index.md
@@ -0,0 +1,107 @@
 
				+# 추론
			
 
				+
			
 
				+보코더 모델이 변경되어 이전보다 더 많은 VRAM이 필요하며, 원활한 추론을 위해 12GB를 권장합니다.
			
 
				+
			
 
				+추론을 위해 명령줄, HTTP API, WebUI를 지원하며, 원하는 방법을 선택할 수 있습니다.
			
 
				+
			
 
				+## 가중치 다운로드
			
 
				+
			
 
				+먼저 모델 가중치를 다운로드해야 합니다:
			
 
				+
			
 
				+```bash
			
 
				+huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
			
 
				+```
			
 
				+
			
 
				+## 명령줄 추론
			
 
				+
			
 
				+!!! note
			
 
				+    모델이 임의로 음색을 선택하도록 하려면 이 단계를 건너뛸 수 있습니다.
			
 
				+
			
 
				+### 1. 참조 오디오에서 VQ 토큰 얻기
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/dac/inference.py \
			
 
				+    -i "ref_audio_name.wav" \
			
 
				+    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
			
 
				+```
			
 
				+
			
 
				+`fake.npy`와 `fake.wav`를 얻을 수 있습니다.
			
 
				+
			
 
				+### 2. 텍스트에서 의미 토큰 생성:
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/text2semantic/inference.py \
			
 
				+    --text "변환하고 싶은 텍스트" \
			
 
				+    --prompt-text "참조 텍스트" \
			
 
				+    --prompt-tokens "fake.npy" \
			
 
				+    --checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --num-samples 2 \
			
 
				+    --compile # 더 빠른 속도를 원한다면
			
 
				+```
			
 
				+
			
 
				+이 명령은 작업 디렉토리에 `codes_N` 파일을 생성합니다. 여기서 N은 0부터 시작하는 정수입니다.
			
 
				+
			
 
				+!!! note
			
 
				+    더 빠른 추론을 위해 `--compile`을 사용하여 CUDA 커널을 융합할 수 있습니다(약 30 토큰/초 -> 약 500 토큰/초).
			
 
				+    이에 따라 가속을 사용하지 않으려면 `--compile` 매개변수를 주석 처리할 수 있습니다.
			
 
				+
			
 
				+!!! info
			
 
				+    bf16을 지원하지 않는 GPU의 경우 `--half` 매개변수를 사용해야 할 수 있습니다.
			
 
				+
			
 
				+### 3. 의미 토큰에서 음성 생성:
			
 
				+
			
 
				+#### VQGAN 디코더
			
 
				+
			
 
				+!!! warning "향후 경고"
			
 
				+    원래 경로(tools/vqgan/inference.py)에서 액세스 가능한 인터페이스를 유지하고 있지만, 이 인터페이스는 향후 릴리스에서 제거될 수 있으므로 가능한 한 빨리 코드를 변경해 주세요.
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/dac/inference.py \
			
 
				+    -i "codes_0.npy" \
			
 
				+    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
			
 
				+```
			
 
				+
			
 
				+## HTTP API 추론
			
 
				+
			
 
				+추론을 위한 HTTP API를 제공합니다. 다음 명령으로 서버를 시작할 수 있습니다:
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.api_server \
			
 
				+    --listen 0.0.0.0:8080 \
			
 
				+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
			
 
				+    --decoder-config-name modded_dac_vq
			
 
				+```
			
 
				+
			
 
				+> 추론을 가속화하려면 `--compile` 매개변수를 추가할 수 있습니다.
			
 
				+
			
 
				+그 후 http://127.0.0.1:8080/ 에서 API를 보고 테스트할 수 있습니다.
			
 
				+
			
 
				+## GUI 추론 
			
 
				+[클라이언트 다운로드](https://github.com/AnyaCoder/fish-speech-gui/releases)
			
 
				+
			
 
				+## WebUI 추론
			
 
				+
			
 
				+다음 명령으로 WebUI를 시작할 수 있습니다:
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.run_webui \
			
 
				+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
			
 
				+    --decoder-config-name modded_dac_vq
			
 
				+```
			
 
				+
			
 
				+또는 간단히
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.run_webui
			
 
				+```
			
 
				+> 추론을 가속화하려면 `--compile` 매개변수를 추가할 수 있습니다.
			
 
				+
			
 
				+!!! note
			
 
				+    라벨 파일과 참조 오디오 파일을 메인 디렉토리의 `references` 폴더에 미리 저장할 수 있습니다(직접 생성해야 함). 이렇게 하면 WebUI에서 직접 호출할 수 있습니다.
			
 
				+
			
 
				+!!! note
			
 
				+    `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME`과 같은 Gradio 환경 변수를 사용하여 WebUI를 구성할 수 있습니다.
			
 
				+
			
 
				+즐기세요!
			
--- a/docs/ko/inference.md
+++ b/docs/ko/inference.md
--- a/docs/pt/index.md
+++ b/docs/pt/index.md
@@ -0,0 +1,107 @@
 
				+# Inferência
			
 
				+
			
 
				+Como o modelo vocoder foi alterado, você precisa de mais VRAM do que antes, sendo recomendado 12GB para inferência fluente.
			
 
				+
			
 
				+Suportamos linha de comando, API HTTP e WebUI para inferência, você pode escolher qualquer método que preferir.
			
 
				+
			
 
				+## Baixar Pesos
			
 
				+
			
 
				+Primeiro você precisa baixar os pesos do modelo:
			
 
				+
			
 
				+```bash
			
 
				+huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
			
 
				+```
			
 
				+
			
 
				+## Inferência por Linha de Comando
			
 
				+
			
 
				+!!! note
			
 
				+    Se você planeja deixar o modelo escolher aleatoriamente um timbre de voz, pode pular esta etapa.
			
 
				+
			
 
				+### 1. Obter tokens VQ do áudio de referência
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/dac/inference.py \
			
 
				+    -i "ref_audio_name.wav" \
			
 
				+    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
			
 
				+```
			
 
				+
			
 
				+Você deve obter um `fake.npy` e um `fake.wav`.
			
 
				+
			
 
				+### 2. Gerar tokens semânticos do texto:
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/text2semantic/inference.py \
			
 
				+    --text "O texto que você quer converter" \
			
 
				+    --prompt-text "Seu texto de referência" \
			
 
				+    --prompt-tokens "fake.npy" \
			
 
				+    --checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --num-samples 2 \
			
 
				+    --compile # se você quiser uma velocidade mais rápida
			
 
				+```
			
 
				+
			
 
				+Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é um inteiro começando de 0.
			
 
				+
			
 
				+!!! note
			
 
				+    Você pode querer usar `--compile` para fundir kernels CUDA para inferência mais rápida (~30 tokens/segundo -> ~500 tokens/segundo).
			
 
				+    Correspondentemente, se você não planeja usar aceleração, pode comentar o parâmetro `--compile`.
			
 
				+
			
 
				+!!! info
			
 
				+    Para GPUs que não suportam bf16, você pode precisar usar o parâmetro `--half`.
			
 
				+
			
 
				+### 3. Gerar vocais a partir de tokens semânticos:
			
 
				+
			
 
				+#### Decodificador VQGAN
			
 
				+
			
 
				+!!! warning "Aviso Futuro"
			
 
				+    Mantivemos a interface acessível do caminho original (tools/vqgan/inference.py), mas esta interface pode ser removida em versões subsequentes, então por favor altere seu código o mais breve possível.
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/dac/inference.py \
			
 
				+    -i "codes_0.npy" \
			
 
				+    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
			
 
				+```
			
 
				+
			
 
				+## Inferência com API HTTP
			
 
				+
			
 
				+Fornecemos uma API HTTP para inferência. Você pode usar o seguinte comando para iniciar o servidor:
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.api_server \
			
 
				+    --listen 0.0.0.0:8080 \
			
 
				+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
			
 
				+    --decoder-config-name modded_dac_vq
			
 
				+```
			
 
				+
			
 
				+> Se você quiser acelerar a inferência, pode adicionar o parâmetro `--compile`.
			
 
				+
			
 
				+Depois disso, você pode visualizar e testar a API em http://127.0.0.1:8080/.
			
 
				+
			
 
				+## Inferência GUI 
			
 
				+[Baixar cliente](https://github.com/AnyaCoder/fish-speech-gui/releases)
			
 
				+
			
 
				+## Inferência WebUI
			
 
				+
			
 
				+Você pode iniciar o WebUI usando o seguinte comando:
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.run_webui \
			
 
				+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
			
 
				+    --decoder-config-name modded_dac_vq
			
 
				+```
			
 
				+
			
 
				+Ou simplesmente
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.run_webui
			
 
				+```
			
 
				+> Se você quiser acelerar a inferência, pode adicionar o parâmetro `--compile`.
			
 
				+
			
 
				+!!! note
			
 
				+    Você pode salvar o arquivo de rótulo e o arquivo de áudio de referência antecipadamente na pasta `references` no diretório principal (que você precisa criar), para que possa chamá-los diretamente no WebUI.
			
 
				+
			
 
				+!!! note
			
 
				+    Você pode usar variáveis de ambiente do Gradio, como `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` para configurar o WebUI.
			
 
				+
			
 
				+Divirta-se!
			
--- a/docs/pt/inference.md
+++ b/docs/pt/inference.md
--- a/docs/zh/index.md
+++ b/docs/zh/index.md
@@ -0,0 +1,107 @@
 
				+# 推理
			
 
				+
			
 
				+由于声码器模型已更改，您需要比以前更多的显存，建议使用12GB显存以便流畅推理。
			
 
				+
			
 
				+我们支持命令行、HTTP API 和 WebUI 进行推理，您可以选择任何您喜欢的方法。
			
 
				+
			
 
				+## 下载权重
			
 
				+
			
 
				+首先您需要下载模型权重：
			
 
				+
			
 
				+```bash
			
 
				+huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
			
 
				+```
			
 
				+
			
 
				+## 命令行推理
			
 
				+
			
 
				+!!! note
			
 
				+    如果您计划让模型随机选择音色，可以跳过此步骤。
			
 
				+
			
 
				+### 1. 从参考音频获取VQ tokens
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/dac/inference.py \
			
 
				+    -i "ref_audio_name.wav" \
			
 
				+    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
			
 
				+```
			
 
				+
			
 
				+您应该会得到一个 `fake.npy` 和一个 `fake.wav`。
			
 
				+
			
 
				+### 2. 从文本生成语义tokens：
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/text2semantic/inference.py \
			
 
				+    --text "您想要转换的文本" \
			
 
				+    --prompt-text "您的参考文本" \
			
 
				+    --prompt-tokens "fake.npy" \
			
 
				+    --checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --num-samples 2 \
			
 
				+    --compile # 如果您想要更快的速度
			
 
				+```
			
 
				+
			
 
				+此命令将在工作目录中创建一个 `codes_N` 文件，其中N是从0开始的整数。
			
 
				+
			
 
				+!!! note
			
 
				+    您可能想要使用 `--compile` 来融合CUDA内核以获得更快的推理速度（约30 tokens/秒 -> 约500 tokens/秒）。
			
 
				+    相应地，如果您不打算使用加速，可以删除 `--compile` 参数的注释。
			
 
				+
			
 
				+!!! info
			
 
				+    对于不支持bf16的GPU，您可能需要使用 `--half` 参数。
			
 
				+
			
 
				+### 3. 从语义tokens生成人声：
			
 
				+
			
 
				+#### VQGAN 解码器
			
 
				+
			
 
				+!!! warning "未来警告"
			
 
				+    我们保留了从原始路径（tools/vqgan/inference.py）访问的接口，但此接口可能在后续版本中被移除，请尽快更改您的代码。
			
 
				+
			
 
				+```bash
			
 
				+python fish_speech/models/dac/inference.py \
			
 
				+    -i "codes_0.npy" \
			
 
				+    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
			
 
				+```
			
 
				+
			
 
				+## HTTP API 推理
			
 
				+
			
 
				+我们提供HTTP API进行推理。您可以使用以下命令启动服务器：
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.api_server \
			
 
				+    --listen 0.0.0.0:8080 \
			
 
				+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
			
 
				+    --decoder-config-name modded_dac_vq
			
 
				+```
			
 
				+
			
 
				+> 如果您想要加速推理，可以添加 `--compile` 参数。
			
 
				+
			
 
				+之后，您可以在 http://127.0.0.1:8080/ 查看和测试API。
			
 
				+
			
 
				+## GUI 推理 
			
 
				+[下载客户端](https://github.com/AnyaCoder/fish-speech-gui/releases)
			
 
				+
			
 
				+## WebUI 推理
			
 
				+
			
 
				+您可以使用以下命令启动WebUI：
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.run_webui \
			
 
				+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
			
 
				+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
			
 
				+    --decoder-config-name modded_dac_vq
			
 
				+```
			
 
				+
			
 
				+或者简单地
			
 
				+
			
 
				+```bash
			
 
				+python -m tools.run_webui
			
 
				+```
			
 
				+> 如果您想要加速推理，可以添加 `--compile` 参数。
			
 
				+
			
 
				+!!! note
			
 
				+    您可以提前将标签文件和参考音频文件保存到主目录的 `references` 文件夹中（需要自己创建），这样就可以在WebUI中直接调用它们。
			
 
				+
			
 
				+!!! note
			
 
				+    您可以使用Gradio环境变量，如 `GRADIO_SHARE`、`GRADIO_SERVER_PORT`、`GRADIO_SERVER_NAME` 来配置WebUI。
			
 
				+
			
 
				+尽情享受吧！
			
--- a/docs/zh/inference.md
+++ b/docs/zh/inference.md
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -56,11 +56,8 @@ theme:
 
				         code: Roboto Mono
			
 
				 
			
 
				 nav:
			
 
				-  - Introduction: index.md
			
 
				-  - Finetune: finetune.md
			
 
				-  - Inference: inference.md
			
 
				-  - Start Agent: start_agent.md
			
 
				-  - Samples: samples.md
			
 
				+  - Installation: en/install.md
			
 
				+  - Inference: en/inference.md
			
 
				 
			
 
				 # Plugins
			
 
				 plugins:
			
@@ -83,37 +80,26 @@ plugins:
 
				           name: 简体中文
			
 
				           build: true
			
 
				           nav:
			
 
				-            - 介绍: zh/index.md
			
 
				-            - 微调: zh/finetune.md
			
 
				+            - 安装: zh/install.md
			
 
				             - 推理: zh/inference.md
			
 
				-            - 启动Agent: zh/start_agent.md
			
 
				-            - 例子: zh/samples.md
			
 
				         - locale: ja
			
 
				           name: 日本語
			
 
				           build: true
			
 
				           nav:
			
 
				-            - Fish Speech の紹介: ja/index.md
			
 
				-            - 微調整: ja/finetune.md
			
 
				+            - インストール: ja/install.md
			
 
				             - 推論: ja/inference.md
			
 
				-            - スタートエージェント: ja/start_agent.md
			
 
				-            - サンプル: ja/samples.md
			
 
				         - locale: pt
			
 
				           name: Português (Brasil)
			
 
				           build: true
			
 
				           nav:
			
 
				-            - Introdução: pt/index.md
			
 
				-            - Ajuste Fino: pt/finetune.md
			
 
				+            - Instalação: pt/install.md
			
 
				             - Inferência: pt/inference.md
			
 
				-            - Agente inicial: pt/start_agent.md
			
 
				-            - Amostras: pt/samples.md
			
 
				         - locale: ko
			
 
				           name: 한국어
			
 
				           build: true
			
 
				           nav:
			
 
				-            - 소개: ko/index.md
			
 
				-            - 파인튜닝: ko/finetune.md
			
 
				+            - 설치: ko/install.md
			
 
				             - 추론: ko/inference.md
			
 
				-            - 샘플: ko/samples.md
			
 
				 
			
 
				 markdown_extensions:
			
 
				   - pymdownx.highlight:
			
--- a/tools/download_models.py
+++ b/tools/download_models.py
@@ -22,7 +22,7 @@ def check_and_download_files(repo_id, file_list, local_dir):
 
				 
			
 
				 
			
 
				 # 1st
			
 
				-repo_id_1 = "fishaudio/fish-speech-1.5"
			
 
				+repo_id_1 = "fishaudio/openaudio-s1-mini"
			
 
				 local_dir_1 = "./checkpoints/openaudio-s1-mini"
			
 
				 files_1 = [
			
 
				     ".gitattributes",
			
@@ -31,7 +31,7 @@ files_1 = [
 
				     "special_tokens.json",
			
 
				     "tokenizer.tiktoken",
			
 
				     "config.json",
			
 
				-    "firefly-gan-vq-fsq-8x1024-21hz-generator.pth",
			
 
				+    "codec.pth",
			
 
				 ]
			
 
				 
			
 
				 # 3rd
			
--- a/tools/llama/build_dataset.py
+++ b/tools/llama/build_dataset.py
@@ -1,169 +0,0 @@
 
				-import itertools
			
 
				-import os
			
 
				-import re
			
 
				-from collections import defaultdict
			
 
				-from functools import partial
			
 
				-from multiprocessing import Pool
			
 
				-from pathlib import Path
			
 
				-
			
 
				-import click
			
 
				-import numpy as np
			
 
				-from loguru import logger
			
 
				-from tqdm import tqdm
			
 
				-
			
 
				-from fish_speech.datasets.protos.text_data_pb2 import Semantics, Sentence, TextData
			
 
				-from fish_speech.datasets.protos.text_data_stream import pack_pb_stream
			
 
				-from fish_speech.utils.file import load_filelist
			
 
				-
			
 
				-# To avoid CPU overload
			
 
				-os.environ["MKL_NUM_THREADS"] = "1"
			
 
				-os.environ["OMP_NUM_THREADS"] = "1"
			
 
				-
			
 
				-
			
 
				-def task_generator_folder(root: Path, text_extension: str):
			
 
				-    files = list(tqdm(Path(root).rglob("*.npy"), desc=f"Loading {root}"))
			
 
				-    files = sorted(files)
			
 
				-
			
 
				-    grouped_files = defaultdict(list)
			
 
				-    for file in tqdm(files, desc=f"Grouping {root}"):
			
 
				-        p = str(file.parent)
			
 
				-        speaker = file.parent.name
			
 
				-
			
 
				-        try:
			
 
				-            if isinstance(text_extension, str):
			
 
				-                texts = [file.with_suffix(text_extension).read_text(encoding="utf-8")]
			
 
				-            else:
			
 
				-                texts = [
			
 
				-                    file.with_suffix(ext).read_text(encoding="utf-8")
			
 
				-                    for ext in text_extension
			
 
				-                ]
			
 
				-        except Exception as e:
			
 
				-            logger.error(f"Failed to read text {file}: {e}")
			
 
				-            continue
			
 
				-
			
 
				-        grouped_files[p].append((speaker, file, texts))
			
 
				-
			
 
				-    logger.info(
			
 
				-        f"Found {len(grouped_files)} groups in {root}, {list(grouped_files.keys())[:5]}..."
			
 
				-    )
			
 
				-
			
 
				-    for i in grouped_files.values():
			
 
				-        subset = [(f, t) for _, f, t in i]
			
 
				-        yield i[0][0], subset, "folder"
			
 
				-
			
 
				-
			
 
				-def task_generator_filelist(filelist):
			
 
				-    grouped_files = defaultdict(list)
			
 
				-    for filename, speaker, _, text in load_filelist(filelist):
			
 
				-        grouped_files[speaker].append((Path(filename), [text]))
			
 
				-
			
 
				-    logger.info(f"Found {len(grouped_files)} groups in {filelist}")
			
 
				-    for speaker, values in grouped_files.items():
			
 
				-        yield speaker, values, "filelist"
			
 
				-
			
 
				-
			
 
				-def run_task(task):
			
 
				-    name, subset, source = task
			
 
				-
			
 
				-    # Parse the files
			
 
				-    sentences = []
			
 
				-    for file, texts in subset:
			
 
				-        np_file = file.with_suffix(".npy")
			
 
				-        if np_file.exists() is False:
			
 
				-            logger.warning(f"Can't find {np_file}")
			
 
				-            continue
			
 
				-
			
 
				-        new_texts = []
			
 
				-
			
 
				-        for text in texts:
			
 
				-            # Simple cleaning: replace { xxx } and < xxx > with space
			
 
				-            text = re.sub(r"\{.*?\}", " ", text)
			
 
				-            text = re.sub(r"<.*?>", " ", text)
			
 
				-            text = re.sub(r"\s+", " ", text)
			
 
				-            new_texts.append(text)
			
 
				-
			
 
				-        try:
			
 
				-            semantics = np.load(np_file)
			
 
				-        except Exception as e:
			
 
				-            logger.error(f"Failed to parse {file}: {e}")
			
 
				-            continue
			
 
				-
			
 
				-        if isinstance(semantics, np.ndarray):
			
 
				-            semantics = semantics.tolist()
			
 
				-
			
 
				-        sentences.append(
			
 
				-            Sentence(
			
 
				-                texts=new_texts,
			
 
				-                semantics=[Semantics(values=s) for s in semantics],
			
 
				-            )
			
 
				-        )
			
 
				-
			
 
				-    # Pack the sentences
			
 
				-    return pack_pb_stream(
			
 
				-        TextData(
			
 
				-            source=source,
			
 
				-            name=name,
			
 
				-            sentences=sentences,
			
 
				-        )
			
 
				-    )
			
 
				-
			
 
				-
			
 
				-@click.command()
			
 
				-@click.option(
			
 
				-    "--input",
			
 
				-    type=click.Path(path_type=Path),
			
 
				-    required=True,
			
 
				-    help="A folder containing the dataset or a filelist",
			
 
				-    multiple=True,
			
 
				-)
			
 
				-@click.option(
			
 
				-    "--output", type=click.Path(path_type=Path), default="data/quantized-dataset-ft"
			
 
				-)
			
 
				-@click.option("--num-workers", type=int, default=16)
			
 
				-@click.option("--text-extension", type=str, default=[".txt"], multiple=True)
			
 
				-@click.option(
			
 
				-    "--shard-size", type=int, default=10, help="The maximum size of each shard in mb"
			
 
				-)
			
 
				-def main(input, output, num_workers, text_extension, shard_size):
			
 
				-    generator_fns = []
			
 
				-
			
 
				-    for f in input:
			
 
				-        assert f.exists(), f"{f} not found"
			
 
				-
			
 
				-        if f.is_dir():
			
 
				-            generator_fn = task_generator_folder(f, text_extension)
			
 
				-        else:
			
 
				-            generator_fn = task_generator_filelist(f)
			
 
				-
			
 
				-        generator_fns.append(generator_fn)
			
 
				-
			
 
				-    generator_fn = itertools.chain(*generator_fns)
			
 
				-    output.mkdir(parents=True, exist_ok=True)
			
 
				-
			
 
				-    dataset_fp = None
			
 
				-    tar_idx = 0
			
 
				-    written_size = 0
			
 
				-
			
 
				-    with Pool(num_workers) as p:
			
 
				-        for result in tqdm(p.imap_unordered(run_task, generator_fn)):
			
 
				-            if dataset_fp is None:
			
 
				-                dataset_fp = open(Path(output) / f"{tar_idx:08d}.protos", "wb")
			
 
				-
			
 
				-            dataset_fp.write(result)
			
 
				-            written_size += len(result)
			
 
				-
			
 
				-            if written_size > shard_size * 1024 * 1024:
			
 
				-                logger.info(f"Finished writing {tar_idx} shards to {output}")
			
 
				-                dataset_fp.close()
			
 
				-                dataset_fp = None
			
 
				-                written_size = 0
			
 
				-                tar_idx += 1
			
 
				-
			
 
				-    if dataset_fp is not None:
			
 
				-        dataset_fp.close()
			
 
				-
			
 
				-    logger.info(f"Finished writing {tar_idx + 1} shards to {output}")
			
 
				-
			
 
				-
			
 
				-if __name__ == "__main__":
			
 
				-    main()
			
--- a/tools/llama/eval_in_context.py
+++ b/tools/llama/eval_in_context.py
@@ -1,171 +0,0 @@
 
				-import pyrootutils
			
 
				-import torch
			
 
				-import torch.nn.functional as F
			
 
				-from matplotlib import pyplot as plt
			
 
				-from transformers import AutoTokenizer
			
 
				-
			
 
				-# register eval resolver and root
			
 
				-pyrootutils.setup_root(__file__, indicator=".project-root", pythonpath=True)
			
 
				-
			
 
				-from torch.utils.data import DataLoader
			
 
				-
			
 
				-from fish_speech.datasets.semantic import AutoAugTextDataset, TextDataCollator
			
 
				-from fish_speech.models.text2semantic.inference import load_model
			
 
				-
			
 
				-
			
 
				-def smooth(
			
 
				-    scalars: list[float], weight: float
			
 
				-) -> list[float]:  # Weight between 0 and 1
			
 
				-    last = scalars[0]  # First value in the plot (first timestep)
			
 
				-    smoothed = list()
			
 
				-    for point in scalars:
			
 
				-        smoothed_val = last * weight + (1 - weight) * point  # Calculate smoothed value
			
 
				-        smoothed.append(smoothed_val)  # Save it
			
 
				-        last = smoothed_val  # Anchor the last smoothed value
			
 
				-
			
 
				-    return smoothed
			
 
				-
			
 
				-
			
 
				-@torch.inference_mode()
			
 
				-def analyze_one_model(loader, config, weight, max_length):
			
 
				-    device = "cuda" if torch.cuda.is_available() else "cpu"
			
 
				-    model = load_model(
			
 
				-        config,
			
 
				-        weight,
			
 
				-        device,
			
 
				-        torch.bfloat16,
			
 
				-        max_length,
			
 
				-        compile=False,
			
 
				-    )[0]
			
 
				-
			
 
				-    current_step = 0
			
 
				-    model.eval()
			
 
				-
			
 
				-    semantic_loss_sum = torch.zeros(
			
 
				-        max_length,
			
 
				-        dtype=torch.float32,
			
 
				-        device=device,
			
 
				-    )
			
 
				-    counter = torch.zeros(
			
 
				-        max_length,
			
 
				-        dtype=torch.long,
			
 
				-        device=device,
			
 
				-    )
			
 
				-
			
 
				-    for batch in loader:
			
 
				-        batch = {k: v.to(device) for k, v in batch.items()}
			
 
				-
			
 
				-        labels = batch["labels"]
			
 
				-        outputs = model(
			
 
				-            inp=batch["inputs"],
			
 
				-            key_padding_mask=batch["attention_masks"],
			
 
				-        )
			
 
				-
			
 
				-        token_logits = outputs.token_logits
			
 
				-        codebook_logits = outputs.codebook_logits
			
 
				-
			
 
				-        # Generate labels
			
 
				-        base_loss = F.cross_entropy(
			
 
				-            token_logits.reshape(-1, token_logits.size(-1)),
			
 
				-            labels[:, 0].reshape(-1),
			
 
				-            ignore_index=-100,
			
 
				-            reduction="none",
			
 
				-        )
			
 
				-
			
 
				-        codebook_labels = labels[:, 1 : 1 + model.config.num_codebooks].mT
			
 
				-        semantic_loss = F.cross_entropy(
			
 
				-            codebook_logits.reshape(-1, codebook_logits.size(-1)),
			
 
				-            codebook_labels.reshape(-1),
			
 
				-            ignore_index=-100,
			
 
				-            reduction="none",
			
 
				-        )
			
 
				-
			
 
				-        base_loss = base_loss.reshape(labels[:, 0].shape)
			
 
				-        semantic_loss = semantic_loss.reshape(codebook_labels.shape)
			
 
				-
			
 
				-        semantic_loss_frame = semantic_loss.mean(-1)
			
 
				-        pad_pos = codebook_labels.sum(-1) == -100 * model.config.num_codebooks
			
 
				-
			
 
				-        for loss_sample, pad in zip(semantic_loss_frame, pad_pos):
			
 
				-            semantic_loss_sum[~pad] += loss_sample[~pad]
			
 
				-            counter[~pad] += 1
			
 
				-
			
 
				-        current_step += 1
			
 
				-        if current_step == 10:
			
 
				-            break
			
 
				-
			
 
				-    semantic_loss = semantic_loss.cpu()
			
 
				-    counter = counter.cpu()
			
 
				-    xs, ys = [], []
			
 
				-
			
 
				-    for i, (loss, count) in enumerate(zip(semantic_loss_sum, counter)):
			
 
				-        if count > 0:
			
 
				-            xs.append(i)
			
 
				-            ys.append((loss / count).item())  # for better loss visualization
			
 
				-
			
 
				-    smoothed_ys = smooth(ys, 0.95)
			
 
				-
			
 
				-    # Unload model
			
 
				-    del model
			
 
				-    torch.cuda.empty_cache()
			
 
				-
			
 
				-    return xs, ys, smoothed_ys
			
 
				-
			
 
				-
			
 
				-def main():
			
 
				-    tokenizer = AutoTokenizer.from_pretrained("fishaudio/fish-speech-1")
			
 
				-    max_length = 4096
			
 
				-
			
 
				-    ds = AutoAugTextDataset(
			
 
				-        ["data/protos/sft/云天河"],
			
 
				-        tokenizer=tokenizer,
			
 
				-        use_speaker=False,
			
 
				-        interactive_prob=1.0,
			
 
				-        max_length=max_length,
			
 
				-    )
			
 
				-
			
 
				-    loader = DataLoader(
			
 
				-        ds,
			
 
				-        batch_size=8,
			
 
				-        collate_fn=TextDataCollator(tokenizer, max_length=max_length),
			
 
				-        num_workers=0,
			
 
				-        shuffle=False,
			
 
				-    )
			
 
				-
			
 
				-    plt.figure(figsize=(10, 5), dpi=200)
			
 
				-
			
 
				-    plt.xlabel("Frame")
			
 
				-    plt.ylabel("Loss")
			
 
				-    plt.yscale("log")
			
 
				-    plt.title("Semantic Loss")
			
 
				-    plt.grid(which="both", axis="both")
			
 
				-    plt.xlim(0, max_length)
			
 
				-
			
 
				-    tests = [
			
 
				-        (
			
 
				-            "pertrain-medium",
			
 
				-            "dual_ar_2_codebook_medium",
			
 
				-            "checkpoints/text2semantic-pretrain-medium-2k-v1.pth",
			
 
				-        ),
			
 
				-        (
			
 
				-            "sft-medium",
			
 
				-            "dual_ar_2_codebook_medium",
			
 
				-            "checkpoints/text2semantic-sft-medium-v1.1-4k.pth",
			
 
				-        ),
			
 
				-        (
			
 
				-            "sft-large",
			
 
				-            "dual_ar_2_codebook_large",
			
 
				-            "checkpoints/text2semantic-sft-large-v1.1-4k.pth",
			
 
				-        ),
			
 
				-    ]
			
 
				-
			
 
				-    for name, config, weight in tests:
			
 
				-        xs, _, smoothed_ys = analyze_one_model(loader, config, weight, max_length)
			
 
				-        plt.plot(xs, smoothed_ys, label=name)
			
 
				-
			
 
				-    plt.legend()
			
 
				-    plt.savefig("semantic_loss.png")
			
 
				-
			
 
				-
			
 
				-if __name__ == "__main__":
			
 
				-    main()
			
--- a/tools/llama/merge_lora.py
+++ b/tools/llama/merge_lora.py
@@ -1,96 +0,0 @@
 
				-import shutil
			
 
				-from copy import deepcopy
			
 
				-from pathlib import Path
			
 
				-
			
 
				-import click
			
 
				-import hydra
			
 
				-import torch
			
 
				-from hydra import compose, initialize
			
 
				-from hydra.utils import instantiate
			
 
				-from loguru import logger
			
 
				-
			
 
				-from fish_speech.models.text2semantic.llama import BaseTransformer
			
 
				-from fish_speech.models.text2semantic.lora import get_merged_state_dict
			
 
				-
			
 
				-
			
 
				-@click.command()
			
 
				-@click.option("--lora-config", type=str, default="r_8_alpha_16")
			
 
				-@click.option("--base-weight", type=str, default="checkpoints/fish-speech-1.4")
			
 
				-@click.option("--lora-weight", type=str, required=True)
			
 
				-@click.option("--output", type=str, required=True)
			
 
				-def merge(lora_config, base_weight, lora_weight, output):
			
 
				-    output = Path(output)
			
 
				-    logger.info(
			
 
				-        f"Merging {base_weight} and {lora_weight} into {output} with {lora_config}"
			
 
				-    )
			
 
				-
			
 
				-    with initialize(version_base="1.3", config_path="../../fish_speech/configs/lora"):
			
 
				-        cfg = compose(config_name=lora_config)
			
 
				-
			
 
				-    lora_config = instantiate(cfg)
			
 
				-    logger.info(f"Loaded lora model with config {lora_config}")
			
 
				-
			
 
				-    llama_model = BaseTransformer.from_pretrained(
			
 
				-        path=base_weight,
			
 
				-        load_weights=True,
			
 
				-        lora_config=lora_config,
			
 
				-    )
			
 
				-    logger.info(f"Loaded llama model")
			
 
				-
			
 
				-    llama_state_dict = llama_model.state_dict()
			
 
				-    llama_state_dict = {k: v for k, v in llama_state_dict.items() if "lora" not in k}
			
 
				-    llama_state_dict_copy = deepcopy(llama_state_dict)
			
 
				-    lora_state_dict = torch.load(lora_weight, map_location="cpu", weights_only=False)
			
 
				-
			
 
				-    if "state_dict" in llama_state_dict:
			
 
				-        llama_state_dict = llama_state_dict["state_dict"]
			
 
				-
			
 
				-    if "state_dict" in lora_state_dict:
			
 
				-        lora_state_dict = lora_state_dict["state_dict"]
			
 
				-
			
 
				-    # remove prefix model.
			
 
				-    if any(k.startswith("model.") for k in llama_state_dict.keys()):
			
 
				-        llama_state_dict = {
			
 
				-            k.replace("model.", ""): v
			
 
				-            for k, v in llama_state_dict.items()
			
 
				-            if k.startswith("model.")
			
 
				-        }
			
 
				-    if any(k.startswith("model.") for k in lora_state_dict.keys()):
			
 
				-        lora_state_dict = {
			
 
				-            k.replace("model.", ""): v
			
 
				-            for k, v in lora_state_dict.items()
			
 
				-            if k.startswith("model.")
			
 
				-        }
			
 
				-
			
 
				-    logger.info(f"Found {len(llama_state_dict)} keys in llama model")
			
 
				-    logger.info(f"Found {len(lora_state_dict)} keys in lora model")
			
 
				-
			
 
				-    merged_state_dict = llama_state_dict | lora_state_dict
			
 
				-    llama_model.load_state_dict(merged_state_dict, strict=True)
			
 
				-    logger.info(f"Merged model loaded")
			
 
				-
			
 
				-    # Trigger eval mode to merge lora
			
 
				-    llama_model.eval()
			
 
				-    llama_model.save_pretrained(output, drop_lora=True)
			
 
				-    logger.info(f"Saved merged model to {output}, validating")
			
 
				-
			
 
				-    new_state_dict = torch.load(output / "model.pth", map_location="cpu")
			
 
				-    original_keys = set(llama_state_dict_copy.keys())
			
 
				-
			
 
				-    tolerance = 1e-5
			
 
				-    for key in original_keys:
			
 
				-        diff_l1 = (new_state_dict[key] - llama_state_dict_copy[key]).abs().sum().item()
			
 
				-        if diff_l1 > tolerance:
			
 
				-            logger.info(f"Significant difference found in key: {key}")
			
 
				-            break
			
 
				-
			
 
				-    if diff_l1 <= tolerance:
			
 
				-        logger.warning(
			
 
				-            "Merged model seems identical to the original model. Further validation might be needed."
			
 
				-        )
			
 
				-    else:
			
 
				-        logger.info("Merged model is different from the original model, check passed")
			
 
				-
			
 
				-
			
 
				-if __name__ == "__main__":
			
 
				-    merge()