Explorar el Código

Fix utf-8 encoding

Lengyue hace 2 años
padre
commit
1854897e3c
Se han modificado 1 ficheros con 1 adiciones y 1 borrados
  1. 1 1
      preparing_data/whisper_asr.py

+ 1 - 1
preparing_data/whisper_asr.py

@@ -142,7 +142,7 @@ def main(folder: str, rank: int, world_size: int, num_workers: int):
 
         # Write to file
         for file, transcription in zip(batch, trascriptions):
-            Path(file).with_suffix(".whisper.txt").write_text(transcription)
+            Path(file).with_suffix(".whisper.txt").write_text(transcription, encoding="utf-8")
 
     logger.info(
         f"{RANK_STR}Finished processing {len(files)} files, {total_time / 3600:.2f} hours of audio"