Whisper to Normal Conversion SAMPLES

Contato para a senha: cezaryamamura@gmail.com

Author: Cezar Fumio Yamamura
Datasets: DynaVoicer
Note: Based on PhD thesis (text in progress)


CAP 5 - Study of Voice Conversion Models Applied to Whispered Speech

WAV2VEC-VC: github: https://github.com/prairie-schooner/wav2vec-vc | artigo: https://ieeexplore.ieee.org/document/10447984
LVC-VC: github: https://github.com/wonjune-kang/lvc-vc | artigo: https://arxiv.org/abs/2205.09784
TRIAAN-VC: github: https://github.com/winddori2002/TriAAN-VC | artigo: https://arxiv.org/abs/2303.09057
KNN-VC: github: https://github.com/bshall/knn-vc | artigo: https://arxiv.org/abs/2305.18975

Source Speaker Target Speaker Wav2Vec-VC LVC-VC TRIAAN-VC KNN-VC

CAP 7.1 - KNN-VC Using Other SSL Feature Extractors

Bseline used in KNN-VC paper is WavLM Large
Other SSL models tested: Data2Vec, Hubert and Wav2Vec2
WavLM: huggingface: https://huggingface.co/microsoft/wavlm-large | artigo: https://arxiv.org/pdf/2110.13900
Data2Vec: huggingface: https://huggingface.co/facebook/data2vec-audio-large-960h | artigo: https://arxiv.org/abs/2202.03555
Hubert: huggingface: https://huggingface.co/facebook/hubert-large-ls960-ft | artigo: https://arxiv.org/pdf/2106.07447
Wav2Vec: huggingface: https://huggingface.co/facebook/wav2vec2-large-960h | artigo: https://arxiv.org/abs/2006.11477

Source Speaker Target Speaker WavLM Data2Vec Hubert Wav2Vec2

CAP 7.2 ~ 7.4 - IMPROVING KNN-VC

Works inspired
7.2 - LoRA: huggingface: https://huggingface.co/docs/peft/main/conceptual_guides/lora | artigo: https://arxiv.org/abs/2106.09685
7.3 - wKNN-VC: artigo: https://ieeexplore.ieee.org/document/10800247/
7.4 - MLP-WavLM-VC: Created by author

Source Speaker Target Speaker KNN-VC KNN-VC (LoRA) wKNN-VC MLP-WavLM-VC