Contato para a senha: cezaryamamura@gmail.com
Author: Cezar Fumio Yamamura
Datasets: DynaVoicer
Note: Based on PhD thesis (text in progress)
WAV2VEC-VC: github: https://github.com/prairie-schooner/wav2vec-vc |
artigo: https://ieeexplore.ieee.org/document/10447984
LVC-VC: github: https://github.com/wonjune-kang/lvc-vc |
artigo: https://arxiv.org/abs/2205.09784
TRIAAN-VC: github: https://github.com/winddori2002/TriAAN-VC |
artigo: https://arxiv.org/abs/2303.09057
KNN-VC: github: https://github.com/bshall/knn-vc |
artigo: https://arxiv.org/abs/2305.18975
Source Speaker | Target Speaker | Wav2Vec-VC | LVC-VC | TRIAAN-VC | KNN-VC |
---|---|---|---|---|---|
Bseline used in KNN-VC paper is WavLM Large
Other SSL models tested: Data2Vec, Hubert and Wav2Vec2
WavLM: huggingface: https://huggingface.co/microsoft/wavlm-large |
artigo: https://arxiv.org/pdf/2110.13900
Data2Vec: huggingface: https://huggingface.co/facebook/data2vec-audio-large-960h |
artigo: https://arxiv.org/abs/2202.03555
Hubert: huggingface: https://huggingface.co/facebook/hubert-large-ls960-ft |
artigo: https://arxiv.org/pdf/2106.07447
Wav2Vec: huggingface: https://huggingface.co/facebook/wav2vec2-large-960h |
artigo: https://arxiv.org/abs/2006.11477
Source Speaker | Target Speaker | WavLM | Data2Vec | Hubert | Wav2Vec2 |
---|---|---|---|---|---|
Works inspired
7.2 - LoRA: huggingface: https://huggingface.co/docs/peft/main/conceptual_guides/lora |
artigo: https://arxiv.org/abs/2106.09685
7.3 - wKNN-VC: artigo: https://ieeexplore.ieee.org/document/10800247/
7.4 - MLP-WavLM-VC: Created by author
Source Speaker | Target Speaker | KNN-VC | KNN-VC (LoRA) | wKNN-VC | MLP-WavLM-VC |
---|---|---|---|---|---|