Abstract
In recent audio signal processing techniques, analysis and synthesis models based on deep generative models have been applied for various reasons, such as audio signal compression. Particularly, some recently developed structures such as vector-quantized variational autoencoders can compress speech signals. However, extending these techniques to compress audio and music signals is challenging. Recently, a realtime audio variational autoencoder (RAVE) method for high-quality audio waveform synthesis was developed. The RAVE method synthesizes audio waveforms better than conventional methods; however, it still encounters certain challenges, such as missing low-pitched notes or generating irrelevant pitches. Therefore, to be applied to audio reconstruction problems such as audio signal compression, the reconstruction performance should be improved. Thus, we propose an enhanced structure of RAVE based on a conditional variational autoencoder (CVAE) structure and automatic music transcription model to improve the reconstruction performance of music signal waveforms.
| Original language | English |
|---|---|
| Journal | Proceedings of the International Congress on Acoustics |
| State | Published - 2022 |
| Event | 24th International Congress on Acoustics, ICA 2022 - Gyeongju, Korea, Republic of Duration: 24 Oct 2022 → 28 Oct 2022 |
Keywords
- Audio Synthesis
- Generation Model
- Variational Autoencoder
Fingerprint
Dive into the research topics of 'Enhancement of waveform reconstruction for variational autoencoder-based neural audio synthesis with pitch information and automatic music transcription'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver