TY - JOUR
T1 - Enhancement of waveform reconstruction for variational autoencoder-based neural audio synthesis with pitch information and automatic music transcription
AU - Lee, Seokjin
AU - Kim, Minhan
AU - Shin, Seunghyeon
AU - Lee, Daeho
AU - Jang, Inseon
AU - Lim, Wootaek
N1 - Publisher Copyright:
© 2022 Proceedings of the International Congress on Acoustics. All rights reserved.
PY - 2022
Y1 - 2022
N2 - In recent audio signal processing techniques, analysis and synthesis models based on deep generative models have been applied for various reasons, such as audio signal compression. Particularly, some recently developed structures such as vector-quantized variational autoencoders can compress speech signals. However, extending these techniques to compress audio and music signals is challenging. Recently, a realtime audio variational autoencoder (RAVE) method for high-quality audio waveform synthesis was developed. The RAVE method synthesizes audio waveforms better than conventional methods; however, it still encounters certain challenges, such as missing low-pitched notes or generating irrelevant pitches. Therefore, to be applied to audio reconstruction problems such as audio signal compression, the reconstruction performance should be improved. Thus, we propose an enhanced structure of RAVE based on a conditional variational autoencoder (CVAE) structure and automatic music transcription model to improve the reconstruction performance of music signal waveforms.
AB - In recent audio signal processing techniques, analysis and synthesis models based on deep generative models have been applied for various reasons, such as audio signal compression. Particularly, some recently developed structures such as vector-quantized variational autoencoders can compress speech signals. However, extending these techniques to compress audio and music signals is challenging. Recently, a realtime audio variational autoencoder (RAVE) method for high-quality audio waveform synthesis was developed. The RAVE method synthesizes audio waveforms better than conventional methods; however, it still encounters certain challenges, such as missing low-pitched notes or generating irrelevant pitches. Therefore, to be applied to audio reconstruction problems such as audio signal compression, the reconstruction performance should be improved. Thus, we propose an enhanced structure of RAVE based on a conditional variational autoencoder (CVAE) structure and automatic music transcription model to improve the reconstruction performance of music signal waveforms.
KW - Audio Synthesis
KW - Generation Model
KW - Variational Autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85192519100&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85192519100
SN - 2226-7808
JO - Proceedings of the International Congress on Acoustics
JF - Proceedings of the International Congress on Acoustics
T2 - 24th International Congress on Acoustics, ICA 2022
Y2 - 24 October 2022 through 28 October 2022
ER -