Enhancement of waveform reconstruction for variational autoencoder-based neural audio synthesis with pitch information and automatic music transcription

Seokjin Lee, Minhan Kim, Seunghyeon Shin, Daeho Lee, Inseon Jang, Wootaek Lim

Research output: Contribution to journalConference articlepeer-review

Abstract

In recent audio signal processing techniques, analysis and synthesis models based on deep generative models have been applied for various reasons, such as audio signal compression. Particularly, some recently developed structures such as vector-quantized variational autoencoders can compress speech signals. However, extending these techniques to compress audio and music signals is challenging. Recently, a realtime audio variational autoencoder (RAVE) method for high-quality audio waveform synthesis was developed. The RAVE method synthesizes audio waveforms better than conventional methods; however, it still encounters certain challenges, such as missing low-pitched notes or generating irrelevant pitches. Therefore, to be applied to audio reconstruction problems such as audio signal compression, the reconstruction performance should be improved. Thus, we propose an enhanced structure of RAVE based on a conditional variational autoencoder (CVAE) structure and automatic music transcription model to improve the reconstruction performance of music signal waveforms.

Original languageEnglish
JournalProceedings of the International Congress on Acoustics
StatePublished - 2022
Event24th International Congress on Acoustics, ICA 2022 - Gyeongju, Korea, Republic of
Duration: 24 Oct 202228 Oct 2022

Keywords

  • Audio Synthesis
  • Generation Model
  • Variational Autoencoder

Fingerprint

Dive into the research topics of 'Enhancement of waveform reconstruction for variational autoencoder-based neural audio synthesis with pitch information and automatic music transcription'. Together they form a unique fingerprint.

Cite this