벡터 양자화 변분 오토인코더 기반의 폴리 음향 생성 모델을 위한 잔여 벡터 양자화 적용 연구

Translated title of the contribution: A study on the application of residual vector quantization for vector quantized-variational autoencoder-based foley sound generation model

Research output: Contribution to journalArticlepeer-review

Abstract

Among the Foley sound generation models that have recently begun to be studied, a sound generation technique using the Vector Quantized-Variational AutoEncoder (VQ-VAE) structure and generation model such as Pixelsnail are one of the important research subjects. On the other hand, in the field of deep learning-based acoustic signal compression, residual vector quantization technology is reported to be more suitable than the conventional VQ-VAE structure. Therefore, in this paper, we aim to study whether residual vector quantization technology can be effectively applied to the Foley sound generation. In order to tackle the problem, this paper applies the residual vector quantization technique to the conventional VQ-VAE-based Foley sound generation model, and in particular, derives a model that is compatible with the existing models such as Pixelsnail and does not increase computational resource consumption. In order to evaluate the model, an experiment was conducted using DCASE2023 Task7 data. The results show that the proposed model enhances about 0.3 of the Fréchet audio distance. Unfortunately, the performance enhancement was limited, which is believed to be due to the decrease in the resolution of time-frequency domains in order to do not increase consumption of the computational resources.

Translated title of the contributionA study on the application of residual vector quantization for vector quantized-variational autoencoder-based foley sound generation model
Original languageKorean
Pages (from-to)243-252
Number of pages10
JournalJournal of the Acoustical Society of Korea
Volume43
Issue number2
DOIs
StatePublished - 2024

Keywords

  • Foley sound generation model
  • Generative Model
  • Residual vector quantization
  • Vector Quantized-Variational AutoEncoder (VQ-VAE)

Fingerprint

Dive into the research topics of 'A study on the application of residual vector quantization for vector quantized-variational autoencoder-based foley sound generation model'. Together they form a unique fingerprint.

Cite this