A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis

Ganesh Kumar P, Arul Antran Vijay S, Jothi Prakash V, Anand Paul, Anand Nayyar

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the design. The GradCAM visualization technique is applied to visualize the aligned input image-text pairs learned by the proposed CS-MDF model.

Original languageEnglish
Pages (from-to)54249-54278
Number of pages30
JournalMultimedia Tools and Applications
Volume83
Issue number18
DOIs
StatePublished - May 2024

Keywords

  • Deep learning
  • Gated recurrent unit
  • Information retrieval
  • Multimedia analysis
  • Multimodal sentiment analysis
  • Sentiment analysis

Fingerprint

Dive into the research topics of 'A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis'. Together they form a unique fingerprint.

Cite this