GCF2-NET: GLOBAL-AWARE CROSS-MODAL FEATURE FUSION NETWORK FOR SPEECH EMOTION RECOGNITION

GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition

GCF2-Net: global-aware cross-modal feature fusion network for speech emotion recognition

Blog Article

Emotion recognition plays an essential role in interpersonal communication.However, existing recognition systems use only features of a single modality for emotion recognition, ignoring the interaction of information from the different modalities.Therefore, dea eyewear in our study, we propose a global-aware Cross-modal feature Fusion Network (GCF2-Net) for recognizing emotion.We construct a residual cross-modal fusion attention module (ResCMFA) to fuse information from multiple modalities and design a global-aware module to capture global details.

More specifically, we first here use transfer learning to extract wav2vec 2.0 features and text features fused by the ResCMFA module.Then, cross-modal fusion features are fed into the global-aware module to capture the most essential emotional information globally.Finally, the experiment results have shown that our proposed method has significant advantages than state-of-the-art methods on the IEMOCAP and MELD datasets, respectively.

Report this page