Abstract: Emotion recognition based on text–audio modalities is the core technology for transforming a graphical user interface into a voice user interface, and it plays a vital role in natural ...