Goto

Collaborating Authors

 Suthaharan, Shan


LDEB -- Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues

arXiv.org Artificial Intelligence

The development of an automated system for emotion recognition in conversations (ERC) is beneficial to many conversational AI applications, [Hazarika et al., 2021, Bhat et al., 2021]. The recent language model ChatGPT in the domain of conversational AI has shown the usefulness of an automated system for ERC, [Shahriar and Hayawi, 2023, Zhang et al., 2023]. Such a system can help advance research in many disciplines that include computational linguistics, neuroscience, and psychology, [Canales and Martínez-Barco, 2014, Strapparava and Mihalcea, 2008]. There has been a significant effort to understand the emotions in conversations and develop efficient computational techniques and machine learning classifiers for ERC using the information in conversational dialogues, [Huang et al., 2018, 2019]. For example, [Huang et al., 2018]-assuming that the textual information in a dialogue does not deliver sufficient information-proposed an approach to supply emotion information a priori at training. Subsequently, [Huang et al., 2019] have also utilized the Long Short Term Memory networks (LSTM) architecture hierarchically-as an iterative model-to capture contextual emotional features so that the model can predict the emotions in textual dialogues. Machine learning (ML) is a technique that can help us develop such an automated system to recognize emotions in a conversational dialogue by performing the classification of emotions. For example, [Binali et al., 2010] have adapted emotion theories, based on Ekman's model and the OCC (Ortony/Clore/Collins) model, and developed a support vector machine (SVM) classifier for emotion recognition in a web blog data.


A fatal point concept and a low-sensitivity quantitative measure for traffic safety analytics

arXiv.org Machine Learning

The variability of the clusters generated by clustering techniques in the domain of latitude and longitude variables of fatal crash data are significantly unpredictable. This unpredictability, caused by the randomness of fatal crash incidents, reduces the accuracy of crash frequency (i.e., counts of fatal crashes per cluster) which is used to measure traffic safety in practice. In this paper, a quantitative measure of traffic safety that is not significantly affected by the aforementioned variability is proposed. It introduces a fatal point -- a segment with the highest frequency of fatality -- concept based on cluster characteristics and detects them by imposing rounding errors to the hundredth decimal place of the longitude. The frequencies of the cluster and the cluster's fatal point are combined to construct a low-sensitive quantitative measure of traffic safety for the cluster. The performance of the proposed measure of traffic safety is then studied by varying the parameter k of k-means clustering with the expectation that other clustering techniques can be adopted in a similar fashion. The 2015 North Carolina fatal crash dataset of Fatality Analysis Reporting System (FARS) is used to evaluate the proposed fatal point concept and perform experimental analysis to determine the effectiveness of the proposed measure. The empirical study shows that the average traffic safety, measured by the proposed quantitative measure over several clusters, is not significantly affected by the variability, compared to that of the standard crash frequency.


Elliptical modeling and pattern analysis for perturbation models and classfication

arXiv.org Machine Learning

The characteristics (or numerical patterns) of a feature vector in the transform domain of a perturbation model differ significantly from those of its corresponding feature vector in the input domain. These differences - caused by the perturbation techniques used for the transformation of feature patterns - degrade the performance of machine learning techniques in the transform domain. In this paper, we proposed a nonlinear parametric perturbation model that transforms the input feature patterns to a set of elliptical patterns, and studied the performance degradation issues associated with random forest classification technique using both the input and transform domain features. Compared with the linear transformation such as Principal Component Analysis (PCA), the proposed method requires less statistical assumptions and is highly suitable for the applications such as data privacy and security due to the difficulty of inverting the elliptical patterns from the transform domain to the input domain. In addition, we adopted a flexible block-wise dimensionality reduction step in the proposed method to accommodate the possible high-dimensional data in modern applications. We evaluated the empirical performance of the proposed method on a network intrusion data set and a biological data set, and compared the results with PCA in terms of classification performance and data privacy protection (measured by the blind source separation attack and signal interference ratio). Both results confirmed the superior performance of the proposed elliptical transformation.