Goto

Collaborating Authors

 correct classification rate


Practical Example of Clustering and Radial Basis Functions (RBF)

#artificialintelligence

Clustering is a technique used in machine learning and data analysis to group similar data points together. The goal of clustering is to identify patterns and relationships in the data without any prior knowledge of the underlying structure. Clustering is commonly used in unsupervised learning, where the algorithm is not given any labeled data and must find its own structure in the data. There are numerous applications of clustering in various fields such as finance, marketing, biology, social networks, image and video processing, and many more. There are several different algorithms that can be used for clustering, including k-means, hierarchical clustering, and DBSCAN.


Feature Selection Enhancement and Feature Space Visualization for Speech-Based Emotion Recognition

Kanwal, Sofia, Asghar, Sohail, Ali, Hazrat

arXiv.org Artificial Intelligence

Robust speech emotion recognition relies on the quality of the speech features. We present speech features enhancement strategy that improves speech emotion recognition. We used the INTERSPEECH 2010 challenge feature-set. We identified subsets from the features set and applied Principle Component Analysis to the subsets. Finally, the features are fused horizontally. The resulting feature set is analyzed using t-distributed neighbour embeddings (t-SNE) before the application of features for emotion recognition. The method is compared with the state-of-the-art methods used in the literature. The empirical evidence is drawn using two well-known datasets: Emotional Speech Dataset (EMO-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) for two languages, German and English, respectively. Our method achieved an average recognition gain of 11.5\% for six out of seven emotions for the EMO-DB dataset, and 13.8\% for seven out of eight emotions for the RAVDESS dataset as compared to the baseline study.


Application of Markov Structure of Genomes to Outlier Identification and Read Classification

Karr, Alan F., Hauzel, Jason, Porter, Adam A., Schaefer, Marcel

arXiv.org Machine Learning

That the sequential structure of genomes is important has been known since the discovery of DNA. In this paper we employ a statistics and stochastic process perspective on triplets of successive bases to address two important applications: identifying outliers in genome databases, and classifying reads in the metagenomic context of reference-guided assembly. From this stochastic process perspective, triplets are a second-order Markov chain specified by the distribution of each base conditional on its two immediate predecessors. To be sure, studying genomes via base sequence distributions is not novel. Previous papers have addressed genome signatures (Karlin et al., 1997; Campbell et al., 1999; Takashi et al., 2003), as well as frequentist (Rosen et al., 2008) and Bayesian (Wang et al., 2007) approaches to classification problems.


A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels

Wallimann, Hannes, Imhof, David, Huber, Martin

arXiv.org Machine Learning

We propose a new method for flagging bid rigging, which is particularly useful for detecting incomplete bid-rigging cartels. Our approach combines screens, i.e. statistics derived from the distribution of bids in a tender, with machine learning to predict the probability of collusion. As a methodological innovation, we calculate such screens for all possible subgroups of three or four bids within a tender and use summary statistics like the mean, median, maximum, and minimum of each screen as predictors in the machine learning algorithm. This approach tackles the issue that competitive bids in incomplete cartels distort the statistical signals produced by bid rigging. We demonstrate that our algorithm outperforms previously suggested methods in applications to incomplete cartels based on empirical data from Switzerland.

  cartel, competitive bid, correct classification rate, (12 more...)
2004.05629

Population-Guided Large Margin Classifier for High-Dimension Low -Sample-Size Problems

Yin, Qingbo, Adeli, Ehsan, Shen, Liran, Shen, Dinggang

arXiv.org Machine Learning

Various applications in different fields, such as gene expression analysis or computer vision, suffer from data sets with high-dimensional low-sample-size (HDLSS), which has posed significant challenges for standard statistical and modern machine learning methods. In this paper, we propose a novel linear binary classifier, denoted by population-guided large margin classifier (PGLMC), which is applicable to any sorts of data, including HDLSS. PGLMC is conceived with a projecting direction w given by the comprehensive consideration of local structural information of the hyperplane and the statistics of the training samples. Our proposed model has several advantages compared to those widely used approaches. First, it is not sensitive to the intercept term b. Second, it operates well with imbalanced data. Third, it is relatively simple to be implemented based on Quadratic Programming. Fourth, it is robust to the model specification for various real applications. The theoretical properties of PGLMC are proven. We conduct a series of evaluations on two simulated and six real-world benchmark data sets, including DNA classification, digit recognition, medical image analysis, and face recognition. PGLMC outperforms the state-of-the-art classification methods in most cases, or at least obtains comparable results.


Evaluating and Understanding the Robustness of Adversarial Logit Pairing

Engstrom, Logan, Ilyas, Andrew, Athalye, Anish

arXiv.org Machine Learning

We evaluate the robustness of Adversarial Logit Pairing, a recently proposed defense againstadversarial examples. We find that a network trained with Adversarial Logit Pairing achieves 0.6% correct classification rate under targeted adversarial attack, the threat model in which the defense is considered. We provide a brief overview of the defense and the threat models/claims considered, as well as a discussion of the methodology and results of our attack. Our results offer insights into the reasons underlying the vulnerability of ALP to adversarial attack, and are of general interest in evaluating and understanding adversarial defenses.


An Outlyingness Matrix for Multivariate Functional Data Classification

Dai, Wenlin, Genton, Marc G.

arXiv.org Machine Learning

The classification of multivariate functional data is an important task in scientific research. Unlike point-wise data, functional data are usually classified by their shapes rather than by their scales. We define an outlyingness matrix by extending directional outlyingness, an effective measure of the shape variation of curves that combines the direction of outlyingness with conventional depth. We propose two classifiers based on directional outlyingness and the outlyingness matrix, respectively. Our classifiers provide better performance compared with existing depth-based classifiers when applied on both univariate and multivariate functional data from simulation studies. We also test our methods on two data problems: speech recognition and gesture classification, and obtain results that are consistent with the findings from the simulated data.


Training a Feed-forward Neural Network with Artificial Bee Colony Based Backpropagation Method

Nandy, Sudarshan, Sarkar, Partha Pratim, Das, Achintya

arXiv.org Artificial Intelligence

Back-propagation algorithm is one of the most widely used and popular techniques to optimize the feed forward neural network training. Nature inspired meta-heuristic algorithms also provide derivative-free solution to optimize complex problem. Artificial bee colony algorithm is a nature inspired meta-heuristic algorithm, mimicking the foraging or food source searching behaviour of bees in a bee colony and this algorithm is implemented in several applications for an improved optimized outcome. The proposed method in this paper includes an improved artificial bee colony algorithm based back-propagation neural network training method for fast and improved convergence rate of the hybrid neural network learning method. The result is analysed with the genetic algorithm based back-propagation method, and it is another hybridized procedure of its kind. Analysis is performed over standard data sets, reflecting the light of efficiency of proposed method in terms of convergence speed and rate.


Inter Genre Similarity Modelling For Automatic Music Genre Classification

Bagci, Ulas, Erzin, Engin

arXiv.org Machine Learning

Music genre classification is an essential tool for music information retrieval systems and it has been finding critical applications in various media platforms. Two important problems of the automatic music genre classification are feature extraction and classifier design. This paper investigates inter-genre similarity modelling (IGS) to improve the performance of automatic music genre classification. Inter-genre similarity information is extracted over the mis-classified feature population. Once the inter-genre similarity is modelled, elimination of the inter-genre similarity reduces the inter-genre confusion and improves the identification rates. Inter-genre similarity modelling is further improved with iterative IGS modelling(IIGS) and score modelling for IGS elimination(SMIGS). Experimental results with promising classification improvements are provided.


Supervised functional classification: A theoretical remark and some comparisons

Baillo, Amparo, Cuevas, Antonio

arXiv.org Machine Learning

The problem of supervised classification (or discrimination) with functional data is considered, with a special interest on the popular k-nearest neighbors (k-NN) classifier. First, relying on a recent result by Cerou and Guyader (2006), we prove the consistency of the k-NN classifier for functional data whose distribution belongs to a broad family of Gaussian processes with triangular covariance functions. Second, on a more practical side, we check the behavior of the k-NN method when compared with a few other functional classifiers. This is carried out through a small simulation study and the analysis of several real functional data sets. While no global "uniform" winner emerges from such comparisons, the overall performance of the k-NN method, together with its sound intuitive motivation and relative simplicity, suggests that it could represent a reasonable benchmark for the classification problem with functional data.