Goto

Collaborating Authors

 arff


Comparing Spectral Bias and Robustness For Two-Layer Neural Networks: SGD vs Adaptive Random Fourier Features

arXiv.org Artificial Intelligence

We present experimental results highlighting two key differences resulting from the choice of training algorithm for two-layer neural networks. The spectral bias of neural networks is well known, while the spectral bias dependence on the choice of training algorithm is less studied. Our experiments demonstrate that an adaptive random Fourier features algorithm (ARFF) can yield a spectral bias closer to zero compared to the stochastic gradient descent optimizer (SGD). Additionally, we train two identically structured classifiers, employing SGD and ARFF, to the same accuracy levels and empirically assess their robustness against adversarial noise attacks.


ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions

arXiv.org Machine Learning

Outlier detection refers to the identification of data points that deviate from a general data distribution. Existing unsupervised approaches often suffer from high computational cost, complex hyperparameter tuning, and limited interpretability, especially when working with large, high-dimensional datasets. To address these issues, we present a simple yet effective algorithm called ECOD (Empirical-Cumulative-distribution-based Outlier Detection), which is inspired by the fact that outliers are often the "rare events" that appear in the tails of a distribution. In a nutshell, ECOD first estimates the underlying distribution of the input data in a nonparametric fashion by computing the empirical cumulative distribution per dimension of the data. ECOD then uses these empirical distributions to estimate tail probabilities per dimension for each data point. Finally, ECOD computes an outlier score of each data point by aggregating estimated tail probabilities across dimensions. Our contributions are as follows: (1) we propose a novel outlier detection method called ECOD, which is both parameter-free and easy to interpret; (2) we perform extensive experiments on 30 benchmark datasets, where we find that ECOD outperforms 11 state-of-the-art baselines in terms of accuracy, efficiency, and scalability; and (3) we release an easy-to-use and scalable (with distributed support) Python implementation for accessibility and reproducibility.


Building A Machine Learning Model With WEKA With 'No Coding'

#artificialintelligence

No-code environments in machine learning have become increasingly popular due to the fact that almost anybody who needs machine learning, whatever field they may be in, can use these tools to build models for themselves. WEKA is one of the early no-code tools that was developed but is very efficient and powerful. WEKA can be used to implement state of the art machine learning and deep learning models and can support numerous file formats. In this article, we will learn about how to use WEKA to pre-process and build a machine learning model with code. WEKA can be used in Linux, Windows or Mac operating systems and you can download this from the official website here.


COPOD: Copula-Based Outlier Detection

arXiv.org Machine Learning

Outlier detection refers to the identification of rare items that are deviant from the general data distribution. Existing approaches suffer from high computational complexity, low predictive capability, and limited interpretability. As a remedy, we present a novel outlier detection algorithm called COPOD, which is inspired by copulas for modeling multivariate data distribution. COPOD first constructs an empirical copula, and then uses it to predict tail probabilities of each given data point to determine its level of "extremeness". Intuitively, we think of this as calculating an anomalous p-value. This makes COPOD both parameter-free, highly interpretable, and computationally efficient. In this work, we make three key contributions, 1) propose a novel, parameter-free outlier detection algorithm with both great performance and interpretability, 2) perform extensive experiments on 30 benchmark datasets to show that COPOD outperforms in most cases and is also one of the fastest algorithms, and 3) release an easy-to-use Python implementation for reproducibility.


Introduction to machine learning with Weka - Target Veb

#artificialintelligence

In this tutorial a small introduction of machine learning focused on development will be done with one of the most used Java libraries for this purpose, Weka. The machine learning is a subfield of data science . If data science covers the entire process of obtaining knowledge, cleaning, analysis, visualization and data deployment, machine learning are the algorithms and techniques used in the analysis and modeling phase of this process. Within these, we will focus on supervised learning, which is often used for classification and regression problems. The classification can be applied when dealing with a discrete class, where the objective is to predict one of the mutually exclusive values in the target variable.