AITopics

2311.16872

Country:

North America > United States (0.67)
Europe (0.46)

Genre: Research Report > Experimental Study (0.70)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.47)

arXiv.org Machine LearningNov-26-2023

Augmentation Invariant Manifold Learning

Wang, Shulei

Data augmentation is a widely used technique and an essential ingredient in the recent advance in self-supervised representation learning. By preserving the similarity between augmented data, the resulting data representation can improve various downstream analyses and achieve state-of-the-art performance in many applications. Despite the empirical effectiveness, most existing methods lack theoretical understanding under a general nonlinear setting. To fill this gap, we develop a statistical framework on a low-dimension product manifold to model the data augmentation transformation. Under this framework, we introduce a new representation learning method called augmentation invariant manifold learning and design a computationally efficient algorithm by reformulating it as a stochastic optimization problem. Compared with existing self-supervised methods, the new method simultaneously exploits the manifold's geometric structure and invariant property of augmented data and has an explicit theoretical guarantee. Our theoretical investigation characterizes the role of data augmentation in the proposed method and reveals why and how the data representation learned from augmented data can improve the $k$-nearest neighbor classifier in the downstream analysis, showing that a more complex data augmentation leads to more improvement in downstream analysis. Finally, numerical experiments on simulated and real datasets are presented to demonstrate the merit of the proposed method.

artificial intelligence, machine learning, representation, (16 more...)

2211.0046

Country:

Europe (0.27)
North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine (0.92)
Education (0.62)
Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Wang, Jiachen T., Jia, Ruoxi

A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms"

arXiv.org Machine LearningNov-25-2023

Data valuation is a growing research field that studies the influence of individual data points for machine learning (ML) models. Data Shapley, inspired by cooperative game theory and economics, is an effective method for data valuation. However, it is well-known that the Shapley value (SV) can be computationally expensive. Fortunately, Jia et al. (2019) showed that for K-Nearest Neighbors (KNN) models, the computation of Data Shapley is surprisingly simple and efficient. In this note, we revisit the work of Jia et al. (2019) and propose a more natural and interpretable utility function that better reflects the performance of KNN models. We derive the corresponding calculation procedure for the Data Shapley of KNN classifiers/regressors with the new utility functions. Our new approach, dubbed soft-label KNN-SV, achieves the same time complexity as the original method. We further provide an efficient approximation algorithm for soft-label KNN-SV based on locality sensitive hashing (LSH). Our experimental results demonstrate that Soft-label KNN-SV outperforms the original method on most datasets in the task of mislabeled data detection, making it a better baseline for future work on data valuation.

artificial intelligence, machine learning, utility function, (15 more...)

2304.04258

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

arXiv.org Machine LearningNov-25-2023

Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

Wang, Jiachen T., Zhu, Yuqing, Wang, Yu-Xiang, Jia, Ruoxi, Mittal, Prateek

Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays. We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical difficulties in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). We show that DP-TKNN-Shapley has several advantages and offers a superior privacy-utility tradeoff compared to naively privatized KNN-Shapley in discerning data quality. Moreover, even non-private TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley, particularly for real-world applications involving sensitive data.

machine learning, natural language, val, (15 more...)

2308.15709

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

arXiv.org Artificial IntelligenceNov-24-2023

Human-Machine Cooperative Multimodal Learning Method for Cross-subject Olfactory Preference Recognition

Xia, Xiuxin, Guo, Yuchen, Wang, Yanwei, Yang, Yuchao, Shi, Yan, Men, Hong

Odor sensory evaluation has a broad application in food, clothing, cosmetics, and other fields. Traditional artificial sensory evaluation has poor repeatability, and the machine olfaction represented by the electronic nose (E-nose) is difficult to reflect human feelings. Olfactory electroencephalogram (EEG) contains odor and individual features associated with human olfactory preference, which has unique advantages in odor sensory evaluation. However, the difficulty of cross-subject olfactory EEG recognition greatly limits its application. It is worth noting that E-nose and olfactory EEG are more advantageous in representing odor information and individual emotions, respectively. In this paper, an E-nose and olfactory EEG multimodal learning method is proposed for cross-subject olfactory preference recognition. Firstly, the olfactory EEG and E-nose multimodal data acquisition and preprocessing paradigms are established. Secondly, a complementary multimodal data mining strategy is proposed to effectively mine the common features of multimodal data representing odor information and the individual features in olfactory EEG representing individual emotional information. Finally, the cross-subject olfactory preference recognition is achieved in 24 subjects by fusing the extracted common and individual features, and the recognition effect is superior to the state-of-the-art recognition methods. Furthermore, the advantages of the proposed method in cross-subject olfactory preference recognition indicate its potential for practical odor evaluation applications.

artificial intelligence, machine learning, olfactory eeg, (17 more...)

2311.14426

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Industry:

Materials > Chemicals (0.93)
Energy > Oil & Gas (0.68)
Health & Medicine > Diagnostic Medicine (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)

Almalki, Sultan, Chatterjee, Prosenjit, Roy, Kaushik

Continuous Authentication Using Mouse Clickstream Data Analysis

arXiv.org Artificial IntelligenceNov-23-2023

Biometrics is used to authenticate an individual based on physiological or behavioral traits. Mouse dynamics is an example of a behavioral biometric that can be used to perform continuous authentication as protection against security breaches. Recent research on mouse dynamics has shown promising results in identifying users; however, it has not yet reached an acceptable level of accuracy. In this paper, an empirical evaluation of different classification techniques is conducted on a mouse dynamics dataset, the Balabit Mouse Challenge dataset. User identification is carried out using three mouse actions: mouse move, point and click, and drag and drop. Verification and authentication methods are conducted using three machine-learning classifiers: the Decision Tree classifier, the K-Nearest Neighbors classifier, and the Random Forest classifier. The results show that the three classifiers can distinguish between a genuine user and an impostor with a relatively high degree of accuracy. In the verification mode, all the classifiers achieve a perfect accuracy of 100%. In authentication mode, all three classifiers achieved the highest accuracy (ACC) and Area Under Curve (AUC) from scenario B using the point and click action data: (Decision Tree ACC:87.6%,

artificial intelligence, classifier, machine learning, (18 more...)

2312.00802

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.91)

Neubauer, Lukas, Filzmoser, Peter

Improving Forecasts for Heterogeneous Time Series by "Averaging", with Application to Food Demand Forecast

arXiv.org Machine LearningNov-23-2023

A common forecasting setting in real world applications considers a set of possibly heterogeneous time series of the same domain. Due to different properties of each time series such as length, obtaining forecasts for each individual time series in a straight-forward way is challenging. This paper proposes a general framework utilizing a similarity measure in Dynamic Time Warping to find similar time series to build neighborhoods in a k-Nearest Neighbor fashion, and improve forecasts of possibly simple models by averaging. Several ways of performing the averaging are suggested, and theoretical arguments underline the usefulness of averaging for forecasting. Additionally, diagnostics tools are proposed allowing a deep understanding of the procedure.

artificial intelligence, data mining, machine learning, (16 more...)

2306.07119

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

arXiv.org Machine LearningNov-16-2023

Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning

Banerjee, Amartya, Hazard, Christopher J., Beel, Jacob, Mack, Cade, Xia, Jack, Resnick, Michael, Goddin, Will

Nonparametric learning is a fundamental concept in machine learning that aims to capture complex patterns and relationships in data without making strong assumptions about the underlying data distribution. Owing to simplicity and familiarity, one of the most well-known algorithms under this paradigm is the $k$-nearest neighbors ($k$-NN) algorithm. Driven by the usage of machine learning in safety-critical applications, in this work, we shed new light on the traditional nearest neighbors algorithm from the perspective of information theory and propose a robust and interpretable framework for tasks such as classification, regression, and anomaly detection using a single model. Instead of using a traditional distance measure which needs to be scaled and contextualized, we use a novel formulation of \textit{surprisal} (amount of information required to explain the difference between the observed and expected result). Finally, we demonstrate this architecture's capability to perform at-par or above the state-of-the-art on classification, regression, and anomaly detection tasks using a single model with enhanced interpretability by providing novel concepts for characterizing data and predictions.

artificial intelligence, data mining, machine learning, (18 more...)

2311.10246

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.66)

arXiv.org Artificial IntelligenceNov-10-2023

Distribution-Informed Adaptation for kNN Graph Construction

Min, Shaojie, Liu, Ji

Graph-based kNN algorithms have garnered widespread popularity for machine learning tasks due to their simplicity and effectiveness. However, as factual data often inherit complex distributions, the conventional kNN graph's reliance on a unified k-value can hinder its performance. A crucial factor behind this challenge is the presence of ambiguous samples along decision boundaries that are inevitably more prone to incorrect classifications. To address the situation, we propose the Distribution-Informed adaptive kNN Graph (DaNNG), which combines adaptive kNN with distribution-aware graph construction. By incorporating an approximation of the distribution with customized k-adaption criteria, DaNNG can significantly improve performance on ambiguous samples, and hence enhance overall accuracy and generalization capability. Through rigorous evaluations on diverse benchmark datasets, DaNNG outperforms state-of-the-art algorithms, showcasing its adaptability and efficacy across various real-world scenarios.

artificial intelligence, danng, machine learning, (15 more...)

2308.02442

Country:

Europe (0.68)
Asia > China (0.29)
North America > United States (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.30)

Nawale, Sahil, Khut, Dhruv, Dave, Daksh, Sawhney, Gauransh, Aggrawal, Pushkar, Devadakar, Dr. Kailas

PotholeGuard: A Pothole Detection Approach by Point Cloud Semantic Segmentation

arXiv.org Artificial IntelligenceNov-5-2023

Pothole detection is crucial for road safety and maintenance, traditionally relying on 2D image segmentation. However, existing 3D Semantic Pothole Segmentation research often overlooks point cloud sparsity, leading to suboptimal local feature capture and segmentation accuracy. Our research presents an innovative point cloud-based pothole segmentation architecture. Our model efficiently identifies hidden features and uses a feedback mechanism to enhance local characteristics, improving feature presentation. We introduce a local relationship learning module to understand local shape relationships, enhancing structural insights. Additionally, we propose a lightweight adaptive structure for refining local point features using the K nearest neighbor algorithm, addressing point cloud density differences and domain selection. Shared MLP Pooling is integrated to learn deep aggregation features, facilitating semantic data exploration and segmentation guidance. Extensive experiments on three public datasets confirm PotholeGuard's superior performance over state-of-the-art methods. Our approach offers a promising solution for robust and accurate 3D pothole segmentation, with applications in road maintenance and safety.

artificial intelligence, machine learning, point cloud, (15 more...)

2311.02641

Country: Asia > India (0.15)

Genre: Research Report > Promising Solution (0.54)

Industry:

Transportation > Ground > Road (0.54)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)