AITopics | esd

Collaborating Authors

esd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Power analysis of knockoff filters for correlated designs

Jingbo Liu, Philippe Rigollet

Neural Information Processing SystemsFeb-11-2026, 10:15:27 GMT

Neural Information Processing Systems http://nips.cc/

distributional limit, knockoff filter, procedure, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Power analysis of knockoff filters for correlated designs

Jingbo Liu, Philippe Rigollet

Neural Information Processing SystemsOct-2-2025, 00:57:25 GMT

When the predictors are i.i.d.

artificial intelligence, knockoff filter, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias

Hu, Yuanzhe, Goel, Kinshuk, Killiakov, Vlad, Yang, Yaoqing

arXiv.org Artificial IntelligenceAug-19-2025

Diagnosing deep neural networks (DNNs) by analyzing the eigenspectrum of their weights has been an active area of research in recent years. One of the main approaches involves measuring the heavytailness of the empirical spectral densities (ESDs) of weight matrices. This analysis has been shown to provide insights to help diagnose whether a model is well-trained or undertrained, and has been used to guide training methods involving layer-wise hyperparameter assignment. In this paper, we address an often-overlooked challenge in estimating the heavytailness of these ESDs: the impact of the aspect ratio of weight matrices. We demonstrate that matrices of varying sizes (and aspect ratios) introduce a non-negligible bias in estimating the heavytailness of ESDs, leading to inaccurate model diagnosis and layer-wise hyperparameter assignment. To overcome this challenge, we propose FARMS (Fixed-Aspect-Ratio Matrix Subsampling), a method that normalizes the weight matrices by subsampling submatrices with a fixed aspect ratio. Instead of measuring the heavytailness of the original ESD, we measure the average ESD of these subsampled submatrices. We show that this method effectively mitigates the aspect ratio bias. We validate our approach across various optimization techniques and application domains that involve eigenspectrum analysis of weights, including image classification in computer vision (CV) models, scientific machine learning (SciML) model training, and large language model (LLM) pruning. Our results show that despite its simplicity, FARMS uniformly improves the accuracy of eigenspectrum analysis while enabling more effective layer-wise hyperparameter assignment. In one of the LLM pruning experiments, FARMS reduces the perplexity of the LLaMA-7B model by 17.3% when compared with state-of-the-art methods.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.0628

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Machine Learning-based Early Detection of Potato Sprouting Using Electrophysiological Signals

Andreoletti, Davide, Marcolongo, Aris, Djukic, Natasa Sarafijanovic, Roulet, Julien, Billeter, Stefano, Kurenda, Andrzej, Visse-Mansiaux, Margot, Dupuis, Brice, Plummer, Carrol Annette, Paoli, Beatrice, Ayoub, Omran

arXiv.org Artificial IntelligenceJul-2-2025

Accurately predicting potato sprouting before the emergence of any visual signs is critical for effective storage management, as sprouting degrades both the commercial and nutritional value of tubers. Effective forecasting allows for the precise application of anti-sprouting chemicals (ASCs), minimizing waste and reducing costs. This need has become even more pressing following the ban on Isopropyl N-(3-chlorophenyl) carbamate (CIPC) or Chlorpropham due to health and environmental concerns, which has led to the adoption of significantly more expensive alternative ASCs. Existing approaches primarily rely on visual identification, which only detects sprouting after morphological changes have occurred, limiting their effectiveness for proactive management. A reliable early prediction method is therefore essential to enable timely intervention and improve the efficiency of post-harvest storage strategies, where early refers to detecting sprouting before any visible signs appear. In this work, we address the problem of early prediction of potato sprouting. To this end, we propose a novel machine learning (ML)-based approach that enables early prediction of potato sprouting using electrophysiological signals recorded from tubers using proprietary sensors. Our approach preprocesses the recorded signals, extracts relevant features from the wavelet domain, and trains supervised ML models for early sprouting detection. Additionally, we incorporate uncertainty quantification techniques to enhance predictions. Experimental results demonstrate promising performance in the early detection of potato sprouting by accurately predicting the exact day of sprouting for a subset of potatoes and while showing acceptable average error across all potatoes. Despite promising results, further refinements are necessary to minimize prediction errors, particularly in reducing the maximum observed deviations.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2507.00862

Genre: Research Report > New Finding (0.66)

Industry:

Food & Agriculture > Agriculture (0.68)
Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Grokking and Generalization Collapse: Insights from \texttt{HTSR} theory

Prakash, Hari K., Martin, Charles H.

arXiv.org Machine LearningJun-6-2025

We study the well-known grokking phenomena in neural networks (NNs) using a 3-layer MLP trained on 1 k-sample subset of MNIST, with and without weight decay, and discover a novel third phase -- \emph{anti-grokking} -- that occurs very late in training and resembles but is distinct from the familiar \emph{pre-grokking} phases: test accuracy collapses while training accuracy stays perfect. This late-stage collapse is distinct, from the known pre-grokking and grokking phases, and is not detected by other proposed grokking progress measures. Leveraging Heavy-Tailed Self-Regularization HTSR through the open-source WeightWatcher tool, we show that the HTSR layer quality metric $α$ alone delineates all three phases, whereas the best competing metrics detect only the first two. The \emph{anti-grokking} is revealed by training for $10^7$ and is invariably heralded by $α< 2$ and the appearance of \emph{Correlation Traps} -- outlier singular values in the randomized layer weight matrices that make the layer weight matrix atypical and signal overfitting of the training set. Such traps are verified by visual inspection of the layer-wise empirical spectral densities, and by using Kolmogorov--Smirnov tests on randomized spectra. Comparative metrics, including activation sparsity, absolute weight entropy, circuit complexity, and $l^2$ weight norms track pre-grokking and grokking but fail to distinguish grokking from anti-grokking. This discovery provides a way to measure overfitting and generalization collapse without direct access to the test data. These results strengthen the claim that the \emph{HTSR} $α$ provides universal layer-convergence target at $α\approx 2$ and underscore the value of using the HTSR alpha $(α)$ metric as a measure of generalization.

artificial intelligence, correlation trap, machine learning, (15 more...)

arXiv.org Machine Learning

2506.04434

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Erased but Not Forgotten: How Backdoors Compromise Concept Erasure

Grebe, Jonas Henry, Braun, Tobias, Rohrbach, Marcus, Rohrbach, Anna

arXiv.org Artificial IntelligenceMay-1-2025

The expansion of large-scale text-to-image diffusion models has raised growing concerns about their potential to generate undesirable or harmful content, ranging from fabricated depictions of public figures to sexually explicit images. To mitigate these risks, prior work has devised machine unlearning techniques that attempt to erase unwanted concepts through fine-tuning. However, in this paper, we introduce a new threat model, Toxic Erasure (ToxE), and demonstrate how recent unlearning algorithms, including those explicitly designed for robustness, can be circumvented through targeted backdoor attacks. The threat is realized by establishing a link between a trigger and the undesired content. Subsequent unlearning attempts fail to erase this link, allowing adversaries to produce harmful content. We instantiate ToxE via two established backdoor attacks: one targeting the text encoder and another manipulating the cross-attention layers. Further, we introduce Deep Intervention Score-based Attack (DISA), a novel, deeper backdoor attack that optimizes the entire U-Net using a score-based objective, improving the attack's persistence across different erasure methods. We evaluate five recent concept erasure methods against our threat model. For celebrity identity erasure, our deep attack circumvents erasure with up to 82% success, averaging 57% across all erasure methods. For explicit content erasure, ToxE attacks can elicit up to 9 times more exposed body parts, with DISA yielding an average increase by a factor of 2.9. These results highlight a critical security gap in current unlearning strategies.

artificial intelligence, machine learning, zhang, (18 more...)

arXiv.org Artificial Intelligence

2504.21072

Country:

Europe > Switzerland (0.28)
North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Information Technology (0.74)
Leisure & Entertainment (0.67)
Government > Regional Government (0.67)
Media > Music (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Using Pre-trained LLMs for Multivariate Time Series Forecasting

Wolff, Malcolm L., Yang, Shenghao, Torkkola, Kari, Mahoney, Michael W.

arXiv.org Artificial IntelligenceJan-10-2025

Time series forecasting refers to a class of techniques for the prediction of events through a sequence of time, typically to inform strategic or tactical decision making. Going beyond strategic forecasting problems (e.g., those commonly-used historically in statistics and econometrics [1]), operational forecasting problems are increasingly-important. For example, at large internet retail companies, this includes demand forecasting for products at an online retailer, work force cohorts of a company in its locations, compute capacity needs per region and server type, etc.; in scientific machine learning, this includes prediction of extreme events in, e.g., climate and weather models; and so on. In particular, MQCNN [2] and MQTransformer [3] are stateof-the-art (SOTA) neural network (NN) based multivariate time series forecasting models that are used to predict future demand at the product level for hundreds of millions of products.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2501.06386

Genre: Research Report > New Finding (0.68)

Industry:

Government > Military (0.54)
Retail > Online (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Does Data Diversity Shape the Weight Landscape of Neural Networks?

Ba, Yang, Mancenido, Michelle V., Pan, Rong

arXiv.org Artificial IntelligenceOct-18-2024

To enhance the generalization of machine learning models to unseen data, techniques such as dropout, weight decay ($L_2$ regularization), and noise augmentation are commonly employed. While regularization methods (i.e., dropout and weight decay) are geared toward adjusting model parameters to prevent overfitting, data augmentation increases the diversity of the input training set, a method purported to improve accuracy and calibration error. In this paper, we investigate the impact of each of these techniques on the parameter space of neural networks, with the goal of understanding how they alter the weight landscape in transfer learning scenarios. To accomplish this, we employ Random Matrix Theory to analyze the eigenvalue distributions of pre-trained models, fine-tuned using these techniques but using different levels of data diversity, for the same downstream tasks. We observe that diverse data influences the weight landscape in a similar fashion as dropout. Additionally, we compare commonly used data augmentation methods with synthetic data created by generative models. We conclude that synthetic data can bring more diversity into real input data, resulting in a better performance on out-of-distribution test instances.

artificial intelligence, data augmentation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.14602

Country:

North America > United States > Arizona (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Model Balancing Helps Low-data Training and Fine-tuning

Liu, Zihang, Hu, Yuanzhe, Pang, Tianyu, Zhou, Yefan, Ren, Pu, Yang, Yaoqing

arXiv.org Machine LearningOct-15-2024

Recent advances in foundation models have emphasized the need to align pre-trained models with specialized domains using small, curated datasets. Studies on these foundation models underscore the importance of low-data training and fine-tuning. This topic, well-known in natural language processing (NLP), has also gained increasing attention in the emerging field of scientific machine learning (SciML). To address the limitations of low-data training and fine-tuning, we draw inspiration from Heavy-Tailed Self-Regularization (HT-SR) theory, analyzing the shape of empirical spectral densities (ESDs) and revealing an imbalance in training quality across different model layers. To mitigate this issue, we adapt a recently proposed layer-wise learning rate scheduler, TempBalance, which effectively balances training quality across layers and enhances low-data training and fine-tuning for both NLP and SciML tasks. Notably, TempBalance demonstrates increasing performance gains as the amount of available tuning data decreases. Comparative analyses further highlight the effectiveness of TempBalance and its adaptability as an "add-on" method for improving model performance.

dataset, pl alpha hill, tempbalance, (12 more...)

arXiv.org Machine Learning

2410.12178

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Kothapalli, Vignesh, Pang, Tianyu, Deng, Shenyang, Liu, Zongmin, Yang, Yaoqing

arXiv.org Machine LearningJun-7-2024

Modern training strategies of deep neural networks (NNs) tend to induce a heavy-tailed (HT) spectra of layer weights. Extensive efforts to study this phenomenon have found that NNs with HT weight spectra tend to generalize well. A prevailing notion for the occurrence of such HT spectra attributes gradient noise during training as a key contributing factor. Our work shows that gradient noise is unnecessary for generating HT weight spectra: two-layer NNs trained with full-batch Gradient Descent/Adam can exhibit HT spectra in their weights after finite training steps. To this end, we first identify the scale of the learning rate at which one step of full-batch Adam can lead to feature learning in the shallow NN, particularly when learning a single index teacher model. Next, we show that multiple optimizer steps with such (sufficiently) large learning rates can transition the bulk of the weight's spectra into an HT distribution. To understand this behavior, we present a novel perspective based on the singular vectors of the weight matrices and optimizer updates. We show that the HT weight spectrum originates from the `spike', which is generated from feature learning and interacts with the main bulk to generate an HT spectrum. Finally, we analyze the correlations between the HT weight spectra and generalization after multiple optimizer updates with varying learning rates.

esd, fb-adam, singular vector, (13 more...)

arXiv.org Machine Learning

2406.04657

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback