AITopics | normalized

Collaborating Authors

normalized

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inducing Spatial Locality in Vision Transformers through the Training Protocol

Toledo, Eduardo Santiago, Martínez, Asael Fabian

arXiv.org Machine LearningMay-19-2026

We investigate whether the training protocol can induce spatial locality in the early layers of a Vision Transformer (ViT) trained from scratch, without large-scale pretraining. Keeping the architecture and optimization procedure fixed, we compare a Baseline protocol with a Modern protocol (AutoAugment/ColorJitter, CutMix, and Label Smoothing) on CIFAR-10, CIFAR-100, and Tiny-ImageNet, characterizing each attention head via Mean Attention Distance (MAD) and normalized entropy. Across all three datasets, the Modern protocol produces more local and more concentrated attention in early layers; on CIFAR-100, the minimum MAD drops from 0.316 (Baseline) to 0.008 (Modern). To identify the source of this effect, we conduct an ablation study on CIFAR-100 by adding or removing each component individually. The results identify CutMix as the determining component within our experiments: all conditions with CutMix exhibit MAD 0.024, while all conditions without CutMix remain at MAD 0.210. AutoAugment and Label Smoothing show no independent effect on locality. Taken together, these findings suggest that the pressure to classify from partial image regions, induced by CutMix, can promote the emergence of local attention in Vision Transformers.

artificial intelligence, machine learning, protocol, (16 more...)

arXiv.org Machine Learning

2605.1639

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.76)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

4b3cc0d1c897ebcf71aca92a4a26ac83-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 16:31:58 GMT

ddiag, exp, normalized, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

VUSA: Virtually Upscaled Systolic Array Architecture to Exploit Unstructured Sparsity in AI Acceleration

Helal, Shereef, Garcia-Ortiz, Alberto, Bamberg, Lennart

arXiv.org Artificial IntelligenceJun-3-2025

--Leveraging high degrees of unstructured sparsity is a promising approach to enhance the efficiency of deep neural network (DNN) accelerators--particularly important for emerging Edge-AI applications. We introduce VUSA, a systolic-array architecture that virtually grows based on the present sparsity to perform larger matrix multiplications with the same number of physical multiply-accumulate (MAC) units. The proposed architecture achieves saving by 37% and 68% in area and power efficiency, respectively, at the same peak-performance, compared to a baseline systolic array architecture in a commercial 16-nm technology. Still, the proposed architecture supports acceleration for any DNN with any sparsity--even no sparsity at all. Thus, the proposed architecture is application-independent, making it viable for general-purpose AI acceleration. Over recent years, Artificial Intelligence (AI) has emerged as a revolutionary new technology, spreading across different industries and enhancing various aspects of our daily lives. The deployment of AI is not only confined to powerful data-center machines, but is increasingly demanded in resource-constrained embedded devices, a concept known as Edge AI. Deep Neural Network (DNN) architectures are the backbone of state-of-the-art AI applications to perform numerous tasks, such as image processing, speech recognition, natural language processing (NLP), and more [1]. However, DNNs have high computational demands, posing a significant challenge when deploying them in real-world applications.

artificial intelligence, machine learning, sparsity, (18 more...)

arXiv.org Artificial Intelligence

2506.01166

Country:

North America > United States (0.68)
Europe > Germany > Bremen > Bremen (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

A Distributional Perspective on Word Learning in Neural Language Models

Ficarra, Filippo, Cotterell, Ryan, Warstadt, Alex

arXiv.org Artificial IntelligenceFeb-9-2025

Language models (LMs) are increasingly being studied as models of human language learners. Due to the nascency of the field, it is not well-established whether LMs exhibit similar learning dynamics to humans, and there are few direct comparisons between learning trajectories in humans and models. Word learning trajectories for children are relatively well-documented, and recent work has tried to extend these investigations to language models. However, there are no widely agreed-upon metrics for word learning in language models. We take a distributional approach to this problem, defining lexical knowledge in terms of properties of the learned distribution for a target word. We argue that distributional signatures studied in prior work fail to capture key distributional information. Thus, we propose an array of signatures that improve on earlier approaches by capturing knowledge of both where the target word can and cannot occur as well as gradient preferences about the word's appropriateness. We obtain learning trajectories for a selection of small language models we train from scratch, study the relationship between different distributional signatures, compare how well they align with human word learning trajectories and interpretable lexical features, and address basic methodological questions about estimating these distributional signatures. Our metrics largely capture complementary information, suggesting that it is important not to rely on a single metric. However, across all metrics, language models' learning trajectories fail to correlate with those of children.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.05892

Country:

North America > United States > Virginia (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

Islam, Mohammad Saiful, Rakha, Mohamed Sami, Pourmajidi, William, Sivaloganathan, Janakan, Steinbacher, John, Miranskyy, Andriy

arXiv.org Artificial IntelligenceJan-6-2025

As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. Additionally, we demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process. This study and the accompanying dataset provide a resource for researchers and practitioners in cloud system monitoring. It facilitates more efficient testing of anomaly detection methods in real-world data, helping to advance the development of robust solutions to maintain the health and performance of large-scale cloud infrastructures.

anomaly, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.09047

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Physics-Informed Machine Learning Towards A Real-Time Spacecraft Thermal Simulator

Oddiraju, Manaswin, Hasnain, Zaki, Bandyopadhyay, Saptarshi, Sunada, Eric, Chowdhury, Souma

arXiv.org Artificial IntelligenceJul-8-2024

Modeling thermal states for complex space missions, such as the surface exploration of airless bodies, requires high computation, whether used in ground-based analysis for spacecraft design or during onboard reasoning for autonomous operations. For example, a finite-element thermal model with hundreds of elements can take significant time to simulate, which makes it unsuitable for onboard reasoning during time-sensitive scenarios such as descent and landing, proximity operations, or in-space assembly. Further, the lack of fast and accurate thermal modeling drives thermal designs to be more conservative and leads to spacecraft with larger mass and higher power budgets. The emerging paradigm of physics-informed machine learning (PIML) presents a class of hybrid modeling architectures that address this challenge by combining simplified physics models with machine learning (ML) models resulting in models which maintain both interpretability and robustness. Such techniques enable designs with reduced mass and power through onboard thermal-state estimation and control and may lead to improved onboard handling of off-nominal states, including unplanned down-time. The PIML model or hybrid model presented here consists of a neural network which predicts reduced nodalizations (distribution and size of coarse mesh) given on-orbit thermal load conditions, and subsequently a (relatively coarse) finite-difference model operates on this mesh to predict thermal states. We compare the computational performance and accuracy of the hybrid model to a data-driven neural net model, and a high-fidelity finite-difference model of a prototype Earth-orbiting small spacecraft. The PIML based active nodalization approach provides significantly better generalization than the neural net model and coarse mesh model, while reducing computing cost by up to 1.7x compared to the high-fidelity model.

architecture, nodalization, node, (16 more...)

arXiv.org Artificial Intelligence

2407.06099

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
North America > United States > New York > Erie County > Buffalo (0.04)

Genre: Research Report (0.82)

Industry: Energy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning

Lian, Hongqiao, Ma, Zeyuan, Guo, Hongshu, Huang, Ting, Gong, Yue-Jiao

arXiv.org Artificial IntelligenceApr-12-2024

Solving multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. Although existing works strike the balance of exploration and exploitation through hand-crafted adaptive strategies, they require certain expert knowledge, hence inflexible to deal with MMOP with different properties. In this paper, we propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent for flexibly adjusting individual-level searching strategies to match the up-to-date optimization status, hence boosting the search performance on MMOP. Concretely, we encode landscape properties and evolution path information into each individual and then leverage attention networks to advance population information sharing. With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. The experimental results on the CEC2013 MMOP benchmark underscore the competitive optimization performance of RLEMMO against several strong baselines.

agent, optimization, rlemmo, (14 more...)

arXiv.org Artificial Intelligence

2404.08242

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
Oceania > Australia (0.04)
Asia > China > Shanxi Province (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

One flow to correct them all: improving simulations in high-energy physics with a single normalising flow and a switch

Daumann, Caio Cesar, Donega, Mauro, Erdmann, Johannes, Galli, Massimiliano, Späh, Jan Lukas, Valsecchi, Davide

arXiv.org Artificial IntelligenceMar-27-2024

Simulated events are key ingredients in almost all high-energy physics analyses. However, imperfections in the simulation can lead to sizeable differences between the observed data and simulated events. The effects of such mismodelling on relevant observables must be corrected either effectively via scale factors, with weights or by modifying the distributions of the observables and their correlations. We introduce a correction method that transforms one multidimensional distribution (simulation) into another one (data) using a simple architecture based on a single normalising flow with a boolean condition. We demonstrate the effectiveness of the method on a physics-inspired toy dataset with non-trivial mismodelling of several observables and their correlations.

base distribution, dataset, simulation, (17 more...)

arXiv.org Artificial Intelligence

2403.18582

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Seeking Next Layer Neurons' Attention for Error-Backpropagation-Like Training in a Multi-Agent Network Framework

Moakhar, Arshia Soltani, Azizmalayeri, Mohammad, Mirzaei, Hossein, Manzuri, Mohammad Taghi, Rohban, Mohammad Hossein

arXiv.org Artificial IntelligenceOct-15-2023

Despite considerable theoretical progress in the training of neural networks viewed as a multi-agent system of neurons, particularly concerning biological plausibility and decentralized training, their applicability to real-world problems remains limited due to scalability issues. In contrast, error-backpropagation has demonstrated its effectiveness for training deep networks in practice. In this study, we propose a local objective for neurons that, when pursued by neurons individually, align them to exhibit similarities to error-backpropagation in terms of efficiency and scalability during training. For this purpose, we examine a neural network comprising decentralized, self-interested neurons seeking to maximize their local objective -- attention from subsequent layer neurons -- and identify the optimal strategy for neurons. We also analyze the relationship between this strategy and backpropagation, establishing conditions under which the derived strategy is equivalent to error-backpropagation. Lastly, we demonstrate the learning capacity of these multi-agent neural networks through experiments on three datasets and showcase their superior performance relative to error-backpropagation in a catastrophic forgetting benchmark.

error-backpropagation-like training, neural network, neuron, (14 more...)

arXiv.org Artificial Intelligence

2310.09952

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

EvalAttAI: A Holistic Approach to Evaluating Attribution Maps in Robust and Non-Robust Models

Nielsen, Ian E., Ramachandran, Ravi P., Bouaynaya, Nidhal, Fathallah-Shaykh, Hassan M., Rasool, Ghulam

arXiv.org Artificial IntelligenceMar-15-2023

The expansion of explainable artificial intelligence as a field of research has generated numerous methods of visualizing and understanding the black box of a machine learning model. Attribution maps are generally used to highlight the parts of the input image that influence the model to make a specific decision. On the other hand, the robustness of machine learning models to natural noise and adversarial attacks is also being actively explored. This paper focuses on evaluating methods of attribution mapping to find whether robust neural networks are more explainable. We explore this problem within the application of classification for medical imaging. Explainability research is at an impasse. There are many methods of attribution mapping, but no current consensus on how to evaluate them and determine the ones that are the best. Our experiments on multiple datasets (natural and medical imaging) and various attribution methods reveal that two popular evaluation metrics, Deletion and Insertion, have inherent limitations and yield contradictory results. We propose a new explainability faithfulness metric (called EvalAttAI) that addresses the limitations of prior metrics. Using our novel evaluation, we found that Bayesian deep neural networks using the Variational Density Propagation technique were consistently more explainable when used with the best performing attribution method, the Vanilla Gradient. However, in general, various types of robust neural networks may not be more explainable, despite these models producing more visually plausible attribution maps.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2023.3300242

2303.08866

Country:

Europe > Jersey (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Arkansas (0.04)
(8 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback