AITopics | layer 3

Collaborating Authors

layer 3

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary Material for P-Flow

Neural Information Processing SystemsFeb-17-2026, 18:42:28 GMT

The link to our demo page is https://bit.ly/3ID5Zam. We present the objective metrics according to the Euler steps in the result section of the main paper. We measure the acoustic quality using 5-scale Mean Opinion Scores (MOS).

artificial intelligence, natural language, representation, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.32)

Add feedback

Start Making Sense(s): A Developmental Probe of Attention Specialization Using Lexical Ambiguity

Rivière, Pamela D., Trott, Sean

arXiv.org Artificial IntelligenceDec-1-2025

Despite an in-principle understanding of self-attention matrix operations in Transformer language models (LMs), it remains unclear precisely how these operations map onto interpretable computations or functions--and how or when individual attention heads develop specialized attention patterns. Here, we present a pipeline to systematically probe attention mechanisms, and we illustrate its value by leveraging lexical ambiguity--where a single word has multiple meanings--to isolate attention mechanisms that contribute to word sense disambiguation. We take a "developmental" approach: first, using publicly available Pythia LM checkpoints, we identify inflection points in disambiguation performance for each LM in the suite; in 14M and 410M, we identify heads whose attention to disambiguating words covaries with overall disambiguation performance across development. We then stress-test the robustness of these heads to stimulus perturbations: in 14M, we find limited robustness, but in 410M, we identify multiple heads with surprisingly generalizable behavior. Then, in a causal analysis, we find that ablating the target heads demonstrably impairs disambiguation performance, particularly in 14M . We additionally reproduce developmental analyses of 14M across all of its random seeds. Together, these results suggest: that disambiguation benefits from a constellation of mechanisms, some of which (especially in 14M) are highly sensitive to the position and part-of-speech of the disambiguating cue; and that larger models (410M) may contain heads with more robust disambiguation behavior. They also join a growing body of work that highlights the value of adopting a developmental perspective when probing LM mechanisms.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.21974

Country:

North America > United States (0.67)
Europe (0.46)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Supplementary Material for P-Flow

Neural Information Processing SystemsOct-9-2025, 10:48:10 GMT

artificial intelligence, natural language, representation, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.32)

Add feedback

We thank all the reviewers for their constructive comments and useful suggestions

Neural Information Processing SystemsOct-2-2025, 09:32:23 GMT

We thank all the reviewers for their constructive comments and useful suggestions. Q (R1): "Comparison with other methods like encoder" & "why do we need this technique" This is a very important point that we need to clarify in our paper. We will expand on this in the paper. As compared to GD-based methods, our algorithm is much more efficient. See appendix for time comparisons.

artificial intelligence, constructive comment and useful suggestion, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Leveraging Zipformer Model for Effective Language Identification in Code-Switched Child-Directed Speech

Shankar, Lavanya, Perera, Leibny Paola Garcia

arXiv.org Artificial IntelligenceAug-14-2025

This paper addresses this challenge by using Zipformer to handle the nuances of speech which contains two imbalanced languages - Mandarin and English - in an utterance. This work demonstrates that the internal layers of the Zipformer effectively encode the language characteristics, which can be leveraged in language identification. We present the selection methodology of the inner layers to extract the em-beddings and make a comparison with different back-ends. Our analysis shows that Zipformer is robust across these backends. Our approach effectively handles imbalanced data, achieving a Balanced Accuracy (BAC) of 81.89%, a 15.47% improvement over the language identification baseline.

artificial intelligence, language identification, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2508.0943

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)

Add feedback

Enhancing variational quantum algorithms by balancing training on classical and quantum hardware

Bhowmick, Rahul, Wadhwa, Harsh, Singh, Avinash, Sidana, Tania, Tran, Quoc Hoan, Sabapathy, Krishna Kumar

arXiv.org Machine LearningMar-20-2025

Quantum computers offer a promising route to tackling problems that are classically intractable such as in prime-factorization, solving large-scale linear algebra and simulating complex quantum systems, but require fault-tolerant quantum hardware. On the other hand, variational quantum algorithms (VQAs) have the potential to provide a near-term route to quantum utility or advantage, and is usually constructed by using parametrized quantum circuits (PQCs) in combination with a classical optimizer for training. Although VQAs have been proposed for a multitude of tasks such as ground-state estimation, combinatorial optimization and unitary compilation, there remain major challenges in its trainability and resource costs on quantum hardware. Here we address these challenges by adopting Hardware Efficient and dynamical LIe algebra Supported Ansatz (HELIA), and propose two training schemes that combine an existing g-sim method (that uses the underlying group structure of the operators) and the Parameter-Shift Rule (PSR). Our improvement comes from distributing the resources required for gradient estimation and training to both classical and quantum hardware. We numerically test our proposal for ground-state estimation using Variational Quantum Eigensolver (VQE) and classification of quantum phases using quantum neural networks. Our methods show better accuracy and success of trials, and also need fewer calls to the quantum hardware on an average than using only PSR (upto 60% reduction), that runs exclusively on quantum hardware. We also numerically demonstrate the capability of HELIA in mitigating barren plateaus, paving the way for training large-scale quantum models.

ansatz, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2503.16361

Country:

North America > United States > California (0.14)
Asia > British Indian Ocean Territory > Diego Garcia (0.04)
Europe > Monaco (0.04)
(2 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

Add feedback

Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting

Hegazy, Kareem, Mahoney, Michael W., Erichson, N. Benjamin

arXiv.org Machine LearningFeb-9-2025

Transformers have recently shown strong performance in time-series forecasting, but their all-to-all attention mechanism overlooks the (temporal) causal and often (temporally) local nature of data. We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay. This simple yet effective modification endows the model with an inductive bias favoring temporally local dependencies, while still allowing sufficient flexibility to learn the unique correlation structure of each dataset. Our empirical results demonstrate that Powerformer not only achieves state-of-the-art accuracy on public time-series benchmarks, but also that it offers improved interpretability of attention patterns. Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention. These findings highlight the importance of domain-specific modifications to the Transformer architecture for time-series forecasting, and they establish Powerformer as a strong, efficient, and principled baseline for future research and real-world applications.

data mining, large language model, machine learning, (19 more...)

arXiv.org Machine Learning

2502.06151

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report > New Finding (0.47)

Industry:

Energy (0.67)
Government (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Low-Rank Agent-Specific Adaptation (LoRASA) for Multi-Agent Policy Learning

Zhang, Beining, Kapoor, Aditya, Sun, Mingfei

arXiv.org Artificial IntelligenceFeb-8-2025

Multi-agent reinforcement learning (MARL) often relies on \emph{parameter sharing (PS)} to scale efficiently. However, purely shared policies can stifle each agent's unique specialization, reducing overall performance in heterogeneous environments. We propose \textbf{Low-Rank Agent-Specific Adaptation (LoRASA)}, a novel approach that treats each agent's policy as a specialized ``task'' fine-tuned from a shared backbone. Drawing inspiration from parameter-efficient transfer methods, LoRASA appends small, low-rank adaptation matrices to each layer of the shared policy, naturally inducing \emph{parameter-space sparsity} that promotes both specialization and scalability. We evaluate LoRASA on challenging benchmarks including the StarCraft Multi-Agent Challenge (SMAC) and Multi-Agent MuJoCo (MAMuJoCo), implementing it atop widely used algorithms such as MAPPO and A2PO. Across diverse tasks, LoRASA matches or outperforms existing baselines \emph{while reducing memory and computational overhead}. Ablation studies on adapter rank, placement, and timing validate the method's flexibility and efficiency. Our results suggest LoRASA's potential to establish a new norm for MARL policy parameterization: combining a shared foundation for coordination with low-rank agent-specific refinements for individual specialization.

artificial intelligence, low-rank agent-specific adaptation, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.05573

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)

Genre: Research Report > New Finding (0.85)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Attention Sinks and Outlier Features: A 'Catch, Tag, and Release' Mechanism for Embeddings

Zhang, Stephen, Khan, Mustafa, Papyan, Vardan

arXiv.org Artificial IntelligenceFeb-2-2025

Two prominent features of large language models (LLMs) is the presence of large-norm (outlier) features and the tendency for tokens to attend very strongly to a select few tokens. Despite often having no semantic relevance, these select tokens, called attention sinks, along with the large outlier features, have proven important for model performance, compression, and streaming. Consequently, investigating the roles of these phenomena within models and exploring how they might manifest in the model parameters has become an area of active interest. Through an empirical investigation, we demonstrate that attention sinks utilize outlier features to: catch a sequence of tokens, tag the captured tokens by applying a common perturbation, and then release the tokens back into the residual stream, where the tagged tokens are eventually retrieved. We prove that simple tasks, like averaging, necessitate the 'catch, tag, release' mechanism hence explaining why it would arise organically in modern LLMs. Our experiments also show that the creation of attention sinks can be completely captured in the model parameters using low-rank matrices, which has important implications for model compression and substantiates the success of recent approaches that incorporate a low-rank term to offset performance degradation.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.00919

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Bayesian Optimization with Preference Exploration by Monotonic Neural Network Ensemble

Wang, Hanyang, Branke, Juergen, Poloczek, Matthias

arXiv.org Machine LearningJan-30-2025

In MOO, there is usually not a single optimal solution, but a range of so-called Pareto optimal or non-dominated Many real-world black-box optimization problems solutions with different trade-offs. A widely adopted approach have multiple conflicting objectives. Rather aims to search for a good representation of these than attempting to approximate the entire set of Pareto-optimal solutions by maximizing their hypervolume. Pareto-optimal solutions, interactive preference Two prominent methods stand out in this regard: ParEGO learning, i.e., optimization with a decision maker (Knowles, 2006), which employs random augmented Chebyshev in the loop, allows to focus the search on the scalarizations for optimization in each iteration, and most relevant subset. However, few previous studies expected hypervolume maximization (Yang et al., 2019; have exploited the fact that utility functions Daulton et al., 2020), which directly maximizes the hypervolume are usually monotonic.

artificial intelligence, bayesian optimization, machine learning, (14 more...)

arXiv.org Machine Learning

2501.18792

Country:

North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback