AITopics

2503.00392

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Hong Kong (0.04)
(15 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Nielsen, Stefan K., Teo, Rachel S. Y., Abdullaev, Laziz U., Nguyen, Tan M.

Tight Clusters Make Specialized Experts

arXiv.org Artificial IntelligenceMar-1-2025

Sparse Mixture-of-Experts (MoE) architectures have emerged as a promising approach to decoupling model capacity from computational cost. At the core of the MoE model is the router, which learns the underlying clustering structure of the input distribution in order to send input tokens to appropriate experts. However, latent clusters may be unidentifiable in high dimension, which causes slow convergence, susceptibility to data contamination, and overall degraded representations as the router is unable to perform appropriate token-expert matching. We examine the router through the lens of clustering optimization and derive optimal feature weights that maximally identify the latent clusters. We use these weights to compute the token-expert routing assignments in an adaptively transformed space that promotes well-separated clusters, which helps identify the best-matched expert for each token. In particular, for each expert cluster, we compute a set of weights that scales features according to whether that expert clusters tightly along that feature. We term this novel router the Adaptive Clustering (AC) router. Our AC router enables the MoE model to obtain three connected benefits: 1) faster convergence, 2) better robustness to data corruption, and 3) overall performance improvement, as experts are specialized in semantically distinct regions of the input space. We empirically demonstrate the advantages of our AC router over baseline routing methods when applied on a variety of MoE backbones for language modeling and image recognition tasks in both clean and corrupted settings.

acmoe, conference paper, router, (15 more...)

2502.15315

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

New ScientistFeb-28-2025, 10:13:09 GMT

Read an extract from Michel Nieva's science fiction novel Dengue Boy

Michel Nieva's Dengue Boy is set on a drowned future Earth Spread-eagle on that strange white surface which lay beneath the inclement Antarctic sun, Dengue Destroyed saw everything flash by in no more than a second. What of life is there to look back on in the space of a few instants when a boy, a girl, a destroyed void, believes it is about to die? Might it think of its dear mother, lament the father it never knew, or perhaps recall, some humorous or traumatic anecdote involving its classmates? Truthfully, not much else had happened during her brief time on Earth. However (for the mind works in mysterious and unpredictable ways, especially the mind of a mutant mosquito), Dengue Destroyed did not think about any of these people, but rather about a story her mother used to read her at bedtime, the story of Snow White and the Seven Dwarfs.

artificial intelligence, science fiction, snow, (14 more...)

New Scientist

Country:

North America > United States > New York (0.05)
South America > Argentina > Pampas (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Science Fiction (0.41)

More of the Same: Persistent Representational Harms Under Increased Representation

Mickel, Jennifer, De-Arteaga, Maria, Liu, Leqi, Tian, Kevin

To recognize and mitigate the harms of generative AI systems, it is crucial to consider who is represented in the outputs of generative AI systems and how people are represented. A critical gap emerges when naively improving who is represented, as this does not imply bias mitigation efforts have been applied to address how people are represented. We critically examined this by investigating gender representation in occupation across state-of-the-art large language models. We first show evidence suggesting that over time there have been interventions to models altering the resulting gender distribution, and we find that women are more represented than men when models are prompted to generate biographies or personas. We then demonstrate that representational biases persist in how different genders are represented by examining statistically significant word differences across genders. This results in a proliferation of representational harms, stereotypes, and neoliberalism ideals that, despite existing interventions to increase female representation, reinforce existing systems of oppression.

large language model, machine learning, natural language, (16 more...)

2503.00333

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Europe > United Kingdom > Scotland (0.14)
South America (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (0.68)
Government > Regional Government > North America Government > United States Government (0.68)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)

FANformer: Improving Large Language Models Through Effective Periodicity Modeling

Dong, Yihong, Li, Ge, Jiang, Xue, Tao, Yongding, Zhang, Kechi, Zhu, Hao, Liu, Huanyu, Ding, Jiazheng, Li, Jia, Deng, Jinliang, Mei, Hong

Periodicity, as one of the most important basic characteristics, lays the foundation for facilitating structured knowledge acquisition and systematic cognitive processes within human learning paradigms. However, the potential flaws of periodicity modeling in Transformer affect the learning efficiency and establishment of underlying principles from data for large language models (LLMs) built upon it. In this paper, we demonstrate that integrating effective periodicity modeling can improve the learning efficiency and performance of LLMs. We introduce FANformer, which integrates Fourier Analysis Network (FAN) into attention mechanism to achieve efficient periodicity modeling, by modifying the feature projection process of attention mechanism. Extensive experimental results on language modeling show that FANformer consistently outperforms Transformer when scaling up model size and training tokens, underscoring its superior learning efficiency. To further validate the effectiveness of FANformer, we pretrain a FANformer-1B on 1 trillion tokens. FANformer-1B exhibits marked improvements on downstream tasks compared to open-source LLMs with similar model parameters or training tokens. The results position FANformer as an effective and promising architecture for advancing LLMs.

large language model, machine learning, natural language, (19 more...)

2502.21309

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Xiong, Jiafeng, Zareie, Ahmad, Sakellariou, Rizos

A Survey of Link Prediction in Temporal Networks

Temporal networks have gained significant prominence in the past decade for modelling dynamic interactions within complex systems. A key challenge in this domain is Temporal Link Prediction (TLP), which aims to forecast future connections by analysing historical network structures across various applications including social network analysis. While existing surveys have addressed specific aspects of TLP, they typically lack a comprehensive framework that distinguishes between representation and inference methods. This survey bridges this gap by introducing a novel taxonomy that explicitly examines representation and inference from existing methods, providing a novel classification of approaches for TLP. We analyse how different representation techniques capture temporal and structural dynamics, examining their compatibility with various inference methods for both transductive and inductive prediction tasks. Our taxonomy not only clarifies the methodological landscape but also reveals promising unexplored combinations of existing techniques. This taxonomy provides a systematic foundation for emerging challenges in TLP, including model explainability and scalable architectures for complex temporal networks.

international conference, node, temporal network, (13 more...)

2502.21185

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > New York > New York County > New York City (0.05)
(15 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology (0.88)
Telecommunications > Networks (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Kinfu, Kaleab A., Vidal, René

Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have led to significant progress in 2D body pose estimation. However, achieving a good balance between accuracy, efficiency, and robustness remains a challenge. For instance, CNNs are computationally efficient but struggle with long-range dependencies, while ViTs excel in capturing such dependencies but suffer from quadratic computational complexity. This paper proposes two ViT-based models for accurate, efficient, and robust 2D pose estimation. The first one, EViTPose, operates in a computationally efficient manner without sacrificing accuracy by utilizing learnable joint tokens to select and process a subset of the most important body patches, enabling us to control the trade-off between accuracy and efficiency by changing the number of patches to be processed. The second one, UniTransPose, while not allowing for the same level of direct control over the trade-off, efficiently handles multiple scales by combining (1) an efficient multi-scale transformer encoder that uses both local and global attention with (2) an efficient sub-pixel CNN decoder for better speed and accuracy. Moreover, by incorporating all joints from different benchmarks into a unified skeletal representation, we train robust methods that learn from multiple datasets simultaneously and perform well across a range of scenarios -- including pose variations, lighting conditions, and occlusions. Experiments on six benchmarks demonstrate that the proposed methods significantly outperform state-of-the-art methods while improving computational efficiency. EViTPose exhibits a significant decrease in computational complexity (30% to 44% less in GFLOPs) with a minimal drop of accuracy (0% to 3.5% less), and UniTransPose achieves accuracy improvements ranging from 0.9% to 43.8% across these benchmarks.

computer vision, human pose estimation, pose estimation, (12 more...)

2503.00232

Country:

South America > Chile (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Health & Medicine (1.00)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

How Metacognitive Architectures Remember Their Own Thoughts: A Systematic Review

Nolte, Robin, Pomarlan, Mihai, Janssen, Ayden, Beßler, Daniel, Javanmardi, Kamyar, Jongebloed, Sascha, Porzel, Robert, Bateman, John, Beetz, Michael, Malaka, Rainer

Inspired by human cognition, metacognition has gained significant attention for its potential to enhance autonomy, adaptability, and robust learning in artificial agents. Yet research on Computational Metacognitive Architectures (CMAs) remains fragmented: diverse theories, terminologies, and design choices have led to disjointed developments and limited comparability across systems. Existing overviews and surveys often remain at a broad, conceptual level, making it difficult to synthesize deeper insights into the underlying algorithms and representations, and their respective success. We address this gap by performing an explorative systematic review of how CMAs model, store, remember and process their metacognitive experiences, one of Flavell's (1979) three foundational components of metacognition. Following this organizing principle, we identify 35 CMAs that feature episodic introspective data ranging from symbolic event traces to sub-symbolic arousal metrics. We consider different aspects - ranging from the underlying psychological theories to the content and structure of collected data, to the algorithms used and evaluation results - and derive a unifying perspective that allows us to compare in depth how different Computational Metacognitive Architectures (CMAs) leverage metacognitive experiences for tasks such as error diagnosis, self-repair, and goal-driven learning. Our findings highlight both the promise of metacognitive experiences - in boosting adaptability, explainability, and overall system performance - and the persistent lack of shared standards or evaluation benchmarks.

architecture, metacognitive architecture, metacognitive experience, (14 more...)

2503.13467

Country:

Europe > Germany > Bremen > Bremen (0.14)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)
(38 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Consumer Health (0.94)
Health & Medicine > Therapeutic Area > Neurology (0.93)
Leisure & Entertainment (0.92)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
(7 more...)

Pseudo-Knowledge Graph: Meta-Path Guided Retrieval and In-Graph Text for RAG-Equipped LLM

Yang, Yuxin, Wu, Haoyang, Wang, Tao, Yang, Jia, Ma, Hao, Luo, Guojie

The advent of Large Language Models (LLMs) has revolutionized natural language processing. However, these models face challenges in retrieving precise information from vast datasets. Retrieval-Augmented Generation (RAG) was developed to combining LLMs with external information retrieval systems to enhance the accuracy and context of responses. Despite improvements, RAG still struggles with comprehensive retrieval in high-volume, low-information-density databases and lacks relational awareness, leading to fragmented answers. To address this, this paper introduces the Pseudo-Knowledge Graph (PKG) framework, designed to overcome these limitations by integrating Meta-path Retrieval, In-graph Text and Vector Retrieval into LLMs. By preserving natural language text and leveraging various retrieval techniques, the PKG offers a richer knowledge representation and improves accuracy in information retrieval. Extensive evaluations using Open Compass and MultiHop-RAG benchmarks demonstrate the framework's effectiveness in managing large volumes of data and complex relationships.

information, language model, retrieval, (12 more...)

2503.00309

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
(20 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Food & Agriculture > Agriculture (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Synthetic data enables context-aware bioacoustic sound event detection

Hoffman, Benjamin, Robinson, David, Miron, Marius, Baglione, Vittorio, Canestrari, Daniela, Elias, Damian, Trapote, Eva, Pietquin, Olivier

We propose a methodology for training foundation models that enhances their in-context learning capabilities within the domain of bioacoustic signal processing. We use synthetically generated training data, introducing a domain-randomization-based pipeline that constructs diverse acoustic scenes with temporally strong labels. We generate over 8.8 thousand hours of strongly-labeled audio and train a query-by-example, transformer-based model to perform few-shot bioacoustic sound event detection. Our second contribution is a public benchmark of 13 diverse few-shot bioacoustics tasks. Our model outperforms previously published methods by 49%, and we demonstrate that this is due to both model design and data scale. We make our trained model available via an API, to provide ecologists and ethologists with a training-free tool for bioacoustic sound event detection.

dataset, detection, pseudovox, (15 more...)

2503.00296

Country:

North America > United States > Pennsylvania (0.04)
Africa > Gabon (0.04)
North America > United States > Hawaii (0.04)
(11 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)