AITopics | trg

Collaborating Authors

trg

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation

Neural Information Processing SystemsFeb-18-2026, 17:36:54 GMT

Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers

Kawata, Ryotaro, Song, Yujin, Bietti, Alberto, Nishikawa, Naoki, Suzuki, Taiji, Vaiter, Samuel, Wu, Denny

arXiv.org Machine LearningDec-23-2025

Transformers can implement both generalizable algorithms (e.g., induction heads) and simple positional shortcuts (e.g., memorizing fixed output positions). In this work, we study how the choice of pretraining data distribution steers a shallow transformer toward one behavior or the other. Focusing on a minimal trigger-output prediction task -- copying the token immediately following a special trigger upon its second occurrence -- we present a rigorous analysis of gradient-based training of a single-layer transformer. In both the infinite and finite sample regimes, we prove a transition in the learned mechanism: if input sequences exhibit sufficient diversity, measured by a low ``max-sum'' ratio of trigger-to-trigger distances, the trained model implements an induction head and generalizes to unseen contexts; by contrast, when this ratio is large, the model resorts to a positional shortcut and fails to generalize out-of-distribution (OOD). We also reveal a trade-off between the pretraining context length and OOD generalization, and derive the optimal pretraining distribution that minimizes computational cost per sample. Finally, we validate our theoretical predictions with controlled synthetic experiments, demonstrating that broadening context distributions robustly induces induction heads and enables OOD generalization. Our results shed light on the algorithmic biases of pretrained transformers and offer conceptual guidelines for data-driven control of their learned behaviors.

max-sum ratio, transformer, trg, (12 more...)

arXiv.org Machine Learning

2512.18634

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Singapore (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.34)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Uni-DAD: Unified Distillation and Adaptation of Diffusion Models for Few-step Few-shot Image Generation

Bahram, Yara, Desbos, Melodie, Shateri, Mohammadhadi, Granger, Eric

arXiv.org Artificial IntelligenceNov-25-2025

Diffusion models (DMs) produce high-quality images, yet their sampling remains costly when adapted to new domains. Distilled DMs are faster but typically remain confined within their teacher's domain. Thus, fast and high-quality generation for novel domains relies on two-stage training pipelines: Adapt-then-Distill or Distill-then-Adapt. However, both add design complexity and suffer from degraded quality or diversity. We introduce Uni-DAD, a single-stage pipeline that unifies distillation and adaptation of DMs. It couples two signals during training: (i) a dual-domain distribution-matching distillation objective that guides the student toward the distributions of the source teacher and a target teacher, and (ii) a multi-head generative adversarial network (GAN) loss that encourages target realism across multiple feature scales. The source domain distillation preserves diverse source knowledge, while the multi-head GAN stabilizes training and reduces overfitting, especially in few-shot regimes. The inclusion of a target teacher facilitates adaptation to more structurally distant domains. We perform evaluations on a variety of datasets for few-shot image generation (FSIG) and subject-driven personalization (SDP). Uni-DAD delivers higher quality than state-of-the-art (SoTA) adaptation methods even with less than 4 sampling steps, and outperforms two-stage training pipelines in both quality and diversity.

artificial intelligence, machine learning, uni-dad, (16 more...)

arXiv.org Artificial Intelligence

2511.18281

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

KARIPAP: Quantum-Inspired Tensor Network Compression of Large Language Models Using Infinite Projected Entangled Pair States and Tensor Renormalization Group

Nazri, Azree

arXiv.org Artificial IntelligenceOct-28-2025

Large Language Models (LLMs) like ChatGPT and LLaMA drive rapid progress in generative AI, yet their huge parameter scales create severe computational and environmental burdens. High training costs, energy use, and limited device deployment hinder accessibility. Existing compression - pruning, distillation, low-rank, and quantization - reduces size but ignores complex inter-layer correlations. We propose KARIPAP, a quantum-inspired tensor network compression using Infinite Projected Entangled Pair States (iPEPS) and Tensor Renormalization Group (TRG) contraction. Unlike 1D Matrix Product States, iPEPS captures multi-directional entanglement in attention and deep transformer layers. TRG ensures polynomial-time contraction, making tensorization feasible while preserving key correlation geometry. Experiments on LLaMA-2 7B show up to 93% memory and 70% parameter reduction, with 50% faster training, 25% faster inference, and only 2-3% accuracy loss. Layer-wise entanglement profiling reveals redundancy in deeper layers, confirming their suitability for tensor factorization. KARIPAP demonstrates that modern LLMs occupy low-dimensional entanglement manifolds, enabling scalable, energy-efficient, and quantum-aware AI architectures.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.21844

Country: Asia (0.15)

Genre: Research Report (1.00)

Industry: Energy (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning

Guo, Yihong, Yang, Yu, Xu, Pan, Liu, Anqi

arXiv.org Artificial IntelligenceOct-20-2025

We study off-dynamics offline reinforcement learning, where the goal is to learn a policy from offline source and limited target datasets with mismatched dynamics. Existing methods either penalize the reward or discard source transitions occurring in parts of the transition space with high dynamics shift. As a result, they optimize the policy using data from low-shift regions, limiting exploration of high-reward states in the target domain that do not fall within these regions. Consequently, such methods often fail when the dynamics shift is significant or the optimal trajectories lie outside the low-shift regions. To overcome this limitation, we propose MOBODY, a Model-Based Off-Dynamics Offline RL algorithm that optimizes a policy using learned target dynamics transitions to explore the target domain, rather than only being trained with the low dynamics-shift transitions. For the dynamics learning, built on the observation that achieving the same next state requires taking different actions in different domains, MOBODY employs separate action encoders for each domain to encode different actions to the shared latent space while sharing a unified representation of states and a common transition function. We further introduce a target Q-weighted behavior cloning loss in policy optimization to avoid out-of-distribution actions, which push the policy toward actions with high target-domain Q-values, rather than high source domain Q-values or uniformly imitating all actions in the offline dataset. We evaluate MOBODY on a wide range of MuJoCo and Adroit benchmarks, demonstrating that it outperforms state-of-the-art off-dynamics RL baselines as well as policy learning methods based on different dynamics learning baselines, with especially pronounced improvements in challenging scenarios where existing methods struggle.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2506.0846

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Targeted Attacks and Defenses for Distributed Federated Learning in Vehicular Networks

Demir, Utku, Erpek, Tugba, Sagduyu, Yalin E., Kompella, Sastry, Xue, Mengran

arXiv.org Artificial IntelligenceOct-20-2025

In emerging networked systems, mobile edge devices such as ground vehicles and unmanned aerial system (UAS) swarms collectively aggregate vast amounts of data to make machine learning decisions such as threat detection in remote, dynamic, and infrastructure-constrained environments where power and bandwidth are scarce. Federated learning (FL) addresses these constraints and privacy concerns by enabling nodes to share local model weights for deep neural networks instead of raw data, facilitating more reliable decision-making than individual learning. However, conventional FL relies on a central server to coordinate model updates in each learning round, which imposes significant computational burdens on the central node and may not be feasible due to the connectivity constraints. By eliminating dependence on a central server, distributed federated learning (DFL) offers scalability, resilience to node failures, learning robustness, and more effective defense strategies. Despite these advantages, DFL remains vulnerable to increasingly advanced and stealthy cyberattacks. In this paper, we design sophisticated targeted training data poisoning and backdoor (Trojan) attacks, and characterize the emerging vulnerabilities in a vehicular network. We analyze how DFL provides resilience against such attacks compared to individual learning and present effective defense mechanisms to further strengthen DFL against the emerging cyber threats.

artificial intelligence, backdoor attack, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.15109

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.48)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation

Neural Information Processing SystemsOct-10-2025, 21:44:02 GMT

Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation.

darc, source domain, target domain, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Robust Classification Under Sample Selection Bias

Anqi Liu, Brian Ziebart

Neural Information Processing SystemsFeb-9-2025, 21:09:36 GMT

In many important machine learning applications, the source distribution used to estimate a probabilistic classifier differs from the target distribution on which the classifier will be used to make predictions. Due to its asymptotic properties, sample reweighted empirical loss minimization is a commonly employed technique to deal with this difference. However, given finite amounts of labeled source data, this technique suffers from significant estimation errors in settings with large sample selection bias. We develop a framework for learning a robust bias-aware (RBA) probabilistic classifier that adapts to different sample selection biases using a minimax estimation formulation. Our approach requires only accurate estimates of statistics under the source distribution and is otherwise as robust as possible to unknown properties of the conditional label distribution, except when explicit generalization assumptions are incorporated. We demonstrate the behavior and effectiveness of our approach on binary classification tasks.

artificial intelligence, machine learning, target distribution, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.31)
Research Report > Experimental Study (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation

Guo, Yihong, Wang, Yixuan, Shi, Yuanyuan, Xu, Pan, Liu, Anqi

arXiv.org Artificial IntelligenceNov-14-2024

Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation. Previous work tackles this challenge by training on the source domain with modified rewards derived by matching distributions between the source and the target optimal trajectories. However, pure modified rewards only ensure the behavior of the learned policy in the source domain resembles trajectories produced by the target optimal policies, which does not guarantee optimal performance when the learned policy is actually deployed to the target domain. In this work, we propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain so that the new policy can generate the same trajectories in the target domain. Our approach, Domain Adaptation and Reward Augmented Imitation Learning (DARAIL), utilizes the reward modification for domain adaptation and follows the general framework of generative adversarial imitation learning from observation (GAIfO) by applying a reward augmented estimator for the policy optimization step. Theoretically, we present an error bound for our method under a mild assumption regarding the dynamics shift to justify the motivation of our method. Empirically, our method outperforms the pure modified reward method without imitation learning and also outperforms other baselines in benchmark off-dynamics environments.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2411.09891

Country: