AITopics | Agents

Transportability for Bandits with Data from Different Environments

Neural Information Processing SystemsApr-28-2026, 22:50:22 GMT

A unifying theme in the design of intelligent agents is to efficiently optimize a policy based on what prior knowledge of the problem is available and what actions can be taken to learn more about it. Bandits are a canonical instance of this task that has been intensely studied in the literature. Most methods, however, typically rely solely on an agent's experimentation in a single environment (or multiple closely related environments). In this paper, we relax this assumption and consider the design of bandit algorithms from a combination of batch data and qualitative assumptions about the relatedness across different environments, represented in the form of causal models. In particular, we show that it is possible to exploit invariances across environments, wherever they may occur in the underlying causal model, to consistently improve learning. The resulting bandit algorithm has a sub-linear regret bound with an explicit dependency on a term that captures how informative related environments are for the task at hand; and may have substantially lower regret than experimentation-only bandit instances.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.94)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.86)
Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Microsoft's smarter Outlook taps AI agents to save you time

PCWorldApr-28-2026, 17:27:55 GMT

PCWorld highlights Microsoft's new agentic AI features for Outlook that go beyond basic email drafting to advanced inbox and calendar management automation. These tools can identify unreplied emails, summarize missed content, draft follow-ups, reschedule meetings, and create agendas to save significant time. Access requires a Microsoft 365 Copilot for Business account and IT approval, potentially revolutionizing productivity for business users. I never really thought I'd welcome AI as a part of my ongoing business day. But Microsoft's ongoing productivity updates to Outlook actually have me tempted. By now, drafting an email using AI is old hat, and something that I generally wouldn't do. But Microsoft has begun adding agentic AI to Outlook via its experimental "Frontier" program and it actually sounds like something that could really save time and energy.

artificial intelligence, gaming laptop mobile monitor pc, security software storage streaming wi-fi, (9 more...)

PCWorld

Industry:

Information Technology > Security & Privacy (0.76)
Leisure & Entertainment > Games > Computer Games (0.57)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.41)

Add feedback

fbd8e65962da06f83f3f28b52774ffd0-Paper-Conference.pdf

Neural Information Processing SystemsApr-28-2026, 11:03:30 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.69)

Industry:

Education (0.47)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

f746974abd33c0015ca583a267dac1fd-Paper-Conference.pdf

Neural Information Processing SystemsApr-28-2026, 09:59:27 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Europe (0.67)
North America > United States (0.28)

Industry:

Law (0.68)
Government > Regional Government (0.68)
Energy (0.68)
Education (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

CODA: Coordination via On-Policy Diffusion for Multi-Agent Offline Reinforcement Learning

Hedman, Marcel, Tessera, Kale-ab Abebe, Formanek, Juan Claude, Sims, Anya, Zamboni, Riccardo, McInroe, Trevor, Torr, John, Fosong, Elliot

arXiv.org Machine LearningApr-28-2026

Offline multi-agent reinforcement learning (MARL) enables policy learning from fixed datasets, but is prone to coordination failure: agents trained on static, off-policy data converge to suboptimal joint behaviours because they cannot co-adapt as their policies change. We introduce CODA (Coordination via On-Policy Diffusion for Multi-Agent Reinforcement Learning), a diffusion-based multi-agent trajectory generator for data augmentation that samples conditioned on the current joint policy, producing synthetic experience which reflects the evolving behaviours of the agents, thereby providing a mechanism for co-adaptation. We find that previous diffusion-based augmentation approaches are insufficient for fostering multi-agent coordination because they produce static augmented datasets that do not evolve as the current joint policy changes during training; CODA resolves this by more closely simulating on-policy learning and is a meaningful step toward coordinated behaviours in the offline setting. CODA is algorithm-agnostic and can be layered onto both model-free and model-based offline reinforcement learning pipelines as an augmentation module. Empirically, CODA not only resolves canonical coordination pathologies in continuous polynomial games but also delivers strong results on the more complex MaMuJoCo continuous-control benchmarks.

machine learning, reinforcement learning, trajectory, (15 more...)

arXiv.org Machine Learning

2604.23308

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report (0.50)

Industry:

Education (0.46)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

c40d1e40dd121d0e7ba8e4ab65bca81b-Paper-Conference.pdf

Neural Information Processing SystemsApr-27-2026, 14:39:08 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

543924fdf260ba990f2ef84f940f3db2-Paper-Conference.pdf

Neural Information Processing SystemsApr-27-2026, 13:45:47 GMT

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Education > Educational Setting (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.45)

Add feedback

Maryna Viazovska's proofs of sphere packing formalized with AI

AIHubApr-27-2026, 11:43:10 GMT

The proofs that earned EPFL professor Maryna Viazovska the Fields Medal in 2022 have reached a new milestone: their complete formalization by computer, achieved through a collaboration between mathematicians and artificial intelligence tools. In 2016, Maryna Viazovska solved the sphere packing problem in dimension 8, proving that the E lattice constitutes the densest possible arrangement. Shortly after, together with collaborators, she established an analogous result in dimension 24 using the Leech lattice. Her method provided an elegant solution to a problem studied for centuries, with close ties to applied fields such as error-correcting codes. For this major contribution, Viazovska was awarded the Fields Medal in 2022, the highest distinction in mathematics.

artificial intelligence, machine learning, social media, (14 more...)

AIHub

Genre: Personal (0.36)

Industry: Leisure & Entertainment > Sports > Tennis (0.33)

Technology:

Information Technology > Communications > Social Media (0.54)
Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.35)

Add feedback

Details

Neural Information Processing SystemsApr-27-2026, 10:23:28 GMT

A.1 Difference between the performance of two joint policies In Section 3.1, the difference between the performance of two joint policies is expressed as follows: The proof is a multi-agent version of the proof in (Kakade and Langford, 2002). Now we provide the mathematical detail formally. A.2 Approximation that matches the true value to first order In Section 3.1, we claim that Jπ( π) matches J( π) to first order. Intuitively, this means that a sufficiently small update of the joint policy which improves Jπ( π) will also improve J( π). Now we prove it formally.

agent, artificial intelligence, section 3, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.34)

Add feedback