AITopics

We introduce Matrix-Game, an interactive world foundation model for controllable game world generation. Matrix-Game is trained using a two-stage pipeline that first performs large-scale unlabeled pretraining for environment understanding, followed by action-labeled training for interactive video generation. To support this, we curate Matrix-Game-MC, a comprehensive Minecraft dataset comprising over 2,700 hours of unlabeled gameplay video clips and over 1,000 hours of high-quality labeled clips with fine-grained keyboard and mouse action annotations. Our model adopts a controllable image-to-world generation paradigm, conditioned on a reference image, motion context, and user actions. With over 17 billion parameters, Matrix-Game enables precise control over character actions and camera movements, while maintaining high visual quality and temporal coherence. To evaluate performance, we develop GameWorld Score, a unified benchmark measuring visual quality, temporal quality, action controllability, and physical rule understanding for Minecraft world generation. Extensive experiments show that Matrix-Game consistently outperforms prior open-source Minecraft world models (including Oasis and MineWorld) across all metrics, with particularly strong gains in controllability and physical consistency. Double-blind human evaluations further confirm the superiority of Matrix-Game, highlighting its ability to generate perceptually realistic and precisely controllable videos across diverse game scenarios. To facilitate future research on interactive image-to-world generation, we will open-source the Matrix-Game model weights and the GameWorld Score benchmark at https://github.com/SkyworkAI/Matrix-Game.

artificial intelligence, machine learning, natural language, (18 more...)

2506.18701

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Roy-Singh, Sreeja, Li, Alan P., Ravindra, Vinay, Lammers, Roderick, Net, Marc Sanchez

Agile, Autonomous Spacecraft Constellations with Disruption Tolerant Networking to Monitor Precipitation and Urban Floods

Fully re-orientable small spacecraft are now supported by commercial technologies, allowing them to point their instruments in any direction and capture images, with short notice. When combined with improved onboard processing, and implemented on a constellation of inter-communicable satellites, this intelligent agility can significantly increase responsiveness to transient or evolving phenomena. We demonstrate a ground-based and onboard algorithmic framework that combines orbital mechanics, attitude control, inter-satellite communication, intelligent prediction and planning to schedule the time-varying, re-orientation of agile, small satellites in a constellation. Planner intelligence is improved by updating the predictive value of future space-time observations based on shared observations of evolving episodic precipitation and urban flood forecasts. Reliable inter-satellite communication within a fast, dynamic constellation topology is modeled in the physical, access control and network layer. We apply the framework on a representative 24-satellite constellation observing 5 global regions. Results show appropriately low latency in information exchange (average within 1/3rd available time for implicit consensus), enabling the onboard scheduler to observe ~7% more flood magnitude than a ground-based implementation. Both onboard and offline versions performed ~98% better than constellations without agility.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

2506.16537

Country: North America > United States > California (0.28)

Genre: Research Report (0.70)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Modeling & Simulation (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.69)
(2 more...)

Mahlau, Yannik, Schier, Maximilian, Reinders, Christoph, Schubert, Frederik, Bügling, Marco, Rosenhahn, Bodo

Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits

Inverse design of photonic integrated circuits (PICs) has traditionally relied on gradient-based optimization. However, this approach is prone to end up in local minima, which results in suboptimal design functionality. As interest in PICs increases due to their potential for addressing modern hardware demands through optical computing, more adaptive optimization algorithms are needed. We present a reinforcement learning (RL) environment as well as multi-agent RL algorithms for the design of PICs. By discretizing the design space into a grid, we formulate the design task as an optimization problem with thousands of binary variables. We consider multiple two-and three-dimensional design tasks that represent PIC components for an optical computing system. By decomposing the design space into thousands of individual agents, our algorithms are able to optimize designs with only a few thousand environment samples. They outperform previous state-of-the-art gradient-based optimization in both two-and three-dimensional design tasks. Our work may also serve as a benchmark for further exploration of sample-efficient RL for inverse design in photonics.

machine learning, optimization, reinforcement learning, (16 more...)

2506.18627

Country: Europe > Germany > Lower Saxony (0.28)

Genre: Research Report (0.64)

Industry:

Semiconductors & Electronics (0.71)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Deihim, Azad, Alonso, Eduardo, Apostolopoulou, Dimitra

Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning

We present the Multi-Agent Transformer World Model (MATWM), a novel transformer-based world model designed for multi-agent reinforcement learning in both vector- and image-based environments. MATWM combines a decentralized imagination framework with a semi-centralized critic and a teammate prediction module, enabling agents to model and anticipate the behavior of others under partial observability. To address non-stationarity, we incorporate a prioritized replay mechanism that trains the world model on recent experiences, allowing it to adapt to agents' evolving policies. We evaluated MATWM on a broad suite of benchmarks, including the StarCraft Multi-Agent Challenge, PettingZoo, and MeltingPot. MATWM achieves state-of-the-art performance, outperforming both model-free and prior world model approaches, while demonstrating strong sample efficiency, achieving near-optimal performance in as few as 50K environment interactions. Ablation studies confirm the impact of each component, with substantial gains in coordination-heavy tasks.

machine learning, reinforcement learning, world model, (17 more...)

2506.18537

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

A Large Language Model-based Multi-Agent Framework for Analog Circuits' Sizing Relationships Extraction

Liu, Chengjie, Chen, Weiyu, Xu, Huiyao, Du, Yuan, Yang, Jun, Du, Li

In the design process of the analog circuit pre-layout phase, device sizing is an important step in determining whether an analog circuit can meet the required performance metrics. Many existing techniques extract the circuit sizing task as a mathematical optimization problem to solve and continuously improve the optimization efficiency from a mathematical perspective. But they ignore the automatic introduction of prior knowledge, fail to achieve effective pruning of the search space, which thereby leads to a considerable compression margin remaining in the search space. To alleviate this problem, we propose a large language model (LLM)-based multi-agent framework for analog circuits' sizing relationships extraction from academic papers. The search space in the sizing process can be effectively pruned based on the sizing relationship extracted by this framework. Eventually, we conducted tests on 3 types of circuits, and the optimization efficiency was improved by $2.32 \sim 26.6 \times$. This work demonstrates that the LLM can effectively prune the search space for analog circuit sizing, providing a new solution for the combination of LLMs and conventional analog circuit design automation methods.

artificial intelligence, large language model, natural language, (16 more...)

2506.18424

Country: Asia > China (0.17)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Advanced For-Loop for QML algorithm search

Wong, FuTe

This paper introduces an advanced framework leveraging Large Language Model-based Multi-Agent Systems (LLMMA) for the automated search and optimization of Quantum Machine Learning (QML) algorithms. Inspired by Google DeepMind's FunSearch, the proposed system works on abstract level to iteratively generates and refines quantum transformations of classical machine learning algorithms (concepts), such as the Multi-Layer Perceptron, forward-forward and backpropagation algorithms. As a proof of concept, this work highlights the potential of agentic frameworks to systematically explore classical machine learning concepts and adapt them for quantum computing, paving the way for efficient and automated development of QML algorithms. Future directions include incorporating planning mechanisms and optimizing strategy in the search space for broader applications in quantum-enhanced machine learning.

large language model, machine learning, natural language, (16 more...)

2506.1826

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Decentralized Consensus Inference-based Hierarchical Reinforcement Learning for Multi-Constrained UAV Pursuit-Evasion Game

Yuming, Xiang, Sizhao, Li, Rongpeng, Li, Zhifeng, Zhao, Honggang, Zhang

--Multiple quadrotor unmanned aerial vehicle (UA V) systems have garnered widespread research interest and fostered tremendous interesting applications, especially in multi-constrained pursuit-evasion games (MC-PEG). The Cooperative Evasion and Formation Coverage (CEFC) task, where the UA V swarm aims to maximize formation coverage across multiple target zones while collaboratively evading predators, belongs to one of the most challenging issues in MC-PEG, especially under communication-limited constraints. This multifaceted problem, which intertwines responses to obstacles, adversaries, target zones, and formation dynamics, brings up significant high-dimensional complications in locating a solution. In this paper, we propose a novel two-level framework (i.e., Consensus Inference-based Hierarchical Reinforcement Learning (CI-HRL)), which delegates target localization to a high-level policy, while adopting a low-level policy to manage obstacle avoidance, navigation, and formation. Specifically, in the high-level policy, we develop a novel multi-agent reinforcement learning module, Consensus-oriented Multi-Agent Communication (ConsMAC), to enable agents to perceive global information and establish consensus from local states by effectively aggregating neighbor messages. Meanwhile, we leverage an Alternative Training-based Multi-agent proximal policy optimization (A T -M) and policy distillation to accomplish the low-level control. The experimental results, including the high-fidelity software-in-the-loop (SITL) simulations, validate that CI-HRL provides a superior solution with enhanced swarm's collaborative evasion and task completion capabilities. Nowadays, quadrotor Unmanned Aerial V ehicles (UA Vs) have demonstrated great potential in costly or human-unfriendly tasks (e.g., disaster response [1]), due to their agility, cost-effectiveness, and compact size. Nevertheless, the UA V swarm is likely to be exposed to an adversarial environment, where a hostile factor or agent might attack the affiliated members, and must respond promptly to boost the survival opportunity. Y uming Xiang and Sizhao Li and Rongpeng Li are with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China (email: {xiangym1999; liszh5; lirongpeng }@zju.edu.cn).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2506.18126

Country: Asia > China > Zhejiang Province > Hangzhou (0.24)

Genre:

Research Report > Promising Solution (0.34)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Information Technology (0.48)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Baldi, Paolo, D'Asaro, Fabio Aurelio, Dyoub, Abeer, Lisi, Francesca Alessandra

Weighted Assumption Based Argumentation to reason about ethical principles and actions

We augment Assumption Based Argumentation (ABA for short) with weighted argumentation. In a nutshell, we assign weights to arguments and then derive the weight of attacks between ABA arguments. We illustrate our proposal through running examples in the field of ethical reasoning, and present an implementation based on Answer Set Programming.

artificial intelligence, logic & formal reasoning, natural language, (18 more...)

2506.18056

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.28)

Genre:

Instructional Material (0.46)
Research Report (0.40)

Industry:

Health & Medicine (1.00)
Law (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.99)
(2 more...)

Optimization of Flying Ad Hoc Network Topology and Collaborative Path Planning for Multiple UAVs

He, Ming, Wang, Peizhao, Chen, Haihua, Sun, Bin, Wang, Hongpeng

--Multiple unmanned aerial vehicles (UA Vs) play a vital role in monitoring and data collection in wide area environments with harsh conditions. In most scenarios, issues such as real-time data retrieval and real-time UA V positioning are often disregarded, essentially neglecting the communication constraints. In this paper, we comprehensively address both the coverage of the target area and the data transmission capabilities of the flying ad hoc network (F ANET). The data throughput of the network is therefore maximized by optimizing the network topology and the UA V trajectories. The resultant optimization problem is effectively solved by the proposed reinforcement learning-based trajectory planning (RL-TP) algorithm and the convex-based topology optimization (C-TOP) algorithm sequentially. The C-TOP maximizes the data throughput of the network while simultaneously constraining the neighbors and transmit powers of the UA Vs, which is shown to be a convex problem that can be efficiently solved in polynomial time. Simulations and field experimental results show that the proposed optimization strategy can effectively plan the UA V trajectories and significantly improve the data throughput of the F ANET over the adaptive local minimum spanning tree (A-LMST) and cyclic pruning-assisted power optimization (CPAPO) methods. ONITORING tasks are generally demanding in forest, desert, alpine tundra and other wide-area environments, where infrastructure and human resources are scarce. However, relying solely on manpower to complete these tasks can be challenging and time consuming. Unmanned aerial vehicles (UA Vs) are therefore introduced as a substitute for humans, and multiple UA Vs compose a flying ad hoc network (FANET) to cover a wide area. FANET has attracted significant interest and found many applications in electric power inspection, security, urban mapping, and so on.

artificial intelligence, machine learning, real time system, (19 more...)

2506.17945

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation (0.68)
Information Technology (0.54)
Energy > Power Industry (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
(4 more...)

Learning, Reasoning, Refinement: A Framework for Kahneman's Dual-System Intelligence in GUI Agents

Wei, Jinjie, Liu, Jiyao, Liu, Lihao, Hu, Ming, Ning, Junzhi, Li, Mingcheng, Yin, Weijie, He, Junjun, Liang, Xiao, Feng, Chao, Yang, Dingkang

Graphical User Interface (GUI) agents have made significant progress in automating digital tasks through the utilization of computer vision and language models. Nevertheless, existing agent systems encounter notable limitations. Firstly, they predominantly depend on trial and error decision making rather than progressive reasoning, thereby lacking the capability to learn and adapt from interactive encounters. Secondly, these systems are assessed using overly simplistic single step accuracy metrics, which do not adequately reflect the intricate nature of real world GUI interactions. In this paper, we present CogniGUI, a cognitive framework developed to overcome these limitations by enabling adaptive learning for GUI automation resembling human-like behavior. Inspired by Kahneman's Dual Process Theory, our approach combines two main components: (1) an omni parser engine that conducts immediate hierarchical parsing of GUI elements through quick visual semantic analysis to identify actionable components, and (2) a Group based Relative Policy Optimization (GRPO) grounding agent that assesses multiple interaction paths using a unique relative reward system, promoting minimal and efficient operational routes. This dual-system design facilitates iterative ''exploration learning mastery'' cycles, enabling the agent to enhance its strategies over time based on accumulated experience. Moreover, to assess the generalization and adaptability of agent systems, we introduce ScreenSeek, a comprehensive benchmark that includes multi application navigation, dynamic state transitions, and cross interface coherence, which are often overlooked challenges in current benchmarks. Experimental results demonstrate that CogniGUI surpasses state-of-the-art methods in both the current GUI grounding benchmarks and our newly proposed benchmark.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2506.17913

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)