Goto

Collaborating Authors

 Learning Graphical Models


MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning

arXiv.org Artificial Intelligence

Monte Carlo Tree Search (MCTS), which leverages Upper Confidence Bound for Trees (UCTs) to balance exploration and exploitation through randomized sampling, is instrumental to solving complex planning problems. However, for multi-agent planning, MCTS is confronted with a large combinatorial action space that often grows exponentially with the number of agents. As a result, the branching factor of MCTS during tree expansion also increases exponentially, making it very difficult to efficiently explore and exploit during tree search. To this end, we propose MALinZero, a new approach to leverage low-dimensional representational structures on joint-action returns and enable efficient MCTS in complex multi-agent planning. Our solution can be viewed as projecting the joint-action returns into the low-dimensional space representable using a contextual linear bandit problem formulation. We solve the contextual linear bandit problem with convex and $μ$-smooth loss functions -- in order to place more importance on better joint actions and mitigate potential representational limitations -- and derive a linear Upper Confidence Bound applied to trees (LinUCT) to enable novel multi-agent exploration and exploitation in the low-dimensional space. We analyze the regret of MALinZero for low-dimensional reward functions and propose an $(1-\tfrac1e)$-approximation algorithm for the joint action selection by maximizing a sub-modular objective. MALinZero demonstrates state-of-the-art performance on multi-agent benchmarks such as matrix games, SMAC, and SMACv2, outperforming both model-based and model-free multi-agent reinforcement learning baselines with faster learning speed and better performance.


Disciplined Biconvex Programming

arXiv.org Artificial Intelligence

We introduce disciplined biconvex programming (DBCP), a modeling framework for specifying and solving biconvex optimization problems. Biconvex optimization problems arise in various applications, including machine learning, signal processing, computational science, and control. Solving a biconvex optimization problem in practice usually resolves to heuristic methods based on alternate convex search (ACS), which iteratively optimizes over one block of variables while keeping the other fixed, so that the resulting subproblems are convex and can be efficiently solved. However, designing and implementing an ACS solver for a specific biconvex optimization problem usually requires significant effort from the user, which can be tedious and error-prone. DBCP extends the principles of disciplined convex programming to biconvex problems, allowing users to specify biconvex optimization problems in a natural way based on a small number of syntax rules. The resulting problem can then be automatically split and transformed into convex subproblems, for which a customized ACS solver is then generated and applied. DBCP allows users to quickly experiment with different biconvex problem formulations, without expertise in convex optimization. We implement DBCP into the open source Python package dbcp, as an extension to the famous domain specific language CVXPY for convex optimization.


Discovering Spatial Correlations of Earth Observations for weather forecasting by using Graph Structure Learning

arXiv.org Artificial Intelligence

This study aims to improve the accuracy of weather predictions by discovering spatial correlations between Earth observations and atmospheric states. Existing numerical weather prediction (NWP) systems predict future atmospheric states at fixed locations, which are called NWP grid points, by analyzing previous atmospheric states and newly acquired Earth observations. However, the shifting locations of observations and the surrounding meteorological context induce complex, dynamic spatial correlations that are difficult for traditional NWP systems to capture, since they rely on strict statistical and physical formulations. To handle complicated spatial correlations, which change dynamically, we employ a spatiotemporal graph neural networks (STGNNs) with structure learning. However, structure learning has an inherent limitation that this can cause structural information loss and over-smoothing problem by generating excessive edges. To solve this problem, we regulate edge sampling by adaptively determining node degrees and considering the spatial distances between NWP grid points and observations. We validated the effectiveness of the proposed method (CloudNine-v2) using real-world atmospheric state and observation data from East Asia, achieving up to 15\% reductions in RMSE over existing STGNN models. Even in areas with high atmospheric variability, CloudNine-v2 consistently outperformed baselines with and without structure learning.


Loss-Guided Auxiliary Agents for Overcoming Mode Collapse in GFlowNets

arXiv.org Artificial Intelligence

Although Generative Flow Networks (GFlowNets) are designed to capture multiple modes of a reward function, they often suffer from mode collapse in practice, getting trapped in early-discovered modes and requiring prolonged training to find diverse solutions. Existing exploration techniques often rely on heuristic novelty signals. We propose Loss-Guided GFlowNets (LGGFN), a novel approach where an auxiliary GFlowNet's exploration is \textbf{directly driven by the main GFlowNet's training loss}. By prioritizing trajectories where the main model exhibits \textbf{high loss}, LGGFN focuses sampling on poorly understood regions of the state space. This targeted exploration significantly accelerates the discovery of diverse, high-reward samples. Empirically, across \textbf{diverse benchmarks} including grid environments, structured sequence generation, Bayesian structure learning, and biological sequence design, LGGFN consistently \textbf{outperforms} baselines in exploration efficiency and sample diversity. For instance, on a challenging sequence generation task, it discovered over 40 times more unique valid modes while simultaneously reducing the exploration error metric by approximately 99\%.


Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet principled explanations for their underlying mechanisms and several phenomena, such as scaling laws, hallucinations, and related behaviors, remain elusive. In this work, we revisit the classical relationship between compression and prediction, grounded in Kolmogorov complexity and Shannon information theory, to provide deeper insights into LLM behaviors. By leveraging the Kolmogorov Structure Function and interpreting LLM compression as a two-part coding process, we offer a detailed view of how LLMs acquire and store information across increasing model and data scales -- from pervasive syntactic patterns to progressively rarer knowledge elements. Motivated by this theoretical perspective and natural assumptions inspired by Heap's and Zipf's laws, we introduce a simplified yet representative hierarchical data-generation framework called the Syntax-Knowledge model. Under the Bayesian setting, we show that prediction and compression within this model naturally lead to diverse learning and scaling behaviors observed in LLMs. In particular, our theoretical analysis offers intuitive and principled explanations for both data and model scaling laws, the dynamics of knowledge acquisition during training and fine-tuning, factual knowledge hallucinations in LLMs. The experimental results validate our theoretical predictions.


Policy Gradient-Based EMT-in-the-Loop Learning to Mitigate Sub-Synchronous Control Interactions

arXiv.org Artificial Intelligence

This paper explores the development of learning-based tunable control gains using EMT-in-the-loop simulation framework (e.g., PSCAD interfaced with Python-based learning modules) to address critical sub-synchronous oscillations. Since sub-synchronous control interactions (SSCI) arise from the mis-tuning of control gains under specific grid configurations, effective mitigation strategies require adaptive re-tuning of these gains. Such adaptiveness can be achieved by employing a closed-loop, learning-based framework that considers the grid conditions responsible for such sub-synchronous oscillations. This paper addresses this need by adopting methodologies inspired by Markov decision process (MDP) based reinforcement learning (RL), with a particular emphasis on simpler deep policy gradient methods with additional SSCI-specific signal processing modules such as down-sampling, bandpass filtering, and oscillation energy dependent reward computations. Our experimentation in a real-world event setting demonstrates that the deep policy gradient based trained policy can adaptively compute gain settings in response to varying grid conditions and optimally suppress control interaction-induced oscillations.


MARAuder's Map: Motion-Aware Real-time Activity Recognition with Layout-Based Trajectories

arXiv.org Artificial Intelligence

Ambient sensor-based human activity recognition (HAR) in smart homes remains challenging due to the need for real-time inference, spatially grounded reasoning, and context-aware temporal modeling. Existing approaches often rely on pre-segmented, within-activity data and overlook the physical layout of the environment, limiting their robustness in continuous, real-world deployments. In this paper, we propose MARAuder's Map, a novel framework for real-time activity recognition from raw, unsegmented sensor streams. Our method projects sensor activations onto the physical floorplan to generate trajectory-aware, image-like sequences that capture the spatial flow of human movement. These representations are processed by a hybrid deep learning model that jointly captures spatial structure and temporal dependencies. To enhance temporal awareness, we introduce a learnable time embedding module that encodes contextual cues such as hour-of-day and day-of-week. Additionally, an attention-based encoder selectively focuses on informative segments within each observation window, enabling accurate recognition even under cross-activity transitions and temporal ambiguity. Extensive experiments on multiple real-world smart home datasets demonstrate that our method outperforms strong baselines, offering a practical solution for real-time HAR in ambient sensor environments.


Automatic Extraction of Road Networks by using Teacher-Student Adaptive Structural Deep Belief Network and Its Application to Landslide Disaster

arXiv.org Artificial Intelligence

Abstract--An adaptive structural learning method of Restricted Boltzmann Machine (RBM) and Deep Belief Network (DBN) has been developed as one of prominent deep learning models. The neuron generation-annihilation algorithm in R BM and layer generation algorithm in DBN make an optimal networ k structure for given input during the learning. In this paper, our model is applied to an automatic recognition method of road network system, called RoadTracer . A novel method of RoadTracer using the T eacher-Student base d ensemble learning model of Adaptive DBN is proposed, since t he road maps contain many complicated features so that a model with high representation power to detect should be required . The experimental results showed the detection accuracy of t he proposed model was improved from 40.0% to 89.0% on average in the seven major cities among the test dataset. In addition, we challenged to apply our method to the detection of availab le roads when landslide by natural disaster is occurred, in ord er to rapidly obtain a way of transportation. For fast inferenc e, a small size of the trained model was implemented on a small embedded edge device as lightweight deep learning. Recently there have been more cases of extreme climate events including unexpected and unusual weather. The atten - tion of these events has been received in the last few years, d ue to the significant loss of human lives and escalating economi c costs, as well as the impacts on landslides and changes in ecosystems. In Japan, the Japan Meteorological Agency (JMA) has issued "Climate Change Monitoring Report" every year informing the latest status of climate change. According to [1 ], during the Heavy Rain Event of July 2018, Japan experienced unprecedented heavy rainfall. Overall precipitation obse rved at AMeDAS stations throughout Japan in July 2018 was extremely high in comparison with past heavy rainfall event s since 1982. A prominent characteristic of this rain event is that the record-breaking local precipitation, particularly wi thin 48 to 72 hours, was observed extensively over western Japan and Tokyo region, including the Seto Inland Sea side of Chugoku and Shikoku regions. S. Kamada is with Hiroshima City University, Hiroshima, Jap an T. Ichimura is with Prefectural University of Hiroshima, Hi roshima, Japan In addition, lifelines such as wat er supply and communications damaged, and traffic obstacles occurred over a wide area. Due to the disruption of major roads and railroads, the supply was also suspended.


Approximate Bayesian inference for cumulative probit regression models

arXiv.org Machine Learning

Ordinal categorical data are routinely encountered in a wide range of practical applications. When the primary goal is to construct a regression model for ordinal outcomes, cumulative link models represent one of the most popular choices to link the cumulative probabilities of the response with a set of covariates through a parsimonious linear predictor, shared across response categories. When the number of observations grows, standard sampling algorithms for Bayesian inference scale poorly, making posterior computation increasingly challenging in large datasets. In this article, we propose three scalable algorithms for approximating the posterior distribution of the regression coefficients in cumulative probit models relying on Variational Bayes and Expectation Propagation. We compare the proposed approaches with inference based on Markov Chain Monte Carlo, demonstrating superior computational performance and remarkable accuracy; finally, we illustrate the utility of the proposed algorithms on a challenging case study to investigate the structure of a criminal network.


Wasserstein-Cramér-Rao Theory of Unbiased Estimation

arXiv.org Machine Learning

The quantity of interest in the classical Cramér-Rao theory of unbiased estimation (e.g., the Cramér-Rao lower bound, its exact attainment for exponential families, and asymptotic efficiency of maximum likelihood estimation) is the variance, which represents the instability of an estimator when its value is compared to the value for an independently-sampled data set from the same distribution. In this paper we are interested in a quantity which represents the instability of an estimator when its value is compared to the value for an infinitesimal additive perturbation of the original data set; we refer to this as the "sensitivity" of an estimator. The resulting theory of sensitivity is based on the Wasserstein geometry in the same way that the classical theory of variance is based on the Fisher-Rao (equivalently, Hellinger) geometry, and this insight allows us to determine a collection of results which are analogous to the classical case: a Wasserstein-Cramér-Rao lower bound for the sensitivity of any unbiased estimator, a characterization of models in which there exist unbiased estimators achieving the lower bound exactly, and some concrete results that show that the Wasserstein projection estimator achieves the lower bound asymptotically. We use these results to treat many statistical examples, sometimes revealing new optimality properties for existing estimators and other times revealing entirely new estimators.