Goto

Collaborating Authors

 Singapore


Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

Bhaskara, Vin, Wang, Haicheng

arXiv.org Machine Learning

Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it reduces to a tractable per-step form: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this baseline online with a learned critic co-trained alongside the world model; regressing a single scalar, the critic converges well before the world model saturates, redirecting exploration toward learnable transitions without oracle knowledge of the noise floor. The reward is higher for learnable transitions and collapses toward the baseline for stochastic ones, effectively separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error and visitation-count baselines in convergence speed and final world model accuracy.


Back to school: robots learn from factory workers

Robohub

What if training a robot to handle dirty, dangerous work on the factory floor was as simple as showing it how? Czech startup RoboTwin is doing exactly that, helping factory workers teach robots new skills by demonstration. Instead of writing complex code, workers perform the job once and RoboTwin's technology turns those movements into a robot programme - opening the door to automation for smaller manufacturers. Founded in Prague in 2021, RoboTwin builds handheld devices and no-code software that capture human movements and translate them into instructions for industrial robots. The aim is to make automation faster, simpler and more accessible to manufacturers that do not have specialist robotics programmers.


AIhub monthly digest: March 2026 – time series, multiplicity, and the history of RoboCup

AIHub

Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month, we delved into the history of RoboCup, learned about time series, studied multiplicity, and found out more about Theory of Mind. RoboCup is an international competition that promotes and advances robotics and AI through the challenges presented by its various leagues. We got the chance to sit down with Professor Manuela Veloso, one of RoboCup's founders, to find out more about how it all started, how the community has grown over the years, and the vision for the future. What we've learned from 25 years of automated science, and what the future holds We're excited to launch a new series, where we'll be speaking with leading researchers to explore the breakthroughs driving AI and the reality of the future promises, to give you an inside perspective on the headlines.


Scaling up multi-agent systems: an interview with Minghong Geng

AIHub

In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. Minghong Geng recently completed his PhD and is now working as a postdoctoral researcher at Singapore Management University. We sat down to discuss his research on multi-agent systems. Firstly, congratulations on completing your PhD! What is the general topic of your research? I work on multi-agent systems.


Boltzmann Machine Learning with a Parallel, Persistent Markov chain Monte Carlo method for Estimating Evolutionary Fields and Couplings from a Protein Multiple Sequence Alignment

Miyazawa, Sanzo

arXiv.org Machine Learning

The inverse Potts problem for estimating evolutionary single-site fields and pairwise couplings in homologous protein sequences from their single-site and pairwise amino acid frequencies observed in their multiple sequence alignment would be still one of useful methods in the studies of protein structure and evolution. Since the reproducibility of fields and couplings are the most important, the Boltzmann machine method is employed here, although it is computationally intensive. In order to reduce computational time required for the Boltzmann machine, parallel, persistent Markov chain Monte Carlo method is employed to estimate the single-site and pairwise marginal distributions in each learning step. Also, stochastic gradient descent methods are used to reduce computational time for each learning. Another problem is how to adjust the values of hyperparameters; there are two regularization parameters for evolutionary fields and couplings. The precision of contact residue pair prediction is often used to adjust the hyperparameters. However, it is not sensitive to these regularization parameters. Here, they are adjusted for the fields and couplings to satisfy a specific condition that is appropriate for protein conformations. This method has been applied to eight protein families.


Contraction and Hourglass Persistence for Learning on Graphs, Simplices, and Cells

Ji, Mattie, Roy, Indradyumna, Garg, Vikas

arXiv.org Machine Learning

Persistent homology (PH) encodes global information, such as cycles, and is thus increasingly integrated into graph neural networks (GNNs). PH methods in GNNs typically traverse an increasing sequence of subgraphs. In this work, we first expose limitations of this inclusion procedure. To remedy these shortcomings, we analyze contractions as a principled topological operation, in particular, for graph representation learning. We study the persistence of contraction sequences, which we call Contraction Homology (CH). We establish that forward PH and CH differ in expressivity. We then introduce Hourglass Persistence, a class of topological descriptors that interleave a sequence of inclusions and contractions to boost expressivity, learnability, and stability. We also study related families parametrized by two paradigms. We also discuss how our framework extends to simplicial and cellular networks. We further design efficient algorithms that are pluggable into end-to-end differentiable GNN pipelines, enabling consistent empirical improvements over many PH methods across standard real-world graph datasets. Code is available at \href{https://github.com/Aalto-QuML/Hourglass}{this https URL}.


Extraction of informative statistical features in the problem of forecasting time series generated by It{ô}-type processes

Korolev, Victor, Ivanov, Mikhail, Kukanova, Tatiana, Rukavitsa, Artyom, Vakshin, Alexander, Solomonov, Peter, Zeifman, Alexander

arXiv.org Machine Learning

In this paper, we consider the problem of extraction of most informative features from time series that are regarded as observed values of stochastic processes satisfying the It{ô} stochastic differential equations with unknown random drift and diffusion coefficients. We do not attract any additional information and use only the information contained in the time series as it is. Therefore, as additional features, we use the parameters of statistically adjusted mixture-type models of the observed regularities of the behavior of the time series. Several algorithms of construction of these parameters are discussed. These algorithms are based on statistical reconstruction of the coefficients which, in turn, is based on statistical separation of normal mixtures. We obtain two types of parameters by the techniques of the uniform and non-uniform statistical reconstruction of the coefficients of the underlying It{ô} process. The reconstructed coefficients obtained by uniform techniques do not depend on the current value of the process, while the non-uniform techniques reconstruct the coefficients with the account of their dependence on the value of the process. Actually, the non-uniform techniques used in this paper represent a stochastic analog of the Taylor expansion for the time series. The efficiency of the obtained additional features is compared by using them in the autoregressive algorithms of prediction of time series. In order to obtain pure conclusion that is not affected by unwanted factors, say, related to a special choice of the architecture of the neural network prediction methods, we used only simple autoregressive algorithms. We show that the use of additional statistical features improves the prediction.


Google brings Gemini in Chrome to users in Asia and the Pacific

Engadget

Outside of Japan, Google's chatbot is accessible on both desktop and iOS. Google has added a new sidebar to Chrome that allows users to access Gemini from any of their tabs. After debuting in the US, Gemini in Chrome is making its way to more markets. Starting today, Google is rolling out Chrome's built-in chatbot to users in Asia and the Pacific, including Australia, Indonesia, Japan, the Philippines, Singapore, South Korea and Vietnam. The expansion comes after Google earlier this year made Gemini in Chrome available to people in Canada, India and New Zealand .


Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

Montreuil, Yannis, Carlier, Axel, Ng, Lai Xing, Ooi, Wei Tsang

arXiv.org Machine Learning

Existing multi-expert learning-to-defer surrogates are statistically consistent, yet they can underfit, suppress useful experts, or degrade as the expert pool grows. We trace these failures to a shared architectural choice: casting classes and experts as actions inside one augmented prediction geometry. Consistency governs the population target; it says nothing about how the surrogate distributes gradient mass during training. We analyze five surrogates along both axes and show that each trades a fix on one for a failure on the other. We then introduce a decoupled surrogate that estimates the class posterior with a softmax and each expert utility with an independent sigmoid. It admits an $\mathcal{H}$-consistency bound whose constant is $J$-independent for fixed per-expert weight $β{=}λ/J$, and its gradients are free of the amplification, starvation, and coupling pathologies of the augmented family. Experiments on synthetic benchmarks, CIFAR-10, CIFAR-10H, and Covertype confirm that the decoupled surrogate is the only method that avoids amplification under redundancy, preserves rare specialists, and consistently improves over a standalone classifier across all settings.


Cost-optimal Sequential Testing via Doubly Robust Q-learning

Zhou, Doudou, Zhang, Yiran, Jin, Dian, Zheng, Yingye, Tian, Lu, Cai, Tianxi

arXiv.org Machine Learning

Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.