Problem Solving
A Direct Semi-Exhaustive Search Method for Robust, Partial-to-Full Point Cloud Registration
Cheng, Richard, Papozov, Chavdar, Helmick, Dan, Tjersland, Mark
Point cloud registration refers to the problem of finding the rigid transformation that aligns two given point clouds, and is crucial for many applications in robotics and computer vision. The main insight of this paper is that we can directly optimize the point cloud registration problem without correspondences by utilizing an algorithmically simple, yet computationally complex, semi-exhaustive search approach that is very well-suited for parallelization on modern GPUs. Our proposed algorithm, Direct Semi-Exhaustive Search (DSES), iterates over potential rotation matrices and efficiently computes the inlier-maximizing translation associated with each rotation. It then computes the optimal rigid transformation based on any desired distance metric by directly computing the error associated with each transformation candidate $\{R, t\}$. By leveraging the parallelism of modern GPUs, DSES outperforms state-of-the-art methods for partial-to-full point cloud registration on the simulated ModelNet40 benchmark and demonstrates high performance and robustness for pose estimation on a real-world robotics problem (https://youtu.be/q0q2-s2KSuA).
KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search
Luo, Haoran, E, Haihong, Guo, Yikai, Lin, Qika, Wu, Xiaobao, Mu, Xinyu, Liu, Wenhao, Song, Meina, Zhu, Yifan, Tuan, Luu Anh
Knowledge Base Question Answering (KBQA) aims to answer natural language questions with a large-scale structured knowledge base (KB). Despite advancements with large language models (LLMs), KBQA still faces challenges in weak KB awareness, imbalance between effectiveness and efficiency, and high reliance on annotated data. To address these challenges, we propose KBQA-o1, a novel agentic KBQA method with Monte Carlo Tree Search (MCTS). It introduces a ReAct-based agent process for stepwise logical form generation with KB environment exploration. Moreover, it employs MCTS, a heuristic search method driven by policy and reward models, to balance agentic exploration's performance and search space. With heuristic exploration, KBQA-o1 generates high-quality annotations for further improvement by incremental fine-tuning. Experimental results show that KBQA-o1 outperforms previous low-resource KBQA methods with limited annotated data, boosting Llama-3.1-8B model's GrailQA F1 performance to 78.5% compared to 48.5% of the previous sota method with GPT-3.5-turbo.
Node Classification and Search on the Rubik's Cube Graph with GNNs
This study focuses on the application of deep geometric models to solve the 3x3x3 Rubik's Cube. We begin by discussing the cube's graph representation and defining distance as the model's optimization objective. The distance approximation task is reformulated as a node classification problem, effectively addressed using Graph Neural Networks (GNNs). After training the model on a random subgraph, the predicted classes are used to construct a heuristic for $A^*$ search. We conclude with experiments comparing our heuristic to that of the DeepCubeA model.
MCM: Multi-layer Concept Map for Efficient Concept Learning from Masked Images
Sun, Yuwei, Mi, Lu, Fujisawa, Ippei, Kanai, Ryota
Masking strategies commonly employed in natural language processing are still underexplored in vision tasks such as concept learning, where conventional methods typically rely on full images. However, using masked images diversifies perceptual inputs, potentially offering significant advantages in concept learning with large-scale Transformer models. To this end, we propose Multi-layer Concept Map (MCM), the first work to devise an efficient concept learning method based on masked images. In particular, we introduce an asymmetric concept learning architecture by establishing correlations between different encoder and decoder layers, updating concept tokens using backward gradients from reconstruction tasks. The learned concept tokens at various levels of granularity help either reconstruct the masked image patches by filling in gaps or guide the reconstruction results in a direction that reflects specific concepts. Moreover, we present both quantitative and qualitative results across a wide range of metrics, demonstrating that MCM significantly reduces computational costs by training on fewer than 75% of the total image patches while enhancing concept prediction performance. Additionally, editing specific concept tokens in the latent space enables targeted image generation from masked images, aligning both the visible contextual patches and the provided concepts. By further adjusting the testing time mask ratio, we could produce a range of reconstructions that blend the visible patches with the provided concepts, proportional to the chosen ratios.
Counting and Reasoning with Plans
Speck, David, Hecher, Markus, Gnad, Daniel, Fichte, Johannes K., Corrêa, Augusto B.
Classical planning asks for a sequence of operators reaching a given goal. While the most common case is to compute a plan, many scenarios require more than that. However, quantitative reasoning on the plan space remains mostly unexplored. A fundamental problem is to count plans, which relates to the conditional probability on the plan space. Indeed, qualitative and quantitative approaches are well-established in various other areas of automated reasoning. We present the first study to quantitative and qualitative reasoning on the plan space. In particular, we focus on polynomially bounded plans. On the theoretical side, we study its complexity, which gives rise to rich reasoning modes. Since counting is hard in general, we introduce the easier notion of facets, which enables understanding the significance of operators. On the practical side, we implement quantitative reasoning for planning. Thereby, we transform a planning task into a propositional formula and use knowledge compilation to count different plans. This framework scales well to large plan spaces, while enabling rich reasoning capabilities such as learning pruning functions and explainable planning.
On Scaling Neurosymbolic Programming through Guided Logical Inference
Valentin, Thomas Jean-Michel, Werner, Luisa Sophie, Genevès, Pierre, Layaïda, Nabil
Probabilistic neurosymbolic learning seeks to integrate neural networks with symbolic programming. Many state-of-the-art systems rely on a reduction to the Probabilistic Weighted Model Counting Problem (PWMC), which requires computing a Boolean formula called the logical provenance.However, PWMC is \\#P-hard, and the number of clauses in the logical provenance formula can grow exponentially, creating a major bottleneck that significantly limits the applicability of PNL solutions in practice.We propose a new approach centered around an exact algorithm DPNL, that enables bypassing the computation of the logical provenance.The DPNL approach relies on the principles of an oracle and a recursive DPLL-like decomposition in order to guide and speed up logical inference.Furthermore, we show that this approach can be adapted for approximate reasoning with $\epsilon$ or $(\epsilon, \delta)$ guarantees, called ApproxDPNL.Experiments show significant performance gains.DPNL enables scaling exact inference further, resulting in more accurate models.Further, ApproxDPNL shows potential for advancing the scalability of neurosymbolic programming by incorporating approximations even further, while simultaneously ensuring guarantees for the reasoning process.
State Stream Transformer (SST) : Emergent Metacognitive Behaviours Through Latent State Persistence
We introduce the State Stream Transformer (SST), a novel LLM architecture that reveals emergent reasoning behaviours and capabilities latent in pretrained weights through addressing a fundamental limitation in traditional transformer models: the lack of latent computational continuity across autoregressive generations in the state space. SST introduces a sliding window latent state (FFN) cache with weighted decay that maintains and evolves persistent latent processes throughout autoregressive generations. Through controlled experiments comparing base and SST architectures using the same frozen weights, we demonstrate that this architectural modification alone enables enhanced reasoning capabilities which appear best explained by some form of potential higher-order processing, as evidenced by emergent metacognitive behaviours. These behaviours persist under controlled conditions designed to eliminate confounding factors such as stochastic variation or learned response patterns. Analysis of latent state distributions and processing dynamics provides evidence that it is solely the 'state stream' that is responsible for these phenomena. In quantitative evaluations, the SST achieves substantial performance improvements over the base model on two reasoning benchmarks, reaching 89.01\% accuracy on GSM-8K (0-shot) and 91.04\% on ARC Challenge (0-shot CoT). These findings indicate that persistent computation in the latent state space enables fundamentally different information processing and internal reasoning strategies, with implications for our understanding of artificial intelligence systems.
International AI Safety Report
Bengio, Yoshua, Mindermann, Sören, Privitera, Daniel, Besiroglu, Tamay, Bommasani, Rishi, Casper, Stephen, Choi, Yejin, Fox, Philip, Garfinkel, Ben, Goldfarb, Danielle, Heidari, Hoda, Ho, Anson, Kapoor, Sayash, Khalatbari, Leila, Longpre, Shayne, Manning, Sam, Mavroudis, Vasilios, Mazeika, Mantas, Michael, Julian, Newman, Jessica, Ng, Kwan Yee, Okolo, Chinasa T., Raji, Deborah, Sastry, Girish, Seger, Elizabeth, Skeadas, Theodora, South, Tobin, Strubell, Emma, Tramèr, Florian, Velasco, Lucia, Wheeler, Nicole, Acemoglu, Daron, Adekanmbi, Olubayo, Dalrymple, David, Dietterich, Thomas G., Felten, Edward W., Fung, Pascale, Gourinchas, Pierre-Olivier, Heintz, Fredrik, Hinton, Geoffrey, Jennings, Nick, Krause, Andreas, Leavy, Susan, Liang, Percy, Ludermir, Teresa, Marda, Vidushi, Margetts, Helen, McDermid, John, Munga, Jane, Narayanan, Arvind, Nelson, Alondra, Neppel, Clara, Oh, Alice, Ramchurn, Gopal, Russell, Stuart, Schaake, Marietje, Schölkopf, Bernhard, Song, Dawn, Soto, Alvaro, Tiedrich, Lee, Varoquaux, Gaël, Yao, Andrew, Zhang, Ya-Qin, Albalawi, Fahad, Alserkal, Marwan, Ajala, Olubunmi, Avrin, Guillaume, Busch, Christian, de Carvalho, André Carlos Ponce de Leon Ferreira, Fox, Bronwyn, Gill, Amandeep Singh, Hatip, Ahmet Halit, Heikkilä, Juha, Jolly, Gill, Katzir, Ziv, Kitano, Hiroaki, Krüger, Antonio, Johnson, Chris, Khan, Saif M., Lee, Kyoung Mu, Ligot, Dominic Vincent, Molchanovskyi, Oleksii, Monti, Andrea, Mwamanzi, Nusu, Nemer, Mona, Oliver, Nuria, Portillo, José Ramón López, Ravindran, Balaraman, Rivera, Raquel Pezoa, Riza, Hammam, Rugege, Crystal, Seoighe, Ciarán, Sheehan, Jerry, Sheikh, Haroon, Wong, Denise, Zeng, Yi
I am honoured to present the International AI Safety Report. It is the work of 96 international AI experts who collaborated in an unprecedented effort to establish an internationally shared scientific understanding of risks from advanced AI and methods for managing them. We embarked on this journey just over a year ago, shortly after the countries present at the Bletchley Park AI Safety Summit agreed to support the creation of this report. Since then, we published an Interim Report in May 2024, which was presented at the AI Seoul Summit. We are now pleased to publish the present, full report ahead of the AI Action Summit in Paris in February 2025. Since the Bletchley Summit, the capabilities of general-purpose AI, the type of AI this report focuses on, have increased further. For example, new models have shown markedly better performance at tests of Professor Yoshua Bengio programming and scientific reasoning.
ISAM-MTL: Cross-subject multi-task learning model with identifiable spikes and associative memory networks
Li, Junyan, Hu, Bin, Guan, Zhi-Hong
Cross-subject variability in EEG degrades performance of current deep learning models, limiting the development of brain-computer interface (BCI). This paper proposes ISAM-MTL, which is a multi-task learning (MTL) EEG classification model based on identifiable spiking (IS) representations and associative memory (AM) networks. The proposed model treats EEG classification of each subject as an independent task and leverages cross-subject data training to facilitate feature sharing across subjects. ISAM-MTL consists of a spiking feature extractor that captures shared features across subjects and a subject-specific bidirectional associative memory network that is trained by Hebbian learning for efficient and fast within-subject EEG classification. ISAM-MTL integrates learned spiking neural representations with bidirectional associative memory for cross-subject EEG classification. The model employs label-guided variational inference to construct identifiable spike representations, enhancing classification accuracy. Experimental results on two BCI Competition datasets demonstrate that ISAM-MTL improves the average accuracy of cross-subject EEG classification while reducing performance variability among subjects. The model further exhibits the characteristics of few-shot learning and identifiable neural activity beneath EEG, enabling rapid and interpretable calibration for BCI systems.
Certifying Pareto-Optimality in Multi-Objective Maximum Satisfiability
Jabs, Christoph, Berg, Jeremias, Bogaerts, Bart, Järvisalo, Matti
Due to the wide employment of automated reasoning in the analysis and construction of correct systems, the results reported by automated reasoning engines must be trustworthy. For Boolean satisfiability (SAT) solvers - and more recently SAT-based maximum satisfiability (MaxSAT) solvers - trustworthiness is obtained by integrating proof logging into solvers, making solvers capable of emitting machine-verifiable proofs to certify correctness of the reasoning steps performed. In this work, we enable for the first time proof logging based on the VeriPB proof format for multi-objective MaxSAT (MO-MaxSAT) optimization techniques. Although VeriPB does not offer direct support for multi-objective problems, we detail how preorders in VeriPB can be used to provide certificates for MO-MaxSAT algorithms computing a representative solution for each element in the non-dominated set of the search space under Pareto-optimality, without extending the VeriPB format or the proof checker. By implementing VeriPB proof logging into a state-of-the-art multi-objective MaxSAT solver, we show empirically that proof logging can be made scalable for MO-MaxSAT with reasonable overhead.