Goto

Collaborating Authors

 stm


pDANSE: Particle-based Data-driven Nonlinear State Estimation from Nonlinear Measurements

Ghosh, Anubhab, Eldar, Yonina C., Chatterjee, Saikat

arXiv.org Artificial Intelligence

We consider the problem of designing a data-driven nonlinear state estimation (DANSE) method that uses (noisy) nonlinear measurements of a process whose underlying state transition model (STM) is unknown. Such a process is referred to as a model-free process. A recurrent neural network (RNN) provides parameters of a Gaussian prior that characterize the state of the model-free process, using all previous measurements at a given time point. In the case of DANSE, the measurement system was linear, leading to a closed-form solution for the state posterior. However, the presence of a nonlinear measurement system renders a closed-form solution infeasible. Instead, the second-order statistics of the state posterior are computed using the nonlinear measurements observed at the time point. We address the nonlinear measurements using a reparameterization trick-based particle sampling approach, and estimate the second-order statistics of the state posterior. The proposed method is referred to as particle-based DANSE (pDANSE). The RNN of pDANSE uses sequential measurements efficiently and avoids the use of computationally intensive sequential Monte-Carlo (SMC) and/or ancestral sampling. We describe the semi-supervised learning method for pDANSE, which transitions to unsupervised learning in the absence of labeled data. Using a stochastic Lorenz-$63$ system as a benchmark process, we experimentally demonstrate the state estimation performance for four nonlinear measurement systems. We explore cubic nonlinearity and a camera-model nonlinearity where unsupervised learning is used; then we explore half-wave rectification nonlinearity and Cartesian-to-spherical nonlinearity where semi-supervised learning is used. The performance of state estimation is shown to be competitive vis-à-vis particle filters that have complete knowledge of the STM of the Lorenz-$63$ system.


Structure-Preserving Margin Distribution Learning for High-Order Tensor Data with Low-Rank Decomposition

Xu, Yang, Li, Junpeng, Hua, Changchun, Yang, Yana

arXiv.org Artificial Intelligence

Abstract--The Large Margin Distribution Machine (LMDM) is a recent advancement in classifier design that optimizes not just the minimum margin (as in SVM) but the entire margin distribution, thereby improving generalization. However, existing LMDM formulations are limited to vectorized inputs and struggle with high-dimensional tensor data due to the need for flattening, which destroys the data's inherent multi-mode structure and increases computational burden. In this paper, we propose a Structure-Preserving Margin Distribution Learning for High-Order T ensor Data with Low-Rank Decomposition (SPMD-LRT) that operates directly on tensor representations without vectorization. The SPMD-LRT preserves multi-dimensional spatial structure by incorporating first-order and second-order tensor statistics (margin mean and variance) into the objective, and it leverages low-rank tensor decomposition techniques including rank-1(CP), higher-rank CP, and T ucker decomposition to parameterize the weight tensor . An alternating optimization (double-gradient descent) algorithm is developed to efficiently solve the SPMD-LRT, iteratively updating factor matrices and core tensor . This approach enables SPMD-LRT to maintain the structural information of high-order data while optimizing margin distribution for improved classification. Extensive experiments on diverse datasets (including MNIST, images and fMRI neuroimaging) demonstrate that SPMD-LRT achieves superior classification accuracy compared to conventional SVM, vector-based LMDM, and prior tensor-based SVM extensions (Support T ensor Machines and Support T ucker Machines). These results confirm the effectiveness and robustness of SPMD-LRT in handling high-dimensional tensor data for classification. Dvances in data acquisition have led to an abundance of high-order tensor data (multi-dimensional arrays) across various domains, such as video sequences, medical imaging, and spatiotemporal sensor readings. Effectively learning from such tensor-structured data has become a pressing research focus [1] [2]. The multi-dimensional structure of tensors offers rich information (e.g.


Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity

Wu, Chao-Chung, Tam, Zhi Rui, Lin, Chieh-Yen, Lee, Hung-yi, Chen, Yun-Nung

arXiv.org Artificial Intelligence

Maintaining consistent model performance across domains is a fundamental challenge in machine learning. While recent work has explored using LLM-generated data for fine-tuning, its impact on cross-domain generalization remains poorly understood. In this paper, we present a systematic analysis revealing that fine-tuning with LLM-generated data not only improves target task performance but also reduces out-of-domain (OOD) degradation compared to fine-tuning with ground truth data. Through analyzing the data sequence in tasks of various domains, we demonstrate that this enhanced OOD robustness stems from a reduced prevalence of high perplexity tokens in LLM-generated sequences. Following this hypothesis we showed that masking high perplexity tokens in ground truth training data also achieves similar OOD preservation comparable to using LLM-generated data. Extensive experiments across diverse model architectures and scales, including Gemma2-2B, Mistral-7B and Llama3-8B, corroborate the consistency of our findings. To the best of our knowledge, this work provides the first mechanistic explanation for the superior OOD robustness conferred by LLM-generated training data, offering valuable insights for developing more robust fine-tuning strategies.


The Sparse Tsetlin Machine: Sparse Representation with Active Literals

Østby, Sebastian, Brambo, Tobias M., Glimsdal, Sondre

arXiv.org Artificial Intelligence

This paper introduces the Sparse Tsetlin Machine (STM), a novel Tsetlin Machine (TM) that processes sparse data efficiently. Traditionally, the TM does not consider data characteristics such as sparsity, commonly seen in NLP applications and other bag-of-word-based representations. Consequently, a TM must initialize, store, and process a significant number of zero values, resulting in excessive memory usage and computational time. Previous attempts at creating a sparse TM have predominantly been unsuccessful, primarily due to their inability to identify which literals are sufficient for TM training. By introducing Active Literals (AL), the STM can focus exclusively on literals that actively contribute to the current data representation, significantly decreasing memory footprint and computational time while demonstrating competitive classification performance.


MemoNav: Working Memory Model for Visual Navigation

Li, Hongxin, Wang, Zeyu, Yang, Xu, Yang, Yuran, Mei, Shuqi, Zhang, Zhaoxiang

arXiv.org Artificial Intelligence

Image-goal navigation is a challenging task that requires an agent to navigate to a goal indicated by an image in unfamiliar environments. Existing methods utilizing diverse scene memories suffer from inefficient exploration since they use all historical observations for decision-making without considering the goal-relevant fraction. To address this limitation, we present MemoNav, a novel memory model for image-goal navigation, which utilizes a working memory-inspired pipeline to improve navigation performance. Specifically, we employ three types of navigation memory. The node features on a map are stored in the short-term memory (STM), as these features are dynamically updated. A forgetting module then retains the informative STM fraction to increase efficiency. We also introduce long-term memory (LTM) to learn global scene representations by progressively aggregating STM features. Subsequently, a graph attention module encodes the retained STM and the LTM to generate working memory (WM) which contains the scene features essential for efficient navigation. The synergy among these three memory types boosts navigation performance by enabling the agent to learn and leverage goal-relevant scene features within a topological map. Our evaluation on multi-goal tasks demonstrates that MemoNav significantly outperforms previous methods across all difficulty levels in both Gibson and Matterport3D scenes. Qualitative results further illustrate that MemoNav plans more efficient routes.


SCALE: Synergized Collaboration of Asymmetric Language Translation Engines

Cheng, Xin, Wang, Xun, Ge, Tao, Chen, Si-Qing, Wei, Furu, Zhao, Dongyan, Yan, Rui

arXiv.org Artificial Intelligence

In this paper, we introduce SCALE, a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine. By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM, thus mitigating language bias of LLM and parallel data bias of STM, enhancing LLM speciality without sacrificing generality, and facilitating continual learning without expensive LLM fine-tuning. Our comprehensive experiments show that SCALE significantly outperforms both few-shot LLMs (GPT-4) and specialized models (NLLB) in challenging low-resource settings. Moreover, in Xhosa to English translation, SCALE experiences consistent improvement by a 4 BLEURT score without tuning LLM and surpasses few-shot GPT-4 by 2.5 COMET score and 3.8 BLEURT score when equipped with a compact model consisting of merely 600M parameters. SCALE could also effectively exploit the existing language bias of LLMs by using an English-centric STM as a pivot for translation between any language pairs, outperforming few-shot GPT-4 by an average of 6 COMET points across eight translation directions. Furthermore we provide an in-depth analysis of SCALE's robustness, translation characteristics, and latency costs, providing solid foundation for future studies exploring the potential synergy between LLMs and more specialized, task-specific models Large Language Models (LLMs) have recently revolutionized the field of natural language processing (OpenAI, 2023; Touvron et al., 2023; Peng et al., 2023), significantly influencing machine translation (MT) by delivering exceptional performance without requiring a bilingual corpus, particularly in high-resource languages (Brown et al., 2020; Garcia et al., 2023). Moreover, as a unified multi-task learner, LLMs represent a substantial step towards artificial general intelligence (Bubeck et al., 2023), with the potential to overcome not only language barriers but also cultural boundaries simultaneously through a simple "translate and explain" prompt.


Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Neural Information Processing Systems

We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the topics). In the sparse topic model (STM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the STM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the STM on four real-world datasets.


A Theoretical Computer Science Perspective on Free Will

Blum, Lenore, Blum, Manuel

arXiv.org Artificial Intelligence

We consider the paradoxical concept of free will from the perspective of Theoretical Computer Science (TCS), a branch of mathematics concerned with understanding the underlying principles of computation and complexity, including the implications and surprising consequences of resource limitations, particularly the fact that computation takes time. The concept of free will is paradoxical.


Sybil-Proof Diffusion Auction in Social Networks

Chen, Hongyin, Deng, Xiaotie, Wang, Ying, Wu, Yue, Zhao, Dengji

arXiv.org Artificial Intelligence

A diffusion auction is a market to sell commodities over a social network, where the challenge is to incentivize existing buyers to invite their neighbors in the network to join the market. Existing mechanisms have been designed to solve the challenge in various settings, aiming at desirable properties such as non-deficiency, incentive compatibility and social welfare maximization. Since the mechanisms are employed in dynamic networks with ever-changing structures, buyers could easily generate fake nodes in the network to manipulate the mechanisms for their own benefits, which is commonly known as the Sybil attack. We observe that strategic agents may gain an unfair advantage in existing mechanisms through such attacks. To resist this potential attack, we propose two diffusion auction mechanisms, the Sybil tax mechanism (STM) and the Sybil cluster mechanism (SCM), to achieve both Sybil-proofness and incentive compatibility in the single-item setting. Our proposal provides the first mechanisms to protect the interests of buyers against Sybil attacks with a mild sacrifice of social welfare and revenue.


MemoNav: Selecting Informative Memories for Visual Navigation

Li, Hongxin, Yang, Xu, Yang, Yuran, Mei, Shuqi, Zhang, Zhaoxiang

arXiv.org Artificial Intelligence

Image-goal navigation is a challenging task, as it requires the agent to navigate to a target indicated by an image in a previously unseen scene. Current methods introduce diverse memory mechanisms which save navigation history to solve this task. However, these methods use all observations in the memory for generating navigation actions without considering which fraction of this memory is informative. To address this limitation, we present the MemoNav, a novel memory mechanism for image-goal navigation, which retains the agent's informative short-term memory and long-term memory to improve the navigation performance on a multi-goal task. The node features on the agent's topological map are stored in the short-term memory, as these features are dynamically updated. To aid the short-term memory, we also generate long-term memory by continuously aggregating the short-term memory via a graph attention module. The MemoNav retains the informative fraction of the short-term memory via a forgetting module based on a Transformer decoder and then incorporates this retained short-term memory and the long-term memory into working memory. Lastly, the agent uses the working memory for action generation. We evaluate our model on a new multi-goal navigation dataset. The experimental results show that the MemoNav outperforms the SoTA methods by a large margin with a smaller fraction of navigation history. The results also empirically show that our model is less likely to be trapped in a deadlock, which further validates that the MemoNav improves the agent's navigation efficiency by reducing redundant steps.