Goto

Collaborating Authors

 order dependency


Order-Independence Without Fine Tuning

Neural Information Processing Systems

The development of generative language models that can create long and coherent textual outputs via autoregression has lead to a proliferation of uses and a corresponding sweep of analyses as researches work to determine the limitations of this new paradigm. Unlike humans, these'Large Language Models' (LLMs) are highly sensitive to small changes in their inputs, leading to unwanted inconsistency in their behavior. One problematic inconsistency when LLMs are used to answer multiple-choice questions or analyze multiple inputs is order dependency: the output of an LLM can (and often does) change significantly when sub-sequences are swapped, despite both orderings being semantically identical. In this paper we present, a technique that guarantees the output of an LLM will not have order dependence on a specified set of sub-sequences. We show that this method provably eliminates order dependency, and that it can be applied to any transformer-based LLM to enable text generation that is unaffected by re-orderings.


Simplified Mamba with Disentangled Dependency Encoding for Long-Term Time Series Forecasting

Weng, Zixuan, Han, Jindong, Jiang, Wenzhao, Liu, Hao

arXiv.org Artificial Intelligence

Recently many deep learning models have been proposed for Long-term Time Series Forecasting (LTSF). Based on previous literature, we identify three critical patterns that can improve forecasting accuracy: the order and semantic dependencies in time dimension as well as cross-variate dependency. However, little effort has been made to simultaneously consider order and semantic dependencies when developing forecasting models. Moreover, existing approaches utilize cross-variate dependency by mixing information from different timestamps and variates, which may introduce irrelevant or harmful cross-variate information to the time dimension and largely hinder forecasting performance. To overcome these limitations, we investigate the potential of Mamba for LTSF and discover two key advantages benefiting forecasting: (i) the selection mechanism makes Mamba focus on or ignore specific inputs and learn semantic dependency easily, and (ii) Mamba preserves order dependency by processing sequences recursively. After that, we empirically find that the non-linear activation used in Mamba is unnecessary for semantically sparse time series data. Therefore, we further propose SAMBA, a Simplified Mamba with disentangled dependency encoding. Specifically, we first remove the non-linearities of Mamba to make it more suitable for LTSF. Furthermore, we propose a disentangled dependency encoding strategy to endow Mamba with cross-variate dependency modeling capabilities while reducing the interference between time and variate dimensions. Extensive experimental results on seven real-world datasets demonstrate the effectiveness of SAMBA over state-of-the-art forecasting models.


High-Order Conditional Mutual Information Maximization for dealing with High-Order Dependencies in Feature Selection

Souza, Francisco, Premebida, Cristiano, Araújo, Rui

arXiv.org Artificial Intelligence

This paper presents a novel feature selection method based on the conditional mutual information (CMI). The proposed High Order Conditional Mutual Information Maximization (HOCMIM) incorporates high order dependencies into the feature selection procedure and has a straightforward interpretation due to its bottom-up derivation. The HOCMIM is derived from the CMI's chain expansion and expressed as a maximization optimization problem. The maximization problem is solved using a greedy search procedure, which speeds up the entire feature selection process. The experiments are run on a set of benchmark datasets (20 in total). The HOCMIM is compared with eighteen state-of-the-art feature selection algorithms, from the results of two supervised learning classifiers (Support Vector Machine and K-Nearest Neighbor). The HOCMIM achieves the best results in terms of accuracy and shows to be faster than high order feature selection counterparts.


Advantages and a Limitation of Using LEG Nets in a Real-TIme Problem

Slack, Thomas

arXiv.org Artificial Intelligence

After experimenting with a number of non-probabilistic methods for dealing with uncertainty many researchers reaffirm a preference for probability methods [1] [2], although this remains controversial. The importance of being able to form decisions from incomplete data in diagnostic problems has highlighted probabilistic methods [5] which compute posterior probabilities from prior distributions in a way similar to Bayes Rule, and thus are called Bayesian methods. This paper documents the use of a Bayesian method in a real time problem which is similar to medical diagnosis in that there is a need to form decisions and take some action without complete knowledge of conditions in the problem domain. This particular method has a limitation which is discussed.


Products of ``Edge-perts

Welling, Max, Gehler, Peter V.

Neural Information Processing Systems

Images represent an important and abundant source of data. Understanding theirstatistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the "products of edge-perts model" to describe thestructure of wavelet transformed images. We develop a practical denoising algorithm based on a single edge-pert and show state-ofthe-art denoisingperformance on benchmark images.