Goto

Collaborating Authors

 feature pattern


null i

Neural Information Processing Systems

A.1 Equivalence between PreConv SA and vanilla SA Our proposed PreConv SA is formulated in Equation 1. f


Self-Attentive Spatio-Temporal Calibration for Precise Intermediate Layer Matching in ANN-to-SNN Distillation

Hong, Di, Wang, Yueming

arXiv.org Artificial Intelligence

Spiking Neural Networks (SNNs) are promising for low-power computation due to their event-driven mechanism but often suffer from lower accuracy compared to Artificial Neural Networks (ANNs). ANN-to-SNN knowledge distillation can improve SNN performance, but previous methods either focus solely on label information, missing valuable intermediate layer features, or use a layer-wise approach that neglects spatial and temporal semantic inconsistencies, leading to performance degradation.To address these limitations, we propose a novel method called self-attentive spatio-temporal calibration (SASTC). SASTC uses self-attention to identify semantically aligned layer pairs between ANN and SNN, both spatially and temporally. This enables the autonomous transfer of relevant semantic information. Extensive experiments show that SASTC outperforms existing methods, effectively solving the mismatching problem. Superior accuracy results include 95.12% on CIFAR-10, 79.40% on CIFAR-100 with 2 time steps, and 68.69% on ImageNet with 4 time steps for static datasets, and 97.92% on DVS-Gesture and 83.60% on DVS-CIFAR10 for neuromorphic datasets. This marks the first time SNNs have outperformed ANNs on both CIFAR-10 and CIFAR-100, shedding the new light on the potential applications of SNNs.


Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning

Liang, Yao, Wang, Yuwei, Li, Yang, Zeng, Yi

arXiv.org Artificial Intelligence

Fine-tuning techniques based on Large Pretrained Language Models (LPLMs) have been proven to significantly enhance model performance on a variety of downstream tasks and effectively control the output behaviors of LPLMs. Recent studies have proposed numerous methods for fine-tuning a small number of parameters based on open-source LPLMs, reducing the demand for computational and storage resources. Among these, reparameterization fine-tuning methods represented by LoRA (Low-Rank Adaptation) have gained popularity. We find that although these methods perform well in many aspects, there is still considerable room for improvement in terms of complex task adaptability, performance, stability, and algorithm complexity. In response to this, inspired by the idea that the functions of the brain are shaped by its geometric structure, this paper integrates this idea into LoRA technology and proposes a new matrix transformation-based reparameterization method for efficient fine-tuning, named Matrix-Transformation based Low-Rank Adaptation (MTLoRA). The spatiotemporal patterns of brain neural activity are the excitation of different wavelength characteristic patterns of its geometric structure. MTLoRA aims to dynamically alter its spatial geometric structure by applying a transformation-matrix T to perform linear transformations, such as rotation, scaling, and translation, on the task-specific parameter matrix, generating new matrix feature patterns (eigenvectors) to mimic the fundamental influence of complex geometric structure feature patterns in the brain on functions, thereby enhancing the model's performance in downstream tasks. The transformation-matrix T contains four different structures, each designed to simulate the geometric feature patterns of the brain at different levels. In Natural Language Understanding (NLU) tasks, it is evaluated using the GLUE benchmark test, and the results reveal that MTLoRA achieves an overall performance increase of about 1.0% across eight tasks and reduces the standard deviation by 0.7% in the Corpus of Linguistic Acceptability (CoLA) task; in Natural Language Generation (NLG) tasks, MTLoRA improves performance by an average of 0.95% and 0.56% in the DART and WebNLG tasks, respectively.


Discovering indicators of dark horse of soccer games by deep learning from sequential trading data

Lu, Liyao, Lyu, Qiang

arXiv.org Artificial Intelligence

It is not surprise for machine learning models to provide decent prediction accuracy of soccer games outcomes based on various objective metrics. However, the performance is not that decent in terms of predicting difficult and valuable matches. A deep learning model is designed and trained on a real sequential trading data from the real prediction market, with the assumption that such trading data contain critical latent information to determine the game outcomes. A new loss function is proposed which biases the selection toward matches with high investment return to train our model. Full investigation of 4669 top soccer league matches showed that our model traded off prediction accuracy for high value return due to a certain ability to detect dark horses. A further try is conducted to depict some indicators discovered by our model for describing key features of big dark horses and regular hot horses.


Interpretation and Simplification of Deep Forest

Kim, Sangwon, Jeong, Mira, Ko, Byoung Chul

arXiv.org Artificial Intelligence

This paper proposes a new method for interpreting and simplifying a black box model of a deep random forest (RF) using a proposed rule elimination. In deep RF, a large number of decision trees are connected to multiple layers, thereby making an analysis difficult. It has a high performance similar to that of a deep neural network (DNN), but achieves a better generalizability. Therefore, in this study, we consider quantifying the feature contributions and frequency of the fully trained deep RF in the form of a decision rule set. The feature contributions provide a basis for determining how features affect the decision process in a rule set. Model simplification is achieved by eliminating unnecessary rules by measuring the feature contributions. Consequently, the simplified model has fewer parameters and rules than before. Experiment results have shown that a feature contribution analysis allows a black box model to be decomposed for quantitatively interpreting a rule set. The proposed method was successfully applied to various deep RF models and benchmark datasets while maintaining a robust performance despite the elimination of a large number of rules.


Exploiting Local Feature Patterns for Unsupervised Domain Adaptation

Wen, Jun, Liu, Risheng, Zheng, Nenggan, Zheng, Qian, Gong, Zhefeng, Yuan, Junsong

arXiv.org Machine Learning

Unsupervised domain adaptation methods aim to alleviate performance degradation caused by domain-shift by learning domain-invariant representations. Existing deep domain adaptation methods focus on holistic feature alignment by matching source and target holistic feature distributions, without considering local features and their multi-mode statistics. We show that the learned local feature patterns are more generic and transferable and a further local feature distribution matching enables fine-grained feature alignment. In this paper, we present a method for learning domain-invariant local feature patterns and jointly aligning holistic and local feature statistics. Comparisons to the state-of-the-art unsupervised domain adaptation methods on two popular benchmark datasets demonstrate the superiority of our approach and its effectiveness on alleviating negative transfer.


General Latent Feature Models for Heterogeneous Datasets

Valera, Isabel, Pradier, Melanie F., Lomeli, Maria, Ghahramani, Zoubin

arXiv.org Machine Learning

Latent feature modeling allows capturing the latent structure responsible for generating the observed properties of a set of objects. It is often used to make predictions either for new values of interest or missing information in the original data, as well as to perform data exploratory analysis. However, although there is an extensive literature on latent feature models for homogeneous datasets, where all the attributes that describe each object are of the same (continuous or discrete) nature, there is a lack of work on latent feature modeling for heterogeneous databases. In this paper, we introduce a general Bayesian nonparametric latent feature model suitable for heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while keeping the properties of conjugate models, which allow us to infer the model in linear time with respect to the number of objects and attributes. Second, its Bayesian nonparametric nature allows us to automatically infer the model complexity from the data, i.e., the number of features necessary to capture the latent structure in the data. Third, the latent features in the model are binary-valued variables, easing the interpretability of the obtained latent features in data exploratory analysis. We show the flexibility of the proposed model by solving both prediction and data analysis tasks on several real-world datasets. Moreover, a software package of the GLFM is publicly available for other researcher to use and improve it.


Quantum Model for Conjoint Recognition

Busemeyer, Jerome R. (Cognitive Science Indiana University) | Trueblood, Jennifer S. (Indiana University)

AAAI Conferences

In a conjoint memory recognition task, a person is presented a list of target items to remember. Afterwards, a test probe is presented which is sampled from one of three mutually exclusive and exhaustive categories: one is a target from the set of previously presented targets; a second is a non target but meaningfully related to a target; and a third is a non target and unrelated to any target. The episodic overestimate effect refers to the fact that the probability of accepting a probe when asked if it is a target plus the probability of accepting a probe when asked if it is a related non target is greater than the probability of accepting a probe when asked if it is either a target or a non related target. Logically these two probabilities should be identical. Previously these results were explained by a dual process theory. This article presents an alternative quantum memory recognition model for this effect that addresses some problematic issues that arise with the dual process explanation.


Analyzing Prosodic Features and Student Uncertainty using Visualization

Xiong, Wenting (University of Pittsburgh) | Litman, Diane J. (University of Pittsburgh) | Marai, G. Elisabeta (University of Pittsburgh)

AAAI Conferences

It has been hypothesized that to maximize learning, intelligent tutoring systems should detect and respond to both cognitive student states, and affective and metacognitive states such as uncertainty. In intelligent tutoring research so far, student state detection is primarily based on information available from a single student-system exchange unit, or turn. However, the features used in the detection of such states may have a temporal component, spanning multiple turns, and may change throughout the tutoring process. To test this hypothesis, an interactive tool was implemented for the visual analysis of prosodic features across a corpus of student turns previously annotated for uncertainty. The tool consists of two complementary visualization modules. The first module allows researchers to visually mine the feature data for patterns per individual student dialogue, and form hypotheses about feature dependencies. The second module allows researchers to quickly test these hypotheses on groups of students through statistical visual analysis of feature dependencies. Results show that significant differences exist among feature patterns across different student groups. Further analysis suggests that feature patterns may vary with student domain knowledge.