Goto

Collaborating Authors

 Wang, Deliang


IKUN: Initialization to Keep snn training and generalization great with sUrrogate-stable variaNce

arXiv.org Artificial Intelligence

Weight initialization significantly impacts the convergence and performance of neural networks. While traditional methods like Xavier and Kaiming initialization are widely used, they often fall short for spiking neural networks (SNNs), which have distinct requirements compared to artificial neural networks (ANNs). To address this, we introduce \textbf{IKUN}, a variance-stabilizing initialization method integrated with surrogate gradient functions, specifically designed for SNNs. \textbf{IKUN} stabilizes signal propagation, accelerates convergence, and enhances generalization. Experiments show \textbf{IKUN} improves training efficiency by up to \textbf{50\%}, achieving \textbf{95\%} training accuracy and \textbf{91\%} generalization accuracy. Hessian analysis reveals that \textbf{IKUN}-trained models converge to flatter minima, characterized by Hessian eigenvalues near zero on the positive side, promoting better generalization. The method is open-sourced for further exploration: \href{https://github.com/MaeChd/SurrogateVarStabe}{https://github.com/MaeChd/SurrogateVarStabe}.


Interpreting Deep Knowledge Tracing Model on EdNet Dataset

arXiv.org Artificial Intelligence

With more deep learning techniques being introduced into the knowledge tracing domain, the interpretability issue of the knowledge tracing models has aroused researchers' attention. Our previous study(Lu et al. 2020) on building and interpreting the KT model mainly adopts the ASSISTment dataset(Feng, Heffernan, and Koedinger 2009),, whose size is relatively small. In this work, we perform the similar tasks but on a large and newly available dataset, called EdNet(Choi et al. 2020). The preliminary experiment results show the effectiveness of the interpreting techniques, while more questions and tasks are worthy to be further explored and accomplished.


Towards Interpretable Deep Learning Models for Knowledge Tracing

arXiv.org Artificial Intelligence

As an important technique for modeling the knowledge states of learners, the traditional knowledge tracing (KT) models have been widely used to support intelligent tutoring systems and MOOC platforms. Driven by the fast advancements of deep learning techniques, deep neural network has been recently adopted to design new KT models for achieving better prediction performance. However, the lack of interpretability of these models has painfully impeded their practical applications, as their outputs and working mechanisms suffer from the intransparent decision process and complex inner structures. We thus propose to adopt the post-hoc method to tackle the interpretability issue for deep learning based knowledge tracing (DLKT) models. Specifically, we focus on applying the layer-wise relevance propagation (LRP) method to interpret RNN-based DLKT model by backpropagating the relevance from the model's output layer to its input layer. The experiment results show the feasibility using the LRP method for interpreting the DLKT model's predictions, and partially validate the computed relevance scores from both question level and concept level. We believe it can be a solid step towards fully interpreting the DLKT models and promote their practical applications in the education domain.


Cocktail Party Processing via Structured Prediction

Neural Information Processing Systems

While human listeners excel at selectively attending to a conversation in a cocktail party, machine performance is still far inferior by comparison. We show that the cocktail party problem, or the speech separation problem, can be effectively approached via structured prediction. To account for temporal dynamics in speech, we employ conditional random fields (CRFs) to classify speech dominance within each time-frequency unit for a sound mixture. To capture complex, nonlinear relationship between input and output, both state and transition feature functions in CRFs are learned by deep neural networks. The formulation of the problem as classification allows us to directly optimize a measure that is well correlated with human speech intelligibility. The proposed system substantially outperforms existing ones in a variety of noises.


A Classification-based Cocktail-party Processor

Neural Information Processing Systems

At a cocktail party, a listener can selectively attend to a single voice and filter out other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial location cues: interaural time differences (ITD) and interaural intensity differences (IID). Motivated by the auditory masking effect, we employ the notion of an ideal time-frequency binary mask, which selects the target if it is stronger than the interference in a local time-frequency unit. Within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for estimated ITD and IID.


A Classification-based Cocktail-party Processor

Neural Information Processing Systems

At a cocktail party, a listener can selectively attend to a single voice and filter out other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial location cues: interaural time differences (ITD) and interaural intensity differences (IID). Motivated by the auditory masking effect, we employ the notion of an ideal time-frequency binary mask, which selects the target if it is stronger than the interference in a local time-frequency unit. Within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for estimated ITD and IID.


Monaural Speech Separation

Neural Information Processing Systems

Monaural speech separation has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with speech in the highfrequency range. Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. Motivated by this, we propose a model for monaural separation that deals with low-frequency and highfrequency signals differently. For resolved harmonics, our model generates segments based on temporal continuity and cross-channel correlation, and groups them according to periodicity. For unresolved harmonics, the model generates segments based on amplitude modulation (AM) in addition to temporal continuity and groups them according to AM repetition rates derived from sinusoidal modeling. Underlying the separation process is a pitch contour obtained according to psychoacoustic constraints. Our model is systematically evaluated, and it yields substantially better performance than previous systems, especially in the high-frequency range.


Monaural Speech Separation

Neural Information Processing Systems

Monaural speech separation has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with speech in the highfrequency range. Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. Motivated by this, we propose a model for monaural separation that deals with low-frequency and highfrequency signals differently. For resolved harmonics, our model generates segments based on temporal continuity and cross-channel correlation, and groups them according to periodicity. For unresolved harmonics, the model generates segments based on amplitude modulation (AM) in addition to temporal continuity and groups them according to AM repetition rates derived from sinusoidal modeling. Underlying the separation process is a pitch contour obtained according to psychoacoustic constraints. Our model is systematically evaluated, and it yields substantially better performance than previous systems, especially in the high-frequency range.


Monaural Speech Separation

Neural Information Processing Systems

Monaural speech separation has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with speech in the highfrequency range.Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. Motivated by this, we propose a model for monaural separation that deals with low-frequency and highfrequency signalsdifferently. For resolved harmonics, our model generates segments based on temporal continuity and cross-channel correlation, and groups them according to periodicity. For unresolved harmonics, the model generates segments based on amplitude modulation (AM) in addition to temporal continuity and groups them according to AM repetition rates derived from sinusoidal modeling. Underlying the separation process is a pitch contour obtained according to psychoacoustic constraints. Our model is systematically evaluated, and it yields substantially better performance than previous systems, especially in the high-frequency range.