Goto

Collaborating Authors

 discriminative feature



NeuralAnisotropyDirections

Neural Information Processing Systems

In machine learning, given a finite set of samples, there are usually multiple solutions that can perfectly fit the training data, but theinductive biasof a learning algorithm selects and prioritizes those solutions that agree with itsaprioriassumptions [1,2].





Harnessing Hard Mixed Samples with Decoupled Regularizer

Neural Information Processing Systems

Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data. Recently, dynamic mixup methods have improved previous \textit{static} policies effectively (e.g., linear interpolation) by maximizing target-related salient regions in mixed samples, but excessive additional time costs are not acceptable. These additional computational overheads mainly come from optimizing the mixed samples according to the mixed labels. However, we found that the extra optimizing step may be redundant because label-mismatched mixed samples are informative hard mixed samples for deep models to localize discriminative features. In this paper, we thus are not trying to propose a more complicated dynamic mixup policy but rather an efficient mixup objective function with decoupled regularizer, named decoupled mixup (DM). The primary effect is that DM can adaptively utilize those hard mixed samples to mine discriminative features without losing the original smoothness of mixup. As a result, DM enables static mixup methods to achieve comparable or even exceed the performance of dynamic methods without any extra computation. This also leads to an interesting objective design problem for mixup training that we need to focus on both smoothing the decision boundaries and identifying discriminative features.


Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning

Neural Information Processing Systems

Online continual learning (OCL) requires the models to learn from constant, endless streams of data. While significant efforts have been made in this field, most were focused on mitigating the \textit{catastrophic forgetting} issue to achieve better classification ability, at the cost of a much heavier training workload. They overlooked that in real-world scenarios, e.g., in high-speed data stream environments, data do not pause to accommodate slow models. In this paper, we emphasize that \textit{model throughput}-- defined as the maximum number of training samples that a model can process within a unit of time -- is equally important. It directly limits how much data a model can utilize and presents a challenging dilemma for current methods. With this understanding, we revisit key challenges in OCL from both empirical and theoretical perspectives, highlighting two critical issues beyond the well-documented catastrophic forgetting: (\romannumeral1) Model's ignorance: the single-pass nature of OCL challenges models to learn effective features within constrained training time and storage capacity, leading to a trade-off between effective learning and model throughput; (\romannumeral2) Model's myopia: the local learning nature of OCL on the current task leads the model to adopt overly simplified, task-specific features and \textit{excessively sparse classifier}, resulting in the gap between the optimal solution for the current task and the global objective. To tackle these issues, we propose the Non-sparse Classifier Evolution framework (NsCE) to facilitate effective global discriminative feature learning with minimal time cost. NsCE integrates non-sparse maximum separation regularization and targeted experience replay techniques with the help of pre-trained models, enabling rapid acquisition of new globally discriminative features. Extensive experiments demonstrate the substantial improvements of our framework in performance, throughput and real-world practicality.


Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection

Neural Information Processing Systems

While numerous Video Violence Detection (VVD) methods have focused on representation learning in Euclidean space, they struggle to learn sufficiently discriminative features, leading to weaknesses in recognizing normal events that are visually similar to violent events (i.e., ambiguous violence). In contrast, hyperbolic representation learning, renowned for its ability to model hierarchical and complex relationships between events, has the potential to amplify the discrimination between visually similar events. Inspired by these, we develop a novel Dual-Space Representation Learning (DSRL) method for weakly supervised VVD to utilize the strength of both Euclidean and hyperbolic geometries, capturing the visual features of events while also exploring the intrinsic relations between events, thereby enhancing the discriminative capacity of the features. DSRL employs a novel information aggregation strategy to progressively learn event context in hyperbolic spaces, which selects aggregation nodes through layer-sensitive hyperbolic association degrees constrained by hyperbolic Dirichlet energy. Furthermore, DSRL attempts to break the cyber-balkanization of different spaces, utilizing cross-space attention to facilitate information interactions between Euclidean and hyperbolic space to capture better discriminative features for final violence detection. Comprehensive experiments demonstrate the effectiveness of our proposed DSRL.


Hold me tight! Influence of discriminative features on deep network boundaries

Neural Information Processing Systems

Important insights towards the explainability of neural networks reside in the characteristics of their decision boundaries. In this work, we borrow tools from the field of adversarial robustness, and propose a new perspective that relates dataset features to the distance of samples to the decision boundary. This enables us to carefully tweak the position of the training samples and measure the induced changes on the boundaries of CNNs trained on large-scale vision datasets. We use this framework to reveal some intriguing properties of CNNs. Specifically, we rigorously confirm that neural networks exhibit a high invariance to non-discriminative features, and show that the decision boundaries of a DNN can only exist as long as the classifier is trained with some features that hold them together. Finally, we show that the construction of the decision boundary is extremely sensitive to small perturbations of the training samples, and that changes in certain directions can lead to sudden invariances in the orthogonal ones. This is precisely the mechanism that adversarial training uses to achieve robustness.


Structure-aware Hybrid-order Similarity Learning for Multi-view Unsupervised Feature Selection

Xu, Lin, Li, Ke, Wang, Dongjie, Lv, Fengmao, Li, Tianrui, Huang, Yanyong

arXiv.org Artificial Intelligence

Multi-view unsupervised feature selection (MUFS) has recently emerged as an effective dimensionality reduction method for unlabeled multi-view data. However, most existing methods mainly use first-order similarity graphs to preserve local structure, often overlooking the global structure that can be captured by second-order similarity. In addition, a few MUFS methods leverage predefined second-order similarity graphs, making them vulnerable to noise and outliers and resulting in suboptimal feature selection performance. In this paper, we propose a novel MUFS method, termed Structure-aware Hybrid-order sImilarity learNing for multi-viEw unsupervised Feature Selection (SHINE-FS), to address the aforementioned problem. SHINE-FS first learns consensus anchors and the corresponding anchor graph to capture the cross-view relationships between the anchors and the samples. Based on the acquired cross-view consensus information, it generates low-dimensional representations of the samples, which facilitate the reconstruction of multi-view data by identifying discriminative features. Subsequently, it employs the anchor-sample relationships to learn a second-order similarity graph. Furthermore, by jointly learning first-order and second-order similarity graphs, SHINE-FS constructs a hybrid-order similarity graph that captures both local and global structures, thereby revealing the intrinsic data structure to enhance feature selection. Comprehensive experimental results on real multi-view datasets show that SHINE-FS outperforms the state-of-the-art methods.