discriminative pattern
Scale-teaching: Robust Multi-scale Training for Time Series Classification with Noisy Labels
Deep Neural Networks (DNNs) have been criticized because they easily overfit noisy (incorrect) labels. To improve the robustness of DNNs, existing methods for image data regard samples with small training losses as correctly labeled data (small-loss criterion). Nevertheless, time series' discriminative patterns are easily distorted by external noises (i.e., frequency perturbations) during the recording process. This results in training losses of some time series samples that do not meet the small-loss criterion. Therefore, this paper proposes a deep learning paradigm called Scale-teaching to cope with time series noisy labels. Specifically, we design a fine-to-coarse cross-scale fusion mechanism for learning discriminative patterns by utilizing time series at different scales to train multiple DNNs simultaneously. Meanwhile, each network is trained in a cross-teaching manner by using complementary information from different scales to select small-loss samples as clean labels. For unselected large-loss samples, we introduce multi-scale embedding graph learning via label propagation to correct their labels by using selected clean samples. Experiments on multiple benchmark time series datasets demonstrate the superiority of the proposed Scale-teaching paradigm over state-of-the-art methods in terms of effectiveness and robustness.
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Li, Hongkang, Zhang, Yihua, Zhang, Shuai, Wang, Meng, Liu, Sijia, Chen, Pin-Yu
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors, each of which is the weight update from the pre-trained model to fine-tuned models for certain tasks. This approach recently gained attention as a computationally efficient inference method for model editing, e.g., multi-task learning, forgetting, and out-of-domain generalization capabilities. However, the theoretical understanding of why task vectors can execute various conceptual operations remains limited, due to the highly non-convexity of training Transformer-based models. To the best of our knowledge, this paper provides the first theoretical characterization of the generalization guarantees of task vector methods on nonlinear Transformers. We consider a conceptual learning setting, where each task is a binary classification problem based on a discriminative pattern. We theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or aligned tasks, as well as the success of task negation in unlearning one task from irrelevant or contradictory tasks. Moreover, we prove the proper selection of linear coefficients for task arithmetic to achieve guaranteed generalization to out-of-domain tasks. All of our theoretical results hold for both dense-weight parameters and their low-rank approximations. Although established in a conceptual setting, our theoretical findings were validated on a practical machine unlearning task using the large language model Phi-1.5 (1.3B).
Temporal Alignment-Free Video Matching for Few-shot Action Recognition
Lee, SuBeen, Moon, WonJun, Seong, Hyun Seok, Heo, Jae-Pil
Few-Shot Action Recognition (FSAR) aims to train a model with only a few labeled video instances. A key challenge in FSAR is handling divergent narrative trajectories for precise video matching. While the frame- and tuple-level alignment approaches have been promising, their methods heavily rely on pre-defined and length-dependent alignment units (e.g., frames or tuples), which limits flexibility for actions of varying lengths and speeds. In this work, we introduce a novel TEmporal Alignment-free Matching (TEAM) approach, which eliminates the need for temporal units in action representation and brute-force alignment during matching. Specifically, TEAM represents each video with a fixed set of pattern tokens that capture globally discriminative clues within the video instance regardless of action length or speed, ensuring its flexibility. Furthermore, TEAM is inherently efficient, using token-wise comparisons to measure similarity between videos, unlike existing methods that rely on pairwise comparisons for temporal alignment. Additionally, we propose an adaptation process that identifies and removes common information across classes, establishing clear boundaries even between novel categories. Extensive experiments demonstrate the effectiveness of TEAM. Codes are available at github.com/leesb7426/TEAM.
Scale-teaching: Robust Multi-scale Training for Time Series Classification with Noisy Labels
Deep Neural Networks (DNNs) have been criticized because they easily overfit noisy (incorrect) labels. To improve the robustness of DNNs, existing methods for image data regard samples with small training losses as correctly labeled data (small-loss criterion). Nevertheless, time series' discriminative patterns are easily distorted by external noises (i.e., frequency perturbations) during the recording process. This results in training losses of some time series samples that do not meet the small-loss criterion. Therefore, this paper proposes a deep learning paradigm called Scale-teaching to cope with time series noisy labels. Specifically, we design a fine-to-coarse cross-scale fusion mechanism for learning discriminative patterns by utilizing time series at different scales to train multiple DNNs simultaneously.
DeepFilter: An Instrumental Baseline for Accurate and Efficient Process Monitoring
Wang, Hao, Chen, Zhichao, Pan, Licheng, Jiang, Xiaoyu, Song, Yichen, He, Qunshan, Liu, Xinggao
Effective process monitoring is increasingly vital in industrial automation for ensuring operational safety, necessitating both high accuracy and efficiency. Although Transformers have demonstrated success in various fields, their canonical form based on the self-attention mechanism is inadequate for process monitoring due to two primary limitations: (1) the step-wise correlations captured by self-attention mechanism are difficult to capture discriminative patterns in monitoring logs due to the lacking semantics of each step, thus compromising accuracy; (2) the quadratic computational complexity of self-attention hampers efficiency. To address these issues, we propose DeepFilter, a Transformer-style framework for process monitoring. The core innovation is an efficient filtering layer that excel capturing long-term and periodic patterns with reduced complexity. Equipping with the global filtering layer, DeepFilter enhances both accuracy and efficiency, meeting the stringent demands of process monitoring. Experimental results on real-world process monitoring datasets validate DeepFilter's superiority in terms of accuracy and efficiency compared to existing state-of-the-art models.
VSFormer: Value and Shape-Aware Transformer with Prior-Enhanced Self-Attention for Multivariate Time Series Classification
Xi, Wenjie, Zuo, Rundong, Alvarez, Alejandro, Zhang, Jie, Choi, Byron, Lin, Jessica
Multivariate time series classification is a crucial task in data mining, attracting growing research interest due to its broad applications. While many existing methods focus on discovering discriminative patterns in time series, real-world data does not always present such patterns, and sometimes raw numerical values can also serve as discriminative features. Additionally, the recent success of Transformer models has inspired many studies. However, when applying to time series classification, the self-attention mechanisms in Transformer models could introduce classification-irrelevant features, thereby compromising accuracy. To address these challenges, we propose a novel method, VSFormer, that incorporates both discriminative patterns (shape) and numerical information (value). In addition, we extract class-specific prior information derived from supervised information to enrich the positional encoding and provide classification-oriented self-attention learning, thereby enhancing its effectiveness. Extensive experiments on all 30 UEA archived datasets demonstrate the superior performance of our method compared to SOTA models. Through ablation studies, we demonstrate the effectiveness of the improved encoding layer and the proposed self-attention mechanism. Finally, We provide a case study on a real-world time series dataset without discriminative patterns to interpret our model.
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension
Dai, Xinnan, Qu, Haohao, Shen, Yifen, Zhang, Bohang, Wen, Qihao, Fan, Wenqi, Li, Dongsheng, Tang, Jiliang, Shan, Caihua
Benchmarking the capabilities and limitations of large language models (LLMs) in graph-related tasks is becoming an increasingly popular and crucial area of research. Recent studies have shown that LLMs exhibit a preliminary ability to understand graph structures and node features. However, the potential of LLMs in graph pattern mining remains largely unexplored. This is a key component in fields such as computational chemistry, biology, and social network analysis. To bridge this gap, this work introduces a comprehensive benchmark to assess LLMs' capabilities in graph pattern tasks. We have developed a benchmark that evaluates whether LLMs can understand graph patterns based on either terminological or topological descriptions. Additionally, our benchmark tests the LLMs' capacity to autonomously discover graph patterns from data. The benchmark encompasses both synthetic and real datasets, and a variety of models, with a total of 11 tasks and 7 models. Our experimental framework is designed for easy expansion to accommodate new models and datasets. Our findings reveal that: (1) LLMs have preliminary abilities to understand graph patterns, with O1-mini outperforming in the majority of tasks; (2) Formatting input data to align with the knowledge acquired during pretraining can enhance performance; (3) The strategies employed by LLMs may differ from those used in conventional algorithms. Originally trained on textual data, LLMs have demonstrated remarkable success in various tasks, such as reading comprehension and text reasoning (Achiam et al., 2023; Touvron et al., 2023). To evaluate whether LLMs can adapt the text understanding ability across graphs, several studies have investigated this at both the feature and structural levels (Zhao et al., 2023; Chai et al., 2023). Specifically, LLMs have been shown to enhance node features in social networks (Ye et al., 2023; Huang et al., 2024). Additionally, graph structure understanding tasks, such as shortest path and connectivity, have also been evaluated (Guo et al., 2023; Wang et al., 2024). Graph patterns, a key aspect of graphs, have yet to be thoroughly explored.
Statistically Significant Discriminative Patterns Searching
Pham, Hoang Son, Virlet, Gwendal, Lavenier, Dominique, Termier, Alexandre
Discriminative pattern mining is an essential task of data mining. This task aims to discover patterns which occur more frequently in a class than other classes in a class-labeled dataset. This type of patterns is valuable in various domains such as bioinformatics, data classification. In this paper, we propose a novel algorithm, named SSDPS, to discover patterns in two-class datasets. The SSDPS algorithm owes its efficiency to an original enumeration strategy of the patterns, which allows to exploit some degrees of anti-monotonicity on the measures of discriminance and statistical significance. Experimental results demonstrate that the performance of the SSDPS algorithm is better than others. In addition, the number of generated patterns is much less than the number of the other algorithms. Experiment on real data also shows that SSDPS efficiently detects multiple SNPs combinations in genetic data.
Discovery of Shifting Patterns in Sequence Classification
Jia, Xiaowei, Khandelwal, Ankush, Karpatne, Anuj, Kumar, Vipin
In this paper, we investigate the multi-variate sequence classification problem from a multi-instance learning perspective. Real-world sequential data commonly show discriminative patterns only at specific time periods. For instance, we can identify a cropland during its growing season, but it looks similar to a barren land after harvest or before planting. Besides, even within the same class, the discriminative patterns can appear in different periods of sequential data. Due to such property, these discriminative patterns are also referred to as shifting patterns. The shifting patterns in sequential data severely degrade the performance of traditional classification methods without sufficient training data. We propose a novel sequence classification method by automatically mining shifting patterns from multi-variate sequence. The method employs a multi-instance learning approach to detect shifting patterns while also modeling temporal relationships within each multi-instance bag by an LSTM model to further improve the classification performance. We extensively evaluate our method on two real-world applications - cropland mapping and affective state recognition. The experiments demonstrate the superiority of our proposed method in sequence classification performance and in detecting discriminative shifting patterns.
Constraint-Based Search of Different Kinds of Discriminative Patterns
Cerf, Loïc (Universidade Federal de Minas Gerais) | Foscarini, João (Universidade Federal de Minas Gerais) | Guerra, Israel (Universidade Federal de Minas Gerais) | Boaventura, Michel (Universidade Federal de Minas Gerais) | Jr., Wagner Meira (Universidade Federal de Minas Gerais)
The state-of-the-art Data-Peeler algorithm extracts closed patterns in n-ary relations. Because it refines both a lower and an upper bound of the pattern space, Data-Peeler can, in some circumstances, guarantee that a region of that space does not contain any closed n-set satisfying some relevance constraint. Whenever it happens, such region is not explored and some time is saved. This paper shows that some constraints, which Data-Peeler can efficiently enforce, define useful patterns in the context of a relation with groups of elements in arbitrary dimensions. For instance, it can list the so-called straddling biclusters, which cover at least some given portions of every group. It can discover, as well, closed n-sets that discriminate a group from the others. The experimental section focuses on the latter case. It shows that Data-Peeler is highly competitive despite its general enumeration principles and its expressive class of constraints that open up new applicative perspectives.