Goto

Collaborating Authors

 ca ts


cf66f995883298c4db2f0dcba28fb211-Paper-Conference.pdf

Neural Information Processing Systems

Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformers have dramatically advanced the landscape of forecasting, their effectiveness remains debated.





EEGChaT: A Transformer-Based Modular Channel Selector for SEEG Analysis

arXiv.org Artificial Intelligence

Analyzing stereoelectroencephalography (SEEG) signals is critical for brain-computer interface (BCI) applications and neuroscience research, yet poses significant challenges due to the large number of input channels and their heterogeneous relevance. Traditional channel selection methods struggle to scale or provide meaningful interpretability for SEEG data. In this work, we propose EEGChaT, a novel Transformer-based channel selection module designed to automatically identify the most task-relevant channels in SEEG recordings. EEGChaT introduces Channel Aggregation Tokens (CATs) to aggregate information across channels, and leverages an improved Attention Rollout technique to compute interpretable, quantitative channel importance scores. We evaluate EEGChaT on the DuIN dataset, demonstrating that integrating EEGChaT with existing classification models consistently improves decoding accuracy, achieving up to 17\% absolute gains. Furthermore, the channel weights produced by EEGChaT show substantial overlap with manually selected channels, supporting the interpretability of the approach. Our results suggest that EEGChaT is an effective and generalizable solution for channel selection in high-dimensional SEEG analysis, offering both enhanced performance and insights into neural signal relevance.


cf66f995883298c4db2f0dcba28fb211-Paper-Conference.pdf

Neural Information Processing Systems

Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformers have dramatically advanced the landscape of forecasting, their effectiveness remains debated.


Improving Multilingual Social Media Insights: Aspect-based Comment Analysis

arXiv.org Artificial Intelligence

The inherent nature of social media posts, characterized by the freedom of language use with a disjointed array of diverse opinions and topics, poses significant challenges to downstream NLP tasks such as comment clustering, comment summarization, and social media opinion analysis. To address this, we propose a granular level of identifying and generating aspect terms from individual comments to guide model attention. Specifically, we leverage multilingual large language models with supervised fine-tuning for comment aspect term generation (CAT-G), further aligning the model's predictions with human expectations through DPO. We demonstrate the effectiveness of our method in enhancing the comprehension of social media discourse on two NLP tasks. Moreover, this paper contributes the first multilingual CAT-G test set on English, Chinese, Malay, and Bahasa Indonesian. As LLM capabilities vary among languages, this test set allows for a comparative analysis of performance across languages with varying levels of LLM proficiency.


Reward Model Generalization for Compute-Aware Test-Time Reasoning

arXiv.org Artificial Intelligence

External test-time reasoning enhances large language models (LLMs) by decoupling generation and selection. At inference time, the model generates multiple reasoning paths, and an auxiliary process reward model (PRM) is used to score and select the best one. A central challenge in this setting is test-time compute optimality (TCO), i.e., how to maximize answer accuracy under a fixed inference budget. In this work, we establish a theoretical framework to analyze how the generalization error of the PRM affects compute efficiency and reasoning performance. Leveraging PAC-Bayes theory, we derive generalization bounds and show that a lower generalization error of PRM leads to fewer samples required to find correct answers. Motivated by this analysis, we propose Compute-Aware Tree Search (CATS), an actor-critic framework that dynamically controls search behavior. The actor outputs sampling hyperparameters based on reward distributions and sparsity statistics, while the critic estimates their utility to guide budget allocation. Experiments on the MATH and AIME benchmarks with various LLMs and PRMs demonstrate that CATS consistently outperforms other external TTS methods, validating our theoretical predictions.


CATS: Mitigating Correlation Shift for Multivariate Time Series Classification

arXiv.org Machine Learning

Unsupervised Domain Adaptation (UDA) leverages labeled source data to train models for unlabeled target data. Given the prevalence of multivariate time series (MTS) data across various domains, the UDA task for MTS classification has emerged as a critical challenge. However, for MTS data, correlations between variables often vary across domains, whereas most existing UDA works for MTS classification have overlooked this essential characteristic. To bridge this gap, we introduce a novel domain shift, {\em correlation shift}, measuring domain differences in multivariate correlation. To mitigate correlation shift, we propose a scalable and parameter-efficient \underline{C}orrelation \underline{A}dapter for M\underline{TS} (CATS). Designed as a plug-and-play technique compatible with various Transformer variants, CATS employs temporal convolution to capture local temporal patterns and a graph attention module to model the changing multivariate correlation. The adapter reweights the target correlations to align the source correlations with a theoretically guaranteed precision. A correlation alignment loss is further proposed to mitigate correlation shift, bypassing the alignment challenge from the non-i.i.d. nature of MTS data. Extensive experiments on four real-world datasets demonstrate that (1) compared with vanilla Transformer-based models, CATS increases over $10\%$ average accuracy while only adding around $1\%$ parameters, and (2) all Transformer variants equipped with CATS either reach or surpass state-of-the-art baselines.


Post-Training Statistical Calibration for Higher Activation Sparsity

arXiv.org Artificial Intelligence

We present Statistical Calibrated Activation Pruning (SCAP), a post-training activation pruning framework that (1) generalizes sparsification by input activations of Fully-Connected layers for generic and flexible application across Transformers, and (2) features a simple Mode-Centering technique to pre-calibrate activation distributions for maximizing post-training sparsity. Our results demonstrate robust Pareto efficiency compared to prior methods, translating to a 1.5x additional LLM decoding speedup against CATS at iso model quality. SCAP effectiveness is empirically verified across a wide range of models, including recent Transformer Decoders, MoE, Mamba2, Encoding Transformer, and pre-quantized models, highlighting its practicality and scalability. The code is available at: https://github.com/IntelLabs/SCAP.