conf
On the Optimal Number of Grids for Differentially Private Non-Interactive $K$-Means Clustering
Muthukrishnan, Gokularam, Tandon, Anshoo
Differentially private $K$-means clustering enables releasing cluster centers derived from a dataset while protecting the privacy of the individuals. Non-interactive clustering techniques based on privatized histograms are attractive because the released data synopsis can be reused for other downstream tasks without additional privacy loss. The choice of the number of grids for discretizing the data points is crucial, as it directly controls the quantization bias and the amount of noise injected to preserve privacy. The widely adopted strategy selects a grid size that is independent of the number of clusters and also relies on empirical tuning. In this work, we revisit this choice and propose a refined grid-size selection rule derived by minimizing an upper bound on the expected deviation in the K-means objective function, leading to a more principled discretization strategy for non-interactive private clustering. Compared to prior work, our grid resolution differs both in its dependence on the number of clusters and in the scaling with dataset size and privacy budget. Extensive numerical results elucidate that the proposed strategy results in accurate clustering compared to the state-of-the-art techniques, even under tight privacy budgets.
- North America > United States (0.14)
- Asia > India > Karnataka > Bengaluru (0.04)
Deep Autocorrelation Modeling for Time-Series Forecasting: Progress and Prospects
Wang, Hao, Pan, Licheng, Wen, Qingsong, Yu, Jialin, Chen, Zhichao, Zheng, Chunyuan, Li, Xiaoxi, Chu, Zhixuan, Xu, Chao, Gong, Mingming, Li, Haoxuan, Lu, Yuan, Lin, Zhouchen, Torr, Philip, Liu, Yan
Autocorrelation is a defining characteristic of time-series data, where each observation is statistically dependent on its predecessors. In the context of deep time-series forecasting, autocorrelation arises in both the input history and the label sequences, presenting two central research challenges: (1) designing neural architectures that model autocorrelation in history sequences, and (2) devising learning objectives that model autocorrelation in label sequences. Recent studies have made strides in tackling these challenges, but a systematic survey examining both aspects remains lacking. To bridge this gap, this paper provides a comprehensive review of deep time-series forecasting from the perspective of autocorrelation modeling. In contrast to existing surveys, this work makes two distinctive contributions. First, it proposes a novel taxonomy that encompasses recent literature on both model architectures and learning objectives -- whereas prior surveys neglect or inadequately discuss the latter aspect. Second, it offers a thorough analysis of the motivations, insights, and progression of the surveyed literature from a unified, autocorrelation-centric perspective, providing a holistic overview of the evolution of deep time-series forecasting. The full list of papers and resources is available at https://github.com/Master-PLC/Awesome-TSF-Papers.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Research Report (1.00)
- Overview (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Unified Taxonomy for Multivariate Time Series Anomaly Detection using Deep Learning
Alves, Bruna, Pinho, Armando J., Gouveia, Sónia
The topic of Multivariate Time Series Anomaly Detection (MTSAD) has grown rapidly over the past years, with a steady rise in publications and Deep Learning (DL) models becoming the dominant paradigm. To address the lack of systematization in the field, this study introduces a novel and unified taxonomy with eleven dimensions over three parts (Input, Output and Model) for the categorization of DL-based MTSAD methods. The dimensions were established in a two-fold approach. First, they derived from a comprehensive analysis of methodological studies. Second, insights from review papers were incorporated. Furthermore, the proposed taxonomy was validated using an additional set of recent publications, providing a clear overview of methodological trends in MTSAD. Results reveal a convergence toward Transformer-based and reconstruction and prediction models, setting the foundation for emerging adaptive and generative trends. Building on and complementing existing surveys, this unified taxonomy is designed to accommodate future developments, allowing for new categories or dimensions to be added as the field progresses. This work thus consolidates fragmented knowledge in the field and provides a reference point for future research in MTSAD.
- Research Report (1.00)
- Overview (0.93)
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
Wang, Hao, Pan, Licheng, Chen, Zhichao, Zheng, Chunyuan, Chu, Zhixuan, Li, Xiaoxi, Lu, Yuan, Liu, Xinggao, Li, Haoxuan, Lin, Zhouchen
Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models, current reward modeling heavily relies on experimental feedback data collected from human annotators under controlled and costly conditions. In this work, we introduce observational reward modeling -- learning reward models with observational user feedback (e.g., clicks, copies, and upvotes) -- as a scalable and cost-effective alternative. We identify two fundamental challenges in this setting: (1) observational feedback is noisy due to annotation errors, which deviates it from true user preference; (2) observational feedback is biased by user preference, where users preferentially provide feedback on responses they feel strongly about, which creats a distribution shift between training and inference data. To address these challenges, we propose CausalRM, a causal-theoretic reward modeling framework that aims to learn unbiased reward models from observational feedback. To tackle challenge (1), CausalRM introduces a noise-aware surrogate loss term that is provably equivalent to the primal loss under noise-free conditions by explicitly modeling the annotation error generation process. To tackle challenge (2), CausalRM uses propensity scores -- the probability of a user providing feedback for a given response -- to reweight training samples, yielding a loss function that eliminates user preference bias. Extensive experiments across diverse LLM backbones and benchmark datasets validate that CausalRM effectively learns accurate reward signals from noisy and biased observational feedback and delivers substantial performance improvements on downstream RLHF tasks -- including a 49.2% gain on WildGuardMix and a 32.7% improvement on HarmBench. Code is available on our project website.
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- (4 more...)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology (0.68)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment
To avoid redundancy in these synthetic datasets, it is crucial that each element contains unique features and remains diverse from others during the synthesis stage. In this paper, we provide a thorough theoretical and empirical analysis of diversity within synthesized datasets. We argue that enhancing diversity can improve the parallelizable yet isolated synthesizing approach.
- Asia > Singapore > Central Region > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)