Goto

Collaborating Authors

 homogeneity








ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences

Neural Information Processing Systems

Quantization compresses models to low bits for efficient inferences which has received increasing attentions. However, existing approaches focused on balanced datasets, while imbalanced data is pervasive in the real world. Therefore, in this study, we investigate the realistic problem, quantization on class-imbalanced data. We observe from the analytical results that quantizing imbalanced data tends to obtain a large error due to the differences between separate class distributions, which leads to a significant accuracy loss. To address this issue, we propose a novel quantization framework, Class Imbalanced Quantization (ClimbQ) that focuses on diminishing the inter-class heterogeneity for quantization error reduction. ClimbQ first scales the variance of each class distribution and then projects data through the new distributions to the same space for quantization. To guarantee the homogeneity of class variances after the ClimbQ process, we examine the quantized features and derive that the homogeneity satisfies when data size for each class is restricted (bounded). Accordingly, we design a Homogeneous Variance Loss (HomoVar Loss) which reweights the data losses of each class based on the bounded data sizes to satisfy the homogeneity of class variances. Extensive experiments on class-imbalanced and benchmark balanced datasets reveal that ClimbQ outperforms the state-of-the-art quantization techniques, especially on highly imbalanced data.


Clustering Malware at Scale: A First Full-Benchmark Study

Mocko, Martin, Ševcech, Jakub, Chudá, Daniela

arXiv.org Artificial Intelligence

Recent years have shown that malware attacks still happen with high frequency. Malware experts seek to categorize and classify incoming samples to confirm their trustworthiness or prove their maliciousness. One of the ways in which groups of malware samples can be identified is through malware clustering. Despite the efforts of the community, malware clustering which incorporates benign samples has been under-explored. Moreover, despite the availability of larger public benchmark malware datasets, malware clustering studies have avoided fully utilizing these datasets in their experiments, often resorting to small datasets with only a few families. Additionally, the current state-of-the-art solutions for malware clustering remain unclear. In our study, we evaluate malware clustering quality and establish the state-of-the-art on Bodmas and Ember - two large public benchmark malware datasets. Ours is the first study of malware clustering performed on whole malware benchmark datasets. Additionally, we extend the malware clustering task by incorporating benign samples. Our results indicate that incorporating benign samples does not significantly degrade clustering quality. We find that there are differences in the quality of the created clusters between Ember and Bodmas, as well as a private industry dataset. Contrary to popular opinion, our top clustering performers are K-Means and BIRCH, with DBSCAN and HAC falling behind.



Adversarially Robust Multitask Adaptive Control

Fallah, Kasra, Toso, Leonardo F., Anderson, James

arXiv.org Artificial Intelligence

Adaptive control seeks to design controllers that adapt to uncertain or unknown system dynamics. Rooted in early work on self-tuning regulators for flight and aerospace applications [ Astr om and Wittenmark, 1973, Astr om, 1983], it remains central to modern control. Among its formulations, the linear quadratic regulator (LQR) serves as a canonical benchmark due to its tractability and theoretical appeal. Extensive research over the last five or so years has established non-asymptotic performance guarantees for adaptive LQR through regret analysis [Abbasi-Yadkori and Szepesv ari, 2011, Dean et al., 2018, Cohen et al., 2019, Simchowitz and Foster, 2020, Hazan et al., 2020, Ziemann and Sandberg, 2022], proving that in the single-system setting the optimal expected regret scales as O( dT), with d = d