Goto

Collaborating Authors

 hyp



A Concept uniqueness and granularity

Neural Information Processing Systems

Here, we report statistics about the uniqueness of neuron concepts, as we increase the maximum formula length of our explanations. Figure S1: Number of repeated concepts across probed vision and NLI models, by maximum formula length. Table S1: For probed Image Classification and NLI models, average number of occurrences of each detected concept and percentage of detected concepts that are unique (i.e. A.1 Image Classification Figure S1 (left) plots the number of times each unique concept appears across the 512 units of ResNet-18 as the maximum formula length increases. Table S1 displays the mean number of occurrences per concept, and percentage of concepts occurring that are unique (i.e.



a94a8800a4b0af45600bab91164849df-Supplemental-Conference.pdf

Neural Information Processing Systems

Supplementary Material: Can Adversarial Training Be Manipulated By Non-Robust Features? In this part, we discuss several independent (or concurrent) works that are closely related to this work. They also conclude that conventional adversarial training will prevent a drop in accuracy measured both on clean images and adversarial images. In contrast, we focus on a more realistic setting that does not require a larger attack budget. From this perspective, our work is complementary to theirs. This makes the threat of stability attacks more insidious than that of Fu et al. [19].


WIND: Accelerated RNN-T Decoding with Windowed Inference for Non-blank Detection

Xu, Hainan, Bataev, Vladimir, Grigoryan, Lilit, Ginsburg, Boris

arXiv.org Artificial Intelligence

We propose Windowed Inference for Non-blank Detection (WIND), a novel strategy that significantly accelerates RNN-T inference without compromising model accuracy. During model inference, instead of processing frames sequentially, WIND processes multiple frames simultaneously within a window in parallel, allowing the model to quickly locate non-blank predictions during decoding, resulting in significant speed-ups. We implement WIND for greedy decoding, batched greedy decoding with label-looping techniques, and also propose a novel beam-search decoding method. Experiments on multiple datasets with different conditions show that our method, when operating in greedy modes, speeds up as much as 2.4X compared to the baseline sequential approach while maintaining identical Word Error Rate (WER) performance. Our beam-search algorithm achieves slightly better accuracy than alternative methods, with significantly improved speed.


Generalization Bounds for Quantum Learning via Rényi Divergences

Warsi, Naqueeb Ahmad, Dasgupta, Ayanava, Hayashi, Masahito

arXiv.org Artificial Intelligence

This work advances the theoretical understanding of quantum learning by establishing a new family of upper bounds on the expected generalization error of quantum learning algorithms, leveraging the framework introduced by Caro et al. (2024) and a new definition for the expected true loss. Our primary contribution is the derivation of these bounds in terms of quantum and classical Rényi divergences, utilizing a variational approach for evaluating quantum Rényi divergences, specifically the Petz and a newly introduced modified sandwich quantum Rényi divergence. Analytically and numerically, we demonstrate the superior performance of the bounds derived using the modified sandwich quantum Rényi divergence compared to those based on the Petz divergence. Furthermore, we provide probabilistic generalization error bounds using two distinct techniques: one based on the modified sandwich quantum Rényi divergence and classical Rényi divergence, and another employing smooth max Rényi divergence.


Nash Equilibria via Stochastic Eigendecomposition

Gemp, Ian

arXiv.org Artificial Intelligence

This work proposes a novel set of techniques for approximating a Nash equilibrium in a finite, normal-form game. It achieves this by constructing a new reformulation as solving a parameterized system of multivariate polynomials with tunable complexity. In doing so, it forges an itinerant loop from game theory to machine learning and back. We show a Nash equilibrium can be approximated with purely calls to stochastic, iterative variants of singular value decomposition and power iteration, with implications for biological plausibility. We provide pseudocode and experiments demonstrating solving for all equilibria of a general-sum game using only these readily available linear algebra tools.


RepEval: Effective Text Evaluation with LLM Representation

Sheng, Shuqian, Xu, Yi, Zhang, Tianhang, Shen, Zanwei, Fu, Luoyi, Ding, Jiaxin, Zhou, Lei, Wang, Xinbing, Zhou, Chenghu

arXiv.org Artificial Intelligence

Automatic evaluation metrics for generated texts play an important role in the NLG field, especially with the rapid growth of LLMs. However, existing metrics are often limited to specific scenarios, making it challenging to meet the evaluation requirements of expanding LLM applications. Therefore, there is a demand for new, flexible, and effective metrics. In this study, we introduce RepEval, the first metric leveraging the projection of LLM representations for evaluation. RepEval requires minimal sample pairs for training, and through simple prompt modifications, it can easily transition to various tasks. Results on ten datasets from three tasks demonstrate the high effectiveness of our method, which exhibits stronger correlations with human judgments compared to previous metrics, even outperforming GPT-4. Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.


AILS-NTUA at SemEval-2024 Task 6: Efficient model tuning for hallucination detection and analysis

Grigoriadou, Natalia, Lymperaiou, Maria, Filandrianos, Giorgos, Stamou, Giorgos

arXiv.org Artificial Intelligence

In this paper, we present our team's submissions for SemEval-2024 Task-6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. The participants were asked to perform binary classification to identify cases of fluent overgeneration hallucinations. Our experimentation included fine-tuning a pre-trained model on hallucination detection and a Natural Language Inference (NLI) model. The most successful strategy involved creating an ensemble of these models, resulting in accuracy rates of 77.8% and 79.9% on model-agnostic and model-aware datasets respectively, outperforming the organizers' baseline and achieving notable results when contrasted with the top-performing results in the competition, which reported accuracies of 84.7% and 81.3% correspondingly.


SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation

Du, Jiayu, Li, Jinpeng, Chen, Guoguo, Zhang, Wei-Qiang

arXiv.org Artificial Intelligence

In the wake of the surging tide of deep learning over the past decade, Automatic Speech Recognition (ASR) has garnered substantial attention, leading to the emergence of numerous publicly accessible ASR systems that are actively being integrated into our daily lives. Nonetheless, the impartial and replicable evaluation of these ASR systems encounters challenges due to various crucial subtleties. In this paper we introduce the SpeechColab Leaderboard, a general-purpose, open-source platform designed for ASR evaluation. With this platform: (i) We report a comprehensive benchmark, unveiling the current state-of-the-art panorama for ASR systems, covering both open-source models and industrial commercial services. (ii) We quantize how distinct nuances in the scoring pipeline influence the final benchmark outcomes. These include nuances related to capitalization, punctuation, interjection, contraction, synonym usage, compound words, etc. These issues have gained prominence in the context of the transition towards an End-to-End future. (iii) We propose a practical modification to the conventional Token-Error-Rate (TER) evaluation metric, with inspirations from Kolmogorov complexity and Normalized Information Distance (NID). This adaptation, called modified-TER (mTER), achieves proper normalization and symmetrical treatment of reference and hypothesis. By leveraging this platform as a large-scale testing ground, this study demonstrates the robustness and backward compatibility of mTER when compared to TER. The SpeechColab Leaderboard is accessible at https://github.com/SpeechColab/Leaderboard