AITopics

2307.10596

Country:

Oceania > Australia > Queensland (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Rai, Anand Kumar, Jaiswal, Siddharth D, Mukherjee, Animesh

A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

arXiv.org Artificial IntelligenceJul-20-2023

Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. The dataset is sourced from the very popular NPTEL MOOC platform. We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits of speakers in India. While there exists disparity due to gender, native region, age and speech rate of speakers, disparity based on caste is non-existent. We also observe statistically significant disparity across the disciplines of the lectures. These results indicate the need of more inclusive and robust ASR systems and more representational datasets for disparity evaluation in them.

artificial intelligence, machine learning, natural language, (20 more...)

2307.10587

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > India > West Bengal > Kharagpur (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.94)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.41)

LeJeune, Daniel, Liu, Jiayu, Heckel, Reinhard

Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization

arXiv.org Artificial IntelligenceJul-20-2023

Machine learning models are typically evaluated by shuffling a set of labeled data, splitting it into training and test sets, and evaluating the model trained on the training set on the test set. This measures how well the model performs on the distribution the model was trained on. However, in practice a model is most commonly not applied to such in-distribution data, but rather to outof-distribution data that is almost always at least slightly different. In order to understand the performance of machine learning methods in practice, it is therefore important to understand how out-of-distribution performance relates to in-distribution performance. While there are settings in which models with similar in-distribution performance have different out-of-distribution performance (McCoy et al., 2020), a series of recent empirical studies have shown that often, the in-distribution and out-of-distribution performances of models are strongly correlated: Recht et al. (2019), Yadav and Bottou (2019), and Miller et al. (2020) constructed new test sets for the popular CIFAR-10, ImageNet, and MNIST image classification problems and for the SQuAD question answering datasets by following the original data collection and labeling process as closely as possible. For CIFAR-10 and ImageNet the performance drops significantly when evaluated on the new test set, indicating that even when following the original data collection and labeling process, a significant distribution shift can occur. In addition, for all four distribution shifts, the in-and out-of-distribution errors are strongly linearly correlated.

artificial intelligence, machine learning, relation, (15 more...)

2210.11589

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Yfantidou, Sofia, Sermpezis, Pavlos, Vakali, Athena, Baeza-Yates, Ricardo

Uncovering Bias in Personal Informatics

Ubiquitous technologies, such as smartphones and wearables, are an integral part of our lives today [47, 90]. Their proliferation has given rise to Personal Informatics (PI), namely a class of systems that "help people collect personally relevant information for the purpose of self-reflection and gaining self-knowledge" [66]. Such systems enable people to keep track of their productivity [62], finances [60], and learning [45]. Yet, tracking various aspects of physical and mental health is particularly prevalent [33]. PI systems can continuously and unobtrusively measure and collect physiological and behavioral data, namely, "digital biomarkers", from users through integrated sensors. Digital biomarkers contain an uncanny amount of personal information. Even the coarser behavioral biomarkers acquired from consumer wearables (e.g., steps, calories) strongly correlate to a person's gender, height, and weight [61], while signals of finer granularity (e.g., accelerometer and heart rate), can predict variables associated with an individual's physical health, fitness, and demographics [89]. At the same time, consumer smartphones and wearables are now packed with an increasing number of advanced health tracking features, innovating in personal health, research, and care [7]. Flagship consumer wearable algorithms --some approved by the US Food and Drug Administration-- can now identify signs of atrial fibrillation (AFib) through electrocardiogram (ECG) or photoplethysmography (PPG) signals [37].

data mining, machine learning, natural language, (19 more...)

doi: 10.1145/3610914

2303.15592

Country:

North America > United States > New York > New York County > New York City (0.06)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(7 more...)

Confidence Estimation Using Unlabeled Data

Li, Chen, Hu, Xiaoling, Chen, Chao

Overconfidence is a common issue for deep neural networks, limiting their deployment in real-world applications. To better estimate confidence, existing methods mostly focus on fully-supervised scenarios and rely on training labels. In this paper, we propose the first confidence estimation method for a semi-supervised setting, when most training labels are unavailable. We stipulate that even with limited training labels, we can still reasonably approximate the confidence of model on unlabeled samples by inspecting the prediction consistency through the training process. We use training consistency as a surrogate function and propose a consistency ranking loss for confidence estimation. On both image classification and segmentation tasks, our method achieves state-of-the-art performances in confidence estimation. Furthermore, we show the benefit of the proposed method through a downstream active learning task. The code is available at https://github.com/TopoXLab/consistency-ranking-loss

artificial intelligence, consistency, machine learning, (16 more...)

2307.1044

Country: North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report (0.63)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Herle, A., O'Riordan, C. M., Vegetti, S.

Selection functions of strong lens finding neural networks

Convolution Neural Networks trained for the task of lens finding with similar architecture and training data as is commonly found in the literature are biased classifiers. An understanding of the selection function of lens finding neural networks will be key to fully realising the potential of the large samples of strong gravitational lens systems that will be found in upcoming wide-field surveys. We use three training datasets, representative of those used to train galaxy-galaxy and galaxy-quasar lens finding neural networks. The networks preferentially select systems with larger Einstein radii and larger sources with more concentrated source-light distributions. Increasing the detection significance threshold to 12$\sigma$ from 8$\sigma$ results in 50 per cent of the selected strong lens systems having Einstein radii $\theta_\mathrm{E}$ $\ge$ 1.04 arcsec from $\theta_\mathrm{E}$ $\ge$ 0.879 arcsec, source radii $R_S$ $\ge$ 0.194 arcsec from $R_S$ $\ge$ 0.178 arcsec and source S\'ersic indices $n_{\mathrm{Sc}}^{\mathrm{S}}$ $\ge$ 2.62 from $n_{\mathrm{Sc}}^{\mathrm{S}}$ $\ge$ 2.55. The model trained to find lensed quasars shows a stronger preference for higher lens ellipticities than those trained to find lensed galaxies. The selection function is independent of the slope of the power-law of the mass profiles, hence measurements of this quantity will be unaffected. The lens finder selection function reinforces that of the lensing cross-section, and thus we expect our findings to be a general result for all galaxy-galaxy and galaxy-quasar lens finding neural networks.

artificial intelligence, lens system, machine learning, (13 more...)

2307.10355

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
South America > Argentina (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Martínez, Pablo Antonio, Bernabé, Gregorio, García, José Manuel

Code Detection for Hardware Acceleration Using Large Language Models

Large language models (LLMs) have been massively applied to many tasks, often surpassing state-of-the-art approaches. While their effectiveness in code generation has been extensively studied (e.g., AlphaCode), their potential for code detection remains unexplored. This work presents the first analysis of code detection using LLMs. Our study examines essential kernels, including matrix multiplication, convolution, and fast-fourier transform, implemented in C/C++. We propose both a preliminary, naive prompt and a novel prompting strategy for code detection. Results reveal that conventional prompting achieves great precision but poor accuracy (68.8%, 22.3%, and 79.2% for GEMM, convolution, and FFT, respectively) due to a high number of false positives. Our novel prompting strategy substantially reduces false positives, resulting in excellent overall accuracy (91.1%, 97.9%, and 99.7%, respectively). These results pose a considerable challenge to existing state-of-the-art code detection methods.

large language model, machine learning, natural language, (18 more...)

2307.10348

Country:

Europe > Spain > Region of Murcia > Murcia (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.91)

Huang, Xia, Chong, Kai Fong Ernest

GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence

Web image datasets curated online inherently contain ambiguous in-distribution (ID) instances and out-of-distribution (OOD) instances, which we collectively call non-conforming (NC) instances. In many recent approaches for mitigating the negative effects of NC instances, the core implicit assumption is that the NC instances can be found via entropy maximization. For "entropy" to be well-defined, we are interpreting the output prediction vector of an instance as the parameter vector of a multinomial random variable, with respect to some trained model with a softmax output layer. Hence, entropy maximization is based on the idealized assumption that NC instances have predictions that are "almost" uniformly distributed. However, in real-world web image datasets, there are numerous NC instances whose predictions are far from being uniformly distributed. To tackle the limitation of entropy maximization, we propose $(\alpha, \beta)$-generalized KL divergence, $\mathcal{D}_{\text{KL}}^{\alpha, \beta}(p\|q)$, which can be used to identify significantly more NC instances. Theoretical properties of $\mathcal{D}_{\text{KL}}^{\alpha, \beta}(p\|q)$ are proven, and we also show empirically that a simple use of $\mathcal{D}_{\text{KL}}^{\alpha, \beta}(p\|q)$ outperforms all baselines on the NC instance identification task. Building upon $(\alpha,\beta)$-generalized KL divergence, we also introduce a new iterative training framework, GenKL, that identifies and relabels NC instances. When evaluated on three web image datasets, Clothing1M, Food101/Food101N, and mini WebVision 1.0, we achieved new state-of-the-art classification accuracies: $81.34\%$, $85.73\%$ and $78.99\%$/$92.54\%$ (top-1/top-5), respectively.

artificial intelligence, divergence, machine learning, (17 more...)

doi: 10.1007/s11263-023-01815-9

2307.0981

Country:

Asia > Singapore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Oesterle, Michael, Blöbaum, Patrick, Mastakouri, Atalanti A., Kirschbaum, Elke

Beyond Single-Feature Importance with ICECREAM

Which set of features was responsible for a certain output of a machine learning model? Which components caused the failure of a cloud computing application? These are just two examples of questions we are addressing in this work by Identifying Coalition-based Explanations for Common and Rare Events in Any Model (ICECREAM). Specifically, we propose an information-theoretic quantitative measure for the influence of a coalition of variables on the distribution of a target variable. This allows us to identify which set of factors is essential to obtain a certain outcome, as opposed to well-established explainability and causal contribution analysis methods which can assign contributions only to individual factors and rank them by their importance. In experiments with synthetic and real-world data, we show that ICECREAM outperforms state-of-the-art methods for explainability and root cause analysis, and achieves impressive accuracy in both tasks.

artificial intelligence, data mining, machine learning, (18 more...)

2307.09779

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.46)

Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers

Kim, Jaeyoung, Jung, Kyuheon, Na, Dongbin, Jang, Sion, Park, Eunbin, Choi, Sungchul

For real-world language applications, detecting an out-of-distribution (OOD) sample is helpful to alert users or reject such unreliable samples. However, modern over-parameterized language models often produce overconfident predictions for both in-distribution (ID) and OOD samples. In particular, language models suffer from OOD samples with a similar semantic representation to ID samples since these OOD samples lie near the ID manifold. A rejection network can be trained with ID and diverse outlier samples to detect test OOD samples, but explicitly collecting auxiliary OOD datasets brings an additional burden for data collection. In this paper, we propose a simple but effective method called Pseudo Outlier Exposure (POE) that constructs a surrogate OOD dataset by sequentially masking tokens related to ID classes. The surrogate OOD sample introduced by POE shows a similar representation to ID data, which is most effective in training a rejection network. Our method does not require any external OOD data and can be easily implemented within off-the-shelf Transformers. A comprehensive comparison with state-of-the-art algorithms demonstrates POE's competitiveness on several text classification benchmarks.

machine learning, natural language, text classification, (19 more...)

2307.09455

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
North America > United States > Oregon (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
(2 more...)