AITopics

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Brazil (0.04)
North America > United States > California (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Neural Information Processing SystemsFeb-12-2026, 13:38:14 GMT

Supplement: RobustnesstoLabelNoiseDependson theShapeoftheNoiseDistribution

B.1 Learningdetails All of our code and experiments are implemented in Python using the PyTorch [5]deep learning framework.

againstbaseline, artificial intelligence, machine learning, (18 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Ovcharenko, Olga, Schelter, Sebastian

Towards Cross-Modal Error Detection with Tables and Images

arXiv.org Artificial IntelligenceOct-15-2025

Ensuring data quality at scale remains a persistent challenge for large organizations. Despite recent advances, maintaining accurate and consistent data is still complex, especially when dealing with multiple data modalities. Traditional error detection and correction methods tend to focus on a single modality, typically a table, and often miss cross-modal errors that are common in domains like e-Commerce and healthcare, where image, tabular, and text data co-exist. To address this gap, we take an initial step towards cross-modal error detection in tabular data, by benchmarking several methods. Our evaluation spans four datasets and five baseline approaches. Among them, Cleanlab, a label error detection framework, and DataScope, a data valuation method, perform the best when paired with a strong AutoML framework, achieving the highest F1 scores. Our findings indicate that current methods remain limited, particularly when applied to heavy-tailed real-world data, motivating further research in this area.

data mining, large language model, machine learning, (16 more...)

2510.12383

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (0.88)
Information Technology > Services (0.36)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Data Science > Data Quality > Data Cleaning (0.47)

Neural Information Processing SystemsOct-8-2025, 20:37:29 GMT

6aa9a05b929fb08ff46a58cab6cf860d-Paper-Datasets_and_Benchmarks.pdf

dataset, generative model, synthetic data, (14 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Brazil (0.04)
North America > United States > California (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Neural Information Processing SystemsAug-22-2025, 01:53:59 GMT

Supplement: Robustness to Label Noise Depends on the Shape of the Noise Distribution

Do the main claims made in the abstract and introduction accurately reflect the paper's Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] Instructions are We will provide code after internal review for release. Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [N/A] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) Proofs provided for theoretical results of Section 3. A.1 Uniform noise Proof. Lemma 3.2: Let c, ϵ, m (c 1) ( c 1) ( c 1) (c 1) (c 1) (c 1) (c 1) (c 1) ( c 1) Lemma 3.5: Let c, ϵ, m The proof of Theorem 3.6 is identical to that of Theorem 3.3 except using the value of Fig. S3 shows the same results as Fig. S3, but with the accuracy results of the vanilla (no label noise Fig. S4 compares the clean test accuracy on 10-class, 5-dimensional synthetic data of two label-noise Each of the methods is run with default parameters found in the corresponding repositories. All of our experiments utilize the ResNet-32 architecture across all mitigation methods.

artificial intelligence, machine learning, noise, (17 more...)

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

MIT Technology ReviewApr-25-2024, 12:59:45 GMT

Chatbot answers are all made up. This new tool helps you figure out which ones to trust.

Cleanlab hopes that its tool will make large language models more attractive to businesses worried about how much stuff they invent. "I think people know LLMs will change the world, but they've just got hung up on the damn hallucinations," says Cleanlab CEO Curtis Northcutt. Chatbots are quickly becoming the dominant way people look up information on a computer. Search engines are being redesigned around the technology. Office software used by billions of people every day to create everything from school assignments to marketing copy to financial reports now comes with chatbots built in.

chatbot answer, cleanlab, northcutt, (6 more...)

MIT Technology Review

Country: Europe > United Kingdom (0.06)

Industry: Education (0.57)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.79)

Seedat, Nabeel, Imrie, Fergus, van der Schaar, Mihaela

Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI

arXiv.org Artificial IntelligenceMar-7-2024

Characterizing samples that are difficult to learn from is crucial to developing highly performant ML models. This has led to numerous Hardness Characterization Methods (HCMs) that aim to identify "hard" samples. However, there is a lack of consensus regarding the definition and evaluation of "hardness". Unfortunately, current HCMs have only been evaluated on specific types of hardness and often only qualitatively or with respect to downstream performance, overlooking the fundamental quantitative identification task. We address this gap by presenting a fine-grained taxonomy of hardness types. Additionally, we propose the Hardness Characterization Analysis Toolkit (H-CAT), which supports comprehensive and quantitative benchmarking of HCMs across the hardness taxonomy and can easily be extended to new HCMs, hardness types, and datasets. We use H-CAT to evaluate 13 different HCMs across 8 hardness types. This comprehensive evaluation encompassing over 14K setups uncovers strengths and weaknesses of different HCMs, leading to practical tips to guide HCM selection and future development. Our findings highlight the need for more comprehensive HCM evaluation, while we hope our hardness taxonomy and toolkit will advance the principled evaluation and uptake of data-centric AI methods.

allsh aum el2n, aum el2n, datamap, (14 more...)

2403.04551

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.65)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceAug-4-2023

FPR Estimation for Fraud Detection in the Presence of Class-Conditional Label Noise

Tittelfitz, Justin

We consider the problem of estimating the false-/ true-positive-rate (FPR/TPR) for a binary classification model when there are incorrect labels (label noise) in the validation set. Our motivating application is fraud prevention where accurate estimates of FPR are critical to preserving the experience for good customers, and where label noise is highly asymmetric. Existing methods seek to minimize the total error in the cleaning process - to avoid cleaning examples that are not noise, and to ensure cleaning of examples that are. This is an important measure of accuracy but insufficient to guarantee good estimates of the true FPR or TPR for a model, and we show that using the model to directly clean its own validation data leads to underestimates even if total error is low. This indicates a need for researchers to pursue methods that not only reduce total error but also seek to de-correlate cleaning error with model scores.

dataset, label noise, noise, (14 more...)

2308.02695

Country: North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Information Technology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Larson, Stefan, Lim, Gordon, Leach, Kevin

On Evaluation of Document Classification using RVL-CDIP

arXiv.org Artificial IntelligenceJun-21-2023

The RVL-CDIP benchmark is widely used for measuring performance on the task of document classification. Despite its widespread use, we reveal several undesirable characteristics of the RVL-CDIP benchmark. These include (1) substantial amounts of label noise, which we estimate to be 8.1% (ranging between 1.6% to 16.9% per document category); (2) presence of many ambiguous or multi-label documents; (3) a large overlap between test and train splits, which can inflate model performance metrics; and (4) presence of sensitive personally-identifiable information like US Social Security numbers (SSNs). We argue that there is a risk in using RVL-CDIP for benchmarking document classifiers, as its limited scope, presence of errors (state-of-the-art models now achieve accuracy error rates that are within our estimated label error rate), and lack of diversity make it less than ideal for benchmarking. We further advocate for the creation of a new document classification benchmark, and provide recommendations for what characteristics such a resource should include.

category, machine learning, natural language, (19 more...)

2306.1255

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Minnesota (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.88)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

#artificialintelligenceJan-11-2023, 18:45:18 GMT

Cleanlab: Correct your data labels automatically and quickly – Towards AI

Originally published on Towards AI. I used an open-sourced library, cleanlab, to remove low-quality labels on an image dataset. The model trained on the dataset without low-quality data gained 4 percentage points of accuracy compared to the baseline model (trained on all data). Improving data quality sounds easy enough. But the workload of manually checking data quality can quickly become insurmountable as the dataset scales.

artificial intelligence, dataset, machine learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.30)