AITopics | selfclean

Collaborating Authors

selfclean

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Intrinsic Self-Supervision for Data Quality Audits

Neural Information Processing SystemsDec-26-2025, 21:03:44 GMT

Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings. Use the Report an Issue link to request a name change.

artificial intelligence, data quality, proceedings, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Quality (0.37)
Information Technology > Artificial Intelligence (0.37)

Add feedback

Representation-Based Data Quality Audits for Audio

Gonzalez-Jimenez, Alvaro, Gröger, Fabian, Wermelinger, Linda, Bürli, Andrin, Kastanis, Iason, Lionetti, Simone, Pouly, Marc

arXiv.org Artificial IntelligenceOct-1-2025

ABSTRACT Data quality issues such as off-topic samples, near duplicates, and label errors often limit the performance of audio-based systems. This approach leverages self-supervised audio representations to identify common data quality issues, creating ranked review lists that surface distinct issues within a single, unified process. The method is benchmarked on the ESC-50, GTZAN, and a proprietary industrial dataset, using both synthetic and naturally occurring corruptions. The results demonstrate that this framework achieves state-of-the-art ranking performance, often outperforming issue-specific baselines and enabling significant annotation savings by efficiently guiding human review. Index T erms-- Data quality, dataset auditing, representation learning, near-duplicate detection, label errors 1. INTRODUCTION High-stakes audio applications, from predictive maintenance and safety monitoring to large-scale media search, depend on data that is abundant and trustworthy [1, 2, 3].

data mining, machine learning, selfclean, (20 more...)

arXiv.org Artificial Intelligence

2509.26291

Country: Europe > Switzerland > Basel-City > Basel (0.05)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.47)

Add feedback