AITopics | curator

Collaborating Authors

curator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

b142e78db191e19b17e60c1425a28b52-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-17-2026, 12:50:29 GMT

data quality, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona (0.04)
Europe > France (0.04)
North America > United States > Virginia (0.04)
(9 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Personal > Interview (1.00)
Research Report > New Finding (0.67)
Overview (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

"Draw me a curator" Examining the visual stereotyping of a cultural services profession by generative AI

Spennemann, Dirk HR

arXiv.org Artificial IntelligenceOct-30-2025

Based on 230 visualisations, this paper examines the depiction of museum curators by the popular generative Artificial Intelligence (AI) model, ChatGPT4o. While the AI-generated representations do not reiterate popular stereotypes of curators as nerdy, conservative in dress and stuck in time rummaging through collections, they contrast sharply with real-world demographics. AI-generated imagery extremely underrepresents women (3.5% vs 49% to 72% in reality) and disregards ethnic communities other than Caucasian (0% vs 18% to 36%). It only over-represents young curators (79% vs approx. 27%) but also renders curators to resemble yuppie professionals or people featuring in fashion advertising. Stereotypical attributes are prevalent, with curators widely depicted as wearing beards and holding clipboards or digital tablets. The findings highlight biases in the generative AI image creation dataset, which is poised to shape an inaccurate portrayal of museum professionals if the images were to be taken uncritically at face value.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/info16110936

2508.07132

Country:

Oceania > Australia (0.97)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry:

Education (0.46)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Prompt fidelity of ChatGPT4o / Dall-E3 text-to-image visualisations

Spennemann, Dirk HR

arXiv.org Artificial IntelligenceOct-28-2025

This study examines the prompt fidelity of ChatGPT4o / DALL - E3 text - to - image visualisations by analysing whether anullributes explicitly specified in autogenously generated prompts are correctly rendered in the resulting images. Using two public - domain datasets comprising 200 visualisations of women working in the cultural and creative industries and 230 visualisations of museum curators, the study assessed accuracy across personal anullributes (age, hair), appearance (anullire, glasses), and paraphernalia (name tags, clipboards). While correctly rendered in most cases, DALL - E3 deviated from prompt specifications in 15.6% of all anullributes (n=710). Errors were lowest for paraphernalia, moderate for personal appearance, and highest for depictions of the person themselves, particularly age. These findings demonstrate measurable prompt - to - image fidelity gaps with implications for bias detection and model evaluation.

large language model, machine learning, visualisation, (20 more...)

arXiv.org Artificial Intelligence

2510.21821

Country: North America > United States > California > San Francisco County > San Francisco (0.16)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area (0.94)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.80)

Add feedback

AT axonomy of Challenges to Curating Fair Datasets Dora Zhao Stanford University Morgan Klaus Scheuerman Sony AI Pooja Chitre

Neural Information Processing SystemsOct-10-2025, 13:37:36 GMT

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation.

curator, dataset, fairness, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona (0.04)
Europe > France (0.04)
North America > United States > Virginia (0.04)
(9 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Personal > Interview (1.00)
Research Report > New Finding (0.67)
Overview (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback

3c88c1db16b9523b4dcdcd572aa1e16a-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 14:12:07 GMT

aggregator, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report (0.46)

Industry:

Information Technology > Security & Privacy (0.68)
Health & Medicine (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Security & Privacy (0.68)

Add feedback

Can We Measure the Impact of a Database?

Communications of the ACMApr-16-2025, 16:05:34 GMT

This is undoubtedly the case for scientific and statistical databases, which have largely replaced traditional reference works. Database and Web technologies have led to an explosion in the number of databases that support scientific research, for obvious reasons: Databases provide faster communication of knowledge, hold larger volumes of data, are more easily searched, and are both human- and machine-readable. Moreover, they can be developed rapidly and collaboratively by a mixture of researchers and curators. For example, more than 1,500 curated databases are relevant to molecular biology alone.10 The value of these databases lies not only in the data they present but also in how they organize that data.

artificial intelligence, database, hierarchy, (16 more...)

Communications of the ACM

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

WhACC: Whisker Automatic Contact Classifier with Expert Human-Level Performance

Maire, Phillip, King, Samson G., Cheung, Jonathan Andrew, Walker, Stefanie, Hires, Samuel Andrew

arXiv.org Artificial IntelligenceJan-6-2025

The rodent vibrissal system is pivotal in advancing neuroscience research, particularly for studies of cortical plasticity, learning, decision-making, sensory encoding, and sensorimotor integration. Despite the advantages, curating touch events is labor intensive and often requires >3 hours per million video frames, even after leveraging automated tools like the Janelia Whisker Tracker. We address this limitation by introducing Whisker Automatic Contact Classifier (WhACC), a python package designed to identify touch periods from high-speed videos of head-fixed behaving rodents with human-level performance. WhACC leverages ResNet50V2 for feature extraction, combined with LightGBM for Classification. Performance is assessed against three expert human curators on over one million frames. Pairwise touch classification agreement on 99.5% of video frames, equal to between-human agreement. Finally, we offer a custom retraining interface to allow model customization on a small subset of data, which was validated on four million frames across 16 single-unit electrophysiology recordings. Including this retraining step, we reduce human hours required to curate a 100 million frame dataset from ~333 hours to ~6 hours.

artificial intelligence, expert system, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2501.06219

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.70)

Add feedback

CurateGPT: A flexible language-model assisted biocuration tool

Caufield, Harry, Kroll, Carlo, O'Neil, Shawn T, Reese, Justin T, Joachimiak, Marcin P, Hegde, Harshad, Harris, Nomi L, Krishnamurthy, Madan, McLaughlin, James A, Smedley, Damian, Haendel, Melissa A, Robinson, Peter N, Mungall, Christopher J

arXiv.org Artificial IntelligenceOct-29-2024

Effective data-driven biomedical discovery requires data curation: a time-consuming process of finding, organizing, distilling, integrating, interpreting, annotating, and validating diverse information into a structured form suitable for databases and knowledge bases. Accurate and efficient curation of these digital assets is critical to ensuring that they are FAIR, trustworthy, and sustainable. Unfortunately, expert curators face significant time and resource constraints. The rapid pace of new information being published daily is exceeding their capacity for curation. Generative AI, exemplified by instruction-tuned large language models (LLMs), has opened up new possibilities for assisting human-driven curation. The design philosophy of agents combines the emerging abilities of generative AI with more precise methods. A curator's tasks can be aided by agents for performing reasoning, searching ontologies, and integrating knowledge across external sources, all efforts otherwise requiring extensive manual effort. Our LLM-driven annotation tool, CurateGPT, melds the power of generative AI together with trusted knowledge bases and literature sources. CurateGPT streamlines the curation process, enhancing collaboration and efficiency in common workflows. Compared to direct interaction with an LLM, CurateGPT's agents enable access to information beyond that in the LLM's training data and they provide direct links to the data supporting each claim. This helps curators, researchers, and engineers scale up curation efforts to keep pace with the ever-increasing volume of scientific data.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.00046

Country:

Europe > Germany > Berlin (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.89)

Add feedback

DCA-Bench: A Benchmark for Dataset Curation Agents

Huang, Benhao, Yu, Yingzhuo, Huang, Jin, Zhang, Xingjian, Ma, Jiaqi

arXiv.org Artificial IntelligenceJun-11-2024

The quality of datasets plays an increasingly crucial role in the research and development of modern artificial intelligence (AI). Despite the proliferation of open dataset platforms nowadays, data quality issues, such as insufficient documentation, inaccurate annotations, and ethical concerns, remain common in datasets widely used in AI. Furthermore, these issues are often subtle and difficult to be detected by rule-based scripts, requiring expensive manual identification and verification by dataset users or maintainers. With the increasing capability of large language models (LLMs), it is promising to streamline the curation of datasets with LLM agents. In this work, as the initial step towards this goal, we propose a dataset curation agent benchmark, DCA-Bench, to measure LLM agents' capability of detecting hidden dataset quality issues. Specifically, we collect diverse real-world dataset quality issues from eight open dataset platforms as a testbed. Additionally, to establish an automatic pipeline for evaluating the success of LLM agents, which requires a nuanced understanding of the agent outputs, we implement a dedicated Evaluator using another LLM agent. We demonstrate that the LLM-based Evaluator empirically aligns well with human evaluation, allowing reliable automatic evaluation on the proposed benchmark. We further conduct experiments on several baseline LLM agents on the proposed benchmark and demonstrate the complexity of the task, indicating that applying LLMs to real-world dataset curation still requires further in-depth exploration and innovation. Finally, the proposed benchmark can also serve as a testbed for measuring the capability of LLMs in problem discovery rather than just problem-solving. The benchmark suite is available at \url{https://github.com/TRAIS-Lab/dca-bench}.

curator, dca-bench, evaluator, (17 more...)

arXiv.org Artificial Intelligence

2406.07275

Country:

North America > United States > Michigan (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

A Taxonomy of Challenges to Curating Fair Datasets

Zhao, Dora, Scheuerman, Morgan Klaus, Chitre, Pooja, Andrews, Jerone T. A., Panagiotidou, Georgia, Walker, Shawn, Pine, Kathleen H., Xiang, Alice

arXiv.org Artificial IntelligenceJun-10-2024

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.

dataset, fairness, participant, (14 more...)

arXiv.org Artificial Intelligence

2406.06407

Country:

North America > United States > Arizona (0.04)
Europe > France (0.04)
North America > United States > Illinois (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Personal > Interview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback