AITopics | original description

Technology: Information Technology > Artificial Intelligence > Natural Language (0.71)

Neural Information Processing SystemsFeb-18-2026, 00:02:50 GMT

c37d94c04effc86d72ab2258ba9b76c7-Paper-Datasets_and_Benchmarks_Track.pdf

large language model, machine learning, natural language, (18 more...)

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(3 more...)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
(2 more...)

Neural Information Processing SystemsFeb-12-2026, 07:33:15 GMT

VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding (Supplementary Materials) Houlun Chen

Finally, generate a new brief description mainly based on the originalnull description with attributes information incorporated. Thenull narratives should be similar to the original description.null

artificial intelligence, information, machine learning, (19 more...)

Country: Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.40)

arXiv.org Artificial IntelligenceDec-5-2025

Human-Centred Evaluation of Text-to-Image Generation Models for Self-expression of Mental Distress: A Dataset Based on GPT-4o

He, Sui, Qian, Shenbin

Effective communication is central to achieving positive healthcare outcomes in mental health contexts, yet international students often face linguistic and cultural barriers that hinder their communication of mental distress. In this study, we evaluate the effectiveness of AI-generated images in supporting self-expression of mental distress. To achieve this, twenty Chinese international students studying at UK universities were invited to describe their personal experiences of mental distress. These descriptions were elaborated using GPT-4o with four persona-based prompt templates rooted in contemporary counselling practice to generate corresponding images. Participants then evaluated the helpfulness of generated images in facilitating the expression of their feelings based on their original descriptions. The resulting dataset comprises 100 textual descriptions of mental distress, 400 generated images, and corresponding human evaluation scores. Findings indicate that prompt design substantially affects perceived helpfulness, with the illustrator persona achieving the highest ratings. This work introduces the first publicly available text-to-image evaluation dataset with human judgment scores in the mental health domain, offering valuable resources for image evaluation, reinforcement learning with human feedback, and multi-modal research on mental health communication.

large language model, machine learning, natural language, (17 more...)

2512.04087

Country:

Europe > United Kingdom (0.15)
North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Neural Information Processing SystemsOct-10-2025, 15:54:41 GMT

Appendix for Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

In table 3, we demonstrate the prompt for textualized recaptioning.

original description, relative size proportion, relative spatial positioning, (12 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.71)

Neural Information Processing SystemsOct-10-2025, 15:54:38 GMT

Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval. Currently, image description datasets primarily originate from two sources. One source is the scraping of image-text pairs from the web. Despite their abundance, these descriptions are often of low quality and noisy.

image description, information, zhang, (15 more...)

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(3 more...)

Genre: Research Report (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Nugroho, Yusuf Sulistyo, Salam, Farah Danisha, Reid, Brittany, Kula, Raula Gaikovina, Shimari, Kazumasa, Matsumoto, Kenichi

Uncovering Intention through LLM-Driven Code Snippet Description Generation

arXiv.org Artificial IntelligenceJun-19-2025

Documenting code snippets is essential to pinpoint key areas where both developers and users should pay attention. Examples include usage examples and other Application Programming Interfaces (APIs), which are especially important for third-party libraries. With the rise of Large Language Models (LLMs), the key goal is to investigate the kinds of description developers commonly use and evaluate how well an LLM, in this case Llama, can support description generation. We use NPM Code Snippets, consisting of 185,412 packages with 1,024,579 code snippets. From there, we use 400 code snippets (and their descriptions) as samples. First, our manual classification found that the majority of original descriptions (55.5%) highlight example-based usage. This finding emphasizes the importance of clear documentation, as some descriptions lacked sufficient detail to convey intent. Second, the LLM correctly identified the majority of original descriptions as "Example" (79.75%), which is identical to our manual finding, showing a propensity for generalization. Third, compared to the originals, the produced description had an average similarity score of 0.7173, suggesting relevance but room for improvement. Scores below 0.9 indicate some irrelevance. Our results show that depending on the task of the code snippet, the intention of the document may differ from being instructions for usage, installations, or descriptive learning examples for any user of a library.

large language model, machine learning, natural language, (16 more...)

2506.15453

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.05)
Asia > Indonesia (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Althebeiti, Hattan, Alkinoon, Mohammed, Mohaisen, Manar, Salem, Saeed, Nyang, DaeHun, Mohaisen, David

Enhancing Vulnerability Reports with Automated and Augmented Description Summarization

arXiv.org Artificial IntelligenceApr-30-2025

Public vulnerability databases, such as the National Vulnerability Database (NVD), document vulnerabilities and facilitate threat information sharing. However, they often suffer from short descriptions and outdated or insufficient information. In this paper, we introduce Zad, a system designed to enrich NVD vulnerability descriptions by leveraging external resources. Zad consists of two pipelines: one collects and filters supplementary data using two encoders to build a detailed dataset, while the other fine-tunes a pre-trained model on this dataset to generate enriched descriptions. By addressing brevity and improving content quality, Zad produces more comprehensive and cohesive vulnerability descriptions. We evaluate Zad using standard summarization metrics and human assessments, demonstrating its effectiveness in enhancing vulnerability information.

data mining, large language model, machine learning, (24 more...)

2504.20726

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
(3 more...)

Graziani, Mara, Molnar, Malina, Morales, Irina Espejo, Cadow-Gossweiler, Joris, Laino, Teodoro

Making Sense of Data in the Wild: Data Analysis Automation at Scale

arXiv.org Artificial IntelligenceJan-27-2025

As the volume of publicly available data continues to grow, researchers face the challenge of limited diversity in benchmarking machine learning tasks. Although thousands of datasets are available in public repositories, the sheer abundance often complicates the search for suitable data, leaving many valuable datasets underexplored. This situation is further amplified by the fact that, despite longstanding advocacy for improving data curation quality, current solutions remain prohibitively time-consuming and resource-intensive. In this paper, we propose a novel approach that combines intelligent agents with retrieval augmented generation to automate data analysis, dataset curation and indexing at scale. Our system leverages multiple agents to analyze raw, unstructured data across public repositories, generating dataset reports and interactive visual indexes that can be easily explored. We demonstrate that our approach results in more detailed dataset descriptions, higher hit rates and greater diversity in dataset retrieval tasks. Additionally, we show that the dataset reports generated by our method can be leveraged by other machine learning models to improve the performance on specific tasks, such as improving the accuracy and realism of synthetic data generation. By streamlining the process of transforming raw data into machine-learning-ready datasets, our approach enables researchers to better utilize existing data resources.

large language model, machine learning, natural language, (20 more...)