AITopics | fine-grained description

Collaborating Authors

fine-grained description

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

2d52879ef2ba487445ca2e143b104c3b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 13:34:33 GMT

body part, motion generation, motion sequence, (13 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

2d52879ef2ba487445ca2e143b104c3b-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 09:06:27 GMT

body part, motion generation, motion sequence, (13 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MARIC: Multi-Agent Reasoning for Image Classification

Seo, Wonduk, Yu, Minhyeong, An, Hyunjin, Lee, Seunghyun

arXiv.org Artificial IntelligenceSep-19-2025

Image classification has traditionally relied on parameter-intensive model training, requiring large-scale annotated datasets and extensive fine tuning to achieve competitive performance. While recent vision language models (VLMs) alleviate some of these constraints, they remain limited by their reliance on single pass representations, often failing to capture complementary aspects of visual content. In this paper, we introduce Multi Agent based Reasoning for Image Classification (MARIC), a multi agent framework that reformulates image classification as a collaborative reasoning process. MARIC first utilizes an Outliner Agent to analyze the global theme of the image and generate targeted prompts. Based on these prompts, three Aspect Agents extract fine grained descriptions along distinct visual dimensions. Finally, a Reasoning Agent synthesizes these complementary outputs through integrated reflection step, producing a unified representation for classification. By explicitly decomposing the task into multiple perspectives and encouraging reflective synthesis, MARIC mitigates the shortcomings of both parameter-heavy training and monolithic VLM reasoning. Experiments on 4 diverse image classification benchmark datasets demonstrate that MARIC significantly outperforms baselines, highlighting the effectiveness of multi-agent visual reasoning for robust and interpretable image classification.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.1486

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

CodeV: Issue Resolving with Visual Data

Zhang, Linhao, Zan, Daoguang, Yang, Quanshun, Huang, Zhirong, Chen, Dong, Shen, Bo, Liu, Tianyu, Gong, Yongshun, Huang, Pengjie, Lu, Xudong, Liang, Guangtai, Cui, Lizhen, Wang, Qianxiang

arXiv.org Artificial IntelligenceDec-23-2024

Large Language Models (LLMs) have advanced rapidly in recent years, with their applications in software engineering expanding to more complex repository-level tasks. GitHub issue resolving is a key challenge among these tasks. While recent approaches have made progress on this task, they focus on textual data within issues, neglecting visual data. However, this visual data is crucial for resolving issues as it conveys additional knowledge that text alone cannot. We propose CodeV, the first approach to leveraging visual data to enhance the issue-resolving capabilities of LLMs. CodeV resolves each issue by following a two-phase process: data processing and patch generation. To evaluate CodeV, we construct a benchmark for visual issue resolving, namely Visual SWE-bench. Through extensive experiments, we demonstrate the effectiveness of CodeV, as well as provide valuable insights into leveraging visual data to resolve GitHub issues.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2412.17315

Country:

North America > United States (0.46)
Europe > Austria > Vienna (0.14)

Genre:

Research Report (1.00)
Overview (0.68)

Industry: Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Ontologies to Document Bias in Data

Russo, Mayra, Vidal, Maria-Esther

arXiv.org Artificial IntelligenceJun-29-2024

The breakthroughs and benefits attributed to big data and, consequently, to machine learning (ML) - or AIsystems [1, 2], have also resulted in making prevalent how these systems are capable of producing unexpected, biased, and in some cases, undesirable output [3, 4, 5]. Seminal work on bias (i.e., prejudice for, or against one person, or group, especially in a way considered to be unfair) in the context of ML systems demonstrates how facial recognition tools and popular search engines can exacerbate demographic disparities, worsening the marginalization of minorities at the individual and group level [6, 7]. Further, biases in news recommenders and social media feeds actively play a role in conditioning and manipulating people's behavior and amplifying individual and public opinion polarization [8, 9]. In this context, the last few years have seen the consolidation of the Trustworthy AI framework, led in large part by regulatory bodies [10], with the objective of guiding commercial AI development to proactively account for ethical, legal, and technical dimensions [11]. Furthermore, this framework is also accompanied by the call to establish standards across the field in order to ensure AI systems are safe, secure and fair upon deployment [11]. In terms of AI bias, many efforts have been concentrated in devising methods that can improve its identification, understanding, measurement, and mitigation [12]. For example, the special publication prepared by the National Institute of Standards and Technology (NIST) proposes a thorough, however not exhaustive, categorization of different types of bias in AI beyond common computational definitions (see Figure 1 for core hierarchy) [13]. In this same direction, some scholars advocate for practices that account for the characteristics of ML pipelines (i.e., datasets, ML algorithms, and user interaction loop) [14] to enable actors concerned with its research, development, regulation, and use, to inspect all the actions performed across the engineering process, with the objective to increase trust placed not only on the development processes, but on the systems themselves [15, 16, 17, 18].

doc-biaso, doi, ontology, (16 more...)

arXiv.org Artificial Intelligence

2407.00509

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Germany > Lower Saxony > Hanover (0.04)
(10 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Motion Generation from Fine-grained Textual Descriptions

Li, Kunhang, Feng, Yansong

arXiv.org Artificial IntelligenceMar-26-2024

The task of text2motion is to generate human motion sequences from given textual descriptions, where the model explores diverse mappings from natural language instructions to human body movements. While most existing works are confined to coarse-grained motion descriptions, e.g., "A man squats.", fine-grained descriptions specifying movements of relevant body parts are barely explored. Models trained with coarse-grained texts may not be able to learn mappings from fine-grained motion-related words to motion primitives, resulting in the failure to generate motions from unseen descriptions. In this paper, we build a large-scale language-motion dataset specializing in fine-grained textual descriptions, FineHumanML3D, by feeding GPT-3.5-turbo with step-by-step instructions with pseudo-code compulsory checks. Accordingly, we design a new text2motion model, FineMotionDiffuse, making full use of fine-grained textual information. Our quantitative evaluation shows that FineMotionDiffuse trained on FineHumanML3D improves FID by a large margin of 0.38, compared with competitive baselines. According to the qualitative evaluation and case study, our model outperforms MotionDiffuse in generating spatially or chronologically composite motions, by learning the implicit mappings from fine-grained descriptions to the corresponding basic motions. We release our data at https://github.com/KunhangL/finemotiondiffuse.

fine-grained description, finemotiondiffuse, step 1, (16 more...)

arXiv.org Artificial Intelligence

2403.13518

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:

Research Report (0.82)
Workflow (0.62)

Industry: Leisure & Entertainment > Sports > Boxing (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Description-Enhanced Label Embedding Contrastive Learning for Text Classification

Zhang, Kun, Wu, Le, Lv, Guangyi, Chen, Enhong, Ruan, Shulan, Liu, Jing, Zhang, Zhiqiang, Zhou, Jun, Wang, Meng

arXiv.org Artificial IntelligenceJun-14-2023

Text Classification is one of the fundamental tasks in natural language processing, which requires an agent to determine the most appropriate category for input sentences. Recently, deep neural networks have achieved impressive performance in this area, especially Pre-trained Language Models (PLMs). Usually, these methods concentrate on input sentences and corresponding semantic embedding generation. However, for another essential component: labels, most existing works either treat them as meaningless one-hot vectors or use vanilla embedding methods to learn label representations along with model training, underestimating the semantic information and guidance that these labels reveal. To alleviate this problem and better exploit label information, in this paper, we employ Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task for label utilization from a one-hot manner perspective. Then, we propose a novel Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets. Meanwhile, triplet loss is employed to enhance the analysis of differences and connections among labels. Moreover, considering that one-hot usage is still short of exploiting label information, we incorporate external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning and extend R2-Net to a novel Description-Enhanced Label Embedding network (DELE) from a label embedding perspective. ...

classification, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.08817

Country:

North America > United States (0.14)
Asia > China > Anhui Province > Hefei (0.05)
Asia > China > Beijing > Beijing (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback