AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language (0.47)

Neural Information Processing SystemsFeb-11-2026, 09:45:17 GMT

A Details of Data Augmentation with External Knowledge Resources 486 4 Enhance Relation Recognition: We enriched the relationships between objects parsed from the

The hyperparameters for training are detailed in Table 7. We perform the human evaluation on two of the four in-depth knowledge quality assessment metrics. V alidity ( "): whether the generated visual knowledge is valid to humans . Conformity ( "): whether the generated knowledge faithfully depicts the scenarios in the images . Our calculated average pairwise Cohen's Suppose you are looking at an image that contains the following subject and object entities: Subject list: [Insert the subject names here] Object list: [Insert the object names here] Please extract 5-10 condensed descriptions that describe the interactions and/or relations among those entities in the image.

artificial intelligence, natural language, person2, (15 more...)

Country: Europe (0.04)

Industry: Leisure & Entertainment (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.47)

Neural Information Processing SystemsFeb-11-2026, 09:45:14 GMT

49d1cf22327c51331cbd52bcb76a09a6-Paper-Conference.pdf

knowledge, openvik, visual knowledge, (13 more...)

Country:

Europe (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.68)
Transportation (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Neural Information Processing SystemsOct-8-2025, 15:15:21 GMT

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

Existing methods on visual knowledge extraction often rely on the predefined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction.

data mining, large language model, machine learning, (19 more...)

Country:

Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

arXiv.org Artificial IntelligenceMar-5-2025

Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions

Li, Jun, Liu, Che, Bai, Wenjia, Arcucci, Rossella, Bercea, Cosmin I., Schnabel, Julia A.

Visual Language Models (VLMs) have demonstrated impressive capabilities in visual grounding tasks. However, their effectiveness in the medical domain, particularly for abnormality detection and localization within medical images, remains underexplored. A major challenge is the complex and abstract nature of medical terminology, which makes it difficult to directly associate pathological anomaly terms with their corresponding visual features. In this work, we introduce a novel approach to enhance VLM performance in medical abnormality detection and localization by leveraging decomposed medical knowledge. Instead of directly prompting models to recognize specific abnormalities, we focus on breaking down medical concepts into fundamental attributes and common visual patterns. This strategy promotes a stronger alignment between textual descriptions and visual features, improving both the recognition and localization of abnormalities in medical images.We evaluate our method on the 0.23B Florence-2 base model and demonstrate that it achieves comparable performance in abnormality grounding to significantly larger 7B LLaVA-based medical VLMs, despite being trained on only 1.5% of the data used for such models. Experimental results also demonstrate the effectiveness of our approach in both known and previously unseen abnormalities, suggesting its strong generalization capabilities.

abnormality, arxiv preprint arxiv, dataset, (13 more...)

2503.03278

Country:

North America > United States (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

arXiv.org Artificial IntelligenceJun-17-2024

Beyond Embeddings: The Promise of Visual Table in Visual Reasoning

Zhong, Yiwu, Hu, Zi-Yuan, Lyu, Michael R., Wang, Liwei

Visual representation learning has been a cornerstone in computer vision, involving typical forms such as visual embeddings, structural symbols, and text-based representations. Despite the success of CLIP-type visual embeddings, they often lack access to world knowledge critical for visual reasoning. In this work, we propose Visual Table, a novel form of visual representation tailored for visual reasoning. Visual tables are constructed as hierarchical descriptions of visual scenes, featuring a scene description and multiple object-centric descriptions covering categories, attributes, and knowledge. Thanks to the structural and textual formats, visual tables offer unique advantages over mere visual embeddings, such as interpretability and controllable editing. Furthermore, they deliver instance-level world knowledge and detailed attributes that are essential for visual reasoning. To create visual tables, we develop a generator trained on the dataset with collected, small-scale annotations. Extensive results on 11 visual reasoning benchmarks demonstrate that the generated visual tables significantly outperform previous structural and text-based representations. Moreover, they consistently enhance state-of-the-art multimodal large language models across diverse benchmarks, showcasing their potential for advancing visual reasoning tasks. Our code is available at https://github.com/LaVi-Lab/Visual-Table.

category, knowledge description, visual table, (12 more...)

2403.18252

Country:

North America > Mexico (0.14)
North America > United States > New Hampshire (0.04)
North America > United States > Vermont (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry:

Education (0.92)
Leisure & Entertainment > Sports (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
(3 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-28-2023

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

Cui, Hejie, Fang, Xinyu, Zhang, Zihan, Xu, Ran, Kan, Xuan, Liu, Xin, Yu, Yue, Li, Manling, Song, Yangqiu, Yang, Carl

Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik.

knowledge, openvik, visual knowledge, (13 more...)

2310.18804

Country:

Europe (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.68)
Transportation (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Cui, Wanyun, Chen, Xingran

Free Lunch for Efficient Textual Commonsense Integration in Language Models

arXiv.org Artificial IntelligenceMay-24-2023

Recent years have witnessed the emergence of textual commonsense knowledge bases, aimed at providing more nuanced and context-rich knowledge. The integration of external commonsense into language models has been shown to be a key enabler in advancing the state-of-the-art for a wide range of NLP tasks. However, incorporating textual commonsense descriptions is computationally expensive, as compared to encoding conventional symbolic knowledge. In this paper, we propose a method to improve its efficiency without modifying the model. We group training samples with similar commonsense descriptions into a single batch, thus reusing the encoded description across multiple samples. One key observation is that the upper bound of batch partitioning can be reduced to the classic {\it graph k-cut problem}. Consequently, we propose a spectral clustering-based algorithm to solve this problem. Extensive experiments illustrate that the proposed batch partitioning approach effectively reduces the computational cost while preserving performance. The efficiency improvement is more pronounced on larger datasets and on devices with more memory capacity, attesting to its practical utility for large-scale applications.

artificial intelligence, knowledge, natural language, (18 more...)

2305.15516

Country:

Europe > Italy > Tuscany > Florence (0.04)
North America > United States > Michigan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.69)

arXiv.org Artificial IntelligenceMar-28-2023

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Bian, Ning, Han, Xianpei, Sun, Le, Lin, Hongyu, Lu, Yaojie, He, Ben

Large language models (LLMs) such as ChatGPT and GPT-4 have made significant progress in NLP. However, their ability to memorize, represent, and leverage commonsense knowledge has been a well-known pain point for LLMs. It remains unclear that: (1) Can GPTs effectively answer commonsense questions? (2) Are GPTs knowledgeable in commonsense? (3) Are GPTs aware of the underlying commonsense knowledge for answering a specific question? (4) Can GPTs effectively leverage commonsense for answering questions? To evaluate the above commonsense problems, we conduct a series of experiments to evaluate ChatGPT's commonsense abilities, and the experimental results show that: (1) GPTs can achieve good QA accuracy in commonsense tasks, while they still struggle with certain types of knowledge. (2) ChatGPT is knowledgeable, and can accurately generate most of the commonsense knowledge using knowledge prompts. (3) Despite its knowledge, ChatGPT is an inexperienced commonsense problem solver, which cannot precisely identify the needed commonsense knowledge for answering a specific question, i.e., ChatGPT does not precisely know what commonsense knowledge is required to answer a question. The above findings raise the need to investigate better mechanisms for utilizing commonsense knowledge in LLMs, such as instruction following, better commonsense guidance, etc.

large language model, machine learning, natural language, (19 more...)

2303.16421

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)