Language Model as Visual Explainer

Mar-27-2025, 15:03:14 GMT–Neural Information Processing Systems

Central to our strategy is the collaboration between vision models and LLM to craft explanations. On one hand, the LLM is harnessed to delineate hierarchical visual attributes, while concurrently, a text-to-image API retrieves images that are most aligned with these textual concepts. By mapping the collected texts and images to the vision model's embedding space, we construct a hierarchy-structured visual embedding tree. This tree is dynamically pruned and grown by querying the LLM using language templates, tailoring the explanation to the model. Such a scheme allows us to seamlessly incorporate new attributes while eliminating undesired concepts based on the model's representations. When applied to testing samples, our method provides human-understandable explanations in the form of attributeladen trees. Beyond explanation, we retrained the vision model by calibrating it on the generated concept hierarchy, allowing the model to incorporate the refined knowledge of visual attributes. To access the effectiveness of our approach, we introduce new benchmarks and conduct rigorous evaluations, demonstrating its plausibility, faithfulness, and stability.

explanation, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Mar-27-2025, 15:03:14 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East (0.28)
- Europe > Switzerland
  - Zürich > Zürich (0.14)
- North America > United States
  - California > San Francisco County > San Francisco (0.14)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (1.00)

Industry:
- Health & Medicine > Therapeutic Area (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found