AITopics | relevancy map

Collaborating Authors

relevancy map

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FastRM: An efficient and automatic explainability framework for multimodal generative models

Stan, Gabriela Ben-Melech, Aflalo, Estelle, Luo, Man, Rosenman, Shachar, Le, Tiep, Paul, Sayak, Tseng, Shao-Yen, Lal, Vasudev

arXiv.org Artificial IntelligenceDec-2-2024

While Large Vision Language Models (LVLMs) have become masterly capable in reasoning over human prompts and visual inputs, they are still prone to producing responses that contain misinformation. Identifying incorrect responses that are not grounded in evidence has become a crucial task in building trustworthy AI. Explainability methods such as gradient-based relevancy maps on LVLM outputs can provide an insight on the decision process of models, however these methods are often computationally expensive and not suited for on-the-fly validation of outputs. In this work, we propose FastRM, an effective way for predicting the explainable Relevancy Maps of LVLM models. Experimental results show that employing FastRM leads to a 99.8% reduction in compute time for relevancy map generation and an 44.4% reduction in memory footprint for the evaluated LVLM, making explainable AI more efficient and practical, thereby facilitating its deployment in real-world applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.01487

Country:

Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models

Pulli, Tessa, Thalhammer, Stefan, Schwaiger, Simon, Vincze, Markus

arXiv.org Artificial IntelligenceSep-9-2024

Robots are increasingly envisioned to interact in real-world scenarios, where they must continuously adapt to new situations. To detect and grasp novel objects, zero-shot pose estimators determine poses without prior knowledge. Recently, vision language models (VLMs) have shown considerable advances in robotics applications by establishing an understanding between language input and image input. In our work, we take advantage of VLMs zero-shot capabilities and translate this ability to 6D object pose estimation. We propose a novel framework for promptable zero-shot 6D object pose estimation using language embeddings. The idea is to derive a coarse location of an object based on the relevancy map of a language-embedded NeRF reconstruction and to compute the pose estimate with a point cloud registration method. Additionally, we provide an analysis of LERF's suitability for open-set object pose estimation. We examine hyperparameters, such as activation thresholds for relevancy maps and investigate the zero-shot capabilities on an instance- and category-level. Furthermore, we plan to conduct robotic grasping experiments in a real-world setting.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2409.05413

Country:

Europe > Austria > Vienna (0.14)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking

Beňová, Ivana, Košecká, Jana, Gregor, Michal, Tamajka, Martin, Veselý, Marcel, Šimko, Marián

arXiv.org Artificial IntelligenceJan-29-2024

The dominant probing approaches rely on the zero-shot performance of image-text matching tasks to gain a finer-grained understanding of the representations learned by recent multimodal image-language transformer models. The evaluation is carried out on carefully curated datasets focusing on counting, relations, attributes, and others. This work introduces an alternative probing strategy called guided masking. The proposed approach ablates different modalities using masking and assesses Figure 1: Image from the SVO-Probes dataset (Hendricks the model's ability to predict the masked word and Nematzadeh, 2021). It consists of imagecaption with high accuracy. We focus on studying pairs, where the sentence either correctly describes multimodal models that consider regions of the image (positive example) or one aspect of interest (ROI) features obtained by object detectors the sentence (subject, verb, or object) does not match as input tokens. We probe the understanding the image (negative example). These pairs are used to of verbs using guided masking on probe models through zero-shot image-text matching. ViLBERT, LXMERT, UNITER, and Visual-Example of a positive caption: A person walking on BERT and show that these models can predict a trail.

artificial intelligence, caption, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.16575

Country:

Europe > Czechia > South Moravian Region > Brno (0.04)
North America > United States (0.04)
Europe > Slovakia > Bratislava > Bratislava (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models

Ha, Huy, Song, Shuran

arXiv.org Artificial IntelligenceDec-6-2022

We study open-world 3D scene understanding, a family of tasks that require agents to reason about their 3D environment with an open-set vocabulary and out-of-domain visual inputs - a critical skill for robots to operate in the unstructured 3D world. Towards this end, we propose Semantic Abstraction (SemAbs), a framework that equips 2D Vision-Language Models (VLMs) with new 3D spatial capabilities, while maintaining their zero-shot robustness. We achieve this abstraction using relevancy maps extracted from CLIP, and learn 3D spatial and geometric reasoning skills on top of those abstractions in a semantic-agnostic manner. We demonstrate the usefulness of SemAbs on two open-world 3D scene understanding tasks: 1) completing partially observed objects and 2) localizing hidden objects from language descriptions. Experiments show that SemAbs can generalize to novel vocabulary, materials/lighting, classes, and domains (i.e., real-world scans) from training on limited 3D synthetic data. Code and data is available at https://semantic-abstraction.cs.columbia.edu/

artificial intelligence, natural language, relevancy map, (17 more...)

arXiv.org Artificial Intelligence

2207.11514

Country: Asia > Middle East > Republic of Türkiye (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

Natural Numerical Networks for Natura 2000 habitats classification by satellite images

Mikula, Karol, Kollar, Michal, Ozvat, Aneta A., Ambroz, Martin, Cahojova, Lucia, Jarolimek, Ivan, Sibik, Jozef, Sibikova, Maria

arXiv.org Artificial IntelligenceAug-9-2021

Natural numerical networks are introduced as a new classification algorithm based on the numerical solution of nonlinear partial differential equations of forward-backward diffusion type on complete graphs. The proposed natural numerical network is applied to open important environmental and nature conservation task, the automated identification of protected habitats by using satellite images. In the natural numerical network, the forward diffusion causes the movement of points in a feature space toward each other. The opposite effect, keeping the points away from each other, is caused by backward diffusion. This yields the desired classification. The natural numerical network contains a few parameters that are optimized in the learning phase of the method. After learning parameters and optimizing the topology of the network graph, classification necessary for habitat identification is performed. A relevancy map for each habitat is introduced as a tool for validating the classification and finding new Natura 2000 habitat appearances.

artificial intelligence, machine learning, relevancy map, (14 more...)

arXiv.org Artificial Intelligence

2108.04327

Country:

Europe > Slovakia > Bratislava > Bratislava (0.05)
Europe > Serbia > Vojvodina (0.04)
Europe > Romania (0.04)
(6 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback