AITopics | referential

Collaborating Authors

referential

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Multi Turn Referential Grounded Video Chat with Large Language Models

Neural Information Processing SystemsJun-16-2026, 20:25:09 GMT

Achieving fine-grained spatio-temporal understanding in videos remains a major challenge for current Video Large Multimodal Models (Video LMMs). Addressing this challenge requires mastering two core capabilities: video referring understanding, which captures the semantics of video regions, and video grounding, which segments object regions based on natural language descriptions. However, most existing approaches tackle these tasks in isolation, limiting progress toward unified, referentially grounded video interaction. We identify a key bottleneck in the lack of high-quality, unified video instruction data and a comprehensive benchmark for evaluating referentially grounded video chat. To address these challenges, we contribute in three core aspects: dataset, model, and benchmark. First, we introduce SAMA-239K, a large-scale dataset comprising 15K videos specifically curated to enable joint learning of video referring understanding, grounding, and multi-turn video chat. Second, we propose the SAMA model, which incorporates a versatile spatio-temporal context aggregator and a Segment Anything Model to jointly enhance fine-grained video comprehension and precise grounding capabilities. Finally, we establish SAMA-Bench, a meticulously designed benchmark consisting of 5,067 questions from 522 videos, to comprehensively evaluate the integrated capabilities of Video LMMs in multi-turn, spatio-temporal referring understanding and grounded dialogue. Extensive experiments and benchmarking results show that SAMA not only achieves strong performance on SAMA-Bench but also sets a new state-of-the-art on general grounding benchmarks, while maintaining highly competitive performance on standard visual understanding benchmarks.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

GUMBridge: a Corpus for Varieties of Bridging Anaphora

Levine, Lauren, Zeldes, Amir

arXiv.org Artificial IntelligenceDec-9-2025

Bridging is an anaphoric phenomenon where the referent of an entity in a discourse is dependent on a previous, non-identical entity for interpretation, such as in "There is 'a house'. 'The door' is red," where the door is specifically understood to be the door of the aforementioned house. While there are several existing resources in English for bridging anaphora, most are small, provide limited coverage of the phenomenon, and/or provide limited genre coverage. In this paper, we introduce GUMBridge, a new resource for bridging, which includes 16 diverse genres of English, providing both broad coverage for the phenomenon and granular annotations for the subtype categorization of bridging varieties. We also present an evaluation of annotation quality and report on baseline performance using open and closed source contemporary LLMs on three tasks underlying our data, showing that bridging resolution and subtype classification remain difficult NLP tasks in the age of LLMs.

annotation, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.07134

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models

Zantout, Nader, Zhang, Haochen, Kachana, Pujith, Qiu, Jinkai, Chen, Guofei, Zhang, Ji, Wang, Wenshan

arXiv.org Artificial IntelligenceAug-18-2025

Interpreting object-referential language and grounding objects in 3D with spatial relations and attributes is essential for robots operating alongside humans. However, this task is often challenging due to the diversity of scenes, large number of fine-grained objects, and complex free-form nature of language references. Furthermore, in the 3D domain, obtaining large amounts of natural language training data is difficult. Thus, it is important for methods to learn from little data and zero-shot generalize to new environments. To address these challenges, we propose SORT3D, an approach that utilizes rich object attributes from 2D data and merges a heuristics-based spatial reasoning toolbox with the ability of large language models (LLMs) to perform sequential reasoning. Importantly, our method does not require text-to-3D data for training and can be applied zero-shot to unseen environments. We show that SORT3D achieves state-of-the-art zero-shot performance on complex view-dependent grounding tasks on two benchmarks. We also implement the pipeline to run real-time on two autonomous vehicles and demonstrate that our approach can be used for object-goal navigation on previously unseen real-world environments. All source code for the system pipeline is publicly released at https://github.com/nzantout/SORT3D.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2504.18684

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes

Zhang, Haochen, Zantout, Nader, Kachana, Pujith, Zhang, Ji, Wang, Wenshan

arXiv.org Artificial IntelligenceMar-20-2025

With the recent rise of large language models, vision-language models, and other general foundation models, there is growing potential for multimodal, multi-task robotics that can operate in diverse environments given natural language input. One such application is indoor navigation using natural language instructions. However, despite recent progress, this problem remains challenging due to the 3D spatial reasoning and semantic understanding required. Additionally, the language used may be imperfect or misaligned with the scene, further complicating the task. To address this challenge, we curate a benchmark dataset, IRef-VLA, for Interactive Referential Vision and Language-guided Action in 3D Scenes with imperfect references. IRef-VLA is the largest real-world dataset for the referential grounding task, consisting of over 11.5K scanned 3D rooms from existing datasets, 7.6M heuristically generated semantic relations, and 4.7M referential statements. Our dataset also contains semantic object and room annotations, scene graphs, navigable free space annotations, and is augmented with statements where the language has imperfections or ambiguities. We verify the generalizability of our dataset by evaluating with state-of-the-art models to obtain a performance baseline and also develop a graph-search baseline to demonstrate the performance bound and generation of alternatives using scene-graph knowledge. With this benchmark, we aim to provide a resource for 3D scene understanding that aids the development of robust, interactive navigation systems. The dataset and all source code is publicly released at https://github.com/HaochenZ11/IRef-VLA.

large language model, natural language, object-oriented architecture, (17 more...)

arXiv.org Artificial Intelligence

2503.17406

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.66)

Add feedback

Indication Finding: a novel use case for representation learning

Eckhoff, Maren, Selimi, Valmir, Aranovitch, Alexander, Lyons, Ian, Briggs, Emily, Hou, Jennifer, Devereson, Alex, Macak, Matej, Champagne, David, Anagnostopoulos, Chris

arXiv.org Artificial IntelligenceOct-24-2024

Many therapies are effective in treating multiple diseases. We present an approach that leverages methods developed in natural language processing and real-world data to prioritize potential, new indications for a mechanism of action (MoA). We specifically use representation learning to generate embeddings of indications and prioritize them based on their proximity to the indications with the strongest available evidence for the MoA. We demonstrate the successful deployment of our approach for anti-IL-17A using embeddings generated with SPPMI and present an evaluation framework to determine the quality of indication finding results and the derived embeddings.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.19174

Country: North America > United States (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Rheumatology (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Fuzzy Temporal Protoforms for the Quantitative Description of Processes in Natural Language

Fontenla-Seco, Yago, Bugarín-Diz, Alberto, Lama, Manuel

arXiv.org Artificial IntelligenceMay-16-2023

In this paper, we propose a series of fuzzy temporal protoforms in the framework of the automatic generation of quantitative and qualitative natural language descriptions of processes. The model includes temporal and causal information from processes and attributes, quantifies attributes in time during the process life-span and recalls causal relations and temporal distances between events, among other features. Through integrating process mining techniques and fuzzy sets within the usual Data-to-Text architecture, our framework is able to extract relevant quantitative temporal as well as structural information from a process and describe it in natural language involving uncertain terms. A real use-case in the cardiology domain is presented, showing the potential of our model for providing natural language explanations addressed to domain experts.

artificial intelligence, natural language, protoform, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/FUZZ45933.2021.9494444

2305.09506

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.05)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.95)

Add feedback

Bottom-up top-down detection transformers for open vocabulary object detection

AIHubJan-23-2023, 11:30:54 GMT

We perform open vocabulary detection of the objects mentioned in the sentence using both bottom-up and top-down feedback. Object detection is the fundamental computer vision task of finding all "objects" that are present in a visual scene. However, this raises the question, what is an object? Typically, this question is side-stepped by defining a vocabulary of categories and then training a model to detect instances of this vocabulary. This means that if "apple" is not in this vocabulary, the model does not consider it as an object.

artificial intelligence, detection, detector, (14 more...)

AIHub

Genre: Research Report (0.31)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

10 Best Machine Learning Textbooks that All Data Scientists Should Read

#artificialintelligenceApr-16-2022, 20:50:12 GMT

Machine learning is an intimidating subject. Knowing where to develop mastery around such a massive subject that encompasses so many fields, research topics, and applications can be the hardest part of the journey. Anyone with a background in programming will attest to the value of a good textbook, especially when it comes to a subject as technical as machine learning. Get a quote for an end-to-end data solution to your specific requirements. Whether you're a complete novice or a distinguished mastermind in this field, we at iMerit have compiled the best field guides, icebreakers, and referential machine learning textbooks that will suit both newcomers and veterans alike who are looking to improve their understanding of machine learning.

learning, machine learning, textbook, (14 more...)

#artificialintelligence

Genre:

Summary/Review (0.51)
Instructional Material > Course Syllabus & Notes (0.30)

Industry: Education (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Automating the Generation of High School Geometry Proofs using Prolog in an Educational Context

Font, Ludovic, Cyr, Sébastien, Richard, Philippe R., Gagnon, Michel

arXiv.org Artificial IntelligenceFeb-28-2020

When working on intelligent tutor systems designed for mathematics education and its specificities, an interesting objective is to provide relevant help to the students by anticipating their next steps. This can only be done by knowing, beforehand, the possible ways to solve a problem. Hence the need for an automated theorem prover that provide proofs as they would be written by a student. To achieve this objective, logic programming is a natural tool due to the similarity of its reasoning with a mathematical proof by inference. In this paper, we present the core ideas we used to implement such a prover, from its encoding in Prolog to the generation of the complete set of proofs. However, when dealing with educational aspects, there are many challenges to overcome. We also present the main issues we encountered, as well as the chosen solutions. The QED-Tutrix software [15, 19] provides an environment where a highschool student can solve geometry proof problems. One of its key features is that it allows the student to provide proof elements in any order, not limiting them to forward-or backward-chaining. For instance, when solving the simple problem "prove that a quadrilateral with three right angles is a rectangle", the student can provide any element of any possible proof, such as a direct consequence of the hypotheses ("if two lines are perpendicular to a third, they are parallel"), a necessary premise for the conclusion ("a rectangle is a quadrilateral that has four right angles"), or anything in between ("the quadrilateral ABCD is a parallelogram"). A second key feature is the tutoring aspect. When the student is stuck is the resolution, the software is able to provide them with relevant messages. In the previous example, if the student entered "the quadrilateral ABCD is a parallelogram" and is stuck afterwards, the software identifies that they are working on a proof using parallelogram properties, and will provide them messages such as "what is the definition of a parallelogram?" or "is there a relation between parallelogram and rectangle?" These features, the flexibility in exploration and the tutoring, are very interesting from a mathematics education perspective, but come with a cost.

inference, prolog, triangle, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.313.1

2002.12551

Country:

North America > Canada > Quebec > Montreal (0.15)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre:

Overview (0.68)
Research Report (0.50)

Industry:

Education > Curriculum > Subject-Specific Education (0.54)
Education > Educational Setting > K-12 Education > Secondary School (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.88)

Add feedback

That and There: Judging the Intent of Pointing Actions with Robotic Arms

Alikhani, Malihe, Khalid, Baber, Shome, Rahul, Mitash, Chaitanya, Bekris, Kostas, Stone, Matthew

arXiv.org Artificial IntelligenceDec-13-2019

Collaborative robotics requires effective communication between a robot and a human partner. This work proposes a set of interpretive principles for how a robotic arm can use pointing actions to communicate task information to people by extending existing models from the related literature. These principles are evaluated through studies where English-speaking human subjects view animations of simulated robots instructing pick-and-place tasks. The evaluation distinguishes two classes of pointing actions that arise in pick-and- place tasks: referential pointing (identifying objects) and locating pointing (identifying locations). The study indicates that human subjects show greater flexibility in interpreting the intent of referential pointing compared to locating pointing, which needs to be more deliberate. The results also demonstrate the effects of variation in the environment and task context on the interpretation of pointing. Our corpus, experiments and design principles advance models of context, common sense reasoning and communication in embodied communication.

interpretation, referential, robot, (15 more...)

arXiv.org Artificial Intelligence

1912.06602

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (0.57)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.54)

Add feedback